Fast, cheap and automated
Deploying static websites to AWS

For the last year I relied on the Jekyll static site generator for my blog and hosted it on Github pages. This was a super convenient experience; it’s probably the easiest way to kick off a self hosted website. However, I sat down this weekend and migrated all the content to the Hugo static site generator and setup the hosting in the AWS cloud (Amazon Web Services).

Short background story: why Hugo? It is noticably faster. The compile time with Hugo is nearly instantaneous. I also like that it is less opinionated and allows more flexibility for the project structure.

Anyway, the topic of this blogpost is how to setup an automated and reliable deployment to AWS, regardless whether the static files were generated by Hugo, Jekyll or any other tool.

The ups and downs of AWS

AWS is a powerful cloud provider. It is made up of numerous small services that are configured independently and can be combined with each other like a construction kit. Moreover, the hosting of static content on AWS is ridiculously cheap. (Considering just S3, my monthly bill is usually around a few cents.)

However, the advanced features of AWS are a downside at the same time: the initial hurdle to get started is high and there is plenty room for error. Keep this in mind:

If you are new to AWS, you are best adviced to take the time and understand everything you do properly – even when this is time consuming and frustrating in the beginning.

Basic setup

For building and deploying our website, we use the following chain:

  1. Github is the place where all source files of the static website are at home.
  2. Travis CI builds and pushes the static website upon every change.
  3. AWS S3 holds the generated static files and serves them to the world.
  4. AWS CloudFront (optional) is the CDN service that speeds up load times even more. It also provides an SSL certificate for HTTPS connections.

So, let’s roll up the sleeves!

Github

I won’t go into detail about setting up a Github repo here. But let me point out once again the importance of not commiting any credentials whatsoever to your repo. Neither for AWS, nor for any other service that you use. If it happens after all, you must immediately revoke the affected credentials. (Just erasing them from the commit history is not sufficient!)

Travis CI

Within the Travis account we connect our Github repo. Everytime we push something to the repo, Travis will run a build: it takes care of generating the static sites, builds the assets and deploys the public folder to S3. It executes every build in a clean environment, thus making sure that no artifacts from previous builds or other temporary files happen to make their way into production. In order to tell Travis what to do, we must create a .travis.yml file in the project root:

language: go

install:
  - go get -v github.com/spf13/hugo
  - pip install --user awscli

script:
  - hugo
  - aws s3 sync public/ s3://YOUR_BUCKET_NAME/ --delete

Our file consists of two blocks. (All commands get executed in the order they are specified.)

  1. install: Since the binaries for hugo and AWS are not part of the Travis default environment, we must install them first.
  2. script: This is where the actual build happens. Note, that we use the AWS CLI rather then the out-of-the-box Travis S3 deployment, because the latter one doesn’t take care of deleting orphaned files, which is a major annoyance. Replace the constant with your bucket name.

In order for the AWS CLI to work, we must provide three environment variables in the settings of your Travis project: AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY and AWS_DEFAULT_REGION. Make sure to obfuscate them in the build log! More on these keys later.

We can add additional steps to this configuration as we like. For instance, we could compile our SASS/LESS files by installing some CLI tool (e.g. npm install node-sass) and then invoking it in the script step.

AWS basic setup

IAM (identity and access management)

AWS has powerful mechanisms that allow to setup fine granular permissions. First, we go to IAM and create a new user for programmatic access. It’s fine to not assign a group to this user, in which case he won’t have any default permissions unless we explicitly set them2. Next, we go over to Travis and set the two environment variables for access key and access secret. (See above.)

S3 Bucket

S3 is short for simple storage service. It can be used to store all kinds of files. S3 has a HTTP based interface, so all file operations are performed with regular HTTP requests. This is the reason why the content of S3 buckets can be exposed to the world wide web so easily.

We create a new bucket by giving it a (unique) name and choosing a region. Next we enable static website hosting in the bucket properties, where we can also configure the index and error documents. Finally, we go over to Travis and enter the region identifier as environment variable. (See above.)

Policies

Unless we don’t specify a policy, neither Travis can upload anything nor can anyone view our content in a web browser. Policies can be attached to all kinds of AWS entities, so we can configure them directly in the properties of our S3 bucket. (“Properties” → “Permissions” → “Edit bucket policy”)

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "TravisCI",
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::YOUR_AWS_ACCOUNT_ID:user/TRAVIS_IAM_USER_NAME"
      },
      "Action": "s3:*",
      "Resource": [
        "arn:aws:s3:::BUCKET_NAME",
        "arn:aws:s3:::BUCKET_NAME/*"
      ]
    },
    {
      "Sid": "PublicWebsiteAccess",
        "Effect": "Allow",
      "Principal": "*",
      "Action": ["s3:GetObject"],
      "Resource": ["arn:aws:s3:::BUCKET_NAME/*"]
    }
  ]
}

What’s happening here?

Of course you must replace the uppercase parts with your specific information. And again: be sure to fully understand what’s happening here and how AWS policies work.

DNS settings

Ensure that everything is setup and running by triggering a build. When the build was successfull, you can access your website content via the S3 endpoint, which looks something like this: BUCKET_NAME.s3-website-us-east-1.amazonaws.com. Of course, this isn’t a very nice URL, so you want to configure a CNAME record for your own domain that points to this address. (Consult your DNS provider or domain seller on how to do that.)

Sidenote: It’s not recommended to set CNAME records for root level domains, although this is technically possible. You can setup a CNAME record for www.example.org, but you shouldn’t do so for example.org.3 However, most DNS providers offer the option to redirect the root domain to a subdomain (like www.example.org).

Advanced setup (optional)

AWS CloudFront and HTTPS

CloudFront is the CDN service of AWS that can be put ahead of S3. This means that all our content isn’t delivered directly out of the bucket anymore, but it is served by CDN servers (so-called edges) all around the globe.

The biggest benefit for a smaller website like this blog is not performance. (S3 is usually pretty quick already.) Instead, CloudFront gives us the ability to setup a SSL certificate, which would not be possible with S3 alone. Note that even though the CloudFront default certificates are free, CloudFront itself is a bit more expensive then S3 (depending on how you use it).4

In order to setup CloudFront, create a Distribution in the AWS Console. It’s up to you whether you want to use the caching mechanism or not. Caching can be annoying sometimes, because file changes take much longer to be rolled out. If you set a custom TTL of 0, CloudFront will check the origin for modifications on each request.5 Don’t worry about the performance implications – they are most likely neglectable.

Here are some further tips for the configuration:

When you decide for CloudFront, you might consider to use s3cmd instead of the AWS CLI in the deploy step, because it provides the option to automatically trigger cache invalidation for uploaded files (with the --cf-invalidate option.)

Provisioning tools

Nowadays, infrastructure can be setup in a modern DevOps fashion with tools like terraform. This is not just cool, but it brings in several benefits like predictability and reproducibility. On the other hand though, this would add another layer of complexity and is probably overkill for a simple setup like ours. The infrastructure that is described here can be easily maintained via the AWS web interface (aka Cloud Console).


  1. Leaked credentials are usually abused to mine bitcoins in the compromised accounts. [return]
  2. It’s a common good practice to only grant the exact necessary access rights, even though it is more complicated. [return]
  3. Setting up a CNAME record for the root domain can break email delivery on this domain. [return]
  4. You can estimate your AWS expenses with this handy calculator. [return]
  5. See the AWS docs. This StackOverflow post also provides useful information. [return]