Hosting a Static Blog on AWS with Terraform: S3, CloudFront, and the Gotchas In Between

This blog runs on Jekyll, but Jekyll is just the build tool. The actual hosting stack is AWS: an S3 bucket behind a CloudFront distribution, with ACM for TLS and Route53 for DNS. The whole thing is managed with Terraform. This post walks through the architecture, the Terraform config, and the non-obvious problems I ran into along the way.

The Architecture

The stack is five components wired together:

Browser
  └─ Route53 (A/AAAA alias → CloudFront)
       └─ CloudFront (HTTPS, custom domain, TLS via ACM)
            ├─ CloudFront Function (URI rewrite, viewer-request)
            └─ S3 Origin (private bucket, OAC)

S3 stores the static files built by jekyll build. The bucket is fully private – no public access, no static website hosting enabled.
CloudFront sits in front of S3, handles HTTPS termination, caches responses at edge, and enforces redirect-to-https.
Origin Access Control (OAC) allows CloudFront to fetch from S3 without making the bucket public. It signs requests with SigV4.
ACM provides the TLS certificate. It must be provisioned in us-east-1 regardless of where the rest of your infrastructure lives – CloudFront is a global service and only reads certificates from that region.
Route53 resolves the domain to CloudFront via alias records.

Provider Setup

Because ACM must be in us-east-1, I declare two providers:

provider "aws" {}

provider "aws" {
  alias  = "us_east_1"
  region = "us-east-1"
}

The default provider inherits the region from the environment or AWS config (ap-southeast-2 in my case). The aliased provider is used only for the ACM certificate and its validation.

S3 Bucket

The bucket is private with all public access blocked:

resource "aws_s3_bucket" "blog" {
  bucket = var.domain_name
}

resource "aws_s3_bucket_public_access_block" "blog" {
  bucket = aws_s3_bucket.blog.id

  block_public_acls       = true
  block_public_policy     = true
  ignore_public_acls      = true
  restrict_public_buckets = true
}

The bucket policy grants read access to CloudFront only, scoped to the specific distribution ARN via a condition:

resource "aws_s3_bucket_policy" "blog" {
  bucket = aws_s3_bucket.blog.id

  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Sid       = "AllowCloudFrontOAC"
        Effect    = "Allow"
        Principal = { Service = "cloudfront.amazonaws.com" }
        Action    = "s3:GetObject"
        Resource  = "${aws_s3_bucket.blog.arn}/*"
        Condition = {
          StringEquals = {
            "AWS:SourceArn" = aws_cloudfront_distribution.blog.arn
          }
        }
      }
    ]
  })
}

The AWS:SourceArn condition is important. Without it, any CloudFront distribution in any account could use your bucket as an origin if they guessed the name.

Origin Access Control

OAC replaced the older Origin Access Identity (OAI) mechanism. It uses SigV4 signing rather than a synthetic IAM principal:

resource "aws_cloudfront_origin_access_control" "blog" {
  name                              = var.domain_name
  origin_access_control_origin_type = "s3"
  signing_behavior                  = "always"
  signing_protocol                  = "sigv4"
}

signing_behavior = "always" means every request from CloudFront to S3 is signed. There is no option to conditionally sign, which is fine – the bucket is private and every request needs authorisation.

ACM Certificate

The certificate resource lives in the us_east_1 provider:

resource "aws_acm_certificate" "blog" {
  provider          = aws.us_east_1
  domain_name       = var.domain_name
  validation_method = "DNS"

  lifecycle {
    create_before_destroy = true
  }
}

create_before_destroy prevents Terraform from deleting the old certificate before the replacement is ready. Without it, a certificate replacement would cause downtime.

DNS validation creates CNAME records in Route53. Terraform handles this with a for_each over the certificate’s domain_validation_options:

resource "aws_route53_record" "cert_validation" {
  for_each = {
    for dvo in aws_acm_certificate.blog.domain_validation_options : dvo.domain_name => {
      name   = dvo.resource_record_name
      record = dvo.resource_record_value
      type   = dvo.resource_record_type
    }
  }

  zone_id = data.aws_route53_zone.main.zone_id
  name    = each.value.name
  type    = each.value.type
  ttl     = 300
  records = [each.value.record]
}

resource "aws_acm_certificate_validation" "blog" {
  provider                = aws.us_east_1
  certificate_arn         = aws_acm_certificate.blog.arn
  validation_record_fqdns = [for record in aws_route53_record.cert_validation : record.fqdn]
}

The aws_acm_certificate_validation resource doesn’t create anything in AWS – it’s a Terraform mechanism that blocks until ACM confirms the certificate is issued. The CloudFront distribution references aws_acm_certificate_validation.blog.certificate_arn rather than aws_acm_certificate.blog.arn to ensure it only gets the ARN once validation is complete.

CloudFront Distribution

The distribution wires everything together:

resource "aws_cloudfront_distribution" "blog" {
  enabled             = true
  default_root_object = "index.html"
  aliases             = [var.domain_name]

  origin {
    domain_name              = aws_s3_bucket.blog.bucket_regional_domain_name
    origin_id                = "s3-${var.domain_name}"
    origin_access_control_id = aws_cloudfront_origin_access_control.blog.id
  }

  default_cache_behavior {
    allowed_methods        = ["GET", "HEAD"]
    cached_methods         = ["GET", "HEAD"]
    target_origin_id       = "s3-${var.domain_name}"
    viewer_protocol_policy = "redirect-to-https"
    cache_policy_id        = data.aws_cloudfront_cache_policy.caching_optimized.id

    function_association {
      event_type   = "viewer-request"
      function_arn = aws_cloudfront_function.rewrite_uri.arn
    }
  }

  custom_error_response {
    error_code         = 403
    response_code      = 404
    response_page_path = "/404.html"
  }

  custom_error_response {
    error_code         = 404
    response_code      = 404
    response_page_path = "/404.html"
  }

  viewer_certificate {
    acm_certificate_arn      = aws_acm_certificate_validation.blog.certificate_arn
    ssl_support_method       = "sni-only"
    minimum_protocol_version = "TLSv1.2_2021"
  }
}

A few things worth noting here.

bucket_regional_domain_name is used for the origin, not bucket_domain_name. The regional variant avoids a redirect loop that can occur with the global endpoint when the bucket and CloudFront are in different regions.

The cache policy references AWS’s managed Managed-CachingOptimized policy, which is appropriate for static sites. It caches aggressively and strips most headers from origin requests.

sni-only for ssl_support_method is the correct choice unless you need to support very old clients that don’t send SNI (effectively nothing built after 2010). The alternative (vip) costs around $600/month.

Two Gotchas I Hit

Subdirectory requests return 403

default_root_object = "index.html" only applies to the root path (/). A request to /about/ does not get rewritten to /about/index.html automatically – CloudFront forwards it to S3 literally, S3 has no object at that key, and it returns a 403.

The fix is a CloudFront Function on the viewer-request event that rewrites the URI before it reaches the origin:

resource "aws_cloudfront_function" "rewrite_uri" {
  name    = "rewrite-uri-index-html"
  runtime = "cloudfront-js-2.0"
  publish = true
  code    = <<-EOF
    function handler(event) {
      var request = event.request;
      var uri = request.uri;
      if (uri.endsWith('/')) {
        request.uri += 'index.html';
      } else if (!uri.includes('.')) {
        request.uri += '/index.html';
      }
      return request;
    }
  EOF
}

This runs at the edge before the cache lookup, so rewritten URIs are cached correctly too.

S3 returns 403, not 404, for missing objects

When accessed via OAC, a private S3 bucket returns 403 Forbidden for keys that don’t exist – not 404 Not Found. The reason is that S3 doesn’t want to confirm or deny whether a key exists in a private bucket.

This means a custom 404 page configured only for HTTP 404 errors won’t fire. You need both:

custom_error_response {
  error_code         = 403
  response_code      = 404
  response_page_path = "/404.html"
}

custom_error_response {
  error_code         = 404
  response_code      = 404
  response_page_path = "/404.html"
}

The first block catches the 403 from S3 and converts it to a 404 response for the browser. Without it, visitors hitting a broken link see a generic 403, which is confusing and leaks implementation detail.

Route53 DNS

Two alias records point the domain at CloudFront – one for IPv4 (A) and one for IPv6 (AAAA):

resource "aws_route53_record" "blog_a" {
  zone_id = data.aws_route53_zone.main.zone_id
  name    = var.domain_name
  type    = "A"

  alias {
    name                   = aws_cloudfront_distribution.blog.domain_name
    zone_id                = aws_cloudfront_distribution.blog.hosted_zone_id
    evaluate_target_health = false
  }
}

Alias records resolve at Route53 without a CNAME hop, which means they work at the zone apex (e.g. example.com rather than just www.example.com). Regular CNAME records cannot be set at the zone apex – this is a DNS spec constraint, not an AWS limitation.

Deployment

Once the Terraform is applied, deployment is three commands:

cd blog.jeakyl.com && bundle exec jekyll build
aws s3 sync blog.jeakyl.com/_site/ s3://blog.jeakyl.com --delete
aws cloudfront create-invalidation --distribution-id E1Z05CSWVMASDV --paths "/*"

The --delete flag on s3 sync removes objects from the bucket that no longer exist in the build output. Without it, deleted posts or renamed files persist in S3 indefinitely and remain accessible.

The CloudFront invalidation flushes the cache at all edge locations. Without it, CloudFront serves stale content until the TTL expires – which, with Managed-CachingOptimized, can be up to 24 hours.

Summary

The stack is deliberately simple: S3 for storage, CloudFront for edge delivery and HTTPS, ACM for certificates, Route53 for DNS. There are no servers to patch and nothing to scale. The two non-obvious parts are the URI rewrite function (needed because default_root_object only covers /) and the 403-to-404 custom error mapping (needed because S3 returns 403 for missing keys when accessed privately). Both are easy to miss and neither is well documented in the Terraform provider docs.