Enable gzip for Jekyll blogs on Amazon S3

A little over a year ago I moved away from Wordpress and started using Jekyll for this blog. I also started using Amazon S3 as a cheap but very reliable hosting provider.

The past weeks I've looked into ways to speed up my blog even further without using a CDN. One technique that is used very frequently is enabling gzip compression. In this post I'll walk you through the steps I took to serve gzipped content from Amazon S3.

Why gzip?

Let's start with the why: "Why would you want to serve a gzipped copy of your website to users?"

Gzip is very effective in compressing HTML and CSS files since most text is repeated multiple times. Codes like <div> and <p> are used very frequently throughout HTML pages and a lot of whitespace can be compressed as well.

The first question I asked myself was: "How much smaller is gzip going to make my webpages?". I took an uncompressed version of my Hackintosh blog post and compared it to the compressed version. The uncompressed page was 10217 bytes long while the compressed version was just 4538 bytes long. That's a 44% reduction for a pretty small page!

One way road

Serving gzipped content from S3 is a one way road. It's all or nothing. Once enabled, all your visitors have to support gzip compression.

Regular web servers can be configured to only serve gzipped content to clients who support it. On S3 you don't have this option.

Luckily though, gzip support is included in all recent browsers. Dropping support for very old browsers (or bad proxies) isn't a big deal for me. All of my visitors are running pretty recent versions of browsers anyway and web crawlers like Googlebot can also handle gzip.

Deploying

When you want to deploy a gzipped website to S3 you have to compress every file before uploading it! I use a bash script to deploy my website to S3, so it would be nice to make everything automated. So let's modify my original script with these new requirements:

Beside enabling gzip I also had a "nice to have" feature:

Implementation

Here is the new script that I currently use to automatically deploy my website to S3:

##
# Configuration options
##
STAGING_BUCKET='s3://staging.savjee.be/'
LIVE_BUCKET='s3://www.savjee.be/'
SITE_DIR='_site/'

##
# Usage
##
usage() {
cat << _EOF_
Usage: ${0} [staging | live]
    
    staging		Deploy to the staging bucket
    live		Deploy to the live (www) bucket
_EOF_
}
 
##
# Color stuff
##
NORMAL=$(tput sgr0)
RED=$(tput setaf 1)
GREEN=$(tput setaf 2; tput bold)
YELLOW=$(tput setaf 3)

function red() {
    echo "$RED$*$NORMAL"
}

function green() {
    echo "$GREEN$*$NORMAL"
}

function yellow() {
    echo "$YELLOW$*$NORMAL"
}

##
# Actual script
##

# Expecting at least 1 parameter
if [[ "$#" -ne "1" ]]; then
    echo "Expected 1 argument, got $#" >&2
    usage
    exit 2
fi

if [[ "$1" = "live" ]]; then
	BUCKET=$LIVE_BUCKET
	green 'Deploying to live bucket'
else
	BUCKET=$STAGING_BUCKET
	green 'Deploying to staging bucket'
fi


red '--> Running Jekyll'
Jekyll build


red '--> Gzipping all html, css and js files'
find $SITE_DIR \( -iname '*.html' -o -iname '*.css' -o -iname '*.js' \) -exec gzip -9 -n {} \; -exec mv {}.gz {} \;


yellow '--> Uploading css files'
s3cmd sync --exclude '*.*' --include '*.css' --add-header='Content-Type: text/css' --add-header='Cache-Control: max-age=604800' --add-header='Content-Encoding: gzip' $SITE_DIR $BUCKET


yellow '--> Uploading js files'
s3cmd sync --exclude '*.*' --include '*.js' --add-header='Content-Type: application/javascript' --add-header='Cache-Control: max-age=604800' --add-header='Content-Encoding: gzip' $SITE_DIR $BUCKET

# Sync media files first (Cache: expire in 10weeks)
yellow '--> Uploading images (jpg, png, ico)'
s3cmd sync --exclude '*.*' --include '*.png' --include '*.jpg' --include '*.ico' --add-header='Expires: Sat, 20 Nov 2020 18:46:39 GMT' --add-header='Cache-Control: max-age=6048000' $SITE_DIR $BUCKET


# Sync html files (Cache: 2 hours)
yellow '--> Uploading html files'
s3cmd sync --exclude '*.*' --include '*.html' --add-header='Content-Type: text/html' --add-header='Cache-Control: max-age=7200, must-revalidate' --add-header='Content-Encoding: gzip' $SITE_DIR $BUCKET


# Sync everything else
yellow '--> Syncing everything else'
s3cmd sync --delete-removed $SITE_DIR $BUCKET

(The full script is available on GitHub)

With one command I can deploy my website to my staging environment (sh _deploy.sh staging) or to my production environment (sh _deploy.sh live). The script is pretty simple and I'm sure there's room for improvement.

Let's take a more detailed look at the script:

Configuration section

The configuration section is used to define a few variables that are used later on:

Color section

This section provides the rest of the script with 3 functions to echo text in color. It's something I found in a blog post of Fizer Khan on shell scripts.

It's not a real requirement but it makes the output look good:

Gzipping

Because we want to gzip content, we have to find each HTML, CSS and Javascript document in the SITE_DIR directory and compress it with gzip:

File size comparison

Does gzip really matter? Does it really compress webpages with a large factor? Well, yes it does:

Notice that the difference in size for small webpages isn't that great. But once pages become bigger, the advantage of gzip becomes clear.

Posted on

You May Also Enjoy

Subscribe to my newsletter

Monthly newsletter with cool stuff I found on the internet (related to science, technology, biology, and other nerdy things)! Check out past editions.