How to analyze your CloudFront access logs

When I enabled logging for my client's CloudFront distribution I thought it would be a piece of cake to find a tool that would download, parse and summarize the request logs in some nice report. I know that wasn't the case three years ago, when I wrote this blog post, about analyzing the S3 access logs. However, I encountered some issues with the tools I found.

For example, when I tried out s3stat I encountered the following error:

$ s3stat.py -c <ACCESS_KEY> <SECRET_KEY> <LOGS_BUCKET> <LOG_FILE_PREFIX>
DEBUG:__main__:Downloading of logs completed  
DEBUG:__main__:Creating report  
Unknown option `-p'.  

Most likely I did something wrong, but at any rate I didn't get it to work.

After a few attempts I decided to go for the same tool as I used three years ago, when analyzing S3 logs - request-log-analyzer.

I will describe all necessary steps that I took to get a nice HTML report of my CloudFront requests.

Enable logging for the CloudFront distribution

The first step is really simple. Just edit the CloudFront distribution and locate the log settings:

CloudFront log settings

You will need to enable logging, specify a log bucket and optionally specify a prefix and enable logging of cookies. I really recommend to use a dedicated bucket for this purpose or to use a unique prefix for your logs.

It will take a few hours before the edge nodes will start to upload the access logs to S3. You can find more information about the CloudFront logging feature here.

Download the logs

There are several ways of doing this and you most likely already have your favorite tool for accessing your S3 files. I used aws-cli like this:

mkdir prefix/  
aws s3 sync s3://my-log-bucket/prefix/ prefix/  
Create the report

As I mentioned before, I used request-log-analyzer to analyze the logs. This tool is written in and requires Ruby. If you don't already have Ruby installed, I would recommend using RVM and install it by command the following commands:

gpg --keyserver hkp://keys.gnupg.net --recv-keys 409B6B1796C275462A1703113804BB82D39DC0E3  
\curl -sSL https://get.rvm.io | bash -s stable --ruby

When you have Ruby, you can install request-log-analyzer with gem install request-log-analyzer. Depending on your Ruby installation you might need root access (add sudo).

The tricky part for me was that request-log-analyzer does not have support for the CloudFront access log format. I had to follow the instructions on creating your own file format definition. I've created a GitHub gist of the CloudFront format definition. To start analyzing your logs, first download the gist as cloud_front.rb (right-click the link and choose "Save link as...") and then run:

request-log-analyzer --output html --file report.html --format cloud_front.rb prefix/  

When the tool is finished you can check out report.html with your favorite browser. If you want to do some adjustments, much of cloud_front.rb is quite self-explanatory, and much more can be found in the docs.

I hope you enjoy your CloudFront statistics!