Pre-compressing files with Hakyll
Usually, HTTP responses are compressed in real time, on the web server. That incurs some overhead for each request, but is the only way to go when dealing with dynamic content.
We in the static generation camp have some choice, though. Both Nginx and Apache can be configured to serve pre-compressed files. Apart from eliminating the overhead, this approach lets us use levels of compression that aren’t feasible in real-time setting, but are perfectly reasonable offline.
Faster and smaller responses. What’s not to like?
Don’t obsess over compression levels, though. For files smaller than 100Kb, ten percent extra of compression translates into mere hundreds of bytes saved. Not too impressive. (But do run your own benchmarks.)
Overall, this isn’t the top optimization there is; yet, it’s quite cheap and accessible, so why not implement it?
Let’s get our hands dirty, then. How shall we go about coding this? Well, we might start with recognizing that we want each input file to produce two outputs—one gzipped, the other not. That, in turn, immediately leads us to Hakyll’s tutorial on versioning. So if you have code along the lines of:
"posts/*" $ do
match $ setExtension "html"
route $ pandocCompiler
compile >>= loadAndApplyTemplate "templates/default.html" postCtx
…you should add a new block that will look like this:
"posts/*" $ version "gzipped" $ do
match $ setExtension "html.gz"
route $ pandocCompiler
compile >>= loadAndApplyTemplate "templates/default.html" postCtx
>>= gzip -- to be defined later
That will definitely work, but you probably aren’t happy about code duplication and the fact that Hakyll will now do the same work twice. Neither am I, so let’s press on!
Usually, duplication is eliminated with snapshots, but if we
add saveSnapshot
at the end of the first block and use loadSnapshotBody
in
the second, Hakyll will give us a runtime error due to dependency cycle: gzipped
version of the item will depend on itself. Bummer!
The thing is, versions are part of identifiers. That’s only logical: to
distinguish X from another X, you label one with “gzipped”, and now it’s easy to
tell them apart—one is just “X”, another is “X (gzipped)”. In Hakyll, that means
that when you’re running, say, loadSnapshotBody
from inside a block wrapped in
version "gzipped"
, you’ll be requesting a snapshot of identifier that’s
labeled “gzipped”. That’s what causes a dependency cycle.
Luckily for us, Hakyll exports functions for manipulating identifier’s version. So our second code block will now look as follows:
"posts/*" $ version "gzipped" $ do
match $ setExtension "html.gz"
route $ do
compile id <- getUnderlying
<- loadBody (setVersion Nothing id)
body
makeItem body>>= gzip
As you can see, we’re obtaining the current identifier (which is versioned
because of version "gzipped"
) and modifying it so that it references the
unversioned item. Note that we must use makeItem
there—had we tried to gzip
an item returned by load
, we’d get a runtime error, because identifier of the
item we’d be returning won’t have the appropriate version.
One caveat with the code above is that loadBody
won’t work for files compiled
with copyFileCompiler
(because the latter doesn’t really copy the contents of
the file into the cache, from which loadBody
reads). For such files, we’ll
have to use another approach:
"images/*.svg" $ version "gzipped" $ do
match $ setExtension "svg.gz"
route $ getResourceBody
compile >>= gzip
This code circumvents the problem by reading the file straight from the disk.
With versions sorted out, it’s time to turn our attention to implementing
gzip
. Luckily, this part is much simpler: Hakyll already provides a means for
running external programs. All we have to do is convert item’s body from
String
to lazy ByteString
(on an assumption that it’s UTF-8); the reason
being that the binary returned by compressor is not a textual string and might
not be representable with String
:
gzip :: Item String -> Compiler (Item LBS.ByteString)
= withItemBody
gzip "gzip" ["--best"]
(unixFilterLBS . LBS.fromStrict
. TE.encodeUtf8
. T.pack)
And that’s it. You can now go add that code into your site’s config and experience the major drawback of this solution, namely the fact that it requires separate rules for different filename extensions. If you have a Markdown file compiled into HTML and a bunch of SVG files that are just copied over, you’ll have to write two rules. If you find a way to scrap that boilerplate, please let me know; my email is at the end of this post.
Whoa, you got through that meandering mess of an article! That makes two of us. As a reward, I’m going to tell you how to use Zopfli to gzip your files. It’s the best DEFLATE compressor out there, and using it goes against my earlier advice of not obsessing over the byte count, but whatever; it’s fun.
So, Zopfli. The trouble with that compressor is that it’s not a Unix filter—it
doesn’t accept data on standard input. In order to use it, we have to write
item’s body into a temporary file, compress that, then read the result back.
Fortunately, Zopfli supports writing the result into stdout
; that allows us to
make do with safer of the functions provided by Hakyll. (If it wasn’t the case,
we’d have to resort to unsafeCompiler
). So here’s the code:
= do
gzip item TmpFile tmpFile) <- newTmpFile "gzip.XXXXXXXX"
(
withItemBody"tee" [tmpFile])
(unixFilter
item<- unixFilterLBS
body "zopfli"
"-c" -- write result to stdout
[
, tmpFile]-- no need to feed anything on stdin
(LBS.empty)
makeItem body
Simple, right? If you’re using anything less than --i100
, though, consider
7-zip—at its best (-mx9
) it’s very close to default Zopfli, but 7z
is wa-a-ay faster, and can behave as a filter (use -si -so
).
P.S. Right before publishing this post, I was searching for some other Hakyll-related stuff and stumbled upon a three-years-old conversation on the mailing list that covers everything but the Zopfli bit. Search engines will kill blogging.
Your thoughts are welcome by email
(here’s why my blog doesn’t have a comments form)