I E

Migrating from Jekyll to Hakyll

4 May, 2023

I’ve just finished migrating this site from Jekyll to Hakyll. The only noticeable changes are: slightly prettier page URLs, MathML support, and tweaks to syntax highlighting to compensate for using Pandoc. I paid special attention to preserving the Atom feed identifiers so that feed readers aren’t impacted.

You can find the source code at https://github.com/LightAndLight/lightandlight.github.io/.

Background

I’ve been using Jekyll to generate my blog because that’s what GitHub Pages recommended when I first set things up. Recently I’ve been working on a math-heavy post, and I decided that I wanted MathML support for this site. I started exploring possible solutions, and found texmath, which I then learned is used in Pandoc to convert TeX to MathML. I know Hakyll has good Pandoc support, and Haskell is one of my main languages, so I decided to make the switch1.

Changes

Prettier URLs

Change: removed trailing slashes from many blog page URLs (e.g. from https://blog.ielliott.io/test-post/ to https://blog.ielliott.io/test-post).

Static site generators create HTML files, which typically have file paths ending in .html. Web servers often map URL paths to filesystem paths to serve files, leading to many URLs ending in .html. I don’t like this; .html offers me no useful information. And if it does coincide with the resource’s file type, that is subject to change.

My Jekyll-based site had “extension-less URLs” (which I call “pretty URLs”), but I consistently used a trailing slash at the end of every URL (e.g. https://blog.ielliot.io/test-post/). These days I prefer to use a trailing slash to signify a “directory-like resource”, under which other resources are “nested”. This aligns with the convention where web servers serve /index.html when / is requested. My blog posts don’t need an index because they’re self contained, so their URLs shouldn’t have a trailing slash.

GitHub Pages supports extensionless HTML pages, serving file x.html when x is requested, so I removed the trailing slash from each page’s canonical path and make Hakyll generate a file ending in .html at that path (site.hs#L127, site.hs#L322-L333).

By default, Hakyll’s watch command doesn’t support pretty URLs. For a little while, I manually added .html to the URL in the address bar whenever I clicked a link in my site’s preview. I got sick of this and changed the configuration to resemble GitHub Pages’ pretty URL resolution rules (site.hs#L30-L55). I realised how fortunate it was that I could make this change; the relevant Hakyll changes were only released a week ago!

MathML support

Changes:

I’m becoming more aware of sites that use unnecessary JavaScript. I realised that my blog’s use of MathJax for client-side equation rendering was an example of this. The equations on my blog are static; all the information required to render them is present when I generate the site, so I should be able to compile the equations once and serve them to readers. Client-side MathJax is better suited for fast feedback on dynamic equations, like when someone types an equation into a text box.

I played with compiling LaTeX equations to SVGs (latex2svg.py, latex2svg.m4), but realised it would be hard to make that accessible. I then came across MathML and realised that it was the right solution.

MathML still isn’t ubiquitous, so I added a polyfill script based on https://github.com/fred-wang/mathml-warning.js. If you view a math post in a web browser with limited MathML support, you’ll be prompted to improve the experience by loading external resources:

Screenshot of a warning message for browsers with poor
MathML support. The warning says, "This page uses MathML, which your browser doesn't fully
support." Below that, there are 3 options: "Load MathJax" (currently selected), "Apply
mathml.css", and "Do nothing". Under those, there is a checkbox (unchecked) labelled
"Remember choice for 30 days". At the bottom, there are two buttons: "Ignore"
and "Save".

Syntax highlighting

Change: slightly different syntax highlighting.

Pandoc does syntax highlighting differently to Jekyll, and I prefer Jekyll’s output. I’ll explain why in another post. The consequence is that I had to rewrite my syntax highlighting stylesheet, and code might look a little different due to the way Pandoc marks it up.

Thoughts

I had to reimplement a few things that Jekyll did for me, like the sitemap (site.hs#L196-206) and previous/next post links (site.hs#L240-L260). I didn’t have to create the Atom feed from scratch, though: Hakyll has a module for that. The whole process was pretty involved (a few days of work) and I think I only had the appetite for it because I’m currently not working.

This is the most time I’ve spent working on a Hakyll site, and I think I’ve crossed an “inflection point” in my understanding of the library. I can now build features from scratch instead of searching for recipes on the internet. Normally I would approach a static site generator with some impatience, wanting to “get things done” so that I can return to what I find interesting. This time around, I decided to do a deep dive and I gained a lot of experience.

I’m glad I made the switch. While Pandoc has a few annoying issues, I’m not discouraged from fixing them like I would be if I found a problem with Jekyll. Being proficient with Haskell, fixing these issues would just be a variation on normal software development for me.