In situations like these it seems easy to mistake the excitement of novelty for actual interest in an artistic process. I find it kind of scary to imagine accidentally convincing myself that creativity and motivation follows from having bought something new. Here’s an archetypal story of a situation I’m trying to avoid, based on my experiences with this subject across a few hobbies:
Painting is a large part of my identity. I think of myself as a painter, and the idea that I create paintings is almost existentially important to me. If I’m a painter and I don’t paint, then what even am I? There’s one problem: I haven’t painted anything in months. These days I never have any good ideas, and I’m rarely inspired or motivated to pick up a brush. One day I stumble across a video of someone painting with gouache. I feel something when I see that painting (inspiration, maybe?) and I’ve never used gouache before, so I head out to the art store to buy some. I arrive home excited to try these new paints. In the hours that follow I have fun fooling around, and the part of me that sees myself as a painter is satisfied: I’m doing the thing that I’m supposed to be doing. As days and weeks pass the excitement fades, and the pressures of everyday life eclipse my painting practise. One day I realise that I haven’t painted anything for a while. This realisation is as uncomfortable as ever: I am a painter, so if I haven’t been painting then what have I been doing with myself? I jump on Instagram to look for some inspiration, and the cycle repeats.
There’s a lot going on there: self-concept, motivation, shame, creativity, and conditioning all interacting. Right now I’m most interested in the way “getting new stuff” relates to motivation and creativity.
This topic came to mind recently after reading “When the Cymbals Come In” by Thorsten Ball where I was introduced to the term “Gear Acquisition Syndrome” (GAS). The blurb of “Gear Acquisition Syndrome: Consumption of Instruments and Technology in Popular Music” by Jan-Peter Herbst & Jonas Menze defines it very nicely:
Gear Acquisition Syndrome, also known as GAS, is commonly understood as the musicians’ unrelenting urge to buy and own instruments and equipment as an anticipated catalyst of creative energy and bringer of happiness.
Parts of GAS describe the relationship toward my tools that I’ve been trying to avoid. It’s relieving to find a concise, searchable term that gets me into the same “informational neighbourhood”. I found two interesting articles while exploring the topic through Google:
This Guitar World article suggests that this essay from 1996 popularised the terms “Gear Acquisition Syndrome” and “GAS”.
“The Science of Gear Acquisition Syndrome” by Joshua Sariñana
A neuroscientific, psychological, and slightly philosophy take on GAS from a photography perspective. I most enjoyed their take on the relationship between creativity and anxiety. Stress, fear, and shame around creative projects lead to avoidance^{1}, and “gear acquisition” temporarily masks these emotions with a burst of excitement.
I have some heuristics to avoid GAS. I don’t always follow them (but I hope that writing them down will increase my accountability to the idea), and being heuristics they necessarily don’t work for every situation. I’ll present them here as advice to my future selves.
Demonstrate a commitment to the activity before you think about buying anything to support it. I place a lot of weight on enjoying the process, and when trying something new it’s easy to confuse the excitement of novelty with enjoyment of the activity. New toys compounds this. In the beginning it’s more important to make a habit out of the activity, or discover that you don’t actually like it.
Example: You’ve just started rock climbing with friends. Use the rental shoes at the climbing gym for a few months before buying your own.
One measure of commitment to an activity is whether you’re wearing out my gear and using up your materials. It’s important that this stays a measure, rather than a target, in the sense of Goodhart’s Law: “When a measure becomes a target, it ceases to be a good measure.”
Example: You use your cheap stand mixer at least twice a week and it’s starting to fall apart. It’s time to think about getting a higher-quality model.
If you physically need gear to do the thing, start with cheap gear and keep research to the minimum. As a beginner you can’t percieve most of the differences between similar tools. Perceptual ability and taste only develop as your skills improve. It’s easy to get caught in “analysis paralysis”, comparing gear that would be indistinguishable to you in practise.
Example: You want to learn guitar. Use the crappy hand-me-down that your friend is giving away instead of buying your favourite player’s signature model.
As you become committed to an activity and your skills improve, you will start to notice real flaws and limitations in your current tools. These are no reason to stop doing the thing, and you can usually continue to improve without addressing them. At this point the experience you’ve gained will suggest specific properties you need from a different tool to overcome the limitations of your current one. Crappy gear or not, you are going to do the thing anyway. So when you finally get the better tool, you mostly feel relief from the benefits of using it, rather than excitement or “inspiration” to use it.
Example: Having played tennis weekly for the past 6 months with clunky hire racquets, you appreciate how light and responsive your new one feels while playing.
When I told my girlfriend about “Gear Acquisition Syndrome”, she taught me a Chinese phrase that has a similar meaning: “差生文具多”, which literally means “the poor student has lots of stationery”. It’s inspired by a hypothetical student who is not studying enough, so to get more motivation they go shopping for stationery (instead of studying). A student who does well, on the other hand, can study at any time with the simplest of materials. The phrase is an internet meme in China and we watched a few funny videos about gym-goers with too much fitness gear, home cooks with too many pots, and so on.
]]>I’ve just finished migrating this site from Jekyll to Hakyll. The only noticeable changes are: slightly prettier page URLs, MathML support, and tweaks to syntax highlighting to compensate for using Pandoc. I paid special attention to preserving the Atom feed identifiers so that feed readers aren’t impacted.
You can find the source code at https://github.com/LightAndLight/lightandlight.github.io/.
I’ve been using Jekyll to generate my blog because that’s what GitHub
Pages recommended when I first set things up. Recently I’ve been working
on a math-heavy post, and I decided that I wanted MathML
support for this site. I started exploring possible solutions, and found texmath
,
which I then learned is used in Pandoc to convert TeX to
MathML. I know Hakyll has good Pandoc support, and
Haskell is one of my main languages, so I decided to make the switch^{1}.
Change: removed trailing slashes from many blog page URLs (e.g.
from https://blog.ielliott.io/test-post/
to https://blog.ielliott.io/test-post
).
Static site generators create HTML files, which typically have file paths ending in .html
. Web servers
often map URL paths to filesystem paths to serve files, leading to many URLs ending in
.html
. I don’t like this; .html
offers me no useful
information. And if it does coincide with the resource’s file
type, that is subject to change.
My Jekyll-based site had “extension-less URLs” (which I call “pretty URLs”), but I
consistently used a trailing slash at the end of every URL (e.g.
https://blog.ielliot.io/test-post/
). These days I prefer to use a trailing
slash to signify a “directory-like resource”, under which other resources are “nested”. This aligns with the convention where web servers serve /index.html
when /
is
requested. My blog posts don’t need an index because they’re self contained, so their URLs shouldn’t
have a trailing slash.
GitHub Pages supports extensionless HTML pages,
serving file x.html
when x
is requested, so I removed the trailing slash from each page’s
canonical path and make Hakyll generate a file ending in .html
at that path
(site.hs#L127
,
site.hs#L322-L333
).
By default, Hakyll’s watch
command doesn’t support pretty URLs. For a little while, I manually
added .html
to the URL in the address bar whenever I clicked a link in my site’s preview. I got sick of
this and changed the configuration to resemble GitHub Pages’ pretty URL resolution rules
(site.hs#L30-L55
).
I realised how fortunate it was that I could make this change; the relevant Hakyll changes were only
released a week ago!
Changes:
I’m becoming more aware of sites that use unnecessary JavaScript. I realised that my blog’s use of MathJax for client-side equation rendering was an example of this. The equations on my blog are static; all the information required to render them is present when I generate the site, so I should be able to compile the equations once and serve them to readers. Client-side MathJax is better suited for fast feedback on dynamic equations, like when someone types an equation into a text box.
I played with compiling LaTeX equations to SVGs
(latex2svg.py
,
latex2svg.m4
),
but realised it would be hard to make that
accessible. I then
came across MathML and realised that it was the right solution.
MathML still isn’t ubiquitous, so I added a polyfill script based on https://github.com/fred-wang/mathml-warning.js. If you view a math post in a web browser with limited MathML support, you’ll be prompted to improve the experience by loading external resources:
Change: slightly different syntax highlighting.
Pandoc does syntax highlighting differently to Jekyll, and I prefer Jekyll’s output. I’ll explain why in another post. The consequence is that I had to rewrite my syntax highlighting stylesheet, and code might look a little different due to the way Pandoc marks it up.
I had to reimplement a few things that Jekyll did for
me, like the sitemap
(site.hs#L196-206
)
and previous/next post links
(site.hs#L240-L260
).
I didn’t have to create the Atom feed from scratch, though: Hakyll has a module for that. The whole process was pretty involved (a few days of work) and I
think I only had the appetite for it because I’m currently not working.
This is the most time I’ve spent working on a Hakyll site, and I think I’ve crossed an “inflection point” in my understanding of the library. I can now build features from scratch instead of searching for recipes on the internet. Normally I would approach a static site generator with some impatience, wanting to “get things done” so that I can return to what I find interesting. This time around, I decided to do a deep dive and I gained a lot of experience.
I’m glad I made the switch. While Pandoc has a few annoying issues, I’m not discouraged from fixing them like I would be if I found a problem with Jekyll. Being proficient with Haskell, fixing these issues would just be a variation on normal software development for me.
]]>Recommendation algorithms on social media don’t optimise for general human flourishing. They maximise metrics like engagement, views, clicks, or comments. To social media, all engagement is good, regardless of its impact on your life.
Before centralised social media, we had a very organic recommendation system. You read the blogs of people who you thought were interesting, and they linked to sites that they thought were interesting, and you would follow some of those links and find new content for yourself. In this organic system, we are the recommenders. We can “optimise” our recommendations for things that are highly personal and very difficult to measure, like curiosity, wonder, awe, learning, and insight.
I’d like to contribute to a more distributed, personalised, and organic world wide web, so I’ve created a resources page. I’ll continue to add various web resources that I find interesting, in the hope that others might use it to find something new.
]]>ipso
Ongoing
https://github.com/LightAndLight/ipso
ipso
is a scripting language that I started working on a bit over 2 years ago. My goal for this
project is to have a scripting language that I actually enjoy using. So far I haven’t found a
language that I find satisfactory for small administrative programs; Bash and Python have no types,
and Haskell is a bit slow and old for interpreted use, for example. ipso
is my attempt at an answer.
This year I set up a website (https://ipso.dev) and published my first few releases on GitHub.
Some of this year’s features and milestones that I’m proud of:
Debug
instances for extensible records and
variants (reference
docs)ipso
in a CI scriptThe language itself is pretty stable now, so now my focus will be on writing standard library functions.
ray-tracing-in-one-weekend
January
https://github.com/LightAndLight/ray-tracing-in-one-weekend
An implementation of Peter Shirley’s Ray Tracing in One Weekend with some extra features. It was super fun. It’s incredibly satisfying to go from a bunch of math to beautiful images.
The most striking thing I learned was Monte Carlo integration. It’s a way to compute integrals using random numbers. Ray tracing uses it to approximate the colour of a point on a surface. Every point on a surface has a specific, well-defined colour, and that colour can be the result of contributions from an extremely large number incident rays. The point’s colour can be expressed as an integral, and we use Monte Carlo integration to compute the integral with a varying level of accuracy. For a preview render, we can use few samples, and quickly produce a noisy image. For a full render we can use many samples, which will take longer, but will give a very accurate result.
sylva
January
https://github.com/LightAndLight/sylva
“Sylva” means “forest” in Latin (according to Google Translate). I was playing with some ideas about wikis / “document-based knowledge graphs”.
There were tree things I wanted to combine:
This was just a sketch and I don’t plan to do anything with it.
editor-vue
March
https://github.com/LightAndLight/editor-vue
A while ago I built a toy structural code editor using
Haskell (GHCJS), and the reflex
FRP library. I wasn’t happy with the performance. I heard about
vue.js and was curious what it would be like to use it instead of reflex
. I
rebuilt some of the code editor using vue.js
with TypeScript, enough to get a sense of the coding
style and performance of the app. I was impressed by the performance improvements, and found
TypeScript tolerable (and much, much better than plain JavaScript).
nix-docs
March / April
https://github.com/LightAndLight/nix-docs
nix-docs
is an ongoing experiment with reference documentation for some important Nix functions. Most Nix documentation is prose paragraphs, which is pretty bad for reference docs. Reference docs
need to be skimmable, terse, and interlinked. Here’s the nix-docs
page for
mkDerivation
: https://blog.ielliott.io/nix-docs/mkDerivation.html.
This year I updated the styling to match the new NixOS design and wrote a documentation generator for the content (my first iteration was hand-edited HTML that I copied from the Nixpkgs manual).
ccc
May
https://github.com/LightAndLight/ccc
ccc
stands for cartesian closed
category. I was inspired by this podcast with Conal
Elliott,
and revisited his compiling to categories and
calculating compilers categorically
papers. One important insight from “calculating compilers categorically” is that translating lambda
expressions into CCC syntax sort of “sequentialises” them. The composition operation in a category
implies an order of operations: g ∘ f
is often read as g
after f
. It seems to me that CCC
syntax is closer to our word-at-a-time-style imperative CPUs.
This leads to the first idea I explored in ccc
was: using CCC syntax as an intermediate
representation for lambda calculus. This worked out really well; I learned that the lambda to CCC
translation also performs closure conversion, which is another reason that CCC syntax is easier to
compile to imperative code.
The second idea builds on the first. Once we have a program in CCC syntax, a compiler can be defined
as a functor from CCC syntax to another cartesian closed category. I think Conal mentioned this in
the podcast episode. I wrote a messy SSA
compiler
as a functor from CCC syntax arrows to “SSA builder arrows” (Haskell functions of type
SSA -> SSA
). It was pretty straightforward because CCC syntax is sequential and closure-converted.
The last idea was to apply these techniques to
substructural lambda calculi (i.e. affine and
linear lambda calculus). Linear lambda calculus has
its own categorical syntax (closed symmetric monoidal
category - call it CSMC for short),
so I wrote a
program
that translates lambda calculus to CSMC syntax, and rejects lambda calculus terms that have
non-linear variable usages. I then used the same program structure to translate lambda terms to
semicartesian monoidal category
syntax, which is just CSMC syntax with a terminal
object.
That translation
allows unused variables while rejecting variable duplication, which makes it affine. The final
translation
adds a dup : a -> a ⊗ a
arrow to the semicartesian monoidal category, which gets us back to a
cartesian closed category (but with a slightly different syntax) and unrestricted lambda calculus.
This journey lead to a style for checking lambda calculus that works for linear, affine, and unrestricted lambda calculus. I think would be interesting to create a type checker that checks in this style. My intuition says such a type checker might be easier to parallelise.
I also noticed that the CCC
syntax
I settled on is explicit about parallel computations. While composition (f ∘ g
) can be thought of as f
after
g
, the tensor operator (f ⊗ g
) can be thought of as f
and g
in parallel. There’s a sense in
which this CCC syntax “reveals” parallelism that’s inherient in the lambda calculus. I’m curious
what it would be like to write a multi-core parallel evaluator based on this.
march
June
https://github.com/LightAndLight/march
I wanted to check for broken local links markdown documents, and create
a “move” command that works like mv
but also renames links. I finished the former but not the latter.
bidirectional-typechecking-with-unification
June
https://github.com/LightAndLight/bidirectional-typechecking-with-unification
This work was inspired by an article about the limitations of unification-based type checking. It seemed to claim that Hindley-Milner / unification-based type checking is very limited, and presented a dichotomy between bidirectional typing and unification that I don’t agree with.
I wrote a Hindley-Milner-based type checker for a language with subtyping by applying bidirectional principles. It has higher-rank polymorphism, existential types, optional record fields, and default record fields, which are all powered by the same subtyping mechanism. Unification and instantiation are also performed by the subtyping mechanism.
The key insight is to allow the subtyping check to transform terms. A type A
is a subtype of B
when values of type A
can be used where values of type B
are expected. This is often written as
A :> B
, and in code as something like isSubtypeOf : Type -> Type -> Bool
. My type checker
returns evidence that the subtyping relation holds, which could be written as
(a : A) :> B ~> b
, and as a function: isSubtypeOf : (Expr, Type) -> Type -> Maybe Expr
. The
bidirectional style means ensures that “checking” types drives subtyping. This is
all perfectly compatible with unification-based inference.
This deserves a much clearer explanation in its own blog post. I think it’s a promising result for programming language design.
little
June / July
https://github.com/LightAndLight/little
little
is my first attempt at a Knuth-style literate
programming system. I want to write documents about code
that are also the source truth for the code. Systems like literate Haskell are unsatisfying
to me because I have to present the code to the reader in the same order that the code appears
in the source file. For example, all literate Haskell articles will begin with a preamble of imports
(example article). I want to present
code to the reader in a non-linear fashion, in a way that supports my explanation. I imagine that
I’d often put import declarations in an appendix, for instance.
little doc
generates a document that I can publish on
the web, and little code
generates the codebase that is described in the document. Another fun use
case is “self-documenting shell scripts”
(example). Rather than
commenting a bash script, you can write a literate document that describes a bash script, and give
the document a shebang line.
little
uses XML for its markup, so that I can use whatever “presentation” markup I want (Markdown,
LaTex, HTML, etc.). I was surprised by how “not terrible” it felt to use XML for this. I have a
strong bias against XML in general, and now that bias has gained some nuance. XML feels alright for
markup, that is, for extra information in documents that are mostly text which
people will consume by reading. That’s what it was designed for; it’s the eXtensible Markup
Language. What I now object to is the use of XML as a data format.
This article has a good heuristic for distinguishing the two uses: if
you remove all the tags from your XML document, will it still make sense to a reader? I’ve tried to
apply this heuristic to the syntax of little
.
The code is pretty crappy, so if I continued to work on this I’d rewrite it. I’m optimistic about what I created so far, though.
mininix
August
https://github.com/LightAndLight/mininix
mininix
is an attempt at understanding how Nix-style build systems work by writing a small one. It
includes a content-addressable store, a build database (using sqlite), a parallel build executor and a typed build language.
I also wanted to improve on the naming of concepts (i.e. use a better word than “derivation”), and to keep typeability in mind from the start (Nix is very very untyped. Would types affect the build system’s design?).
One idea I’d like to explore here is a sort of “local” version of Nix. Instead of having a global
store, have a per-project store for build artifacts similar to cabal
’s dist[-newstyle]
directories and
cargo
’s target
directory.
I’m also interested in whether we can have build systems that reuse existing package declarations.
For example, if you want to use Nix to package a Haskell project, you need to convert your
.cabal
file to a Nix expression (or do import from
derivation, which I fundamentally disagree with).
What if there was a way to use the .cabal
file without the grossness of import-from-derivation?
top-down-hindley-milner
September
https://github.com/LightAndLight/top-down-hindley-milner
This project shows a sort of “upside down” approach to Hindley-Milner type inference.
This work was inspired by some inaccurate type errors that ipso
generated, and this algorithm is
my solution.
Bidirectional type checking separates inference from checking, and this distinction is important in contrasting “normal” Hindley-Milner to the “top-down” approach. Roughly speaking, Hindley-Milner constructs types through inference in a bottom-up manner, and my algorithm refines types through checking from the top down.
In Hindley-Milner, all the work is done by inference and checking is the trivial case of inference followed by unification with an expected type. In the “top-down” style, checking does all the work, and inference is performed by checking against a fresh metavariable.
I want to combine this work with the subtyping work I mentioned earier.
hover-pill
October
https://github.com/LightAndLight/hover-pill
hover-pill
is a game I created to learn the Bevy game engine. You can
try an early build here. It’s a 3D puzzle/platformer where
you fly around as a capsule-shaped object (I’m not a 3D artist) trying to reach the green goal square.
I haven’t done any game development for years, so this project was very refreshing. Once I had all the mechanics working, I asked my girlfriend to test the levels I designed. Each time she completed a level, I created a slightly more difficult one. She enjoyed playing it, and I’m glad that in the end I created something fun.
Bevy uses wgpu
for graphics, which combined with Rusts awesome
cross-compilation support means it was pretty easy for me to develop on my desktop (with x86_64 and Vulkan), and
then compile a WASM and WebGL version for the web. It was a pleasant surprise, coming from Haskell
and GHCJS.
This was my first time using an entity-component-system framework, and I enjoyed it. Data-Oriented Design helped me understand the history behind the patterns. I think there are ideas here that apply outside of game development, but I don’t know what they are yet. On example (and I think it’s where I learned about the DoD book) is this explanation of a “data-oriented” performance improvement in the Zig compiler.
wgpu-mandelbrot
October
https://github.com/LightAndLight/wgpu-mandelbrot
After hover-pill I wanted to learn more about graphics APIs and GPU programming. I realised that computing the mandelbrot set was an “embarrassingly parallel” problem, so it would be a good fit for GPU programming.
The mandelbrot renderer runs in realtime. It has a satisfying “blooming” effect as the iteration count ticks up and more points are coloured. The mandelbrot calculations are performed in a compute shader, and the colours are assigned using a histogram algorithm on the CPU. I couldn’t figure out how to do histogram colouring on the GPU.
To make sense of the WebGPU API, I created this diagram which displays all relevant (Web)GPU resources and their relationships:
I have a much better sense of GPU programming fundamentals, and I think the careful design of WebGPU helped.
It’s higher level than Vulkan, but more explicit than OpenGL. I’ve done a Vulkan tutorial and forgot
almost all of it. Having learned the fundamentals wgpu
, I think the Vulkan API would make a lot more sense
to me now.
hedge
December
https://github.com/LightAndLight/hedge
hedge
is a library that makes it easier for me to write web information systems with Haskell. I’ve
been developing a sense of style and a set of patterns around writing Haskell web apps,
in particular
using servant and focusing on server-side rendered resources, and hedge
is
kind of my “kitchen sink” for things that support the style.
I might create a command-line program for setting up a new project, adding endpoints, and other forms of boilerplate I find.
I’m not sure if it will ever lead to something I could call a “framework”, like Rails. Right now I have the sense that it would be more like a pattern language with automated helpers.
]]>Dream.In.Code has been thoroughly archived on web.archive.org, so I scraped the archives for my old posts. I was quite surprised by what I found.
I had forgotten how much of a beginner I was when I started posting there.
My initial posts
were variations of “I wrote this code and it doesn’t work and I don’t know why. Help?”
I could barely ask a coherent question.
Later, my questions became more targeted, like “How do I update all the items in an array?”
It was sobering to be reminded of a time when I didn’t know what a for
loop was, and didn’t really
know how to figure it out for myself.
What left an even stronger impression was the quality of answers I recieved. Every question I asked recieved patient, respectful responses. No one complained about my writing style (I was 13 at the time, and re-reading these posts caused some eye-rolls). No one berated me when I left out helpful debugging information like log files or compiler errors. No one made me feel bad for asking questions. I think this was the best possible start I could have asked for. I’m not sure where someone would go in 2022 for the same experience. Probably not Reddit or StackOverflow.
When I was learning to code, I had no one to turn to “in real life”. Posting a code snippet to a forum and asking, “pls halp” was all I could do. The members of Dream.In.Code turned that into a positive, constructive experience.
Thanks, Dream.In.Code
]]>After much searching I was able to diagnose and fix the problem, which came down to graphics drivers and browser rendering settings. If your problem is similar to mine, here’s how you might be able to solve it:
Check GPU driver status in the browser.
In Firefox, navigate to about:support
. Ctrl+F Graphics
. Look at the WebGL 1 Driver {Renderer,Version}
rows. If you don’t see your GPU manufacturer and model in these rows, then you need to install
the correct drivers.
For reference, I’m running a Nvidia GeForce RTX2070. Without drivers, my driver renderer was VMware, Inc. -- llvmpipe (LLVM 9.0.1, 256 bits)
and my driver version was 3.1 Mesa 20.1.10
.
Install GPU drivers.
I’m on NixOS, so this was as simple as adding
nixpkgs.config.allowUnfree = true;
services.xserver.videoDrivers = [ "nvidia" ];
to my configuration.nix
, then running sudo nixos-rebuild switch && reboot
.
Confirm driver installation.
Repeat step 1. If you still can’t see your GPU manufacturer and model, then I can’t help you.
After step 2, my driver renderer was NVIDIA Corporation -- GeForce RTX 2070/PCIe/SSE2
and driver version was
4.6.0 NVIDIA 455.38
.
Enable WebRender.
In the Graphics section of about:support
, check the Compositing
row. If it says WebRender
, then you’re
done. If it says Basic
, then you need to enable WebRender.
Navigate to about:config
, move past the warning, and search for gfx.webrender.enabled
. Set it to true
and
restart Firefox. Confirm this change by checking Compositing
row in about:support
.
Before, each requestAnimationFrame call lasted 14-15ms in my Canvas implementation, but the app was running well below 60fps. Each frame lasted ~10ms in my WebGL implementation, but the framerate was even worse than the Canvas version!
After following these instructions, both my Canvas implementation and WebGL implementation run at 60fps. Canvas’ frame duration didn’t appear to change, but WebGL’s frame duration dropped to ~4ms. Yay!
]]>De Bruijn indexed terms are
functorial in their free variables.
This means that given a datatype Expr
, we can write a function
map_freevars : (Int -> Int) -> Expr -> Expr
such that map_freevars id == id
and map_freevars f ∘ map_freevars g == map_freevars (f ∘ g)
. In Haskell, I’d
implement this as follows:
data Expr
= Var Int
| App Expr Expr
| Lam Expr
map_freevars :: (Int -> Int) -> Expr -> Expr
=
map_freevars f e case e of
Var n -> Var (f n)
App a b -> App (map_freevars f a) (map_freevars f b)
Lam b -> Lam (map_freevars (\n -> if n == 0 then 0 else 1 + f (n - 1)) b)
Now, here’s a direct translation from Haskell to Rust:
enum Expr {
usize),
Var(Box<Expr>, Box<Expr>),
App(Box<Expr>),
Lam(}
fn map_freevars<F: Fn(usize) -> usize>(f: F, e: &Expr) -> Expr {
match e {
Expr::Var(n) => Expr::Var(f(*n)),
Expr::App(a, b) => Expr::App(Box::new(map_freevars(f, a)), Box::new(map_freevars(f, b))),
Expr::Lam(b) => Expr::Lam(Box::new(map_freevars(
|n| {
if n == 0 {
0
} else {
1 + f(n - 1)
}
},
b,
)))}
}
This doesn’t typecheck because the call to map_freevars(f, a)
takes ownership of f
,
which means f
can no longer be used in the call to map_freevars(f, b)
.
To avoid this, map_freevars
should borrow the mapping function:
fn map_freevars<F: Fn(usize) -> usize>(f: &F, e: &Expr) -> Expr {
match e {
Expr::Var(n) => Expr::Var(f(*n)),
Expr::App(a, b) => Expr::App(Box::new(map_freevars(f, a)), Box::new(map_freevars(f, b))),
Expr::Lam(b) => Expr::Lam(Box::new(map_freevars(
&|n| {
if n == 0 {
0
} else {
1 + f(n - 1)
}
},
b,
)))}
}
But this doesn’t compile either! The Rust compiler reports
that it reached the recursion limit while instantiating map_freevars::<[closure@...]>
.
Rust generates all its closures at compile time, and this code causes the compiler to
generate a countably infinite number of closures.
For every known closure that is passed to map_freevars
as f
, Rust generates another
closure for |n| if n == 0 { 0 } else { 1 + f(n - 1) } }
. But |n| if n == 0 { 0 } else { 1 + f(n - 1) } }
is also passed to map_freevars
, so another closure needs to be generated. And that closure is
also passed to map_freevars
, so another closure needs to be generated. And so on.
The next natural step is to use a trait object.
fn map_freevars(f: &dyn Fn(usize) -> usize, e: &Expr) -> Expr {
match e {
Expr::Var(n) => Expr::Var(f(*n)),
Expr::App(a, b) => Expr::App(Box::new(map_freevars(f, a)), Box::new(map_freevars(f, b))),
Expr::Lam(b) => Expr::Lam(Box::new(map_freevars(
&|n| {
if n == 0 {
0
} else {
1 + f(n - 1)
}
},
b,
)))}
}
A &dyn
reference is a pair of pointers; one pointer to a value of a type that implements the trait,
and another pointer to the implementation of the trait for that type^{1}.
This code is perfectly usable, and I’d guess it’s the ‘idiomatic’ Rust solution. But there’s one final step I’d like to take, mostly for educational perposes, and for a small efficiency gain.
For all intents and purposes, there are only two possible ‘origins’ for f
:
map_freevars
unchanged, either from a top-level call or from a recursive call at an
App
nodemap_freevars
at a Lam
nodeThis structure is described by the following datatype:
enum Origin<'a, F> {
,
Unchanged(F)&'a Origin<'a, F>)
LamNode(}
The Origin
datatype can be interpreted as a function from usize
to usize
:
impl <'a, F: Fn(usize) -> usize> Origin<'a, F> {
fn apply(&self, n: usize) -> usize {
match self {
Origin::Unchanged(f) => f(n),
Origin::LamNode(f) => if n == 0 { 0 } else { 1 + f.apply(n-1) }
}
}
}
Challenge: implement Origin::apply
using constant stack space.
Now the Origin::LamNode
constructor replaces the fresh closure in the Lam
branch:
fn map_freevars<'a, F: Fn(usize) -> usize>(f: &'a Origin<'a, F>, e: &Expr) -> Expr {
match e {
Expr::Var(n) => Expr::Var(f.apply(*n)),
Expr::App(a, b) => Expr::App(Box::new(map_freevars(f, a)), Box::new(map_freevars(f, b))),
Expr::Lam(b) => Expr::Lam(Box::new(map_freevars(&Origin::LamNode(f)))),
}
}
This transformation is an example of defunctionalisation.
Here, the practical benefit is that &Origin
is half the size of a &dyn Fn(usize) -> usize
(a single pointer
instead of two), so recursing over a Lam
node uses less stack space.
The interface to map_freevars
can then be cleaned up using the worker/wrapper pattern:
fn map_freevars<F: Fn(usize) -> usize>(f: F, e: &Expr) -> Expr {
fn go<'a, F: Fn(usize) -> usize>(f: &'a Origin<'a, F>, e: &Expr) -> Expr {
match e {
Expr::Var(n) => Expr::Var(f.apply(*n)),
Expr::App(a, b) => Expr::App(Box::new(go(f, a)), Box::new(go(f, b))),
Expr::Lam(b) => Expr::Lam(Box::new(go(&Origin::LamNode(f)))),
}
}
&Origin::Unchanged(f), e)
go(}
I haven’t benchmarked the defunctionalised approach and compared it against the trait object implementation. If anyone has suggestions for easily measuring the time and memory usage of Rust programs, preferably by function, then please let me know.
]]>#! /usr/bin/env bash
# remove containers
docker ps --all --format "{%raw%}{{.ID}}{%endraw%}" | xargs docker rm
# remove images
docker images --format "{%raw%}{{.ID}}{%endraw%}" | xargs docker rmi -f
# remove volumes
docker volume prune
# remove build cache
docker builder prune
Memory-sensitive languages like C++ and Rust use compile-time information to calculate sizes of datatypes. These sizes are used to inform alignment, allocation, and calling conventions in ways that improve runtime performance. Modern languages in this setting support generic types, but so far these languages only allow parameterisation over types, not type constructors. In this article I describe how to enable parameterisation over arbitrary type constructs, while still retaining compile-time calculation of datatype sizes.
The code for this project can be found here.
Many typed languages support some form of generic (parameterised) datatypes. This ability to abstract
over types is known as ‘parametric polymorphism’ (polymorphism for short). In Rust, for example, one
can define type of polymorphic pairs as struct Pair<A, B>(fst: A, snd: B)
. In this definition, A
and B
are type
variables (or type parameters), and can be substituted for other types:
Pair<bool, bool>
, Pair<bool, char>
, and Pair<String, int32>
are all valid pairs.
The name of a type, without any parameters, is known as a type constructor. Pair
is not a type on its own;
Pair<A, B>
(for some types A
and B
) is. The number of types required to ‘complete’ a type constructor is known
as its arity (so Pair
has arity 2). The arity of a type constructor must always be respected; it’s an error to
provide greater or fewer type parameters than are expected. For example, Pair<bool>
and
Pair<char, int32, String>
are invalid.
When using C++ or Rust, the compiler will calculate how many bytes of memory each datatype requires. Simple
types like int32
and bool
have a constant size; 4 bytes and 1 byte respectively. The size of datatypes
built using of other simple types is easy to calculate. The simplest way to calculate the size of a struct
is to sum the sizes of the fields, and the simplest way to calculate the size of an enum (or tagged union)
is to find the largest variant, and add 1 (for a tag byte). This is rarely the exact formula used by production
compilers, because they take alignment into account.
This article will assume the simple sizing formula, because the results can easily be adapted to more nuanced
formulae.
The size of a datatype like struct TwoInts(x: int32, y: int32)
is known immediately at its definition. TwoInts
requires 8 bytes of memory. On the other hand, the size of a generic datatype is not always known at its definition.
What is the size of struct Pair<A, B>(fst: A, snd: B)
? It’s the size of A
plus the size of B
, for some
unknown A
and B
.
This difficulty is usually addressed by only generating code for datatypes and functions when all the generic
types have been replaced with concrete types. This process is known as monomorphisation. If the program contains a
Pair(true, true)
, then the compiler will generate
a new type struct PairBoolBool(fst: bool, snd: bool)
whose size is statically known. If Pair(true, true)
is passed to a function fn swap<A, B>(p: Pair<A, B>) -> Pair<B, A>
, then the compiler generates a new
function fn swapBoolBool(p: PairBoolBool) -> PairBoolBool
. Because this new function only uses types with known
sizes, the code for memory allocation and calling conventions can be generated correctly.
There are also generic types that don’t depend on the size of their parameters. An example of
this is the pointer, commonly known in Rust as Box<A>
. A pointer has the same size (often 4 or 8 bytes depending
on your CPU) regardless of what it points to. But in order to allocate a new pointer, the size of the item must
be known.
For each generic datatype or function, the compiler keeps track of which type variables are important for sizing calculations. The specifics of this is discussed in the Type Classes section.
A consequence of all this is that in these languages, type variables can only stand for types. But there are good reasons to have type variables that stand for type constructors, too:
struct One<A>(A)
impl <A> One<A>{
<B, F: Fn(A) -> B>(self, f: F) -> One<B> { ... }
map}
struct Two<A>(A, A)
impl <A> Two<A>{
<B, F: Fn(A) -> B>(self, f: F) -> Two<B> { ... }
map}
struct Three<A>(A, A, A)
impl <A> Three<A>{
<B, F: Fn(A) -> B>(self, f: F) -> Three<B> { ... }
map}
Here are some 1-arity container types. The only difference between these datatypes is the number of elements
they contain. They all support a map
operation, which applies a function to all the datatype’s elements. Functions
that use map
need to be implemented once for each type, even when their implementations are identical:
fn incrOne(x: One<int32>) -> One<int32> { x.map(|n| n + 1) }
fn incrTwo(x: Two<int32>) -> Two<int32> { x.map(|n| n + 1) }
fn incrThree(x: Three<int32>) -> Three<int32> { x.map(|n| n + 1) }
To remedy this, there must first be a way to abstract over the type constructors, so that the code can be written once and for all:
fn incr<F>(x: F<int32>) -> F<int32> { x.map(|n| n + 1) } // when F<A> has map, for all types A
Then, there must be some way to rule out invalid types. For example, replacing F
with bool
in F<int32>
is invalid, because bool<int32>
is not a type. This is the job of kinds^{1}.
Kinds describe the ‘shape’ of types (and type constructors) in the same way that types describe the ‘shape’ of values. A type’s kind determines whether or not it takes any parameters. Here’s the syntax of kinds:
kind ::=
Type
kind -> kind
Types that take no arguments (like bool
, char
, and String
) have kind Type
. Types that take one argument,
like One
, have kind Type -> Type
. In the code for incr
above, F
implicitly has kind Type -> Type
. Types
that take more than one argument are represented in curried form. This
means that Two
has kind Type -> Type -> Type
, not (Type, Type) -> Type
. Three
has kind Type -> Type -> Type -> Type
,
and so on.
Curried type constructors are standard in this setting, but not necessary. The results in this article could also be applied to a setting with uncurried type constructors, at cost to expressiveness or implementation complexity.
Kinds put types and type constructors on equal footing. For the remainder of the article, both concepts will be
referred to as types. The kind becomes the distinguishing feature. For example, “type constructor of arity 2” would
be replaced by “type of kind Type -> Type -> Type
”.
Some final jargon: types with a kind other than Type
are known as ‘higher-kinded types’, and parameterising
over higher-kinded types is known as ‘higher-kinded polymorphism’.
Rust uses traits to coordinate sizing calculations. Each
datatype implicitly receives an implementation of the Sized
trait, and every type variable that is relevant for
a sizing calculation is given a Sized
bound. This means that trait resolution, an already useful feature, can
be re-used to perform size calculations.
Closely related to traits is the functional programming concept of type classes^{1}. There are differences between the two, but those differences don’t impact the results of this article. Type classes will prove a more convenient language in which to discuss these ideas.
A type class (or trait) can be considered a predicate on types. A type class constraint (or trait bound) is an assertion that the predicate must be true. For each constraint that is satisfied, there is corresponding ‘evidence’ that the predicate is true.
When a type T
has a Sized
constraint, it is being asserted that the statement “T
has a known size” is true. For
brevity, this will be written as Sized T
. When this statement satisfied (for instance, when T
is int32
), the
evidence is produced is the actual size of T
(when Sized int32
is satisfied, the evidence
is the number 4
- the size of int32
).
Generic types like Two<A>
have a size that depends on their type parameter. In terms of constraints, it can
be said that Sized A
implies Sized Two<A>
. If A
is int32
, then its size is 4
, which implies that
Two<int32>
has a size of 4 + 4 = 8
. Similarly, of Pair
it can be said that Sized A
implies [ Sized B
implies
Sized Pair<A, B>
]. There is a choice between a curried an uncurried version; it could also be said that
[ Sized A
and Sized B
] implies Sized Pair<A, B>
, but the curried version will be used for convenience.
Note that type constructors don’t have a size. In other words, only types of kind Type
have a size. A type constructor
such as Two
(of kind Type -> Type
) has a size function. Given the sizes of the type constructor’s parameters,
a size function computes the size of the resulting datatype. Two
’s size function is \a -> a + a
. Pair
’s size
function \a -> b -> a + b
(it could also be \(a, b) -> a + b
in an uncurried setting).
With the background out of the way, the specific problem can be stated:
When a type of kind Type
is relevant for a size calculation, it is given a Sized
constraint, which will be
satisfied with a concrete size as evidence. What is the equivalent notion of constraint and evidence for
higher-kinded types that contribute to size calculations?
An elegant solution to this problem can found by introducing quantified class constraints^{2}. Quantified constraints are an extension to type classes that add implication and quantification to the language of constraints, and corresponding notions of evidence.
Here’s new syntax of quantified size constraints:
constraint ::=
Sized type (size constraint)
constraint => constraint (implication constraint)
forall A. constraint (quantification constraint)
The evidence for a constraint c1 => c2
is a function that takes evidence for c1
and produces evidence for c2
, and the
evidence for forall A. c
is just the evidence for c
. The evidence for quantification constraints is a bit more nuanced
in general, but this description is accurate when only considering size constraints.
Concretely, this means that the sizing rules for higher-kinded types can now be expressed using constraints, and size
calculations involving higher-kinded types can be performed using type class resolution. It is now the
case that forall A. Sized A => Sized Two<A>
, and the evidence for this constraint is the function \a -> a + a
.
The relevant constraint for Pair
is forall A. forall B. Sized A => Sized B => Sized Pair<A, B>
with evidence function
\a b -> a + b
.
This extends to types of any kind. For all types, there is a mechanical way to derive an appropriate size constraint based
only on type’s kind;
T
of kind Type
leads to Sized T
, U
of kind Type -> Type
leads to forall A. Sized A => Sized U<A>
, and so on. In
datatypes and functions, any size-relevant type variables can be assigned a size constraint in this way, and the compiler
will use this extra information when monomorphising definitions.
sized-hkts is a minimal compiler that implements these ideas. It supports higher-kinded polymorphism, functions and algebraic datatypes, and compiles to C. Kinds and size constraints are inferred, requiring no annotations from the user.
Here’s some example code that illustrates the higher-kinded data pattern (source, generated C code):
enum ListF f a { Nil(), Cons(f a, ptr (ListF f a)) }
enum Maybe a { Nothing(), Just(a) }
struct Identity a = Identity(a)
fn validate<a>(xs: ListF Maybe a) -> Maybe (ListF Identity a) {
match xs {
Nil() => Just(Nil()),
Cons(mx, rest) => match mx {
Nothing() => Nothing(),
Just(x) => match validate(*rest) {
Nothing() => Nothing(),
Just(nextRest) => Just(Cons(Identity(x), new[nextRest]))
}
}
}
}
fn main() -> int32 {
let
a = Nil();
b = Cons(Nothing(), new[a]);
c = Cons(Just(1), new[b])
in
match validate(c) {
Nothing() => 11,
Just(xs) => match xs {
Nil() => 22,
Cons(x, rest) => x.0
}
}
}
This code defines a linked list whose elements are wrapped in a generic ‘container’ type. It defines two possible
container types: Maybe
, which is a possibly-empty container, and Identity
, the single-element container.
validate
takes a list whose elements are wrapped in Maybe
and tries to replace all the Just
s with Identity
s.
If any of the elements of the list are Nothing
, then the whole function returns Nothing
.
Points of interest in the generated code include:
ListF Maybe int32
, ListF Identity int32
, Maybe int32
, Identity int32
, and Maybe (ListF Identity int32)
validate
is generated, because it is only used at one instantiation of a
.sizeof
; the datatype sizes are known after typechecking and inlined during
code generation. The compiler knows that ListF Maybe int32
is naively 14
bytes wide
(1 + max(1, 1 + 4) + 8
), whereas ListF Identity int32
is 13
bytes wide (max(1, 1 + 4) + 8
).sizeof
, because they ignore alignment for simplicity.
At this point, factoring alignment into the size calculations is straightforward.Quantified class constraints provide an elegant framework for statically-sized higher-kinded types. On its own, this can raise the abstraction ceiling for high-performance languages, but it also serves as the groundwork for ‘zero-cost’ versions of functional programming abstractions such as Functor, Applicative, and Traversable.
This work shows it’s definitely possible for Rust to support higher-kinded types in a reasonable manner, but there are some less theoretical reasons why that might not be a good idea in practice. Adding ‘quantified trait bounds’ would require new syntax, and represents an additional concept for users to learn. Adding a kind system to Rust would also be a controversial change; choosing to keep types uncurried would disadvantage prospective users of the system, and changing to curried types would require rethinking of syntax and educational materials to maintain Rust’s high standard of user experience.
Jones, M. P. (1995). A system of constructor classes: overloading and implicit higher-order polymorphism. Journal of functional programming, 5(1), 1-35. ↩︎^{1} ↩︎^{2}
Bottu, G. J., Karachalias, G., Schrijvers, T., Oliveira, B. C. D. S., & Wadler, P. (2017). Quantified class constraints. ACM SIGPLAN Notices, 52(10), 148-161. ↩︎
When you define a datatype, you list the ways to construct values of that type. For example, this definition:
data Bool : Type where {
True;
False
}
says there are two ways to construct a Bool
: True
and False
.
Similarly, this definition:
data These (a : Type) (b : Type) : Type where {
This[a];
That[b];
These[a, b]
}
gives three ways to construct a These a b
(for any values of a
and b
). This[0]
has type
These Int x
, for any x
. That[True]
has type These x Bool
for any x
. These[0, True]
has
type These Int Bool
.
I want to note that constructors aren’t functions; they have a fixed number of arguments and must be fully applied.
Datatypes can also be defined recursively:
data List (a : Type) : Type where {
Nil;
Cons[a, List a]
}
The way you construct a value of a datatype is unique to that datatype; there are a finite number of constructors, and each represents a different way to build a value of that type. In contrast, there is a universal way to destruct values: pattern matching.
If some expression x
has type Bool
then we can destruct x
using pattern matching:
case x of {
True -> ...;
False -> ...
}
A pattern match acknowledges all the ways that a value could have been constructed, and provides
a branch for each possible case. When constructors carry around other values
(like those of These
or List
), pattern matching is used to write programs that extract and
process the inner values:
case y of {
This[a] -> f a;
That[b] -> g b;
These[c, d] -> h c d
}
When a program is running, the value that is being matched will eventually reduce to a constructor form:
case (These 0 True) of {
This[a] -> f a;
That[b] -> g b;
These[c, d] -> h c d
}
at which point, the appropriate branch is selected and the contents of the constructor are substituted
to the right of the ->
. The above code will pick the These
branch, substituting 0
for c
and True
for d
, so that the final result is h 0 True
.
Pattern matching is enough to process non-recursive datatypes, but recursive datatypes require recursive function definitions:
sum : List Int -> Int
sum n =
case n of {
Nil -> 0;
Cons[x, xs] -> x + sum xs
}
Hopefully this is all familiar to you. I’ve covered all this so that it contrasts with codatatypes.
Codatatypes are the dual to datatypes. Formally, this means a lot of things that I don’t yet understand. What follows is how this duality arises in practise.
To begin, I’d like to share some hand-wavy intuition for the concepts I’m discussing.
Datatypes are. They’re finite, fully-evaluated structures. They’re inert; they just exist and won’t ever “do anything”. Haskell doesn’t have true ‘datatypes’ in this sense because its constructors don’t force their arguments to be evaluated, which means you can hide computations inside them. Haskell lets you partially apply constructors, which further diverges from what I’ve laid out here.
Codatatypes do. They have ‘potential energy’; they have the capacity to do more work when prodded. Haskell’s ‘datatypes’ are more codata-like in this respect because they can contain suspended computations.
Since datatypes are defined by their constructors, codatatypes will be defined by their destructors.
This definition:
codata Pair (a : Type) (b : Type) : Type where {
fst : a;
snd : b
}
says that there are two ways to destruct a Pair a b
(for any a
and b
). If some expression x
has
type Pair a b
, then x.fst
has type and a
, and x.snd
has type a b
.
Pair
really is pair, it has just been defined by the ways you can pull things out of it- you can either
extract the first thing, or you can extract the second.
I also want to note that destructors aren’t functions, either. You can’t partially apply a destructor, and they’re not first-class.
Codatatypes can also be recursive:
codata Stream (a : Type) : Type where {
head : a;
tail : Stream a
}
A stream is like an infinite list; every stream value contains a head and a tail, and no matter how many times you extract the tail, there will always be another stream waiting for you.
There is a universal way to destruct datatypes, and there is a universal way to construct codatatypes.
For lack of a better term, you can call it ‘copattern matching’. Here’s how you would construct a
Pair Int Bool
:
cocase Pair Int Bool of {
fst -> 0;
snd -> True
}
A copattern match acknowledges every way it could be destructed, and provides a branch for each case.
Remember, copattern matching constructs values. The above code is a value that produces 0
when
destructed using fst
, and True
when destructed using snd
. It is defining a pair of 0
with True
.
When a program is running, a value that is being destructed will eventually reduce to a copattern match form.
So x.fst
might reduce to (cocase Pair Int Bool of { fst -> 0; snd -> True }).fst
. At this point,
the appropriate branch in the copattern match will be chosen, and the right hand side of the ->
will be
selected. In this case, (cocase Pair Int Bool of { fst -> 0; snd -> True }).fst
reduces to 0
.
Recursive codatatypes like Stream
need to be constructed by recursive definitions:
countFrom : Int -> Stream Int
countFrom n =
cocase Stream Int of {
head -> n;
tail -> countFrom (n+1)
}
countFrom 0
produces an infinite stream of integers starting at 0
. However, it doesn’t spin forever,
trying to construct the entire stream in one go. This is because a lone copattern match won’t reduce; reduction
only continues after a destructor has been applied and the correct branch has been selected. Because of
this, codatatypes can represent infinite values that are only generated on demand.
Datatype constructors can carry around values, and so can codatatype destructors. Here’s what that looks like:
codata Lambda (a : Type) (b : Type) : Type where {
apply[a] : b
}
There is one way to destruct a value of type Lambda a b
called apply
, and this destructor takes a
parameter. If f
has type Lambda a b
, and x
has type a
, then f.apply[x]
has type b
.
To create a value of type Lambda a b
, you would use a copattern match:
cocase Lambda a b of {
apply[x] -> ...
}
The destructor’s parameter is abstract and is to be filled by the value that the destructor will be carrying.
For example, (cocase Lambda Int Int of { apply[x] -> x + 1 }).apply[2]
selects the appropriate branch
(there’s only one), and substitutes 2
for x
to the right of the ->
. It steps to 2 + 1
.
So lambdas can be defined as codatatypes. Their destructor corresponds to function application, and copattern matching corresponds to abstraction. This is awesome!
]]>agda-mode
in Emacs), but I remember it was a bit
difficult to get going the very first time. Hopefully this becomes a searchable reference
to getting it all set up quickly.]]>agda-mode
in Emacs), but I remember it was a bit
difficult to get going the very first time. Hopefully this becomes a searchable reference
to getting it all set up quickly.
Prerequisites:
Install AgdaStdlib
globally
# /etc/nixos/configuration.nix
...
with pkgs; [
environment.systemPackages =
...
AgdaStdLib
];
...
Link /share/agda
# /etc/nixos/configuration.nix
...
[ "/share/agda" ];
environment.pathsToLink =
...
Rebuild: sudo nixos-rebuild switch
Navigate to or create ~/.agda
Create 3 files in ~/.agda
: defaults
, libraries
, standard-library.agda-lib
[isaac:~/.agda]$ touch defaults
[isaac:~/.agda]$ touch libraries
[isaac:~/.agda]$ touch standard-library.agda-lib
Edit standard-library.agda-lib
[isaac:~/.agda]$ cat << EOF >> standard-library.agda-library
> name: standard-library
> include: /run/current-system/sw/share/agda/
> EOF
This says that there is a library located at the relevant NixOS path.
Edit libraries
[isaac:~/.agda]$ echo "/home/isaac/.agda/standard-library.agda-lib" >> libraries
This registers the .agda-lib
file with Agda.
Edit defaults
[isaac:~/.agda]$ echo "standard-library" >> defaults
This tells Agda to include the standard-library
library by default.
To check your installation, try compiling a simple Agda file:
[isaac:~]$ cat << EOF >> Test.agda
> module Test where
> open import Data.Nat
> EOF
[isaac:~]$ agda Test.agda
Checking Test (/home/isaac/Test.agda).
Let me know whether or not this works for you :)
]]>Later, Dr. Cheng clearly states her position.
…people being obnoxious to me professionally are almost all white guys…
Many people take her word for it, but some (particularly white guys) are skeptical. I’m going to quickly examine some degrees of belief in her claim through the lens of probability theory.
To begin, I want to explain why I have bothered to think about this. I believe Eugenia’s statement; I have no reason to think that she’s mistaken or lying. But in spite of this, for some reason, I empathised with the aforementioned white guy. Why? My unease wasn’t caused by any conscious reasoning process; it just seemed to arise of its own accord.
I’ve been learning that “focusing” can be a helpful way to unpack these confusing feelings. I didn’t go all-out focusing on this, but throughout my inquiry I made sure to be conscious of how my reasoning interacted with the feeling.
To paint a better picture of the feeling, it was something like:
To re-iterate: I don’t think these feelings were rational, which is why I decided to keep digging. Let’s get into it. I’m going to assume basic familiarity with probability theory, and an understanding that probability is in the mind.
I think Eugenia’s claim is this: $P( \text{person was white guy} \; | \; \text{interacted with obnoxious person} ) > 0.5$. In other words: of all the obnoxious researchers she’s interacted with, most are white guys. Let’s look at the conditional probability in terms of Bayes’ Theorem:
$P(WG | O) = \frac{P(O | WG) \cdot P(WG)}{P(O)}$
To start with, I’ll plug in my estimates to show why I don’t disagree with her.
I think mathematics is pretty male-dominated, so I’m going to say $P(WG) = 0.7$. Seven in ten researchers she meets are white dudes.
Let’s then say that $P(O) = 0.1$ — one in ten researchers she interacts with are jerks (am I optimistic or pessimistic about the academic community?).
Lastly there’s $P(O | WG)$: of all the white male researchers she meets, how many act obnoxiously? I’m going to be charitable here and say (for demonstration purposes) that the white guys are no more jerkish on average, so one in ten white male researchers she interacts with are jerks to her. $P(O | WG) = 0.1$.
Now, compute!
$\begin{aligned} ~ & \frac{P(O | WG) \cdot P(WG)}{P(O)} \\\\ = & \; \frac{0.1 \cdot 0.7}{0.1} \\\\ = & \; 0.7 \end{aligned}$
My estimate is consistent with her statement - of all the obnoxious researchers she meets, seven in ten would be white guys, even when assuming zero racial/gender biases.
Suppose you disagree with me. That is, your estimates are such that $P(WG | O) \le 0.5$. There are two ways to disagree here:
A lower ratio $\frac{P(O | WG)}{P(O)}$. You might take $P(O | WG) = 0.07$, which means $P(WG | O) = 0.49$. You might instead take $P(O) = 0.14$ for a similar result. Either way, you’re claiming that the fragment of white male researchers Dr. Cheng meets are nicer on average than the general population of researchers she has met.
A lower $P(WG)$, indicating that you think Dr. Cheng interacts with relatively fewer white male researchers.
Running through these calculations didn’t give me any closure. I agree on paper, and feel that my estimates are appropriate. In fact, I would take $\frac{P(O | WG)}{P(O)}$ to be slightly greater than one, to account for biases like sexism and racism. But that only means I agree more.
The idea that ‘clicked’ with me, that immediately resolved my inner turmoil, was this: somehow I’m implicitly turning $P(WG | O)$ into $P(O | WG)$. $P(O | WG)$ is the term from which stereotypes are born. If most of the white guys you meet are jerks, then your $P(O | WG)$ is high. If you don’t quotient that by the proportion of people who are rude to you in general, then you have a gratuitous stereotype. If you do then you’re completely justified in thinking that ‘white guy’ and ‘obnoxious’ are correlated. So I think that somehow my wires were crossed and I was subconsciously interpreting the conversation as purely a statement about $P(O | WG)$.
I think this kind of error falls in the category of attribute substitution, and I think it’s pretty common. For example in Julia Galef’s video about Bayes’ Theorem, she says that before students learn Bayes’ Theorem, they often give $P(B | A)$ when asked for $P(A | B)$. I don’t know how exactly this sort of thing happens — maybe I’ll explore that some other time.
Anyway, I’m glad that my feelings and beliefs are now in sync on this issue.
]]>Soon after, I realised I wasn’t capable of building any of it.
At the time, even the simplest programs were incredibly confusing, and the complexity of a browser game was far beyond me. I’m lucky that I didn’t give up on programming after realising I couldn’t build any of my cool ideas. For me, just being able to make computers do things was enthralling, and my learning became self-propelled — games be damned.
Contrast this with how I’ve recently approached another activity that I’m interested in, but very bad at: creative writing. For a while now I’ve wanted to learn how to tell stories, and this has been my process so far:
“I want to write stuff.”
“Not just any old stuff, but cool stuff; interesting stuff.”
“But I don’t have any cool ideas!”
“Guess I won’t write about anything then.”
In other words: I’ve learned nothing, because I haven’t done anything. I’ve been paralysed by this desire to only make ‘interesting’ things. It’s easy to imagine that other people have been stuck in similar situations with programming: someone wants to learn to code, but doesn’t know what to build, or only wants to work on seriously cool and ground-breaking projects. And I’m sure it’s not only programming; it seems there’s a common set of problems facing people who take up a craft. To address this, I’m going to think of advice that I would give to beginner programmers and generalise it.
At the start of the learning process, you have barely any ability. This means that by most standards, you will produce a lot of terrible, unskilled work. But that’s okay, because at this stage each piece of work is a step toward improvement. Each thing you make exists so that you can figure out how to make the next one better. Your standards, or taste, can still point you in directions for improvement, but your mistakes and shortcomings are not bad things.
If you have any grand designs, put them on hold. Chances are you don’t have the ability to see them through right now. Sometimes lofty ideas can help you figure out which areas to study – if you dream of making a first person shooter then you should definitely learn about 3D graphics. In general, though, I think it will help to put the big ideas aside until you’re more skilled.
When I started coding, I made calculator and to-do list apps, tiny database-backed webapps and 2D space shooters and brick breakers. None of this was remotely novel; it was a challenge just to repeat what others had built before me. Even though I wasn’t building anything original, I was still creating something that I had never made before, which was a huge learning experience. Even simple programs, like a calculator, require important fundamental skills that I didn’t have. I encountered problems that had been solved countless times by others, but that didn’t diminish the value of solving them for myself.
Getting into this mindset seems especially difficult in ‘creative’ endeavours like writing, game programming, or visual art. It feels like in these areas, there’s a big emphasis on being unique, imaginative, and innovative. While this is a fine standard for experts, it’s the wrong way for beginners to approach a craft. Making shiny new stuff comes later; right now you need to re-make old stuff.
Replicating past work doesn’t mean to copy things out line for line. Copy the idea, and figure out how to fill in the details yourself. When I sat down and said “I’m going to write a brick breaking game”, I didn’t go and cut sections of code out of other similar games. I took the un-original concept, broke it into smaller components, and figured out how to make each of those bits. How do I display a ball on the screen? How do I make it move? How do I make it bounce? Each sub-problem required original thinking on my part, and it’s this original thinking that drove my improvement.
Motivation is really important for making consistent progress in a skill, but not all motivations are equally effective. I think the least sustainable motivational source (but most common for beginners) is ‘wanting to make something good’. If this is your main motivation, then you’ll be in a constant state of discouragement — every piece of work a reminder of your lack of ability. These little negatives slowly add up until you resent the activity.
A better motivational source is ‘to improve at something’. With this perspective, you feel accomplishment whenever you notice that your skills have increased, which happens often for beginners who practise regularly. But improvement-based motivation isn’t enough on its own; there are periods where you won’t notice improvement, but it’s important to practise anyway.
The most important and fundamental source of motivation comes from enjoying the activity itself. If you enjoy the craft regardless of standards of quality or ability, then it doesn’t matter how ‘bad’ your work is in the beginning, because producing it is a joy in itself. I’ve spent days and sometimes weeks stuck trying to make my code work the way I intended, but that’s all okay because it’s part of the process. At the end of the day I just like making computers do things.
I mainly wrote this for myself, to get a bit of a handle on why I find writing so difficult and how I can make it easier. I think it has helped. The next thing I write (not on here) will likely be the literary equivalent of a command-line calculator program — and that’s okay.
All this ‘advice’ may seem obvious to some, or lacking in original insight, and I’m sure other people have written about these things to greater effect than me. But that’s beside the point, isn’t it?
]]>Not the act of writing itself, but a particular writing process where you take an idea, think deeply about exactly how you want to express it, and then write it down. I think the process itself provides part of the value, as a form of “brain exercise”, but also the end product, as a foundation upon which we can build more robust ideas.
This post is a funny self-referential exercise where I explore what I’m writing about as I write it.
When I have a thought, I feel pretty confident about it. It seems real and solid and it feels like I understand it. But most of the time this is an illusion. If I try to go deeper and ask myself questions like “Why do you think that?” or “Can you elaborate?” I often find it hard to come up with an answer. When this happens I say my idea lacks ‘substance’ or ‘structure’. There are no justifications or consequences bundled along with the thought; it’s kind of empty. Substance is important for ideas to be meaningful - at best, an insubstantial idea is a platitude, and at worse, outright nonsense. So if I have an idea, I’d like to give it some meaning (or figure out that it is actually as empty as it seems).
If I want to uncover the structure of an idea, I need to think more about it. I need to ask questions and explore perspectives. I’ll have a stack of relevant sub-ideas that all need to be related to the larger idea. If I tried to do all this in my head then I wouldn’t get very far. My working memory is really tiny compared to the ideas I want to tackle. So my first use for writing is as a tool of thought: I can store all my relevant thoughts onto a page and devote my full attention to the particular problem at hand. Sound reasoning is an important component of ‘thinking better’, and writing enables me to devote more attention to that process. Not only can I reason better, but I also have the freedom to reason more, because I can explore more sub-ideas without getting lost. Writing allows me to increase the intensity and volume of reasoning, which seems like it should lead to greater improvement in the area.
In addition to being a personal tool, writing also is used to transfer information between two different minds, and I think being conscious of this is an important part of the process. To me, writing seems like a kind of telepathy. It’s a way to transmit thoughts between minds via a physical medium. But high fidelity transmission isn’t guaranteed just because you wrote something down. We need to put words together in a way that makes it more likely for the telepathy to be successful. I think that to succeed at this sort of language game requires a clear understanding of the subject matter. When you understand what you’re talking about, you can play with the descriptions you use and compare their accuracy, then use the best description in your final work. But if you don’t understand what you’re talking about then there’s not much to measure your words against in the first place. I think that focusing on communication forces us to search for the ‘essence’ of an idea, which further engages our critical thinking abilities.
My hope is that putting all this effort into exploring and refining an idea creates new intellectual opportunities in the future. Kind of like taking blobs of clay and forging them into bricks: if you want to build a tower, you want to start with the foundation and work up, brick by brick. You won’t get very high by stacking clay. Similarly, this writing process might be refining ideas in a way that is necessary to make further intellectual progress, and that without it, there would be a much lower ceiling on what we can achieve.
Having written all this down, I think a summary of this process is: the deliberate practise of organising thought, critical thinking, and effective communication. Writing serves dual purposes: as a tool it enables us to better explore ideas, and treating it as an end in itself requires us to better understand those ideas.
]]>Vacuum cleaner is just not ‘the same kind of thing’ as monoid.
On one hand, we have something that was built to accomplish a task. Vacuum cleaners suck stuff up. Some of them have wheels and some go on your back. Some have carpet cleaning attachments and some are adapted for tile floors. In the end, we have an object that was made to clean, made of bits that we can point to and say “this bit helps it suck better” or “this bit makes it easier to move” or “this bit makes it look pretty”.
On the other hand, we have something that describes how things can relate to each other. When you say “the natural numbers with addition and 0 form a monoid”, you impose some structure onto the natural numbers. We can prove whether or not the naturals do exhibit this structure, and then use that fact to inform how we should think about them. We can’t ‘point at bits of monoid’ and say how much they contribute to some purpose.
It seems like the popular perception of programming languages falls more in the ‘vacuum cleaner’ camp: that a programming language is just something for translating a description of computation into something machine-readable. If you want to describe computations involving numbers and strings, then you add ‘do number stuff’ and ‘do string stuff’ features to the language. If you find that ‘X-Y-Z’ is a common coding pattern, you might introduce ‘W’ syntax, which does ‘X-Y-Z’ but is easier to type.
I think that this ‘features focused’ development style can cause people to ignore too much of the structure that the features contribute to (or even that the features are destroying structure). Programming language ‘design’ needs a lot of what goes in the ‘monoid’ camp. That is, they should be treated as more of an abstract thing that gives some structure to computation. Ignoring this aspect of development is what leads to edge cases, unintuitive behaviour, and this general feeling of ‘poor design’.
How many people have been surprised to learn that floating point addition is not associative? It seems reasonable to just expect addition to be associative. Many programming language ‘wats’ exist for a similar reason - they are examples of a language behaving counter to our expectations. In both these cases there are implicit ‘structural contracts’ that are violated. Hidden patterns about how things relate to each other that are so common we just take them for granted, but are not present in certain systems by design.
So I think a big part of what makes a language feel ‘discovered’ as opposed to ‘invented’ is the amount of attention paid to the structure of the thing. ‘Discovered’-seeming languages have more internal consistency and fewer ‘quirks’, because they’re not meant to just ‘turn descriptions of computations into binary’. They have to do this in a way that adheres to a consistent, coherent structure of computation.
]]>The Scope
datatype
in bound
is very safe. The type prevents you from creating invalid De
Bruijn terms, like λ. 3
. This means that you can’t write useful
instances of Plated
for types which contain a Scope
. When it comes
to choosing between bound
and Plated
, I choose Plated
- because we
can use it to build functionality similar to bound
.
Let’s get some boilerplate out of the road. Here is a datatype for
lambda calculus, with De Bruijn indices (B
), as well as free variables
(F
). Notice that lambda abstraction (Abs
) doesn’t give a name to the
function argument, which means that only B
s can reference them. This
is called the “locally nameless” approach.
{-# language DeriveGeneric #-}
import Control.Lens.Plated (Plated(..), gplate, transformM)
import GHC.Generics (Generic)
import qualified Control.Monad.RevState as Reverse
import qualified Control.Monad.State as State
data Expr
= F String
| B Int
| App Expr Expr
| Abs Expr
deriving (Eq, Show, Generic)
instance Plated Expr where
= gplate plate
The core of the bound
-like API will be two functions:
abstract :: String -> Expr -> Expr
instantiate :: Expr -> Expr -> Maybe Expr
Let’s do abstract
first.
abstract name expr
finds all the F name
nodes in an expr
and
replaces them with the appropriate De Bruijn index, then wraps the final
result in an Abs
. The “appropriate index” is the number of Abs
constructors that we passed on the way.
For example, abstract "x" (F "x")
evaluates to Abs (B 0)
, because we
passed zero Abs
constructors to get to the "x"
, then wrapped the final
result in an Abs
. abstract "y" (Abs (App (B 0) (F "y")))
evaluates to
Abs (Abs (App (B 0) (B 1)))
because we passed one Abs
to get to the
"y"
, then wrapped the final result in an Abs
.
“Do this everywhere” usually means
transform :: Plated a => (a -> a) -> a -> a
is appropriate. Though
in this case, it doesn’t give us any way to count the number of Abs
it
passes. Instead we will use transformM :: (Monad m, Plated a) => (a -> m a) -> a -> m a
with State.
Here’s how that looks:
abstract_attempt_1 :: String -> Expr -> Expr
= Abs . flip State.evalState 0 . transformM fun
abstract_attempt_1 name where
fun :: Expr -> State.State Int Expr
F name')
fun (| name == name' = B <$> State.get
| otherwise = pure $ F name'
Abs expr) = Abs expr <$ State.modify (+1)
fun (= pure expr fun expr
If you see a free variable with the name we’re abstracting over, replace
it with a De Bruijn index corresponding to the number of binders we’ve
seen. If you see an Abs
, increment the counter. If you see something
else, don’t do anything special.
This is the right idea, but it doesn’t work because the transform
family of functions act from the bottom up. When it sees a free variable
it can abstract over, it will replace it with B 0
, then go upwards
through the tree, incrementing the counter. This is the reverse of
what we want.
Enter Reverse
State.
In reverse state, get
accesses the state of the computation after
it, not before it. Using regular state,
execState (modify (+1) *> modify (*2)) 0
will evaluate to 2
, because
you set the state to zero, add one, then multiply by two. Using reverse
state, the output is 1
, because you set the state to zero, multiply by
two, then add one.
This means that if we swap regular state for reverse state in
abstract
, get
refers to a state which is only calculated after
bubbling all the way to the top, and counting all the Abs
constructors.
So the correct code looks like this:
abstract :: String -> Expr -> Expr
= Abs . flip Reverse.evalState 0 . transformM fun
abstract name where
fun :: Expr -> Reverse.State Int Expr
F name')
fun (| name == name' = B <$> Reverse.get
| otherwise = pure $ F name'
Abs expr) = Abs expr <$ Reverse.modify (+1)
fun (= pure expr fun expr
The logic remains the same, except now the state transformations run backwards.
Now for instantiate
.
instantiate (Abs body) x
substitutes x
into
the appropriate positions in body
, and wraps the final result in a
Just
. If the first argument to instantiate
is not an Abs
, then the
result is Nothing
. We substitute x
everywhere we find a B
that
contains the number of binders we have passed.
For example, instantiate (Abs (B 0)) (F "x")
evaluates to
Just (F "x")
, because we found a B 0
when we had passed zero binders
(the outer Abs
doesn’t count).
instantiate (Abs (Abs (App (B 0) (B 1)))) (F "y")
evaluates to
Just (Abs (App (B 0) (F "y")))
, because we found a B 1
when we had
passed one binder. The B 0
is not replaced because at that point, we
had passed one binder, and zero is not one.
We have the same problem as with abstract
: counting binders proceeds
from the top down, but transformM
works from the bottom up. We can use
reverse state again to solve this. Here’s the code:
instantiate :: Expr -> Expr -> Maybe Expr
Abs body) x = Just $ Reverse.evalState (transformM fun body) 0
instantiate (where
fun :: Expr -> Reverse.State Int Expr
B n) = do
fun (<- Reverse.get
n' pure $
if n == n'
then x
else B n
Abs expr) = Abs expr <$ Reverse.modify (+1)
fun (= pure expr
fun expr = Nothing instantiate _ _
And there we have it: a bound
-like API for a datatype using Plated
.
I think there are two pressing issues when comparing this code to
bound
: correctness and generalisation. This approach allows you to
write bogus terms, like Abs (B 3)
, whereas bound
does not. I’m okay
with this, because I highly value the tools Plated
provides.
Additionally, the bound
combinators work over any term as long as it
is a Monad
, so abstract
and instantiate
only have to be written
once, whereas we haven’t presented any means for generalisation of the
Plated
approach.
This is easily fixed: in a follow-up post, I’ll
write about how we can use Backpacky Prisms to provide abstract
and instantiate
as library functions.
Given a grid of integers, find the largest product of n numbers which are
adjacent in the same direction (left, right, up, down, or diagonally)
In this diagram, A
, B
, and C
are diagonally adjacent:
0 0 0 0
0 0 0 C
0 0 B 0
0 A 0 0
And in this one, A
, B
, and C
are vertically adjacent:
0 A 0 0
0 B 0 0
0 C 0 0
0 0 0 0
I initially solved the problem with this data structure and operations:
data Grid a
= Grid
width :: !Int
{ height :: !Int
, xPos :: !Int
, yPos :: !Int
, content :: [[a]]
,
}
focus :: Grid a -> a
Grid _ _ x y g) = (g !! y) !! x
focus (
-- These operations return Nothing if we are at the edge of the grid,
-- otherwise increment/decrement xPos/yPos accordingly
right :: Grid a -> Maybe (Grid a) up, left, down,
The idea being to walk through the grid, and for each position calculate the product of the adjacent elements. For example, the product of the focus and the two neighbours to its right would be:
example1 :: Num a => Grid a -> Maybe a
=
example1 grid -> focus grid * b * c) <$>
(\b c fmap pos (right grid) <*>
fmap pos (right <=< right $ grid)
Grid
can be given a Comonad
instance, and this process of per-position calulation can be expressed using comonadic operations. If we plug the example1
function into extend
, we get the function extend example1 :: Grid a -> Grid (Maybe a)
. This function walks through the grid, and replaces each cell with the result of running example1
on it and its neighbours.
This is cool in and of itself, but implementing duplicate
or extend
for Grid
is tedious. Grid
can actually be implemented as the composition of two comonads: Env and Store, which gives us the correct comonadic behaviour for free.
import Control.Applicative (liftA2)
import Control.Monad ((<=<))
import Control.Comonad ((=>>), extract)
import Control.Comonad.Env (EnvT(..), ask)
import Control.Comonad.Store (Store, store, peek, pos, seek)
import Data.List (maximum)
type Dimensions = (Int, Int)
type Position = (Int, Int)
type Grid a = EnvT Dimensions (Store Position) a
EnvT e w a
is an environment of type e
paired with an underlying comonad w a
. We can inspect the environment with ask :: ComonadEnv e w => w a -> e
. extract
ing from an EnvT
just extracts from the underlying comonad, and ignores the environment. The dimensions of the grid are the environment because they remain static throughout the program.
Store s a
consists of some state s
, and an “accessor” function of type s -> a
. extract
ing a Store
feeds its state into the accessor function. For Grid
, the focus position is the state, and the accessor is a function that pulls out the corresponding element from some list of lists.
Three important functions on Store
are:
pos :: ComonadStore s w => w a -> s
seek :: ComonadStore s w => s -> w a -> w a
peek :: ComonadStore s w => s -> w a -> a
pos
returns the current state, seek
replaces the state, and peek
runs the accessor function on a different piece of state, leaving the actual state unchanged.
Here’s how we make a grid. Notice that the accessor function passed to store
behaves like focus
.
mkGrid :: [[a]] -> Maybe (Grid a)
= Nothing
mkGrid [] @(r:rs)
mkGrid g| rl <- length r
all ((==rl) . length) rs =
, Just $
EnvT
length g)
(rl, -> (g !! y) !! x) (0, 0))
(store (\(x, y) | otherwise = Nothing
If the grid has no rows, or has rows of different lengths, return nothing. Otherwise, calculate the dimensions of the grid, and initialise the store pointing to the top-left cell in the grid.
Now we can implement up, down, left, right
:
up :: Grid a -> Maybe (Grid a)
=
up g let
= ask g
(w, h) = pos g
(x, y) in
if y > 0 then Just (seek (x, y-1) g) else Nothing
left :: Grid a -> Maybe (Grid a)
=
left g let
= ask g
(w, h) = pos g
(x, y) in
if x > 0 then Just (seek (x-1, y) g) else Nothing
down :: Grid a -> Maybe (Grid a)
=
down g let
= ask g
(w, h) = pos g
(x, y) in
if y < h-1 then Just (seek (x, y+1) g) else Nothing
right :: Grid a -> Maybe (Grid a)
=
right g let
= ask g
(w, h) = pos g
(x, y) in
if x < w-1 then Just (seek (x+1, y) g) else Nothing
Next are some helper functions for calculating the product of a grid element and its neighbours.
iterateM
is the monadic equivalent of iterate.
productN
calculates the product of the current grid element with its adjacent neighbours in some direction. example1
could be redefined as productN 3 right
.
iterateM :: Monad m => (a -> m a) -> [a -> m a]
= f : fmap (f <=<) (iterateM f)
iterateM f
productN :: Num a => Int -> (Grid a -> Maybe (Grid a)) -> Grid a -> Maybe a
=
productN n f g foldr
-> liftA2 (*) (extract <$> a g) b)
(\a b pure 1)
(take n $ iterateM f) (
Penultimately, we define a function for finding the greatest element in a grid. It peek
s at all the elements and finds the greatest one.
maxInGrid :: Ord a => Grid a -> a
=
maxInGrid g let
= ask g
(w, h) in
maximum $ do
<- [0..w-1]
x <- [0..h-1]
y pure $ peek (x, y) g
Last step. To find the largest product of n
adjacent elements, we find the largest product of n
adjacent elements horizontally, then vertically, then diagonally, and take the maximum of those.
We can write this logic as a series of extend
s, because productN n move
and maxInGrid
are both of the form w a -> b
. ((=>>)
is the flipped infix version of extend
)
largestProduct :: Int -> Grid Int -> Int
=
largestProduct n g let
Just g1 = extract $ g =>> productN n right =>> maxInGrid
Just g2 = extract $ g =>> productN n down =>> maxInGrid
Just g3 = extract $ g =>> productN n (down <=< left) =>> maxInGrid
Just g4 = extract $ g =>> productN n (down <=< right) =>> maxInGrid
in
maximum [g1, g2, g3, g4]
I’m still getting an intuition for comonads, but they seem to embody some kind of “environment”, and comonad transformers are like a “composition of environments”. In this example, there are two environments: the grid’s dimensions, and its content.
For more information about comonads, check out Bartosz Milewski’s comonads post and Dan Piponi’s article about comonadic cellular automata.
Footnote: I feel like largestProduct
could be simplified if Grid
were ComonadApply
, but I haven’t tried to figure it out yet.
This post was generated from a Literate Haskell file using Pandoc, so you can load it up into GHCI and play around if you want.
module CPS where
import Control.Applicative
import Control.Monad.Trans.Class
import Data.IORef
import Data.Maybe
import qualified Data.Map as M
import qualified System.Exit as E
The main idea of this style is that the called function has control over how its return value is used. Usually, the caller will pass a function that tells the callee how to use its return value. Here’s what that looks like:
add :: Num a => a -> a -> (a -> r) -> r
= \k -> k (a + b) add a b
add
takes two numbers, plus a function that will take the result and do something (returning an unknown answer), then pass the result to this function. We call this ‘extra function’ a continuation because it specifies how the program should continue.
It’s possible to write any program using this style. I’m not going to prove it. As a challenge, let’s restrict ourselves to write every function this way, with two exceptions:
exitSuccess :: a -> IO ()
= E.exitSuccess
exitSuccess _
exitFailure :: Int -> IO ()
= E.exitWith . E.ExitFailure exitFailure
exitSuccess
and exitFailure
do not take a continuation, because the program always ends when they are called.
Let’s define mul
and dvd
:
mul :: Num a => a -> a -> (a -> r) -> r
= \k -> k (a * b)
mul a b
dvd :: Fractional a => a -> a -> (a -> r) -> r
= \k -> k (a / b) dvd a b
Now we can write some programs using this style.
-- Exits with status code 5
prog_1 :: IO ()
= add 2 3 exitFailure
prog_1
-- Exits successfully after multiplying 10 by 10
prog_2 :: IO ()
= mul 10 10 exitSuccess
prog_2
-- Exits with status code (2 + 3) * 5 = 25
prog_3 :: IO ()
= add 2 3 (\two_plus_three -> mul two_plus_three 5 exitFailure) prog_3
We can factor out the continuation to make our program more modular:
-- Equivalent to \k -> k ((2 + 3) * 5)
prog_4 :: (Int -> r) -> r
= \k -> add 2 3 (\two_plus_three -> mul two_plus_three 5 k)
prog_4
-- Equivalent to \k -> k ((2 + 3) * 5 + 5)
prog_5 :: (Int -> r) -> r
= \k -> prog_4 (\res -> add res 5 k) prog_5
In these kind of definitions, we’ll call the k
the current continuation to stand for how the program will (currently) continue execution.
Here’s a more complex expression:
-- (2 + 3) * (7 + 9) + 5
prog_6 :: Num a => (a -> r) -> r
= \k ->
prog_6 2 3 (\five ->
add 7 9 (\sixteen ->
add ->
mul five sixteen (\eighty 5 k))) add eighty
In translating programs to continuation passing style, we transform a tree of computations into a sequence of computations. In doing so, we have reified the flow of the program. We now have a data structure in memory that represents the computations that make up the program. In this case, the data structure is a lot like a linked list- there is a head: ‘the computation that will be performed next’, and a tail: ‘the computations that will be performed on the result’. It’s this ability to represent the flow of the program as a data structure that sets CPS programs apart from regular programs, which we will see later.
Right now, writing CPS programs in Haskell is too verbose. Fortunately there are some familiar abstractions that will make it elegant:
newtype Cont r a = Cont { runCont :: (a -> r) -> r }
add' :: Num a => a -> a -> Cont r a
= Cont $ add a b
add' a b
mul' :: Num a => a -> a -> Cont r a
= Cont $ mul a b
mul' a b
dvd' :: Num a => a -> a -> Cont r a
= Cont $ dvd a b
dvd' a b
instance Functor (Cont r) where
fmap f c = Cont $ \k -> runCont c (k . f)
instance Applicative (Cont r) where
pure a = Cont ($ a)
<*> ca = Cont $ \k -> runCont cf (\f -> runCont ca (\a -> k (f a)))
cf
instance Monad (Cont r) where
>>= f = Cont $ \k -> runCont ca (\a -> runCont (f a) k) ca
It turns out that the return type of these CPS programs, (a -> r) -> r
, is a Monad. If you don’t understand these implementations, meditate on them until you do. Here some hand-wave-y English explanations that may help:
fmap
: Continue with the result of c
by changing the result from an a
to a b
then sending that result to the current continuation.
pure
: Send an a
to the current continuation
<*>
: Continue with the result of cf
by continuing with the result of ca
by sending (the result of cf
) applied to (the result of ca
) to the current continuation.
>>=
: Continue with the result of ca
by applying it to f
and passing the current continuation on to the value f
returned.
So now we can rewrite our previous verbose example:
prog_6' :: Cont r Int
= do
prog_6' <- add' 2 3
five <- add' 7 9
sixteen <- mul' five sixteen
eighty 5 add' eighty
and run it:
prog_7 :: IO ()
= runCont prog_6' exitSuccess prog_7
Consider the following CPS program:
prog_8 :: (Eq a, Fractional a) => a -> a -> a -> (Maybe a -> r) -> r
= \k ->
prog_8 a b c
add b c->
(\b_plus_c if b_plus_c == 0
then k Nothing
else dvd a b_plus_c (k . Just))
It adds b
to c
, then if b + c
is zero, sends Nothing
to the current continuation, otherwise divides a
by b + c
then continues by wrapping that in a Just
and sending the Just
result to the current continuation.
Because the current continuation is ‘how the program will continue with the result of this function’, sending a result to the current continuation early cause the function to exit early. In this sense, it’s a bit like like a jmp
or a goto
.
It is conceivable that somehow we can write a program like this using the Cont
monad. This is where callCC
comes in.
callCC
stands for ‘call with current continuation’, and is the way we’re going to bring the current continuation into scope when writing CPS programs. Here’s an example of how the previous code snippet should look using callCC
:
prog_8' :: (Eq a, Fractional a) => a -> a -> a -> Cont r (Maybe a)
= callCC $
prog_8' a b c -> do
\k <- add' b c
b_plus_c if b_plus_c == 0
then k Nothing
else fmap Just $ dvd' a b_plus_c
Here’s how callCC
is defined:
callCC :: ((a -> Cont r b) -> Cont r a) -> Cont r a
= Cont $ \k -> runCont (f (\a -> Cont $ const (k a))) k callCC f
We can see that the current continuation is permanently captured when it is used in the function passed to f
, but it is also used when running the final result of f
. So k
might be called somewhere inside f
, causing f
to exit early, or it might not, in which case k
is guaranteed to be called after f
has finished.
Earlier I said that invoking the current continuation earlier is like jumping. This is a lot easier to show now that we can use it in our Cont
monad. Calling the continuation provided by callCC
will jump the program execution to immediately after the call to callCC
, and set the result of the callCC
continuation to the argument that was passed to the current continuation.
= do
prog_9 <- add' 2 3
five <- callCC $ \k ->
res -- current continuation is never used, so `callCC` is redundant
4 5
mul' -- `res` = 20
add' five res
= do
prog_10 <- add' 2 3
five <- callCC $ \k -> do
res 5
k 4 5 -- this computation is never run
mul' -- program jumps to here, `res` = 5
add' five res
= do
prog_11 <- add' 2 3
five <- callCC $ \k -> do
res if five > 10
then k 10 -- branch A
else mul' 4 5 -- branch B
-- if branch A was reached, `res` = 10
-- if branch B was reached, `res` = 20
add' five res
We can also embed arbitrary effects in the return type of Cont
. In other words, we can create a monad transformer.
newtype ContT r m a = ContT { runContT :: (a -> m r) -> m r }
callCC' :: ((a -> ContT r m b) -> ContT r m a) -> ContT r m a
= ContT $ \k -> runContT (f (\a -> ContT $ const (k a))) k
callCC' f
instance Functor (ContT r m) where
fmap f c = ContT $ \k -> runContT c (k . f)
instance Applicative (ContT r m) where
pure a = ContT ($ a)
<*> ca = ContT $ \k -> runContT cf (\f -> runContT ca (\a -> k (f a)))
cf
instance Monad (ContT r m) where
>>= f = ContT $ \k -> runContT ca (\a -> runContT (f a) k)
ca
instance MonadTrans (ContT r) where
= ContT $ \k -> ma >>= k lift ma
Notice that the Functor
, Applicative
and Monad
instances for ContT r m
don’t place any constraints on the m
. This means that any type constructor of kind (* -> *)
can be in the m
position. The MonadTrans
instance, however, does require m
is a monad. It’s a very simple definition- the result of running the lifted action is piped into the current continuation using >>=
.
Now that we have a fully-featured CPS monad, we can start doing magic.
The continuation that callCC
provides access to is the current future of program execution as a single function. That’s why this program-as-a-linear-sequence is so powerful. If you could save the current continuation and call it at a later time somewhere else in your (CPS) program, it would jump ‘back in time’ to the point after that particular callCC
.
To demonstrate this, and end with a bang, here’s a simple boolean SAT solver.
-- Language of boolean expressions
data Expr
= Implies Expr Expr
| Iff Expr Expr
| And Expr Expr
| Or Expr Expr
| Not Expr
| Val Bool
| Var String
deriving (Eq, Show)
-- Reduces a boolean expression to normal form, substituting variables
-- where possible. There are also some equivalences that are necessary to get
-- the SAT solver working e.g. Not (Not x) = x (I said it was a simple one!)
eval :: M.Map String Expr -- ^ Bound variables
-> Expr -- ^ Input expression
-> Expr
=
eval env expr case expr of
Implies p q -> eval env $ Or (Not p) q
Iff p q -> eval env $ Or (And p q) (And (Not p) (Not q))
And a b ->
case (eval env a, eval env b) of
Val False, _) -> Val False
(Val False) -> Val False
(_, Val True, b') -> b'
(Val True) -> a'
(a', -> And a' b'
(a', b') Or a b ->
case (eval env a, eval env b) of
Val True, _) -> Val True
(Val True) -> Val True
(_, Val False, b') -> b'
(Val False) -> a'
(a',
(a', b')| a' == eval env (Not b') -> Val True
| otherwise -> Or a' b'
Not a ->
case eval env a of
Val True -> Val False
Val False -> Val True
Not a' -> a'
-> Not a'
a' Val b -> Val b
Var name -> fromMaybe (Var name) (M.lookup name env)
-- Returns `Nothing` if the expression is not satisfiable
-- If the epxression is satisfiable returns `Just mapping` with a valid
-- variable mapping
sat :: Expr -> ContT r IO (Maybe (M.Map String Expr))
= do
sat expr -- A stack of continuations
<- lift $ newIORef []
try_next_ref
$ \exit -> do
callCC' -- Run `go` after reducing the expression to normal form without any
-- variable values
<- go (eval M.empty expr) try_next_ref exit
res case res of
-- If there was a failure, backtrack and try again
Nothing -> backtrack try_next_ref exit
Just vars -> case eval vars expr of
-- If the expression evaluates to true with the results of `go`, finish
Val True -> exit res
-- Otherwise, backtrack and try again
-> backtrack try_next_ref exit
_
where
-- To backtrack: try to pop a continuation from the stack. If there are
-- none left, exit with failure. If there is a continuation then enter it.
= do
backtrack try_next_ref exit <- lift $ readIORef try_next_ref
try_next case try_next of
-> exit Nothing
[] :rest -> do
next$ writeIORef try_next_ref rest
lift
next
-- It's a tree traversal, but with some twists
=
go expr try_next_ref exit case expr of
-- Twist 1: When we encounter a variable, we first continue as if it's
-- true, but also push a continuation on the stack where it is set to false
Var name -> do
<- callCC' $ \k -> do
res $ modifyIORef try_next_ref (k (Val False) :)
lift pure $ Val True
-- When this program is first run, `res` = True. But if we pop and
-- enter the result of `k (Val False)`, we would end up back here
-- again, with `res` = False
pure $ Just (M.singleton name res)
Val b -> pure $ if b then Just M.empty else Nothing
-- Twist 2: When we get to an Or, only one of the sides needs to be
-- satisfied. So we first continue by checking the left side, but also
-- push a continuation where we check the right side instead.
Or a b -> do
<- callCC' $ \k -> do
side $ modifyIORef try_next_ref (k b :)
lift pure a
-- Similar to the `Var` example. First ruvn, `side` = a. But if later
-- we enter the saved continuation then we will return to this point
-- in the program with `side` = b
go side try_next_ref exit And a b -> do
<- go a try_next_ref exit
a_res <- go b try_next_ref exit
b_res pure $ liftA2 M.union a_res b_res
Not a -> go a try_next_ref exit
-> go (eval M.empty expr) try_next_ref exit _
The solver sets all the variables to True
, and if the full expression evaluates to False
it flips one to False
and automatically re-evaluates the expression, repeating the process untill either it finally evaluates to True
or all possible combinations of boolean values have been tested. It’s not efficient, but it’s a wonderful illustration of the elegance that CPS enables.
Every time you rebuild your NixOS configuration, a new entry is added to the bootloader. This is helpful if you ever make a configuration change that breaks on your machine because you can reboot into the last known working state and try something different.
If you don’t need to have access to all your old configurations, you can delete them:
sudo nix-collect-garbage -d
sudo nixos-rebuild switch
30 December 2018
I don’t know if I was confused when I first wrote this, or if the process has improved since then. Either way, these instructions are more complex than necessary, so I’ve updated them.
Delete the old (excludes the current) package configurations for the
NixOS system sudo nix-env -p /nix/var/nix/profiles/system --delete-generations old
Collect garbage nix-collect-garbage -d
View the remaining generation nix-env -p /nix/var/nix/profiles/system --list-generations
. Take note of this for the next step.
Remove unnecessary boot loader entries. I use systemd-boot
, so all
my entries are located in /boot/loader/entries
. To remove all the
old entries, run sudo bash -c "cd /boot/loader/entries; ls | grep -v <current-generation-name> | xargs rm"
(you might want to back up
the entries somewhere to be safe)
Unification is a method of solving equations by substitution. This sentence alone doesn’t give enough information to implement an algorithm, so let’s define some vocabulary to write a more rigorous definition.
term
: A term is an abstract syntax tree representing the language that will be used.
In order for unification to proceed, term must have some value that represents a
variable
, and some values that represent constants
- the idea being that variables
can be replaced during unification, but constants cannot.
equation
: An equation is a pair of terms, written term_1 = term_2
.
syntactic equality
: Two terms are syntactically equal if their ASTs match
exactly.
equivalence
: Two terms are equivalent if there exists some substitution that would
make them syntactically equal.
solved
: An equation is solved if the left and right hand sides are syntactically
equal.
substitution
: A substitution is a set of mappings from variables to terms, written
{ var_1 => term_1, ..., var_i => term_i }
.
application
: A substitution can be applied
to a value containing variables - written subs(value)
:
Applying an empty substitution to a value does not change the value. For reasons that will be explained later, a substitution is only valid if every variable on the left side of mapping does not occur in the term on the right side of the respective mapping.
minimal
: A substitution is minimal if no variables in the right hand sides of
any mapping occur on any left hand side of any mapping. In other words, if
applying the substitution is idempotent: subs(subs(value)) = subs(value)
With this vocabulary, we can now better define unification:
Given a set of equations eqs
, find a minimal substitution sub
such that
every equation in sub(eqs)
is solved
Unification is the backbone of type inference in the HM type theory. The actual type inference algorithm is not important here- just how unification works on HM terms.
A term in HM is defined as term := term -> term | primitive | variable
where
primitive
is an element of a set of primitive types and variable
is a string.
respectively. To satisfy the requirements of unification, primitives are constants
and variables are, of course, variables.
Examples of syntactically equal HM terms:
a
and a
primitive
and primitive
a -> a
and a -> a
Examples of equivalent HM terms:
a
and c
primitive
and d
(a -> b) -> c
and d -> e
When conducting type inference for an expression, its type is initially set to a new variable. A set of equations is generated by traversing the expression’s AST, then these equations are then unified, which yields a solution for the expression’s type variable.
A simple unification algorithm can be described as follows:
unify(equations):
solutions := {}
ix := 0
while ix < equations.length:
equation := equations[ix]
if solved(equation):
ix++
continue
substitution := {}
if is_variable(equation.lhs):
if occurs(equation.lhs, equation.rhs):
error("Variable occurred on both sides of an equation")
substitution := {current.lhs => current.rhs}
ix++
elif is_variable(equation.rhs):
swap_sides(equations[ix])
elif equivalent(equation.lhs, equation.rhs):
substitution := unify(implied_equations(equation))
else:
error("Cannot unify non-equivalent terms")
substitution.apply(solutions)
substitution.apply(equations)
solutions.union(substitution)
return solutions
In essence the algorithm is “rearrange an equation so it is a solution, update everything according to this knowledge, remember the solution and continue”. Sounds like something we did a lot in school…
occurs
?This algorithm requires a substitution to ‘eliminate’ a variable from the problem. If a variable could also be on the right side of a substitution then it would not be eliminated, constructing an infinite solution.
To demonstrate, let’s unify the HM equations {a = b -> c, a = d, b = d, a = c}
without
the occurs check:
equations = {a = b -> c, a = d, b = d, a = c}
solutions = {}
equations = {b -> c = d, b = d, b -> c = c} (removed a = b -> c, applied a => b -> c)
solutions = {a => b -> c} (added a => b -> c)
equations = {b = b -> c, b -> c = c} (removed b -> c = d, applied d = b -> c)
solutions = {a => b -> c, d => b -> c} (added d => b -> c)
equations = {(b -> c) -> c = c} (removed b = b -> c, applied b = b -> c)
solutions = {a => (b -> c) -> c, d => (b -> c) -> c, b => b -> c} (applied then added b => b -> c)
equations = {}
solutions = {a => (b -> (b -> c) -> c) -> c, d => (b -> (b -> c) -> c) -> (b -> c) -> c, b => b -> (b -> c) -> c, c = (b -> c) -> c} (applied then added c => (b -> c) -> c)
apply solutions to original equations - remember that the solutions should solve all the original equations:
a = b -> c
b -> (b -> c) -> c = b -> c (using a => ...)
b -> (b -> (b -> c) -> c) -> (b -> c) -> c = b -> (b -> c) -> c (using c => ...)
no matter how many times we do this the equation will never be solved...
Omitting the occurs check does not unify the equations according our definition.
The Wikipedia entry for Unification is amazing and goes into much more depth.
]]>The essential methods for enabling LINQ support are Select
and SelectMany
,
implemented as extension methods. They have the following types:
<B> Select<A,B>(this SomeData<A> a, Func<A,B> f)
SomeData<B> SelectMany<A,B>(this SomeData<A> a, Func<A,SomeData<B>> f)
SomeData<C> SelectMany<A,B,C>(this SomeData<A> a, Func<A,SomeData<B>> f, Func<A,B,C> g) // Overloaded to reduce levels of nesting SomeData
With implementations of these three methods, it is possible to write a query expression such as:
<A> myA = ...;
SomeData<B> myB = ...;
SomeData<A,B,C> f = ...;
Func<C> myC = from a in myA
SomeDatain myB
from b f(a,b); select
which will be compiled to something like:
SomeData<C> output = justWord.SelectMany(a => myB, (a, b) => f(a, b));
Readers who are familiar with Haskell or similar functional languages will
notice that Select
is fmap
, SelectMany
is >>=
and the
from .. in .. select
syntax is equivalent to Monad comprehensions. The above
code would be written in Haskell as follows:
= ...
myA = ...
myB = ...
f a b = do
myC <- myA
a <- myB
b return $ f a b
LINQ was designed to bring Monad comprehensions to C#. And it does. Almost.
Consider our query from earlier:
...
<C> myC = from a in myA
SomeDatain myB
from b f(a,b); select
This seems like a common pattern. We don’t want to write this code over and
over, so we abstract myA
, myB
and f
and make the query into a method.
<C> CombineWith<A,B,C>(SomeData<A> myA, SomeData<B> myB, Func<A,B,C> f)
SomeData{
return from a in myA from b in myB select f(a,b);
}
Now say we define a new data type to use with LINQ, call it OtherData<A>
, and
implement Select
and SelectMany
appropriately. We also want to implement
CombineWith
because from .. in .. from .. in .. select ..
is still a common
pattern that we want to avoid writing:
<C> CombineWith<A,B,C>(OtherData<A> myA, OtherData<B> myB, Func<A,B,C> f)
OtherData{
return from a in myA from b in myB select f(a,b);
}
There is a pattern emerging. For every data type that we want to use with LINQ, one must reimplement all LINQ-specific methods specifically for that type.
This is an issue because it grossly violates DRY (don’t repeat yourself). A well-written program should not have duplicated code - it makes maintenance more laborious and increases the chance of bugs.
So in an effort to save ourselves time, we should abstract over this common pattern. We require a function that specifies
for all generic classes
F<???>
implementingSelect
andSelectMany
, given an instance ofF
containingA
s, another instance ofF
containingB
s, and aFunc<A,B,C
, return anF
containingC
s
It turns out that it’s actually impossible to write this method in C#. I’d like
to write something like
F<C> CombineWith<F<?>,A,B,C>(F<A> myA, F<B> myB, Func<A,B,C> f)
, but C# only
allows abstraction over non-generic types.
To add a little more weight to this revelation, let’s imagine if we could
not abstract over the contents of a list ie. the method
List<A> Sort<A>(List <A> input)
cannot be expressed in this language. Due to
this limitation, we would have to create a new list class every time we needed
a different element type inside the list, then reimplement Sort
for each new
class.
ListOfInt.Sort
ListOfBool.Sort
ListOfSomeData.Sort
…
This is again a terrible violation of the “don’t repeat yourself” principle.
You write n
implementations of Sort
, where n
is the number of sortable
classes. Imagine that each implementation used the
proven incorrect version of TimSort.
If you wanted to implement the correct version, you would have to update n
methods.
Also consider the implementation of
List<B> Map<A,B>(List<A> input, Func<A,B> f)
in a generic-less language. You
would have to write a different method for each inhabitant of A
and B
ListOfInt.MapToListOfInt
ListOfInt.MapToListOfBool
ListOfInt.MapToListOfSomeData
ListOfBool.MapToListOfBool
…
You write n^2
Map
methods where n
is the number of of mappable classes.
More generally, in this generic-less language, you write O(n^m)
where m
is
the sum of should-be-generic inputs and should-be-generic outputs, and n
is
the number of should-be-generic classes.
This exponential growth of redundant nonsense also applies to our CombineWith
issue. For every LINQ-able class, you have to write a separate implementation
of CombineWith
, even though it’s exactly the same code!
Haskell (and other sane functional languages) uses a concept called “Higher
Kinded Types” to address this problem. Every type has a “kind” (denoted *
). In
C#, every type must have kind *
. Higher-kinded types are functions from kinds
to kinds. Given data declaration that has a single type variable, say
Maybe a = Just a | Nothing
, we say that Maybe
has kind * -> *
, which means
that it is a higher-kinded type that takes a type of kind *
and returns a type
of kind *
. In C#, every type must have kind *
ie. if you have a defined the
class List<A>
then you get a compile error if you refer to List
without
the type argument.
Let’s take another look the Haskell implementation of CombineWith
:
combineWith :: Monad m => m a -> m b -> (a -> b -> c) -> m c
= do
combineWith myA myB f <- myA
a <- myB
b return $ f a b
In this function, and the definition of the Monad typeclass (read: interface)m
implicitly has kind * -> *
. This function will work for any type that is
an instance of Monad (read: implements the Monad interface). In Haskell, this
code only needs to be written once. The cost of implementation and maintenance
of a group of functions has gone from O(n^m) to O(1).
Now you might say, “Well, I don’t use LINQ like that. I only use it for
IEnumerable
things”. This is akin to a user of our imaginary generic-less
language saying “Well, I don’t use Sort like that. I only sort lists of
integers”. It is agreed that a language without generics is counter to
productivity. It follows that a language without higher-kinded types is also
counter to productivity.
index.html
instead of home.html
. I renamed the
file, but that left me with countless references to “home.html” that needed
to be changed, and I wanted to change them all at once. Enter sed
.]]>$ sed -i "s/pattern/replacement/g" FILES
Last semester I had to write a static website by hand with no templating
resulting in a lot of duplicated code across multiple pages. I had already
finished most of the project when I realised that the main page of the
project should be named index.html
instead of home.html
. I renamed the
file, but that left me with countless references to “home.html” that needed
to be changed, and I wanted to change them all at once. Enter sed
.
sed
allows the user to write programs which operate on streams of text.
It is run using the syntax
$ sed OPTIONS.. [SCRIPT] [FILENAME..]
To search and replace using sed
we use the s
command of the form
s/regex/replacement/flags
. Our sed
script would become
s/home\.html/index.html/g
. The .
needs to be escaped because .
on its own
matches any character in regex. The g
flag means to replace every occurrence
of the pattern, instead of just the first.
By default, sed
will only write the altered text to stdout
, so we need to
use the -i
flag to make the alterations inside the source file.
The final command is now
$ sed -i "s/home\.html/index.html/g" *.html
which will apply the sed program to all the HTML files in the directory. Easy!
]]>In this article I will attempt to explain some of the instinctual problem solving techniques that experienced programmers use. Our problem will be “fizzbuzz”; a notorious yet straightforward problem used to separate programmers from non-programmers in job interviews. Its specification:
Write a program that prints the numbers from 1 to 100. But for multiples of three print “Fizz” instead of the number and for the multiples of five print “Buzz”. For numbers which are multiples of both three and five print “FizzBuzz”.
The first step in creating an answer is to examine the specification text
for keywords that you can translate into code. Consider the words “from
1 to 100”. If we want to access every number from 1 to 100, then the best
approach is to use a loop. All the other instructions apply to each individual
number, so the loop will contain this logic. for
and while
loops are
equally valid ways to complete the task, however I’ll use a for loop for this
example as it looks much cleaner.
for i in range(1,101): # range(a,b) has range a <= i < b
# do some things
The next few lines of the spec outline three conditions which could change what will be printed. These can all be expressed using “if-else” statements due to their boolean nature. Additionally, if the none of the conditions are satisfied, then the number is to be printed. This “default” behaviour can be specified in the “else” section of the statement.
When there are multiple “if-else” statements checking the same variable, it’s best to use “elif” statement for all the options instead of nested “if-else”.
for i in range(1,101):
if multiple_of_three(i):
print("Fizz")
elif multiple_of_five(i):
print("Buzz")
elif multiple_of_three_and_five(i):
print("FizzBuzz")
else:
print(i)
Here I’ve used some placeholder functions to express the divisibility of the
number, but how how should they be implemented? The simplest answer is to use
the modulus operator (%
). a % b
calculates the remainder of a / b
, so the
three multiple functions could be replaced by i % 3 == 0
, i % 5 == 0
and
i % 3 == 0 and i % 5 == 0
:
for i in range(1,101):
if i % 3 == 0:
print("Fizz")
elif i % 5 == 0:
print("Buzz")
elif i % 3 == 0 and i % 5 == 0:
print("FizzBuzz")
else:
print(i)
If one didn’t know of the modulus operator, however, the same functionality could be created using arithmetic:
A number a
is a multiple of another number b
if a / b
has no remainder.
There exists integers q
and r
such that a = q * b + r
. If a
is a
multiple of b
then a / b = q
, otherwise a / b = q + r / b
. This means
that for all r
, q = floor(a / b)
. Thus if a - b * floor(a / b) = 0
then
a
is a multiple of b
:
from math import floor # import the floor function from Python's math module
def multiple_of(a,b):
# if a is a multiple of b
return (a - (b * floor(a/b)) == 0)
for i in range(1,101):
if multiple_of(i,3):
print("Fizz")
elif multiple_of(i,5):
print("Buzz")
elif multiple_of(i,3) and multiple_of(i,5):
print("FizzBuzz")
else:
print(i)
The next step is testing the code. For simple programs this can be done by running the code and looking at the output.
$ python fizzbuzz.py
1
2
Fizz
4
Buzz
Fizz
7
8
Fizz
Buzz
11
Fizz
13
14
Fizz
...
The output of the program lacks any mentions of “FizzBuzz”, printing “Fizz” instead. This is a clue that the problem lies in the condition evalution. The numbers divisible by both three and five are evaluated as just being divisible by three. To fix this, either
...
# 1. changing condition evaluation order
if i % 3 == 0 and i % 5 == 0:
print("FizzBuzz")
elif i % 3 == 0: # numbers divisible by both three and five will never reach this condition
print("Fizz")
elif i % 5 == 0:
print("Buzz")
else:
print(i)
...
# 2. clarifying logic
if i % 3 == 0 and not i % 5 == 0: # we want to print fizz for numbers that are divisible by three and NOT divisible by five
print("Fizz")
elif i % 5 == 0 and not i % 3 == 0: # the opposite is true here
print("Buzz")
elif i % 3 == 0 and i % 5 == 0:
print("FizzBuzz")
else:
print(i)
Either way, the program now functions correctly.
Now, I can imagine that some people would have questions like “How would I recognise that loops would be useful in this?” or “How do I know to use if-else statements for this problem?” There are three answers to these kinds of questions:
Know your tools.
Knowledge of languages and tools does not define you as as programmer, but this knowledge does influence how effectively you can solve a problem using a given language. Strong knowledge of language features will give you an indication of which tasks are easy or difficult using that language, and and help you use full potential of the language to complete the task.
Practise.
If you only read and never practise you will never reach your full potential. Practise is essential in reinforcing learning.
Get feedback.
Learning is much easier when you have someone more experienced to guide you. As well as practising on your own, get your work reviewed by someone who knows more than you. They will easily be able to point inefficient or redundant code. Additionally, you need to take note of the tips they give you and then practise integrating them when you work. If you don’t take advice to heart then you will never improve.
In summary, when attempting programming problems you need to:
Learn your tools, use the tools and get feedback on your work to ensure contant improvement.
Good luck.
]]>You input a number and want to manipulate it while printing the result each time. If there were no intermediate IO operations we could use the state monad with the following state changes:
add :: Int -> State Int ()
= state $ \s -> ((),s + n)
add n
subtract :: Int -> State Int ()
subtract n = state $ \s -> ((),s - n)
chain them together:
manyOperations :: State Int ()
= do
manyOperations 1
add subtract 3
5
add 7
add subtract 22
then get the result:
= runState manyOperations 5 :: ((),Int) (_,result)
Now let’s consider how to print the state. If we want to preserve the above chaining syntax, we need a monad where:
This monad is called the state monad transformer.
The state monad transformer is defined as:
newtype StateT s m a = StateT { runStateT :: s -> m (a,s) }
^{1}Meaning that given an initial state s
and a state transformer st
, we can call runStateT st s
to get
a monad containing the state tuple.
The real beauty (or magic, as some would say) of this monad comes from the bind function. Let’s take a look at its definition:
(>>=) :: Monad n => StateT s n a -> (a -> StateT s n a) -> StateT s n a
>>= k = StateT $ \ s -> do
m ~(a, s') <- runStateT m s
runStateT (k a) s'
Time to break it down. m
is a state transformer. k
is a function that takes a result of type a
, and returns
a state transformer. The final state transformer, when run with an initial state, does the following:
m
with the initial state s
k
, returning a different state transformerThis means that we will be able to keep using a simple chained sequence of monads.
How does this relate to the problem at hand?
The monad component of the state transformer allows us to execute IO operations which have access to the state
during the computation. Here is how the add
and subtract
functions can be written using the state transformer
monad:
add :: Int -> StateT Int IO ()
= StateT $ \s -> do
add n print (s+n)
return ((),s+n)
subtract :: Int -> StateT Int IO ()
subtract n = StateT $ \s -> do
print (s-n)
return ((),s-n)
We can still chain them using the same syntax as before:
manyOperations :: StateT Int IO ()
= do
manyOperations 1
add subtract 3
5
add 7
add subtract 22
and run it:
= runStateT manyOperations 5
main -- output:
-- 6
-- 3
-- 8
-- 15
-- -7