I E

Exploring a dependency graph with Prolog

I was working on my (currently private) YouTube feed reader project, watching the dependencies build after I upgraded the GHC version, when I noticed an odd transitive dependency: happy. happy is a parser generator, and this project is a web app. Why does my web app depend on a parser generator?

This reminded me of my recent Haskell dependency graph exploration, where I wondered whether there was a good way to query dependency graph. It then occurred to me that Datalog might be a good query language for this. I couldn’t find a canonical FOSS Datalog implementation, so I went with GNU Prolog instead. Here’s how it turned out.

Setup

We’re going to turn the output of ghc-pkg dot into a set of Prolog facts. An edge from a to b will be encoded as a fact depends_on(a, b).

$ cat > dot2pl.py <<EOF
$ cat > dot2pl.py <<EOF
import re
import sys

def dot_to_prolog(dot_content):
    edges = re.findall(r'"(.+)"\s*->\s*"(.+)"', dot_content)

    for source, target in edges:
        print(f"depends_on('{source}', '{target}').")

line = input()
if line == "digraph {":
    pass
else:
    raise Error(f"Expected {"digraph {"}, got {line}")

while True:
    line = input()
    if line == "}":
        break
    else:
        dot_to_prolog(line)
EOF
(no output)

Where are my package databases?

$ cabal exec -- bash c 'cat $(printenv GHC_ENVIRONMENT) | head -n 20'
(output)

Configuration is affected by the following files:
- cabal.project
-- This is a GHC environment file written by cabal. This means you can
-- run ghc or ghci and get the environment of the project as a whole.
-- But you still need to use cabal repl $target to get the environment
-- of specific components (libs, exes, tests etc) because each one can
-- have its own source dirs, cpp flags etc.
--
clear-package-db
global-package-db
package-db /home/isaac/.local/state/cabal/store/ghc-9.8.4-4a67/package.db
package-db /home/isaac/youtube-feeds/dist-newstyle/packagedb/ghc-9.8.4
package-id youtube-feeds-atom-0.1.0.0-inplace
package-id base-4.19.2.0-32b8
package-id ghc-bignum-1.3-455a
package-id ghc-prim-0.11.0-3f63
package-id rts-1.0.2
package-id bytestring-0.12.1.0-2a7c
package-id deepseq-1.5.1.0-88f5
package-id template-haskell-2.21.0.0-bbd9
package-id ghc-boot-th-9.8.4-77b2
package-id pretty-1.1.3.6-6e96

Generate the facts.

$ ghc-pkg --package-db=/home/isaac/.local/state/cabal/store/ghc-9.8.4-4a67/package.db --package-db=/home/isaac/youtube-feeds/dist-newstyle/packagedb/ghc-9.8.4 dot | python3 dot2pl.py > deps.pl
(no output)

Now add some definitions for graph manipulation.

why_depends is named after nix why-depends, which is the canonical way to ask this question of installed Nix derivations. Part of my inspiration for this investigation is the sense that a question like why-depends shouldn’t be baked into CLI of a tool like Nix. Instead, there should be a language in which it’s easy to ask that question and all the other questions that would never have been anticipated by the tool’s authors. #

$ cat > graph.pl <<EOF
$ cat > graph.pl <<EOF
% Identify nodes.
node(X) :- depends_on(X, _).

% Identify paths from one node to another.
why_depends(X, Y, [X, Y]) :- depends_on(X, Y).
why_depends(X, Z, [X, Y | P]) :- depends_on(X, Y), why_depends(Y, Z, [Y | P]).
EOF
(no output)

Exploration

$ gprolog
(output)
GNU Prolog 1.5.0 (64 bits)
Compiled Jul  8 2021, 09:35:47 with gcc
Copyright (C) 1999-2025 Daniel Diaz

| ?-
| ?- [deps]. [graph].
(output)

compiling /home/isaac/youtube-feeds/deps.pl for byte code...
/home/isaac/youtube-feeds/deps.pl compiled, 335 lines read - 53725 bytes written, 8 ms

(2 ms) yes
compiling /home/isaac/youtube-feeds/graph.pl for byte code...
/home/isaac/youtube-feeds/graph.pl compiled, 4 lines read - 993 bytes written, 2 ms

yes

What dependencies do we have?

| ?- setof(N, node(N), Ns).
(output)

Ns = ['MemoTrie-0.6.11','QuickCheck-2.15.0.1','aeson-2.2.3.0','ansi-terminal-1.1.2','ansi-terminal-types-1.1','asn1-encoding-0.9.6','asn1-parse-0.9.5','asn1-types-0.3.4','async-2.2.5','attoparsec-0.14.4','base64-1.0','bifunctors-5.6.2','blaze-textual-0.2.3.1','case-insensitive-1.2.1.0','cborg-0.2.10.0','comonad-5.0.9','contravariant-1.5.5','cookie-0.5.0','crypton-1.0.1','crypton-connection-0.4.3','crypton-x509-1.7.7','crypton-x509-store-1.6.9','crypton-x509-system-1.6.7','crypton-x509-validation-1.6.13','data-default-class-0.2.0.0','data-fix-0.3.4','distributive-0.6.2.1','generics-sop-0.5.1.4','graphviz-2999.20.2.0','happy-lib-2.1.3','hashable-1.5.0.0','http-client-0.7.18','http-client-tls-0.3.6.4','http-date-0.0.11','http-media-0.8.1.1','http-semantics-0.3.0','http-types-0.12.4','http2-5.3.9','indexed-traversable-instances-0.1.2','integer-conversion-0.1.1','iproute-1.7.15','memory-0.18.0','network-control-0.1.3','network-uri-2.6.4.2','old-time-1.1.0.4','pem-0.2.4','pretty-show-1.10','prettyprinter-ansi-terminal-1.1.3','psqueues-0.2.8.0','quickcheck-state-machine-0.10.1','random-1.2.1.3','recv-0.1.0','scientific-0.3.8.0','semialign-1.3.1','semigroupoids-6.0.1','serialise-0.2.6.1','simple-sendfile-0.2.32','socks-0.6.1','sqlite-simple-0.4.19.0','streaming-commons-0.2.2.6','strict-0.5.1','temporary-1.3','text-iso8601-0.1.1','text-short-0.1.6','these-1.2.1','time-compat-1.9.8','time-manager-0.2.2','tls-2.1.6','unix-time-0.4.16','unliftio-0.2.25.0','unordered-containers-0.2.20','uri-encode-1.5.0.7','uuid-types-1.0.6','vault-0.3.1.5','vector-0.13.2.0','wai-3.2.4','warp-3.4.7','witherable-0.5','wl-pprint-text-1.2.0.2','youtube-feeds-log-0.1.0.0','youtube-feeds-oauth2-0.1.0.0','youtube-feeds-server-0.1.0.0','youtube-feeds-sql-0.1.0.0','youtube-feeds-youtubev3-api-0.1.0.0','z-happy-lib-z-backend-glr-2.1.3','z-happy-lib-z-backend-lalr-2.1.3','z-happy-lib-z-frontend-2.1.3','z-happy-lib-z-tabular-2.1.3','z-quickcheck-state-machine-z-no-vendored-treediff-0.10.1']

yes

Looks good.

I can’t find happy in there at a glance. Is happy in here?

| ?- setof(N, (node(N), atom_concat(happy, _, N)), Ns).
(output)

Ns = ['happy-lib-2.1.3']

yes

Cool. What are the names of my project’s packages again?

| ?- setof(N, Suffix^(node(N), atom_concat('youtube-feeds', Suffix, N)), Ns).
(output)

Ns = ['youtube-feeds-log-0.1.0.0','youtube-feeds-oauth2-0.1.0.0','youtube-feeds-server-0.1.0.0','youtube-feeds-sql-0.1.0.0','youtube-feeds-youtubev3-api-0.1.0.0']

yes

Let’s see whether the first one depends on happy.

| ?- why_depends('youtube-feeds-log-0.1.0.0', 'happy-lib-2.1.3', Path).
(output)

no

It doesn’t. Let’s check them all.

| ?- setof(N, Suffix^(node(N), atom_concat('youtube-feeds', Suffix, N)), Ns), member(Package, Ns), why_depends(Package, 'happy-lib-2.1.3', Path).
(output)

no

Okay, that’s surprising. I definitely saw happy being built.

Well, what depends on happy-lib-2.1.3?

| ?- setof(P, depends_on(P, 'happy-lib-2.1.3'), Ps).
(output)

no

Hm. Okay, I think that happy is an executable required for the build of another package.

Since that didn’t work out, let’s play with this approach by looking for the ways I depend on cborg-0.2.10.0.

| ?- setof(N, Suffix^(node(N), atom_concat('youtube-feeds', Suffix, N)), Ns), member(Package, Ns), why_depends(Package, 'cborg-0.2.10.0', Path).
(output)

Ns = ['youtube-feeds-log-0.1.0.0','youtube-feeds-oauth2-0.1.0.0','youtube-feeds-server-0.1.0.0','youtube-feeds-sql-0.1.0.0','youtube-feeds-youtubev3-api-0.1.0.0']
Package = 'youtube-feeds-oauth2-0.1.0.0'
Path = ['youtube-feeds-oauth2-0.1.0.0','http-client-tls-0.3.6.4','crypton-connection-0.4.3','tls-2.1.6','serialise-0.2.6.1','cborg-0.2.10.0'] ? ;

Ns = ['youtube-feeds-log-0.1.0.0','youtube-feeds-oauth2-0.1.0.0','youtube-feeds-server-0.1.0.0','youtube-feeds-sql-0.1.0.0','youtube-feeds-youtubev3-api-0.1.0.0']
Package = 'youtube-feeds-oauth2-0.1.0.0'
Path = ['youtube-feeds-oauth2-0.1.0.0','http-client-tls-0.3.6.4','tls-2.1.6','serialise-0.2.6.1','cborg-0.2.10.0'] ? ;

Ns = ['youtube-feeds-log-0.1.0.0','youtube-feeds-oauth2-0.1.0.0','youtube-feeds-server-0.1.0.0','youtube-feeds-sql-0.1.0.0','youtube-feeds-youtubev3-api-0.1.0.0']
Package = 'youtube-feeds-server-0.1.0.0'
Path = ['youtube-feeds-server-0.1.0.0','youtube-feeds-oauth2-0.1.0.0','http-client-tls-0.3.6.4','crypton-connection-0.4.3','tls-2.1.6','serialise-0.2.6.1','cborg-0.2.10.0'] ? ;

Ns = ['youtube-feeds-log-0.1.0.0','youtube-feeds-oauth2-0.1.0.0','youtube-feeds-server-0.1.0.0','youtube-feeds-sql-0.1.0.0','youtube-feeds-youtubev3-api-0.1.0.0']
Package = 'youtube-feeds-server-0.1.0.0'
Path = ['youtube-feeds-server-0.1.0.0','youtube-feeds-oauth2-0.1.0.0','http-client-tls-0.3.6.4','tls-2.1.6','serialise-0.2.6.1','cborg-0.2.10.0'] ? ;

Ns = ['youtube-feeds-log-0.1.0.0','youtube-feeds-oauth2-0.1.0.0','youtube-feeds-server-0.1.0.0','youtube-feeds-sql-0.1.0.0','youtube-feeds-youtubev3-api-0.1.0.0']
Package = 'youtube-feeds-server-0.1.0.0'
Path = ['youtube-feeds-server-0.1.0.0','youtube-feeds-youtubev3-api-0.1.0.0','http-client-tls-0.3.6.4','crypton-connection-0.4.3','tls-2.1.6','serialise-0.2.6.1','cborg-0.2.10.0'] ? ;

Ns = ['youtube-feeds-log-0.1.0.0','youtube-feeds-oauth2-0.1.0.0','youtube-feeds-server-0.1.0.0','youtube-feeds-sql-0.1.0.0','youtube-feeds-youtubev3-api-0.1.0.0']
Package = 'youtube-feeds-server-0.1.0.0'
Path = ['youtube-feeds-server-0.1.0.0','youtube-feeds-youtubev3-api-0.1.0.0','http-client-tls-0.3.6.4','tls-2.1.6','serialise-0.2.6.1','cborg-0.2.10.0'] ? ;

Ns = ['youtube-feeds-log-0.1.0.0','youtube-feeds-oauth2-0.1.0.0','youtube-feeds-server-0.1.0.0','youtube-feeds-sql-0.1.0.0','youtube-feeds-youtubev3-api-0.1.0.0']
Package = 'youtube-feeds-server-0.1.0.0'
Path = ['youtube-feeds-server-0.1.0.0','youtube-feeds-youtubev3-api-0.1.0.0','youtube-feeds-oauth2-0.1.0.0','http-client-tls-0.3.6.4','crypton-connection-0.4.3','tls-2.1.6','serialise-0.2.6.1','cborg-0.2.10.0'] ? ;

Ns = ['youtube-feeds-log-0.1.0.0','youtube-feeds-oauth2-0.1.0.0','youtube-feeds-server-0.1.0.0','youtube-feeds-sql-0.1.0.0','youtube-feeds-youtubev3-api-0.1.0.0']
Package = 'youtube-feeds-server-0.1.0.0'
Path = ['youtube-feeds-server-0.1.0.0','youtube-feeds-youtubev3-api-0.1.0.0','youtube-feeds-oauth2-0.1.0.0','http-client-tls-0.3.6.4','tls-2.1.6','serialise-0.2.6.1','cborg-0.2.10.0'] ? ;

Ns = ['youtube-feeds-log-0.1.0.0','youtube-feeds-oauth2-0.1.0.0','youtube-feeds-server-0.1.0.0','youtube-feeds-sql-0.1.0.0','youtube-feeds-youtubev3-api-0.1.0.0']
Package = 'youtube-feeds-youtubev3-api-0.1.0.0'
Path = ['youtube-feeds-youtubev3-api-0.1.0.0','http-client-tls-0.3.6.4','crypton-connection-0.4.3','tls-2.1.6','serialise-0.2.6.1','cborg-0.2.10.0'] ? ;

Ns = ['youtube-feeds-log-0.1.0.0','youtube-feeds-oauth2-0.1.0.0','youtube-feeds-server-0.1.0.0','youtube-feeds-sql-0.1.0.0','youtube-feeds-youtubev3-api-0.1.0.0']
Package = 'youtube-feeds-youtubev3-api-0.1.0.0'
Path = ['youtube-feeds-youtubev3-api-0.1.0.0','http-client-tls-0.3.6.4','tls-2.1.6','serialise-0.2.6.1','cborg-0.2.10.0'] ? ;

Ns = ['youtube-feeds-log-0.1.0.0','youtube-feeds-oauth2-0.1.0.0','youtube-feeds-server-0.1.0.0','youtube-feeds-sql-0.1.0.0','youtube-feeds-youtubev3-api-0.1.0.0']
Package = 'youtube-feeds-youtubev3-api-0.1.0.0'
Path = ['youtube-feeds-youtubev3-api-0.1.0.0','youtube-feeds-oauth2-0.1.0.0','http-client-tls-0.3.6.4','crypton-connection-0.4.3','tls-2.1.6','serialise-0.2.6.1','cborg-0.2.10.0'] ? ;

Ns = ['youtube-feeds-log-0.1.0.0','youtube-feeds-oauth2-0.1.0.0','youtube-feeds-server-0.1.0.0','youtube-feeds-sql-0.1.0.0','youtube-feeds-youtubev3-api-0.1.0.0']
Package = 'youtube-feeds-youtubev3-api-0.1.0.0'
Path = ['youtube-feeds-youtubev3-api-0.1.0.0','youtube-feeds-oauth2-0.1.0.0','http-client-tls-0.3.6.4','tls-2.1.6','serialise-0.2.6.1','cborg-0.2.10.0'] ? ;

(2 ms) no

There’s a lot of duplication because some of the project’s packages depend on each other. I think a better way to ask this question is to get all the project’s direct dependencies that transitively depend on the target.

| ?- assertz((project_packages(X) :- setof(N, Suffix^(node(N), atom_concat('youtube-feeds', Suffix, N)), X))).
(output)

yes
| ?- findall(Path, (project_packages(Packages), member(Package, Packages), depends_on(Package, Dep), \+ member(Dep, Packages), why_depends(Dep, 'cborg-0.2.10.0', Path)), Results).
(output)

Results = [['http-client-tls-0.3.6.4','crypton-connection-0.4.3','tls-2.1.6','serialise-0.2.6.1','cborg-0.2.10.0'],['http-client-tls-0.3.6.4','tls-2.1.6','serialise-0.2.6.1','cborg-0.2.10.0'],['http-client-tls-0.3.6.4','crypton-connection-0.4.3','tls-2.1.6','serialise-0.2.6.1','cborg-0.2.10.0'],['http-client-tls-0.3.6.4','tls-2.1.6','serialise-0.2.6.1','cborg-0.2.10.0']]

yes

It looks like my project only depends on cborg-0.2.10.0 via http-client-tls-0.3.6.4. Let’s verify that more concisely.

| ?- assertz((transitively_depends_on(X, Y) :- depends_on(X, Y))).
(output)

yes
| ?- assertz((transitively_depends_on(X, Z) :- depends_on(X, Y), transitively_depends_on(Y, Z))).
(output)

yes
| ?- setof(Dep, Packages^Package^(project_packages(Packages), member(Package, Packages), depends_on(Package, Dep), \+ member(Dep, Packages), transitively_depends_on(Dep, 'cborg-0.2.10.0')), Results).
(output)

Results = ['http-client-tls-0.3.6.4']

yes

Thoughts and ideas

Include build tool dependencies in the graph

I didn’t manage to figure out how my project transitively depends on happy because it’s probably used as a build tool somewhere. The GHC package database doesn’t seem to store build tool information, so the next step would be to find all my dependencies’ Cabal files and include build-tool-depends information in my Prolog facts.

Create a dependency graph for all of Hackage

To really test the performance of this sort of query language, build a facts database for the entirety of Hackage by extracting Cabal files from the Hackage index.

Use Datalog to query the Nix store

The Nix store’s dependency graph was already on my mind. What’s it like to query with Datalog? It will probably be easier than Hackage, because Nix derivations are stored in JSON whereas Hackage uses the Cabal format.

Datalog & Postgres?

I really enjoyed writing these Prolog queries. They make SQL seem clunky in comparison. I’m be interested in trying Datalog for some typical “OLTP” / “CRUD” applications, and it would be nice to avoid reinventing DBMS wheels.

Standalone relational storage engines?

It would be cool if relational databases like Postgres and SQLite had explicitly decoupled storage engines. My (lay) impression is that these are monolithic systems that aren’t intended to multiple frontends to the same underlying storage.

A standalone relational storage engine would be a C library that efficiently stores and searches records on disk or in memory. This library can then be used by a DBMS for storage management. Because it’s a standalone library, you’re no longer locked in to that particular DBMS. If I want a Datalog frontend, I can write it using the same storage library and have something that’s compatible with existing databases.

The pieces are probably all there to do this with something like Postgres; the storage format is well-documented and I wouldn’t be surprised if it was appropriately decoupled in the Postgres codebase. But I feel like the monolithic style of DBMSs discourage this kind of thinking. In contrast, imagine something like “ZeroMQ for database storage”.

← Context menus Reply to "Standalone relational storage engines?" →