Extending OCaml programs with Lua (soupault got plugin support)
Most of the time, when people make extensible programs in typed functional languages,
they make a DSL, not least because it's much easier to make a DSL in a language with algebraic types
and pattern matching than in one without.
Some use cases really require a general-purpose language though. That's where things get more interesting. Commonly used embeddable interpreters such as Lua, Guile, or Chicken are written in C. It's possible to make OCaml or Haskell bindings for them and such bindings do exist, but that's two high level languages communicating through a low level one.
It would be much better to be able to expose native types to the embedded language in a type-safe and more or less convenient fashion. Here's my take at it.
The use case: soupault website generator
Soupault is a website generator based on HTML rewriting instead of template processing. I made it for my own website out of conceptual disagreement with the “classic” workflow with Markdown and front matter that no one seems to question.
I don’t mind it for blogs like this one since the blog format itself is ridig, but for non-blog websites it easily becomes limiting and forces you to either mix Markdown with HTML or invent custom extensions—both approaches arguably defeat the purpose of Markdown.
Soupault works directly on HTML and uses CSS selectors for locating elements, for example,
That allows you to use any imagineable formatting and also offers
features impossible with classic generators. You can make every page look different if you want,
how much of the page is a template and how much is content is up to you. Its TOC and footnotes widgets
can reuse existing ids and make the links persist even if the heading text changes completely.
It’s also quite easy to use as a drop-in workflow upgrade for handwritten websites or other generators,
without losing original URLs—it doesn’t force any workflow on you.
The cost of the templateless approach is that if something is not already supported, it cannot be done at all. Generators that use logicless templates exclusively have the same problem. The usual way to solve that problem is to support plugins.
Most generators fall into two categories: easily extensible but slow or fast but not extensible. Those written in interpreted languages like Jekyll are trivial to add plugin support, and plugins are easy to distribute.
Hugo is well known for speed thanks to being a native executable, but it’s not extensible.
Wyam is a rare example of a middle ground, written for .Net.
Can we combine the native full speed of core components with extensibility? Soupault is written in OCaml, which does support dynamic linking and it’s quite easy to use, but the real problem is distributing plugins. The users would need to compile them for their platform, and it’s obviously much harder than just dropping a file into the plugins directory.
The Lua-ML project
I’ve discovered the Lua-ML project around the time I started working on soupault, so I immediately started wondering if I can use it.
Lua-ML is a pure OCaml implementation of Lua. The great thing about it is that it’s fully modular. You can replace any part with your own module as long as the interface is compatible. Link in new modules as if they were a part of the standard library. Sure. Replace a part of the standard library with your module? That’s possible. Replace the AST interpreter but keep the library? That’s possible too. There are no black boxes.
These are the good parts. The bad part is that it comes from a now defunct research project—a compiler backend named C–. One of the project members, Christian Lindig, salvaged it from C– and published it on Github, but hadn’t actively worked on it.
Last time any non-maintenance work was done on was around 2005 or so. I was to become its first real user, too.
It had a complicated build process due to its use of literate programming (even though most modules had no documentation)
and there was a fork of a 2004 version of the
Hashtbl module from the standard library inside it.
Fortunately, Christian turned out to be very far from an unreachable maintainer. He gave me write access to the original repository so that I could fix the issues, and answered all questions about the codebase he could answer.
After some long nights spent messing up with the code, and with a patch by Gabriel Radanne that adds OPAM packaging, the build process was sane enough to make it a build dependency.
Another bad part is that it implements Lua 2.5,
which was a rather limited language. Many improvements, including
for loops were only made later.
But, that’s a start.
For a simple interpreter example you can easily play with, check out luaclient.ml. My goal for this post is to walk through are more realistic example from the soupault codebase that can be found in plugin_api.ml.
That was a rather long introduction. Let’s see what using Lua-ML actually looks like.
We’ll start with a plugin example. This is a very simple plugin replicating the
feature of website generators that makes relative links into absolute URLs:
-- Converts relative links to absolute URLs -- e.g. "/about" -> "https://www.example.com/about" -- Get the URL from the widget config site_url = config["site_url"] if not Regex.match(site_url, "(.*)/$") then site_url = site_url .. "/" end links = HTML.select(page, "a") -- That's Lua 2.5, hand-cranked iteration... index, link = next(links) while index do href = HTML.get_attribute(link, "href") if href then -- Check if URL schema is present if not Regex.match(href, "^([a-zA-Z0-9]+):") then -- Remove leading slashes href = Regex.replace(href, "^/*", "") href = site_url .. href HTML.set_attribute(link, "href", href) end end index, link = next(links, index) end
Assembling the interpreter
“Abstract” types are known as
userdata in Lua. To expose our type to Lua,
we need to make a module matching this signature:
module type USERDATA = sig type 'a t (* type parameter will be Lua value *) val tname : string (* name of this type, for projection errors *) val eq : ('a -> 'a -> bool) -> 'a t -> 'a t -> bool val to_string : ('a -> string) -> 'a t -> string end
So, we need a type, a string name for it, and functions for equality and string conversion.
The lambdasoup library uses phantom types to distinguish between element nodes and non-elements (roots, text, and whitespace) for better type safety: internally all nodes have the same structure, but their types are artificially made different so that you can’t do things that make no sense, like inserting a child into a text node. We’ll artificially force that type to monomorphic with a simple sum type wrapper and some conversion/coercion functions:
module Html = struct type soup_wrapper = | GeneralNode of Soup.general Soup.node | ElementNode of Soup.element Soup.node | SoupNode of Soup.soup Soup.node type 'a t = soup_wrapper let tname = "html" let eq _ = fun x y -> Soup.equal_modulo_whitespace (to_general x) (to_general y) let to_string _ s = Soup.to_string (to_general s) let from_soup s = SoupNode s let from_element e = ElementNode e let to_element n = match n with | ElementNode n -> n | _ -> raise (Plugin_error "Expected an element, but found a document") let to_general n = match n with | GeneralNode n -> n | ElementNode n -> Soup.coerce n | SoupNode n -> Soup.coerce n let select soup selector = to_general soup |> Soup.select selector |> Soup.to_list |> List.map (fun x -> ElementNode x) let get_attribute node attr_name = to_element node |> Soup.attribute attr_name let set_attribute node attr_name attr_value = to_element node |> Soup.set_attribute attr_name attr_value end
Now we need to make modules that provide embedding and projection for out types (that is, conversion
to and from Lua values). For that we need to feed our module to a
It provides multiple different functors for different number of modules to handle, we’ll use the
one fo handle the built-in
Luaiolib.T module (that provides I/O functions) and our module at once:
module T = Lua.Lib.Combine.T2 (Luaiolib.T) (Html) module LuaioT = T.TV1 module HtmlT = T.TV2
HtmlT modules are ready to use. Use for what exactly? For assembling a complete Lua library.
The regex module works with strings, which are supported by Lua-ML without resorting to custom types,
so it’s just a simple wrapper for
ocaml-re and we do not need to do anything special with it.
module Re_wrapper = struct let replace ?(all=false) s pat sub = try let re = Re.Perl.compile_pat pat in Re.replace ~all:all ~f:(fun _ -> sub) re s with Re__Perl.Parse_error | Re__Perl.Not_supported -> raise (Plugin_error (Printf.sprintf "Malformed regex \"%s\"" pat)) (* ... *)
Assembling the library
This is the complicated part. The first stage is to create a functor that will convert our
to a Lua library and register the Lua-visible
Regex modules in the interpreter state.
The functor will take a
Lua.Lib.TYPEVIEW module setup with type
'a Html.t to make the module
with embedding and projection functions from it.
Simply creating such a module will not yet expose it to Lua. For that we need to pass a list of function name
and function tuples to
Lua-friendly functions are created from OCaml functions using combinators from the
C module create by the
module MakeLib (HtmlV: Lua.Lib.TYPEVIEW with type 'a t = 'a Html.t) : Lua.Lib.USERCODE with type 'a userdata' = 'a HtmlV.combined = struct type 'a userdata' = 'a HtmlV.combined module M (C: Lua.Lib.CORE with type 'a V.userdata' = 'a userdata') = struct module V = C.V let ( **-> ) = V.( **-> ) let ( **->> ) x y = x **-> V.result y module Map = struct let html = HtmlV.makemap V.userdata V.projection end (* Map *) let init g = C.register_module "HTML" [ "select", V.efunc (Map.html **-> V.string **->> (V.list Map.html)) Html.select; "get_attribute", V.efunc (Map.html **-> V.string **->> V.option V.string) Html.get_attribute; "set_attribute", V.efunc (Map.html **-> V.string **-> V.string **->> V.unit) Html.set_attribute; (* ... *) ] g; C.register_module "Regex" [ "replace", V.efunc (V.string **-> V.string **-> V.string **->> V.string) (Re_wrapper.replace ~all:false); (* ... *) ] g end (* M *) end (* MakeLib *)
Now we need to link those modules together:
module W = Lua.Lib.WithType (T) module C = Lua.Lib.Combine.C5 (Luaiolib.Make(LuaioT)) (Luacamllib.Make(LuaioT)) (W (Luastrlib.M)) (W (Luamathlib.M)) (MakeLib (HtmlT))
And finally create an interpreter module:
module I = Lua.MakeInterp (Lua.Parser.MakeStandard) (Lua.MakeEval (T) (C))
Passing values to the interpreter
That’s all good, but to make it possible for plugins to modify internal values of our program, we need to pass them to the interpreter.
This is where the
HtmlT module we created is needed. It provides a
function that creates a record whose fields are functions, among them the
project we need:
let lua_of_soup s = let v = HtmlT.makemap I.Value.userdata I.Value.projection in v.embed s let soup_of_lua l = let v = HtmlT.makemap I.Value.userdata I.Value.projection in v.project l
Running the interpreter
Finally we can setup an environment and run Lua code in it:
let state = I.mk () in let soup = Soup.parse "<p>hello world</p>" in let () = I.register_globals ["page", lua_of_soup (Html.SoupNode soup)] state in let _ = I.dostring state "print(page)" in
I.dofile functions return a list of Lua values now.
It’s not very easy to work with, and worse, execution errors are only logged
stderr and the caller has no easy way to see if plugin execution succeeded
or failed. That’s definitely one of the things to fix.