in which web pages are served but also fetched

I'm not much of a "web programmer". But when I do make web sites, I mostly like to write programs that spit out HTML files, and then put them on my web server. It's simple, fast, and gets the job done for most of the web sites I've needed. Every so often I come across something that needs a little more functionality than static pages can offer, and in those cases I'd put up a 15-line CGI script to save off some data for later analysis or whatever. Works great, and I highly recommend it when it fits the problem at hand.

But what about sites that are mostly dynamic? Using CGI in that case might not be a great fit. Or what if you need to use a server that doesn't support CGI at all? If you're already using nginx for your site, then OpenResty looks like a compelling choice. But I haven't used nginx in ages. I was curious about more standalone alternatives. That's when I discovered moonmint.

The moonmint web framework was created by Calvin Rose, who also created the first version of Fennel¹. It is written in Lua, but it's very easy to use from Fennel. Now of course Lua does not have any functionality built-in from which you can make a web server. You can't open a socket without some third-party code. Often people use luasocket for this, but moonmint takes a different approach and uses luv instead, which is a library that provides bindings to libuv, a multi-platform asynchronous I/O framework.

My goal was to build a hyper-personalized search engine that indexed just the links I've posted. This felt like a good fit for moonmint. I found that SQLite contains a very capable full-text search engine called fts5 and decided to use that to store the pages. For the final piece of the puzzle, I used pandoc to convert HTML into plain text suitable for indexing.

Once I had all the pieces put together, the first working version of the site came together very quickly. I fed it a list of URLs from my bookmarks file and was able to index and query them with ease in around 160 lines of code. But the problem with this first version is that I was feeding the URLs directly to pandoc, which was very convenient when the pages loaded, but pandoc would index them regardless of the response code, so I got a lot of 404 pages in the index. So I needed to start making these HTTP requests myself in order to ensure that only successful requests got their pages included.

And this is where things started to get tricky, because making HTTPS requests on the Lua runtime is ... unfortunately a bit of an achilles' heel at this point. There are a lot of options, but none of them are without significant problems.

The moonmint framework actually came with a module for making HTTPS requests, but its relied on a luarocks dependency for its TLS functionality, and that dependency has since introduced incompatible changes. Due to luarocks version philosophy of "just give me whatever version you feel like", it has stopped working with moonmint. These kinds of problems are predictably somewhat common with luarocks, so I try to use other ways to get libraries when possible².
Installing via apt-get instead of luarocks will tend to get you more stable results. There are two Lua HTTP clients available in apt-get on debian stable, which is what I run on my machines: lua-http and lua-curl. Neither of these are packaged for use with Lua 5.4.
I've used luasocket for projects in the past; mostly for IRC stuff. It's pretty good for that! It doesn't do TLS on its own tho. There is a luasec library that offers a similar API but with added TLS capabilities. It also wraps luasocket's HTTP client in an HTTPS variant. However, luasec does not perform any certificate verification by default! This makes its out-of-the-box behavior very unsafe, and I have to recommend against using it for anything. While it is possible to enable certificate verification, leaving such an unsafe default shows very poor decision-making, and I would not trust this library to do other things right if they get such a basic thing wrong.
The latest one I've seen is lua-https, created by the folks behind LÃVE, the 2D game framework. My project has nothing to do with games, but the LÃVE folks tend to do a good job with their APIs, and I figured it would be worth giving it a shot. It's not available in apt-get yet, but unlike most Lua libraries that have C code included, it's pretty easy to build yourself, so I added it as a git submodule and threw a handful of lines into my Makefile to compile it³. The only downside I've seen so far with this library is that it does not support streaming the responses. For my indexer, it would be nice if we could look at the HTTP headers before fetching the whole request, so I could skip known-unusable mime types like videos. However, given the alternatives, this seemed like the best choice for me.

My situation is a bit unusual. I have indexing runs which happen in batch jobs completely disconnected from the web app. They populate the SQLite database, which is only ever queried from the web app. The web app doesn't make any of its own HTTPS requests. If it did, using lua-https would mean that those requests would block until they completed, during which the server could not process any other requests. The site I'm building is a hyper-personalized search engine. You're welcome to use it too, of course, but it's really most useful to me. It has a target audience of one. So I'm not at all concerned about problems like blocking the main thread. If I were, I would need to replace it with a non-blocking equivalent, ideally tied into the event loop of luv.⁴

My search engine is also unusual because its indexer does not really "crawl" at all; that is, it fetches a list of pages but does not recursively branch out to the pages that those pages link to. That means the index remains pretty small. For a general-purpose search engine, this would render it more or less useless, but for a hyper-personalized search engine, it's honestly not that bad, and it makes it much simpler to code. I've gone into more detail on the site about why I built my own search engine in this unusual way so I won't get too much more into that here, but all my URLs so far come either from my bookmarks file, or from links I have posted on my social media accounts. My next planned step is to start indexing links that are posted by accounts that I follow.

Requests right now are very fast; between 20 and 80 milliseconds⁵ depending on how many results there are. I would expect that hundreds of concurrent users would be supported easily without much noticeable slowdown. But if I was interested in scaling this up beyond that, I would put several different lua processes behind a load balancer. I put the server behind Caddy in order to do the TLS termination and to let me have multiple sites on the same server, but I only run one application server for the search engine. If you needed to handle more traffic, balancing across many different server processes across a few different ports is easy to do with a small tweak to the Caddy config.

Anyway, if you are interested in how to build a simple web app in Fennel, take a look at the code! At just under 350 lines, it's pretty straightforward and readable. I'm not sure I would necessarily recommend Moonmint over Openresty for web applications in general, but if you're doing something with simple I/O requirements and want a solution with fewer moving parts, give it a try.

[1] When I first looked at moonmint, I had an eerie sense of deja-vu. It was a project written in Lua by Calvin Rose in 2016 which had not seen any further development since 2016. Those exact same circumstances could also describe Fennel when I first found it! It's about 1500 lines of code. My own fork of moonmint removes the non-functioning OpenSSL stuff.

[2] There's a newer project called deps.fnl which tries to work around the problems inherent in the luarocks model. I did not try it for this because I needed a few dependencies which it couldn't handle like pandoc and a lua library for robots.txt that was kept in svn. (yes, really!) But if you need a library that's hard to use without luarocks but want an actual reliable build, you should give it a look.

[3] The advantage of this approach is it makes it pretty easy to build the dependency with static linking. When you're writing a CLI tool to distribute, this can be really handy. I used this technique for another Fennel project of mine in the past, but it wasn't relevant for a web application.

[4] This is one place where Openresty might be a better choice; it's already got non-blocking functionality for HTTPS requests and database queries and whatever you'd want to do. And the non-blocking calls can be nicely abstracted away using coroutines so that the code reads linearly and doesn't fall prey to the callback-soup style that is common among async programs.

[5] These response times are coming from my home server, which is a 4-core Thinkpad from 2016 on my home DSL. This machine also hosts my two social media servers. The search engine process consumes around 20 MB of resident RAM.

« older | 2025-04-08T15:33:17Z

à¹