Site update, self-hosted search via pagefind
Published: 01-07-2023 21:32 | Author: Remy van Elst | Text only version of this article
Table of Contents
This is a static site, meaning that no server-side processing occurs. All HTML is generated out of a few folders full of markdown source and then uploaded to the cluster. Searching on this site was always provided by a text-box form that sent you to google with 'site:raymii.org' appended to it. Works fine, but it sends all data to Google. With my recent removal of all Google Ads on this site, as well as tracking via Google Analytics, sending searches via Google seems wrong.
I recently found the
pagefind program which I now use on here, it is a self hosted static site search engine of sorts.
Recently I removed all Google Ads from this site due to their invasive tracking, as well as Google Analytics. Please, if you found this content useful, consider a small donation using any of the options below:
I'm developing an open source monitoring app called Leaf Node Monitoring, for windows, linux & android. Go check it out!
Consider sponsoring me on Github. It means the world to me if you show your appreciation and you'll help pay the server costs.
You can also sponsor me by getting a Digital Ocean VPS. With this referral link you'll get $100 credit for 60 days.
You can read all site-updated articles here.
Pagefind is written in Rust and runs after my static site generator binary.
pagefind indexes all generated HTML and provides an API to query that,
including a search UI which you can see at the bottom of every page here.
Perfect for my static site setup and it aims to not use much storage or
The search box used to look like this:
When you entered a term and pressed
ENTER you were sent to Google:
Now the search box looks like this:
I know, it's such a major change! Searching is instant and shows the results right on the page:
Notable changes include thumbnails and publication dates. I have not done any configuration whatsoever for the thumbnails, it just figured that out by itself. Cool!
What is pagefind?
Quoting the pagefind website:
Pagefind is a fully static search library that aims to perform well on large sites, while using as little of your users' bandwidth as possible, and without hosting any infrastructure.
Pagefind runs after Hugo, Eleventy, Jekyll, Next, Astro, SvelteKit, or any other SSG. The installation process is always the same: Pagefind only requires a folder containing the built static files of your website, so in most cases no configuration is needed to get started.
The goal of Pagefind is that websites with tens of thousands of pages should be searchable by someone in their browser, while consuming a reasonable amount of bandwidth. Pagefind's search index is split into chunks, so that searching in the browser only ever needs to load a small subset of the search index. Pagefind can run a full-text search on a 10,000 page site with a total network payload under 300KB, including the Pagefind library itself. For most sites, this will be closer to 100KB.
In my case this site has 489 articles as of the time this page was written. The search index is around 5MB in size (files on the filesystem, this includes a webassembly runtime).
Using the firefox devtools network tab performance analyzer I can see that
searching for the term
QObject uses around 250kB, excluding the images:
This matches the statement above regarding network payload. The search term
QObject returns 8 results currently.