Skip to main content Logo

Quis custodiet ipsos custodes?
Home | About | All pages | Cluster Status | RSS Feed | Gopher

Site update, self-hosted search via pagefind

Published: 01-07-2023 21:32 | Author: Remy van Elst | Text only version of this article

Table of Contents

This is a static site, meaning that no server-side processing occurs. All HTML is generated out of a few folders full of markdown source and then uploaded to the cluster. Searching on this site was always provided by a text-box form that sent you to google with '' appended to it. Works fine, but it sends all data to Google. With my recent removal of all Google Ads on this site, as well as tracking via Google Analytics, sending searches via Google seems wrong.

I recently found the pagefind program which I now use on here, it is a self hosted static site search engine of sorts.

Recently I removed all Google Ads from this site due to their invasive tracking, as well as Google Analytics. Please, if you found this content useful, consider a small donation using any of the options below:

I'm developing an open source monitoring app called Leaf Node Monitoring, for windows, linux & android. Go check it out!

Consider sponsoring me on Github. It means the world to me if you show your appreciation and you'll help pay the server costs.

You can also sponsor me by getting a Digital Ocean VPS. With this referral link you'll get $100 credit for 60 days.

You can read all site-updated articles here.

Pagefind is written in Rust and runs after my static site generator binary. pagefind indexes all generated HTML and provides an API to query that, including a search UI which you can see at the bottom of every page here. Perfect for my static site setup and it aims to not use much storage or bandwidth.

The search box used to look like this:

old search img

When you entered a term and pressed ENTER you were sent to Google:

old search via Google

Now the search box looks like this:

new search box

I know, it's such a major change! Searching is instant and shows the results right on the page:

new search results

Notable changes include thumbnails and publication dates. I have not done any configuration whatsoever for the thumbnails, it just figured that out by itself. Cool!

What is pagefind?

Quoting the pagefind website:

Pagefind is a fully static search library that aims to perform well on large sites, while using as little of your users' bandwidth as possible, and without hosting any infrastructure.

Pagefind runs after Hugo, Eleventy, Jekyll, Next, Astro, SvelteKit, or any other SSG. The installation process is always the same: Pagefind only requires a folder containing the built static files of your website, so in most cases no configuration is needed to get started.

After indexing, Pagefind adds a static search bundle to your built files, which exposes a JavaScript search API that can be used anywhere on your site. Pagefind also provides a prebuilt UI that can be used with no configuration (you can see the prebuilt UI at the top of this page).

The goal of Pagefind is that websites with tens of thousands of pages should be searchable by someone in their browser, while consuming a reasonable amount of bandwidth. Pagefind's search index is split into chunks, so that searching in the browser only ever needs to load a small subset of the search index. Pagefind can run a full-text search on a 10,000 page site with a total network payload under 300KB, including the Pagefind library itself. For most sites, this will be closer to 100KB.

In my case this site has 489 articles as of the time this page was written. The search index is around 5MB in size (files on the filesystem, this includes a webassembly runtime).

Using the firefox devtools network tab performance analyzer I can see that searching for the term QObject uses around 250kB, excluding the images:

search usage

This matches the statement above regarding network payload. The search term QObject returns 8 results currently.

Tags: blog , ingsoc , pagefind , python , raymiiorg , rust , search , site