Skip to content

Benchmarks

TrailBase is merely the sum of its parts. It’s the result of marrying one of the lowest-overhead languages, one of the fastest HTTP servers, and one of the lightest relational SQL databases, while mostly avoiding extra expenditures. We can expect it to go fast but how fast exactly? Let’s compare it to a few amazing, and certainly more weathered alternatives such as SupaBase, PocketBase, and vanilla SQLite.

Disclaimer

Generally benchmarks are tricky, both to do well and to interpret. Benchmarks never show how fast something can go but merely how fast the author managed to make it go. Micro-benchmarks, especially, offer a key-hole insights, which may be biased and may not apply to your workload.

Performance also doesn’t exist in a vacuum. If something is super fast but doesn’t do what you need it to do, performance is an illusive luxury. Doing less makes it naturally easier to go fast, which is not a bad thing, however means that comparing a highly specialized solution to a more general one may be misleading, unfair or irrelevant for you.

We tried our hardest to give all contenders the best shot1 2. If you have any suggestions on how to make anyone go faster, make the comparison more apples-to-apples or see any issues, let us know. Even with a chunky grain of salt, we hope the results can provide some interesting insights. In the end, nothing beats benchmarking your own setup and workloads.

Insertion Benchmarks

Total Time for 100k Insertions

The graph shows the overall time it takes to insert 100k messages into a mock chat-room table setup, i.e. less is better. Click the labels to toggle individual measurements.

Maybe surprisingly, remote TrailBase is slightly quicker than in-process vanilla SQLite with drizzle and Node.js3. This is an apples to oranges comparison and was intended to establish an upper bound on the cost incurred by IPC leaving the process boundary4 and adding features like authorization, i.e. more table look-ups.

Looking at the other, more similar setups, the measurement suggests that for this test TrailBase can insert 100k records 100+ times faster than Payload2, more than 20 times faster than SupaBase5, and comfortably 10 times faster than PocketBase 1.

The total time of inserting a large batch of data tells only part of the story. Let’s have a look at resource consumption to get an intuition for provisioning or footprint requirements, i.e. what kind of machine one would need:

TrailBase & PocketBase Utilization

The graph shows the CPU utilization and memory consumption (RSS) of both PocketBase and TrailBase. Squinting, they look fairly similar apart from TrailBase finishing earlier. They load roughly 3 and 4 CPUs, respectively. PocketBase’s CPU consumption being slightly more variable 6.

In terms of memory consumption, they’re also both fairly similar consuming between 100MB and 120MB making them suitable for a tiny VPS or a toaster. Though, when TrailBase finished, it had not yet saturated. Note that RSS is not an accurate measure of how much memory a process needs but rather how much memory has been allocated from the OS. Allocators will typically retrieve large chunks of memory and occasionally return memory based on consumption, fragmentation and strategy. This is also why TrailBase’s allocated memory doesn’t drop after the load goes away. TrailBase currently uses Mimalloc, which holds on to memory at idle. Independently, fragmentation is a problem where a garbage-collector running compactions can help.

Now shifting to SupaBase, things get a bit more involved due to its layered architecture including a dozen separate services providing various functionality:

SupaBase Memory Usage

Looking at SupaBase’s memory usage, it increased from from roughly 6GB at rest to 7GB fully loaded. This means that out of the box, SupaBase has roughly 50 times the memory footprint of either PocketBase or TrailBase. In all fairness, a lot of SupaBase’s functionality isn’t needed for this benchmark and it might be possible to shed less critical services, e.g. removing supabase-analytics would save ~40% of memory. That said, we don’t know how feasible this is in practice.

SupaBase CPU utilization

As for CPU usage, one can see it jump up to roughly 9 cores (the benchmark ran on a machine with 8 physical cores and 16 threads: 7840U). Most of the CPUs seem to be consumed by supabase-rest, the API frontend, with postgres itself hovering at only about 0.7 cores. Also, supabase-analytics definitely seems to be in use.

Latency and Read Performance

Let’s take a closer look at latency distributions. To keep things manageable we’ll focus on PocketBase and TrailBase, which are architecturally simpler and more comparable.

For TrailBase, reads were on average 9x (C#) and insertions 10x faster. The latter is in line with the throughput results we’ve seen above.

Latencies are generally lower for TrailBase. Maybe interestingly the insert latencies for the Dart and C# client are fairly similar, while reads with Dart take about 3 times longer compared to C#. TrailBase is fast enough that we’re being bottlenecked on the client. The minimum latency you can expect from Dart and Node.js is around 3-5ms. With sub-millisecond read latencies at full tilt, TrailBase is in the same ballpark as something like Redis, however this is primary data and not a cache! Not needing a dedicated cache eliminates an entire class of issues around consistency, invalidation and can greatly simplify a stack.

Looking at the latency distributions we can see that the spread is well contained for TrailBase. For PocketBase, read latencies are well contained. However, insert latencies show a more significant “tail” with the p90 latency being roughly 5 times slower than p50. Slower insertions can take north of 100ms. This may be related to GC pauses, scheduling, or more generally the same CPU variability we observed earlier.

JavaScript-Runtime Benchmarks

The benchmark sets up a custom HTTP endpoint /fibonacci?n=<N> using the same slow recursive Fibonacci implementation for both, PocketBase and TrailBase. This is meant as a proxy for a computationally heavy workload to primarily benchmark the performance of the underlying JavaScript engines: goja for PocketBase and V8 for TrailBase. In other words, the impact of any overhead within PocketBase or TrailBase is diminished by the time it takes to compute fibonacci(N) for sufficiently large N.

We found that for N=40, V8 (TrailBase) is around 40 times faster than goja (PocketBase):

Interestingly, PocketBase has an initial warm-up of ~30s where it doesn’t parallelize. Not being familiar with goja’s execution model, one would expect similar behavior for a conservative JIT threshold in combination with a global interpreter lock 🤷. However, even after using all cores completing the benchmark takes significantly longer.

With the addition of V8 to TrailBase, we’ve experienced a significant increase in the memory baseline dominating the overall footprint. In this setup, TrailBase consumes roughly 4 times more memory than PocketBase. If memory footprint is a major concern for you, constraining the number of V8 threads will be an effective remedy (--js-runtime-threads).

Final Words

We’re very happy to confirm that TrailBase’s APIs and JS/ES6/TS runtime are quick. The significant performance gap we observed, especially for the APIs, is likely a consequence of how quick SQLite is making even small overheads matter that much more.

With the numbers fresh off the press, prudence is of the essence and ultimately nothing beats benchmarking your own setup and workloads. In any case, we hope this was at least somewhat insightful. Let us know if you see anything that can or should be improved. The benchmarks are available on GitHub.


Footnotes

  1. Trying to give PocketBase the best chance, the binary was built with the latest go compiler (v1.23.1 at the time of writing), CGO_ENABLED=1 (which according to PB’s own documentation will use a faster C-based SQLite driver) and GOAMD64=v4 (for less portable but more aggressive CPU optimizations). We found this setup to be roughly 20% faster than the static, pre-built binary release. 2

  2. We picked Payload as representative of popular Node.js CMS, which itself claims to be many times faster than popular options like Strapi or Directus. We were using a v3 pre-release, as recommended, also using the SQLite/drizzle database adapter marked as beta. We manually turned on WAL mode and filed an issue with payload, otherwise stock payload was ~300x slower. That said, we’ve seen drizzle perform very well meaning there’s probably something very wrong here. Given the magnitude, the competition would have likely performed better. 2

  3. Our setup with drizzle and node.js is certainly not the fastest possible. For example, we could drop down to SQLite in C or another low-level language with less FFI overhead. That said, drizzle is a great popular choice which mostly serves as a point-of-reference and sanity check.

  4. The actual magnitude on IPC overhead will depend on the communication cost. For the our benchmarks we were using the loopback network device.

  5. The SupaBase benchmark setup skips row-level access checks. Technically, this is in its favor from a performance standpoint, however looking at the overall load on its constituents with PG being only a sliver, it probably would not make much of an overall difference nor would PG17’s vectorization, which has been released since the benchmarks were run. That said, these claims deserve re-validation.

  6. We’re unsure as to what causes these 1-core swings. Runtime-effects, such as garbage collection, may have an effect, however we would have expected these to show on shorter time-scales. This could also indicate a contention or thrashing issue 🤷.