Benchmarks
TrailBase is merely the sum of its parts. It’s the result of marrying one of the lowest-overhead languages, one of the fastest HTTP servers, and one of the lightest relational SQL databases, while mostly avoiding extra expenditures. We can expect it to go fast but how fast exactly? Let’s compare it to a few amazing, and certainly more weathered alternatives such as SupaBase, PocketBase, and vanilla SQLite.
Disclaimer
Generally benchmarks are tricky, both to do well and to interpret. Benchmarks never show how fast something can go but merely how fast the author managed to make it go. Micro-benchmarks, especially, offer a key-hole insights, which may be biased and may not apply to your workload.
Performance also doesn’t exist in a vacuum. If something is super fast but doesn’t do what you need it to do, performance is an illusive luxury. Doing less makes it naturally easier to go fast, which is not a bad thing, however means that comparing a highly specialized solution to a more general one may be misleading, unfair or irrelevant for you.
We tried our hardest to give all contenders the best shot1 2. If you have any suggestions on how to make anyone go faster, make the comparison more apples-to-apples or see any issues, let us know. Even with a chunky grain of salt, we hope the results can provide some interesting insights. In the end, nothing beats benchmarking your own setup and workloads.
Insertion Benchmarks
Total Time for 100k Insertions
The graph shows the overall time it takes to insert 100k messages into a mock chat-room table setup, i.e. less is better. Click the labels to toggle individual measurements.
Maybe surprisingly, remote TrailBase is slightly quicker than in-process vanilla SQLite with drizzle and Node.js3. This is an apples to oranges comparison and was intended to establish an upper bound on the cost incurred by IPC leaving the process boundary4 and adding features like authorization, i.e. more table look-ups.
Looking at the other, more similar setups, the measurement suggests that for this test TrailBase can insert 100k records 100+ times faster than Payload2, more than 20 times faster than SupaBase5, and comfortably 7 times faster than PocketBase 1.
The total time of inserting a large batch of data tells only part of the story. Let’s have a look at resource consumption to get an intuition for provisioning or footprint requirements, i.e. what kind of machine one would need:
TrailBase & PocketBase Utilization
The graph shows the CPU utilization and memory consumption (RSS) of both PocketBase and TrailBase. Squinting, they look fairly similar apart from TrailBase finishing earlier. They load roughly 4 to 5 CPUs, with PocketBase’s utilization being more variable 6.
In terms of memory consumption, they’re also both fairly similar consuming between 120MB and 150MB making them suitable for a tiny VPS or a toaster. Note that RSS is not an accurate measure of how much memory a process needs but rather how much memory has been allocated from the OS. Allocators will typically retrieve large chunks of memory and only occasionally return memory based on usage, fragmentation and strategy. This is also why the allocated memory doesn’t drop after the load goes away. Independently, fragmentation is a problem where a garbage-collector running compactions can help.
Now shifting to SupaBase, things get a bit more involved due to its layered architecture including a dozen separate services providing various functionality:
SupaBase Memory Usage
Looking at SupaBase’s memory usage, it increased from from roughly 6GB at rest to 7GB fully loaded. This means that out of the box, SupaBase has roughly 50 times the memory footprint of either PocketBase or TrailBase. In all fairness, a lot of SupaBase’s functionality isn’t needed for this benchmark and it might be possible to shed less critical services, e.g. removing supabase-analytics would save ~40% of memory. That said, we don’t know how feasible this is in practice.
SupaBase CPU utilization
As for CPU usage, one can see it jump up to roughly 9 cores (the benchmark ran on a machine with 8 physical cores and 16 threads: 7840U). Most of the CPUs seem to be consumed by supabase-rest, the API frontend, with postgres itself hovering at only about 0.7 cores. Also, supabase-analytics definitely seems to be in use.
Latency and Read Performance
Let’s take a closer look at latency distributions. To keep things manageable we’ll focus on PocketBase and TrailBase, which are architecturally simpler and more comparable.
For TrailBase, median latencies for reads were 17x (C#) and insertions almost 7x faster. The latter is well in line with the throughput results we’ve seen above.
Latencies are generally lower for TrailBase. Maybe interestingly the insert latencies for the Dart and C# client are fairly similar, while reads with Dart take about 3 times longer compared to C#. TrailBase is fast enough that we’re being bottlenecked on the client. The minimum latency you can expect from Dart and Node.js is around 3-5ms. This likely means that the bottleneck for PocketBase is on the server-side.
With sub-millisecond read latencies at full tilt, TrailBase is in the same ballpark as something like Redis, however this is primary data and not a cache! Not needing a dedicated cache eliminates an entire class of issues around consistency, invalidation and can greatly simplify a stack.
Looking at the latency distributions we can see that the spread is well contained for TrailBase. For PocketBase, read latencies are well contained. However, insert latencies show a more significant “tail” with the p90 latency being roughly 3 times slower than p50. Slower insertions can take north of 100ms. This may be related to GC pauses, scheduling, or more generally the same CPU variability we observed earlier.
File System Performance
File systems play an important role for the performance of storage systems, we’ll therefore take a quick at their impact on SQLite/TrailBase’s performance.
Note that the numbers aren’t directly comparable to the ones above, due to being taken on a different machine with more storage options, specifically an AMD 8700G with a 2TB WD SN850x NVMe SSD running a 6.12 kernel.
Interestingly, the read performance appears identical suggesting that caching is at play w/o much real file system interaction. In the future we should rerun with a larger data set to observe the degraded performance for when not everything fits into memory anymore. Yet, most queries being cached is a not too uncommon reality and it’s reassuring to see that caches are indeed working independent of the file system 😅.
The write latencies are bit more interesting. Unsurprisingly modern copy-on-write (CoW) file systems have a bit more overhead. The relative ranking is roughly in line with Phoronix’ results7, with the added OpenZFS (2.2.7) falling in line with its peers. Also, the differences are attenuated due to the constant overhead TrailBase adds over vanilla SQLite.
We won’t discuss the specific trade-offs and baggage coming with each file system but hope that the numbers can help guide the optimization of your own production setup. In the end its a trade-off between performance, maturity, reliability, physical disc space and feature sets. For example, CoW snapshots may or may not be important to you.
JavaScript Performance
The benchmark sets up a custom HTTP endpoint /fibonacci?n=<N>
using the same
slow recursive Fibonacci
implementation
for both, PocketBase and TrailBase.
This is meant as a proxy for a computationally heavy workload to primarily
benchmark the performance of the underlying JavaScript engines:
goja for PocketBase and V8 for TrailBase.
In other words, the impact of any overhead within PocketBase or TrailBase is
diminished by the time it takes to compute fibonacci(N)
for sufficiently
large N
.
We found that for N=40
, V8 (TrailBase) is around 40 times faster than
goja (PocketBase):
Interestingly, PocketBase has an initial warm-up of ~30s where it doesn’t parallelize. Not being familiar with goja’s execution model, one would expect similar behavior for a conservative JIT threshold in combination with a global interpreter lock 🤷. However, even after using all cores completing the benchmark takes significantly longer.
With the addition of V8 to TrailBase, we’ve experienced a significant increase
in the memory baseline dominating the overall footprint.
In this setup, TrailBase consumes roughly 4 times more memory than PocketBase.
If memory footprint is a major concern for you, constraining the number of V8
threads will be an effective remedy (--js-runtime-threads
).
Final Words
We’re very happy to confirm that TrailBase’s APIs and JS/ES6/TS runtime are quick. The significant performance gap we observed for API access is a consequence of how fast SQLite itself is making even small overheads matter that much more.
With the numbers fresh off the press, prudence is of the essence and ultimately nothing beats benchmarking your own setup and workloads. In any case, we hope this was at least somewhat insightful. Let us know if you see anything that can or should be improved. The benchmarks are available on GitHub.
Footnotes
-
Trying to give PocketBase the best chance, we built v0.25.0 with the latest go compiler (v1.23.5 at the time of writing), the mattn/go-sqlite3 driver (which based on PB’s docs uses a faster C-based SQLite driver) and
GOAMD64=v4
(for less portable but more aggressive CPU optimizations). ↩ ↩2 -
We picked Payload as representative of popular Node.js CMS, which itself claims to be many times faster than popular options like Strapi or Directus. We were using a v3 pre-release, as recommended, also using the SQLite/drizzle database adapter marked as beta. We manually turned on WAL mode and filed an issue with payload, otherwise stock payload was ~300x slower. That said, we’ve seen drizzle perform very well meaning there’s probably something very wrong here. Given the magnitude, the competition would have likely performed better. ↩ ↩2
-
Our setup with drizzle and node.js is certainly not the fastest possible. For example, we could drop down to SQLite in C or another low-level language with less FFI overhead. That said, drizzle is a great popular choice which mostly serves as a point-of-reference and sanity check. ↩
-
The actual magnitude on IPC overhead will depend on the communication cost. For the our benchmarks we were using the loopback network device. ↩
-
The SupaBase benchmark setup skips row-level access checks. Technically, this is in its favor from a performance standpoint, however looking at the overall load on its constituents with PG being only a sliver, it probably would not make much of an overall difference nor would PG17’s vectorization, which has been released since the benchmarks were run. That said, these claims deserve re-validation. ↩
-
We’re unsure as to what causes these 1-core swings. Runtime-effects, such as garbage collection, may have an effect, however we would have expected these to show on shorter time-scales. This could also indicate a contention or thrashing issue 🤷. ↩
-
Despite being close, XFS’ and ext4’s relative ranking is swapped, which could be due to a variety of reason like mount options, kernel version, hardware, … ↩