Models & Relations

Modeling Data

TrailBase gives you full, untethered access to SQLite, as such data is modeled on top of ISO SQL and SQLite concepts. This means data is organized as rows or records across columns of a table as defined by their schema. Relationships between records are expressed by simply referencing other records via their primary key, which works across table boundaries. Data can then be joined together within the same database at access or query time. If you’re new to SQL and this sounds abstract, don’t worry it will become clear very soon and SQL is an evergreen useful well beyond TrailBase.

One of the main benefits of SQL databases is that you can define your models based on intrinsic properties of the underlying data and their relations, rather than having to worry about how the data is or will be used in the future. Instead, SQL queries let you flexibly define what data is being accessed, how it’s transformed and what’s returned. How data is accessed is never explicitly defined and is derived by the query optimizer. This means, if you discover new use-cases in your data that require combining data in ways that would be slow, you can optimize it after the fact by adding or removing indexes without having to touch the models, the queries, or the downstream code consuming the data 🎉.

Tables, Schemas & Data Types

When creating a new table to hold your data, you define a table schema telling the database what columns there are, what kind of data they contain, and potential constraints. For example¹,

CREATE TABLE post (
    id        INTEGER PRIMARY KEY,
    created   INTEGER NOT NULL DEFAULT (UNIXEPOCH()),
    author    INTEGER NOT NULL REFERENCES _user ON DELETE CASCADE,

    title     TEXT    NOT NULL,
    body      TEXT    NOT NULL
);

This creates a table to hold posts in a blog storing an integer creation timestamp, the author, the title and lastly the contents. We also set it up such that an author deleting their account, will cascade deleting their posts as well.

Coming from other SQL databases, it may come as a surprise that despite the data types above SQLite isn’t strictly typed by default, e.g. the title column above may hold values other than TEXT. SQLite interprets types merely as affinities to judge how literals and parameters should be interpreted on insert or or update but will happily accept other values. While “flexible”, this has far reaching consequences on downstream code, now has to deal explicitly with unexpected data types. We therefore encourage working with STRICT schemas whenever possible. In fact, TrailBase APIs are type-safe and thus require the underlying tables to be STRICT, i.e.:

CREATE TABLE post (
    id        INTEGER PRIMARY KEY,
    -- ...
) STRICT;

At the fundamental storage-level SQLite supports the following data types: NULL, INTEGER, REAL, TEXT, BLOB, which all type affinities boil down to. For example, columns with JSON or JSONB type affinities are stored as TEXT or BLOB, respectively².

When defining STRICT tables there’s no affinity, limiting you to SQLite’s fundamental types. However, TrailBase provides you with some extensions to enforce strict schema compatibility on a sub-column-level, e.g.

CREATE TABLE my_table (
    id        INTEGER PRIMARY KEY,
    any_json  TEXT CHECK(is_json(any_json)),
    my_json   TEXT CHECK(jsonschema('MyRegisteredJsonSchema', my_json))
) STRICT;

Constraints

Using TrailBase you can make use of any of SQLite’s column and table constraints. We’ve already encountered some of the former, e.g. NOT NULL, REFERENCES or CHECK, which all constrain the values a column may contain.

Similarly, one can define more complex table constraints constraining tuples of values, e.g.

CREATE TABLE fully_qualifed_ids (
    prefix    TEXT NOT NULL,
    suffix    TEXT NOT NULL,

    UNIQUE (prefix, suffix)
) STRICT;

For a complete list of constraints, check out the SQLite manual.

Generated Columns

SQLite’s generated columns allow you to either materialized derived columns at modification-time or compute values for virtual columns on the fly. Check out SQLite manual for comprehensive guidance.

Relations

SQL typically distinguishes between three types of relations:

1:1 relations, e.g. each user has exactly one user profile.
1:M or one-to-many relations, e.g. users may have many blog posts but each post belongs to exactly one user.
N:M or many-to-many relations, e.g. a blog post may be tagged with many tags, and each tag can be assigned to many posts.

In practice, all relations are simply edges, i.e. tuples of the shape (parent id, child id). There’s some freedom as to where these edges are stored. In case of 1:1 and 1:M relations they can be denormalized into the child record effectively becoming (primary key, foreign key). Alternatively, edges can always be stored in a separate “bridge” table as (foreign key, foreign key) edge records. In case of N:M relations, this is even necessary to achieve the required cardinality, since foreign keys can only reference a single record. This is not a limitation of TrailBase but rather common SQL practice. Linking children to their parents individually² with foreign keys exposes the relationship to the database allowing actions like ON DELETE or ON UPDATE to propagate. For example, a user deletion may trigger related data to be deleted automatically.

Let’s look at the following data schema for a blog example,

Each block represents a table schema. We can see:

a 1:1 relation between users and user profiles,
a 1:M relationship between users and posts,
and an N:M relationship between posts and tags using the post_tag bridge table.

We could have implemented the 1:1 user-profile and 1:M user-post relationships via separate bridge tables with appropriate uniqueness constraints, however pulling the parent key into the child record leads to less indirection.

In order to combine related data we can simply join on the keys. For example to get a list of all users with profiles:

SELECT * FROM _user AS U
    JOIN profile AS P ON U.id = P.user;

To connect posts and tags we have to do two joins across the post_tag bridge table:

SELECT * FROM post AS P
    LEFT JOIN post_tag AS PT ON P.id = PT.post
    LEFT JOIN tag AS T ON T.ID = PT.tag;

Record APIs

There’s an elephant in the room: while this is all pretty standard fair, at least in SQL land, connecting relations across TrailBase’s record APIs without joins can be a lot more work. Naively, for N:M relations, one would have to expose 3 different APIs including the bridge table, manage their ACLs and then have the client manually stitch the edges.

For simple, single hop child ⇨ parent traversals we can use TrailBase’s builtin expansions, i.e. a referenced parent record will be joined and then nested into the child record. To do so, we simply add the foreign key column to the record API’s expand configuration property. This, in turn, changes the child’s data representation, and thus the API definition, to create an affordance to nest the parent record. Concretely, the API’s JSON schema definition of read responses changes to:

type Schema = {
  // ...
  <foreign_key_col>: {
    id: <number | string>,
    data?: ParentSchema,
  };
}

Any pre-existing bindings for the API will need to be re-generated or updated.

Subsequently, joins can be requested for any of the configured foreign key columns during read and list operations, e.g.:

curl -H "Content-Type: application/json" \
  ${SITE}/api/records/v1/post_tag/1?expand=post,tag

{
  post: {
    id: 1,
    data: {
      id: 1,
      author: 'AZUECMTAfqOZJjHfEzAqkA==',
      title: 'first post',
      body: 'body'
    }
  },
  tag: {
    id: 5,
    data: {
      id: 5,
      label: "whimsical"
    }
  }
}

For more complex relations and traversals, rather than manually stitching on the client-side, we recommend to push more responsibility to the server. Concretely, we can expose a single API tailored for specific client use-cases implementing a server-side join using VIEWs:

CREATE VIEW post_tag_view AS SELECT * FROM post AS P
    LEFT JOIN post_tag AS PT ON P.id = PT.post
    LEFT JOIN tag AS T ON T.ID = PT.tag;

More generally, views can be useful to decouple an API definition from the underlying data model. For example, you may want to restructure your data model or APIs while keeping the other stable.

Alternatively, custom JS/TS handlers can provide a more free-form approach to push joining edges and other responsibilities to the server.

This is not meant as full SQL tutorial. We’ll keep it simple so hopefully it is clear from context what any statement intends to do. Note further that SQL keywords are case-insensitive. We’ll use all-caps to highlight them. ↩
This also means than one can use SQLite’s builtin JSON operators on any TEXT column as long as it parses as JSON. ↩ ↩²