Skip to content

Key Concepts

Incremental Views

Epsio's views are very similar to traditional SQL views in that they represent a query- but unlike a traditional SQL view, Epsio's views are both incremental and materialized.

This means that:

  1. Query results are always persisted; no additional computation needs to be done when the view is queried.
  2. The view is updated incrementally as the underlying data changes, meaning that the view is always up to date without ever needing to run the complete query again.

Epsio does this by taking the view's query and building a streaming "pipeline" that every new change goes through. The pipeline is essentially a series of stateful "operators" connected to each other, each doing a specific manipulation on the change and passing on corresponding changes.

Epsio Operators

Epsio supports a wide series of SQL operators and functions, including all types of joins, aggregations, subqueries, CTEs and more. Detailed here are how Epsio implements some of the most common operators.

Filters

Filters are one of the simplest nodes, receiving changes and passing them on if they match certain criteria, regardless of what the modification of the change is. For example, if we had the following query:

SELECT .. FROM ... WHERE animal = 'cats'

The filter node would simply pass on all changes where the animal field is cats, and discard all other changes. So if we were to pass in 3.4 cats:

FilterCats

the filter node would pass on the change as is. But if we were to pass in dogs:

FilterDogs

the filter node would discard the change, and not pass it on to the next node.

Note that filters do not need to hold any state in order to do their operation.

Joins (Inner, Left, Right)

Epsio supports all types of JOINs (inner, left, right) between tables. We'll describe how Epsio implements inner joins here; the more complex joins use the Join node combined with other nodes to achieve the desired effect.

Inner Join

The Join node holds the data coming in from both inputs mapped by their Join key. If we had the following query:

SELECT u.name, um.disabled FROM users u JOIN user_metadata um ON u.id = um.user_id ... 

Epsio would first create mappings (in practices, this is in on an disk KV store) for both sides of the join, with their key being id on the users side, and user_id on the user_metadata side. In practice, it would look something like this:

Mappings

When new changes are streamed through one of the Join's inputs, the Join will first store the change on one of the sides, and then check if there is a corresponding change on the other side. If there is, it will emit a new change with the data from both sides. Let's imagine we stream a change without a corresponding change:

NodeMappings

As we can see, the Join node updates it's left mappings, but outputs nothing. Next, let's stream a corresponding change:

Output

As we can see, this time the Join node found a corresponding change on the other side, and therefore not only updated it's internal mappings, but also outputted a change (which perhaps the next node would take care of).

Group By

The Group By is essentially implemented with a Reduce node (a node that is actually also used to implement left/right joins!) which holds a mapping of groups to their corresponding values. For example, if we were getting the count of people per name, the mappings would look something like this:

GroupBy

The Reduce node always outputs a negative change (a change that removes data) for the previous value of the group, and a positive change (a change that adds data) for the new value of the group. Let's imagine we had the above mappings, and then streamed a change adding another Jojo:

GroupByOutput

Simple, yet satisfying.

How it all fits together

In general, a combination of these nodes (together with filters, maps, and a multitude of others) make up the picture of a complete query. By allowing changes to run in parallel Epsio also provides a very high throughput, which still maintaining very low costs by holding all information in storage and using very little compute.

Eventual Consistency

Eventual Consistency

Eventual consistency is a consistency model used in distributed computing to achieve high availability that informally guarantees that, if no new updates are made to a given data item, eventually all accesses to that item will return the last updated value. Wikipedia

When underlying data is changed in a query defined in Epsio, the change might not be immediately reflected in Epsio. However, the incremental update process ensures that Epsio will eventually be consistent with the data. This might result in brief periods (usually only a few milliseconds) where the view lags slightly behind the latest state of the data.