Epsio's views are very similar to traditional SQL views in that they represent a query- but unlike a traditional SQL view, Epsio's views are both incremental and materialized.
This means that:
- Query results are always persisted; no additional computation needs to be done when the view is queried.
- The view is updated incrementally as the underlying data changes, meaning that the view is always up to date without ever needing to run the complete query again.
Epsio does this by taking the view's query and building a streaming "pipeline" that every new change goes through. The pipeline is essentially a series of stateful "operators" connected to each other, each doing a specific manipulation on the change and passing on corresponding changes.
Epsio supports a wide series of SQL operators and functions, including all types of joins, aggregations, subqueries, CTEs and more. Detailed here are how Epsio implements some of the most common operators.
Filters are one of the simplest nodes, receiving changes and passing them on if they match certain criteria, regardless of what the modification of the change is. For example, if we had the following query:
The filter node would simply pass on all changes where the
animal field is
cats, and discard all other changes. So if we were to pass in 3.4 cats:
the filter node would pass on the change as is. But if we were to pass in dogs:
the filter node would discard the change, and not pass it on to the next node.
Note that filters do not need to hold any state in order to do their operation.
Joins (Inner, Left, Right)¶
Epsio supports all types of
JOINs (inner, left, right) between tables.
We'll describe how Epsio implements inner joins here; the more complex joins use the Join node combined with other nodes to achieve the desired effect.
The Join node holds the data coming in from both inputs mapped by their Join key. If we had the following query:
Epsio would first create mappings (in practices, this is in on an disk KV store) for both sides of the join, with their key being
id on the users side, and
user_id on the user_metadata side.
In practice, it would look something like this:
When new changes are streamed through one of the Join's inputs, the Join will first store the change on one of the sides, and then check if there is a corresponding change on the other side. If there is, it will emit a new change with the data from both sides. Let's imagine we stream a change without a corresponding change:
As we can see, the Join node updates it's left mappings, but outputs nothing. Next, let's stream a corresponding change:
As we can see, this time the Join node found a corresponding change on the other side, and therefore not only updated it's internal mappings, but also outputted a change (which perhaps the next node would take care of).
The Group By is essentially implemented with a Reduce node (a node that is actually also used to implement left/right joins!) which holds a mapping of groups to their corresponding values. For example, if we were getting the count of people per name, the mappings would look something like this:
The Reduce node always outputs a negative change (a change that removes data) for the previous value of the group, and a positive change (a change that adds data) for the new value of the group. Let's imagine we had the above mappings, and then streamed a change adding another Jojo:
Simple, yet satisfying.
How it all fits together¶
In general, a combination of these nodes (together with filters, maps, and a multitude of others) make up the picture of a complete query. By allowing changes to run in parallel Epsio also provides a very high throughput, which still maintaining very low costs by holding all information in storage and using very little compute.
Eventual consistency is a consistency model used in distributed computing to achieve high availability that informally guarantees that, if no new updates are made to a given data item, eventually all accesses to that item will return the last updated value. Wikipedia
When underlying data is changed in a query defined in Epsio, the change might not be immediately reflected in Epsio. However, the incremental update process ensures that Epsio will eventually be consistent with the data. This might result in brief periods (usually only a few milliseconds) where the view lags slightly behind the latest state of the data.