Skip to content

Epsio HA Deployment

ENTERPRISE DEPLOYMENT ONLY

At this time, Epsio running in "HA mode" is only available for enterprise deployments.

Epsio HA is a high-availability deployment of Epsio that provides automatic failover for your Epsio instance and enhances reliability. In the event of a hardware or software failure on the primary instance, Epsio HA seamlessly transitions view maintenance to a replica node, ensuring Epsio views remain up-to-date without the downtime of a full view rebuild. This makes it especially valuable for mission-critical applications where uptime and data freshness are paramount.

Beyond failure scenarios, Epsio HA also enables upgrades of the Epsio engine without the downtime of view re-building. To perform an upgrade, simply update a replica node. Once the upgrade is complete, you can promote that node to become the new primary, ensuring continuous availability throughout the process.

Architecture

When deploying Epsio in HA mode, two or more Epsio instances are deployed and connected to the same database. The primary node is responsible for updating the views in the original database, while the replica nodes maintain the results of the views internally. In case of a failure of the primary node, one of the replica nodes will automatically take over as the new primary node and continue updating the views from the point the previous primary stopped.

Epsio HA Architecture

Internal Mechanism

Epsio instances coordinate via the hosting database to determine their respective roles- i.e. whether to initialize as a primary node or a standby node. The first Epsio instance to initialize will attempt to create a table called epsio.instances. This table contains metadata on all instances connected to the hosting database (such as last_seen date, which is a heartbeat constantly updated by each instance), as well as their respective roles. At all times only one instance in this table will be marked in it's role as primary.

Upon startup, each Epsio instance checks the epsio.instances table to decide whether to initialize as a primary node or a replica node. If the epsio.instances table contains another deployment as primary and the heartbeat date has been recently updated, the Epsio instance will start up as a standby. The primary Epsio instance will update the heartbeat date every 30 seconds. If the standby sees the date has not been updated for over one minute, it will begin the process of taking over as master. The failover process consists of changing the previous primary's role in epsio.instances to standby, it's role to primary, and then consequently beginning to write view results to the host DB. If the previous primary comes back up, it will restart as the standby. Updating these row is done with locks on the table (SELECT .. FOR UPDATE), to ensure no two Epsio instances are checking and writing to it at the same time.

To maintain consistency in view definition across all nodes, the primary node updates the epsio.views_backup table in the source database with the latest view definition whenever a view is created, dropped, or modified. Replica nodes continuously poll this backup table to stay aligned with the primary, ensuring they always have up-to-date view definitions. This synchronization enables any node to seamlessly take over in the event of a failure.

Deployment Steps

To deploy Epsio in High Availability (HA) mode, simply install Epsio on multiple instances—ideally across different availability zones—and configure them to connect to the same database. The first instance installed will automatically take on the role of the primary node, while the others will recognize themselves as replicas. Replica nodes are designed to automatically detect their role and will refrain from updating views in the original database while acting as replicas.

Once the installation is complete, you can verify the status of each node by running the command list_instances on your database:

SELECT * FROM epsio.list_instances();

Other helpful commands for HA deployments are:

  1. list_views Show status of views on all instances.
  2. instance_restart_view Restart a view on a specific Epsio instance.
  3. promote_instance Restart a view on a specific Epsio instance.