Skip to content

CDC Forwarder

The CDC forwarder is built to consume changes from the user's database and send them to the Execution Engine. It natively integrates with every supported database, foregoing any need for the user to run their own Debezium/Kafka/etc. The forwarder is built in-house in Rust (together with native replication tooling each DB exposes), and is designed for performance, fault tolerance, and correctness.

Fault Tolerance

The forwarder is designed to run continuously, regardless of events in the user's database. It handles various scenarios, such as:

  1. Schema changes in base tables being replicated: The forwarder and associated views automatically adapt to these changes when possible.
  2. Database recovery.
  3. Database crashes or restarts.

The CDC forwarder also supports graceful shutdown, retaining its last state for a seamless restart. In the event of a forwarder crash, the forwarder will be automatically reset. The forwarder will either resume from the last known state or, if necessary, remove existing replication mechanisms (e.g., Postgres slots) and start fresh.

Correctness

The CDC Forwarder outputs change data capture (CDC) from multiple tables into a single, unified stream consumed by the Execution Engine. This stream also includes "Transaction End" messages, ensuring that Epsio views remain consistent across tables.

This approach contrasts with other streaming engines that rely on a Debezium + Kafka setup, where each table is replicated into its own topic. In such setups, a view may be transactionally correct for a single table, but joining two tables can produce inconsistent results, as they are consumed independently.

Epsio's CDC Forwarder eliminates this issue by guaranteeing transactional correctness across all tables, ensuring accurate and reliable results in every scenario.

Performance

Unlike other CDC consumers such as Debezium, the CDC forwarder is specifically designed for high-throughput workloads, with a strong emphasis on parallelism. The forwarder writes to an internal buffer that is consumed by the Execution Engine. This design decouples the forwarder from the Execution Engine, ensuring that even if the Execution Engine's read rate is slow, the forwarder minimizes the time replication data remains in the database.

Database Integration

For different database types, the forwarder works differently:

  • Postgres: Epsio on startup creates a logical replication slot to read from, which the forwarder then consumes WAL changes from.
  • MySQL: The forwarder reads the Binlog stream.
  • MSSQL: The forwarder continuously polls for CDC events, using MSSQL procedures.