When We Recommend Log-Based CDC vs Trigger-Based at 137Foundry
A behind-the-scenes look at the decision we make most often on data integration projects: log-based change data capture or trigger-based. We have done this conversation enough times that the pattern is worth writing down.
The short answer is that log-based wins for most teams once they understand what they are buying, but trigger-based is the right answer often enough that the framing "log-based is always better" is misleading. Worth walking through what we actually think about when a client team asks which one fits.
Photo by Brett Sayles on Pexels
The default position
When we walk into a new client engagement and the question of CDC comes up, our starting position is log-based. Not because it is universally better, but because the failure modes of log-based are operational (which we can mitigate) and the failure modes of trigger-based are structural (which we cannot).
Log-based fails when: - The replication slot fills the WAL and the database runs out of disk. - The CDC tool falls behind for long enough that the source database reclaims the log space. - A schema evolution is not handled correctly by the consumer.
All three of these failure modes are observable, monitorable, and recoverable with the right operational practice. We can set alerts on replication slot lag. We can configure WAL retention to survive realistic outages. We can use a schema registry to handle evolution. The work is real, but it is bounded.
Trigger-based fails when: - The trigger logic has a bug that corrupts the source data. - The trigger overhead degrades source-database write performance under load. - The audit-table cleanup process loses track of which rows have shipped.
These failure modes are harder to recover from because they affect the source database directly. A trigger bug that has been silently producing wrong audit rows for a week is a deeper problem than a log-based pipeline that has been silently lagging. The blast radius is larger.
So we start with log-based and only deviate when something specific about the client situation makes log-based impractical.
When trigger-based is the right answer
Specific client scenarios where we recommend trigger-based over log-based:
The source database is a managed service that does not expose replication-level access. Some managed databases on smaller cloud providers do not give you the ability to consume the log directly. If the database is on a provider that does not offer logical replication, log-based CDC is just not available. Trigger-based works on any database that supports triggers, which is essentially all of them.
The team does not have the operational headroom to run Kafka or a managed CDC service. Log-based CDC is operationally cheap once you have Kafka and Debezium running, and operationally expensive if you have to set them up just for CDC. For a five-person engineering team with no existing Kafka infrastructure, the trigger-based pattern (write to an audit table, read it on a schedule, ship the changes) is often the better answer simply because the team can run it without becoming distributed-systems engineers.
The throughput is low enough that the trigger overhead is invisible. A source database doing tens of writes per second can take the trigger overhead with no measurable performance impact. At thousands of writes per second, the same trigger overhead becomes measurable and starts to matter. We profile the actual write rate before recommending one or the other.
The team needs a queryable change history for non-streaming use cases. A trigger-based audit table is a normal SQL table. Analysts can query it directly. Reports can run against it. The log-based pattern requires the data to flow through Kafka before it is queryable, which is fine for streaming consumers but inconvenient for ad-hoc analysis.
In each of these cases, trigger-based is not a compromise; it is the better answer for the specific situation.
When log-based is the obvious choice
The flip side: situations where we do not even discuss trigger-based.
Sub-second latency requirements. Trigger-based pipelines have an inherent lag between the trigger fire and the audit-table reader catching the new row. Log-based pipelines can deliver sub-second end-to-end latency. For real-time fraud detection, live dashboards, or operational data syncing, log-based is the only credible answer.
High-throughput sources where trigger overhead matters. A database doing thousands of writes per second under steady load will feel the trigger overhead. We measure it with a load test before going live, and if the overhead exceeds the team's tolerance, log-based is the path.
Existing Kafka infrastructure. If the client is already running Kafka for other reasons, adding Debezium for CDC is a small marginal cost. The Kafka cluster pays for itself across multiple use cases, and CDC is just one more producer.
Complex schemas with frequent evolution. Log-based tools handle schema evolution more cleanly than hand-rolled trigger pipelines. For a source database that changes schema every couple of weeks, the schema-evolution cost of trigger-based piles up.
What we do not recommend
Two patterns we actively recommend against:
Building log-based CDC from scratch. Parsing the PostgreSQL WAL or the MySQL binlog directly is possible. The libraries exist. Teams who do this end up rebuilding Debezium badly. The standard tools have solved the operational problems already; rolling your own gives you control at the cost of two engineers full-time for a year. Almost never the right call.
Mixing log-based and trigger-based on the same database without clear boundaries. We have seen teams run log-based CDC for some tables and trigger-based for others on the same source database, with no documentation of which is which. The next engineer who joins has to spend a week figuring out the layout before they can change anything. Pick one pattern per database when possible. If two patterns are required, document the boundary explicitly.
The conversation with the client
When we sit down with a client engineering team and walk through the choice, the conversation usually goes like this:
We start by asking three questions: 1. What is your latency tolerance? (Sub-second, seconds, minutes, hours?) 2. What is your peak write throughput on the source tables you want to capture? 3. Do you have schema-write access and replication access to the source database?
The answers point us at one of the two patterns most of the time. The edge cases (where the answers do not converge on a clear winner) are usually situations where the team should pilot both and pick based on measured behavior.
A pilot is cheaper than a wrong choice. We typically build a small log-based pipeline on one table and a trigger-based pipeline on another table, measure latency and overhead over a few weeks, and let the data settle the question. The cost of the pilot is two engineer-weeks. The cost of choosing wrong and discovering it six months in is twenty engineer-weeks.
The bigger picture
The CDC pattern conversation is part of a larger 137Foundry data integration engagement, where we look at the broader architecture: source databases, downstream consumers, latency requirements, operational headroom, build-vs-buy on the tooling. CDC is one of the questions; the full picture is several.
The longer reference we point clients to is the guide How to Implement Change Data Capture Without Polling Your Database, which lays out all three CDC patterns (log-based, trigger-based, timestamp polling) with a decision rule for picking between them. The internal decision-making we do on client projects is more nuanced than the article (because we have specific client context), but the framework is the same.
For broader background on the underlying ideas, the Wikipedia entry on change data capture is a reasonable starting point.
The 137Foundry view
The CDC pattern is the first technical choice on most data integration projects, and the choice carries through the rest of the build. Get it right and the rest of the architecture follows naturally. Get it wrong and you fight against the pattern for the life of the system.
We tend to recommend log-based when the operational headroom exists, trigger-based when it does not, and timestamp polling only for low-stakes pipelines where the failure modes are acceptable. Most clients land somewhere in the first two categories. Few land in the third.
The right answer is the one that fits your specific situation. The wrong answer is the one that fits someone else's situation but feels safer because they have written about it. We aim to make the choice deliberately, with the client team in the room, on every project.
That is most of what makes data integration work at 137Foundry different from a recipe-driven approach. The recipes are useful background; the decisions still have to be made specifically for each system.

















