Benchmarking SQL Server CDC to ClickHouse
Up to 7x faster than Fivetran and Airbyte.
Change Data Capture (CDC) from SQL Server to ClickHouse is one of the most common replication patterns we see at Supermetal. As more customers evaluate Supermetal for this workload, we benchmarked both snapshot and CDC at scale against Fivetran and Airbyte.
Overview
Supermetal is deployed as a single Rust binary. It uses Arrow-native primitives for performance and object storage for durability. No need to wrangle Kafka, Debezium, or other JVM-based tooling. Read more in the architecture docs.
-
Native CDC: Reads directly from SQL Server's internal change tables.
-
Parallel snapshots: Automatically split large tables on the primary key and read them concurrently across all available cores.
-
Broad edition support: SQL Server 2016+, including AWS RDS, Azure SQL Database, Azure SQL Managed Instance, and standard on-prem editions.
-
Native deduplication: Updates and deletes are handled natively using ClickHouse's ReplacingMergeTree engine. No manual merge logic or row-level mutations.
-
Zero-copy writes: Supermetal stages Parquet files on object storage (S3), and ClickHouse pulls directly. No extra network hop through the replication process.
Benchmark Setup
We used the TPC-H dataset at scale factors 20 through 50 (31 GB to 77 GB) to measure performance across two scenarios:
- Snapshot: Full replication of all 8 TPC-H tables from SQL Server to ClickHouse Cloud.
- CDC: Sustained upsert rates from 1K to 30K ops/sec, held for 5-minute intervals.
- Fivetran and Airbyte were tested using their cloud-managed services under default configurations.
Snapshot Performance
We ran a full snapshot (backfill) at scale factors 20 through 50 from SQL Server to ClickHouse Cloud, and tested Supermetal against Fivetran and Airbyte at each scale factor.
The gap widens at every scale factor: up to 7x faster than Fivetran and 5x faster than Airbyte.
-
Throughput: Supermetal sustained ~1.25M rows/sec (~240 MB/s) from SQL Server, consistent across all scale factors.
-
Parallelism: Large tables are chunked and read concurrently. Smaller tables complete in minutes, but total sync time is gated by
lineitem(~70% of the dataset). -
Pipelining: Parquet is staged to S3 while reads are still running. By the time the source read finishes, >85% of data is already in ClickHouse Cloud.
CDC Performance
We tested CDC across sustained rates from 1K to 30K ops/sec. Fivetran and Airbyte use batched scheduling (typically 5–60 minute sync intervals), making their real-time CDC latency not directly comparable.
-
Sustained latency: Supermetal maintains 2–3s end-to-end latency from 1K through 20K ops/sec.
-
Saturation: Throughput only drops off above 20K ops/sec, where SQL Server hits its own write limit (~23K ops/sec on this RDS configuration).
-
Target Performance: ClickHouse Cloud write latency remains flat at ~100ms across all load tiers. Supermetal stages Parquet to S3 for ClickHouse Cloud to pull, so ingestion does not bottleneck the pipeline.
-
Latency drivers: Two primary factors determined end-to-end CDC latency in this benchmark:
- SQL Server Polling (1s): The polling interval of the SQL Server CDC capture agent. This determines how quickly a committed transaction appears in the change tables for Supermetal to read.
- Supermetal Flush (250ms): The interval at which Supermetal bundles data into Parquet and stages it to S3. The default is 5s, configurable via the UI or API.
Try Supermetal
Supermetal ships high-performance connectors for SQL Server and ClickHouse, deployed as a single binary on any machine. No Kafka, no JVM, no cluster. The trial includes 1,000 hours of free sync.
curl -fsSL https://trial.supermetal.io/install.sh | shiwr -useb https://trial.supermetal.io/install.ps1 | iexQuickstart · SQL Server docs · ClickHouse docs · Architecture