Back to Blog

Announcing Apache Doris Target

Direct CDC from operational databases to VeloDB Cloud and Apache Doris with native Merge-on-Write.

Shubham Sinha, Co-Founder

Supermetal now replicates into Apache Doris and VeloDB Cloud from every supported source: Postgres, MySQL, MongoDB, SQL Server, and Oracle. CDC updates and deletes flow through Doris's native Merge-on-Write.

A single Rust binary handles both snapshot and CDC, loading into Doris via the S3 TVF or Stream Load.

Single Process CDC

A typical Doris CDC stack runs four components in sequence:

┌───────────┐     ┌───────────┐     ┌───────────┐     ┌─────────────┐     ┌───────────────┐
│ Source DB │ ──► │ Debezium  │ ──► │ Kafka     │ ──► │ Flink/Spark │ ──► │ Apache Doris  │
└───────────┘     └───────────┘     └───────────┘     └─────────────┘     └───────────────┘
                  (Source Conn.)  (Message Broker)  (Compute Cluster)

Debezium decodes the change log to row-oriented Avro or JSON for Kafka. Flink decodes from Kafka, transforms, and re-encodes for Stream Load. Every hop pays per-row encode/decode.

Apache Doris is built for high-throughput, low-latency ingestion. The multi-hop pipeline limits throughput and adds latency at every hop. Every failure traces through three systems.

Supermetal runs as a single process, deployed directly in your infrastructure:

┌───────────┐     ┌────────────┐     ┌──────────────┐     ┌─────────────────┐
│ Source DB │ ──► │ Supermetal │ ──► │ Object Store │ ──► │  VeloDB Cloud / │
└───────────┘     └────────────┘     │  (optional)  │     │  Apache Doris   │
                                     └──────────────┘     └─────────────────┘
                                       S3 / Azure

Supermetal encodes rows once into Arrow at the source. They stay columnar through Parquet into Doris. With an object store buffer, Doris pulls those Parquet files via the S3 TVF. Without one, it writes Parquet to local disk and sends it to Doris via Stream Load.

Updates and Deletes

Supermetal creates Doris Unique Key tables with Merge-on-Write for every CDC target.

With a source primary key, the Unique Key uses those columns and _sm_version (derived from the source's transaction-log position) is the sequence column for merge ordering under retries. Deletes set Doris's hidden __DORIS_DELETE_SIGN__.

Without a source primary key, the Unique Key uses _sm_id, a row-content hash. Replays and retries dedupe against the hash, keeping inserts idempotent. _sm_version and _sm_deleted are regular columns. Schema changes (column drops or renames) invalidate the hash and require resync.

Performance

Postgres to VeloDB Cloud on the TPC-H dataset. The snapshot covers SF10–SF50. CDC runs at 5K–50K ops/sec.

Source
Postgres on AWS RDS
db.m5.2xlarge
8 vCPU / 32 GB RAM
400 GiB gp3
us-west-2
Supermetal
AWS EC2
m8azn.xlarge
4 vCPU / 16 GB RAM
Amazon Linux 2023
us-west-2
Target
VeloDB Cloud
VeloDB cloud-26.03
8 vCPU / 64 GB RAM
400 GB cache
us-west-2
Cumulative Row Volume
433M rows in 6m 11s
Duration6m 11s
Rows433M

SF50 (433M rows) loads in 6m 11s. SF10 (86.6M rows) finishes in 1m 16s. Supermetal reads all 8 tables in parallel at a sustained ~1.5M rows/sec (~290 MB/sec), writing Parquet to object storage. Doris pulls those files via the S3 TVF while the source read is still running.

CDC: Latency Under Load
Postgres to VeloDB Cloud · 5s flush interval · p100 ~7–9s through 25K ops/sec
p100 End-to-End~7–9s
Target Load
Throughput
Total (p100)

End-to-end p100 latency stays 7–9s at 5K–25K ops/sec. The 5s floor is the default flush interval (configurable), with Doris write under 2s at every tier.

Throughput matches the target through 30K ops/sec, slipping to 96% at 40K. Postgres logical decoding is single-threaded and saturates around 40K rows/sec on this RDS instance. Latency growth above that point is the source, not Supermetal. The Breakdown View shows read latency climbing at higher tiers.


Get started in minutes

curl -fsSL https://trial.supermetal.io/install.sh | sh
iwr -useb https://trial.supermetal.io/install.ps1 | iex

Questions? Check out our docs or reach out to us.