How Goldsky Migrated Terabytes of Mission-Critical Web3 Data

30 TB

Data Migrated

18.9B

Rows Migrated

$500K

Revenue Expansion

Databases

70,806

Tables

$1M+

Unlocked in late-stage pipeline

Goldsky

Goldsky is the modern backend for crypto-enabled products. Goldsky handles the hard parts of building on crypto rails: realtime data, reliable connectivity, and onchain execution, so teams can ship better products, faster.

One of Goldsky's core products is Subgraphs, a managed blockchain indexing service. The service filters and transforms data from smart contracts and entire chains, then exposes it through GraphQL APIs. This means developers can query blockchain data without running their own nodes or managing databases.

The Challenge

Goldsky needed to migrate Graph Node deployments backed by multi-terabyte Postgres databases built over years of block processing.

The default approach would be to re-index from genesis, but this wasn't viable. The team projected:

Weeks to months of rebuild time processing millions of blocks and events
Seven figures in archive node access and compute costs
Missing an upcoming migration deadline

The better option was to migrate the Postgres database directly. Graph Node reads its state from Postgres and resumes syncing from where it left off, so a database migration would preserve all the indexing work already done.

But the database migration had its own problems:

1:1 Schema Preservation: Goldsky's application layer depends on the exact structure of Graph Node's schema. Any changes to target data types, indexes, or partitioning would break production queries. Most migration tools create their own target schema (approximation) and don't preserve the source to target mapping 1:1.
Schema Complexity: Graph Node uses advanced Postgres features like custom types, arrays, block-range partitioning, multi-level TOAST columns, and full-text/range types.
Orchestration: The team needed to migrate schemas in a specific priority order through an API-driven workflow, not manual orchestration.

Goldsky needed a migration tool that could replicate Graph Node's complex schema exactly and provide an API to orchestrate schema-by-schema migration.

Why Supermetal

Schema fidelity: Supermetal replicates Postgres schemas 1:1. No schema transformations or approximations.
Performance on complex tables: Built on Rust and Apache Arrow, Supermetal maintains high throughput even on wide, partitioned tables.
Single binary: Runs as a single binary with no external dependencies or services.
Zero data egress: All data processing happens inside the customer's VPC. Supermetal never accesses customer data or infrastructure.

Migration Deep Dive

Paymahn Moghadasian, Lead Engineer at Goldsky, was responsible for the migration.

Paymahn deployed Supermetal as a single pod (single binary) in Goldsky's Kubernetes cluster. He ran a few quick migrations to gain confidence in Supermetal and understand the underlying performance characteristics.

Once comfortable, Paymahn used Supermetal's API to write scripts that orchestrated the full migration. The scripts provisioned connectors for each schema in priority order, letting him control which schemas migrated first. He monitored progress through Supermetal's API. The entire workflow was scriptable, so Paymahn could repeat the process across multiple Graph Node deployments without manual intervention.

Goldsky maintained full control over the deployment. They owned networking, IAM, monitoring, and change control. Supermetal ran entirely inside their VPC, and no Supermetal operator accessed production data or infrastructure.

Supermetal saved us probably weeks to months of manual work writing our own data migration software. Supermetal handled our scale with ease and the team behind Supermetal was incredibly responsive to any edge cases we discovered. We have since found new use cases for Supermetal in our business.
- Paymahn Moghadasian

Performance

Supermetal's snapshot performance comes from parallel chunking. The agent divides large tables into chunks based on Postgres ctid ranges and processes multiple chunks simultaneously. This happens both within a single table (intra-table parallelization) and across multiple tables (inter-table parallelization). For partitioned tables, Supermetal scans child partitions independently rather than treating the whole table as a single unit.

Diving deep into a specific high-throughput workload: 322 million rows spread across 22 tables. This included seven distinct tables, each with ~40 million rows. Supermetal read the entire workload completely in just 7 minutes.

Row Throughput

Read rate peaked at 1.58M rows/sec

Tables22

Rows322M

Outcome

Goldsky successfully migrated 30 TB (18.9 billion rows) across 24 production databases. The database migration avoided months of re-indexing and seven figures in RPC and compute costs, over $500K in revenue expansion tied to time-sensitive launches, unlocking 1M+ in late-stage pipeline. In Paymahn's words:

Supermetal is easy to use, highly reliable and backed by a very competent team. This migration went smoothly and now we are planning on using Supermetal in other parts of our business.
- Paymahn Moghadasian

What's Next

After saving months of re-indexing time and eliminating seven figures in projected infrastructure costs, Goldsky is evaluating Supermetal across its various products for use cases like change data capture (CDC) to Kafka, product analytics and a real time customer data platform (CDP).

Get in Touch

Supermetal is a low footprint real time data integration platform that provides unparalleled compute economics, reliability and security all within your infrastructure.

Try Supermetal or contact us to discuss your data integration needs.