Cucumber Expressions for Cross-Chain Development

/ ARTICLE

Here’s a situation I keep running into: you’re building a protocol that lives on multiple chains. Maybe it’s EVM plus Solana. Maybe you’ve added TON or Aptos. The smart contracts are written in completely different languages, the toolchains have nothing in common, and somewhere around week three you realize the validation logic on Chain B doesn’t match what Chain A does anymore.

Nobody changed it on purpose. It just… drifted.

The multi-chain testing problem

Working across ecosystems is already hard when the tech is similar. EVM-to-EVM, you’re still dealing with different gas models, different block times, slightly different precompile support. But at least you’re writing Solidity on both sides.

Once you cross VM boundaries, the complexity goes up by an order of magnitude. Solana programs are Rust (or Anchor-flavored Rust). TON contracts are Tact or FunC. Aptos and Sui use different dialects of Move. Each ecosystem has its own testing frameworks, its own deployment tooling, its own way of thinking about state.

So what happens in practice? A few things, and none of them are good:

Specs don’t get written. There’s no time. You’re already juggling three different development environments, learning the quirks of each chain’s SDK. Writing formal specifications feels like a luxury.

Logic gets missed. Validation that exists on one chain’s contracts doesn’t make it into the other chain’s implementation. Or it does, but the edge cases are handled differently. I ran into this on a permissioned RWA protocol built for institutional investors. On one chain, a specific user permission level bypassed the destination address check entirely. On another chain, the receiver address was never verified at all. For a permissioned protocol where you need to know exactly who’s holding what at all times, that’s a serious problem. Role-based access control is already tricky on a single chain. Spread it across multiple ecosystems with different account models and permission patterns, and the surface area for these kinds of gaps gets wide fast.

Specs drift. This one’s the most insidious. You change the implementation on one chain (maybe a new fee structure, maybe a different slippage calculation), and the other chain’s contracts don’t get updated. The specs, if they existed, are now wrong for at least one chain. The gap widens silently.

Where Cucumber expressions fit in

Cucumber expressions (and the broader Gherkin syntax) aren’t new. They’ve been around in web development for years. But they solve a specific problem that maps surprisingly well to cross-chain development: they let you describe behavior in plain language, then wire that description to chain-specific implementations underneath.

A feature file might look like this:

Feature: Cross-chain token transfer

  Scenario: User transfers tokens with valid amount
    Given a user with 1000 tokens on the source chain
    When the user initiates a transfer of 500 tokens to the destination chain
    Then the source chain balance should be 500
    And the destination chain should receive a transfer message for 500 tokens

  Scenario: Transfer below minimum threshold is rejected
    Given a user with 1000 tokens on the source chain
    When the user initiates a transfer of 5 tokens to the destination chain
    Then the transfer should be rejected with "below minimum threshold"

The scenarios read like plain English. A product manager can understand them. A new developer joining the team can read them on day one and know what the system is supposed to do.

But here’s the part that matters for cross-chain: the step definitions behind these scenarios are chain-specific. Your “Given a user with 1000 tokens on the source chain” step has one implementation that sets up a Solana account with the right token balance, and another that deploys an EVM test fixture. The Gherkin layer doesn’t care. It just checks that the behavior matches.

// Step definition for EVM
Given('a user with {int} tokens on the source chain', async function(amount) {
  this.userAccount = await deployEVMFixture();
  await this.tokenContract.mint(this.userAccount.address, amount);
});

// Step definition for Solana
Given('a user with {int} tokens on the source chain', async function(amount) {
  this.userKeypair = Keypair.generate();
  await mintTo(this.connection, this.payer, this.mint,
    this.userTokenAccount, this.mintAuthority, amount);
});

You run the same feature files against each chain’s implementation. If the behavior diverges, you’ll know.

The multi-contract blind spot

There’s another thing that makes cross-chain testing hard, and it’s subtler: most implementations aren’t a single contract. They’re a system of contracts that interact with each other. A token contract, an access control contract, a vault, a router. On each chain, the responsibilities might be split differently.

I was once convinced that a specific contract in the system was handling a particular piece of validation. It wasn’t. The validation I was thinking of lived in a different contract on that chain, and the interaction between the two just happened to produce the right behavior in the cases I’d tested manually. In the cases I hadn’t tested, it didn’t.

This is exactly the kind of thing that Cucumber-style testing catches. Because you’re writing scenarios that describe the full end-to-end behavior (“when a user with role X tries to transfer to address Y, it should be rejected”), you’re testing the entire contract system as a unit. You don’t need to know which contract is responsible for which check. You just verify that the outcome is correct.

Compare that to the alternative: going repo by repo, contract by contract, writing unit tests for each piece in isolation. It’s tedious, it’s error-prone, and it gives you false confidence. All your unit tests can pass while the system as a whole does the wrong thing.

What this actually buys you

The setup cost is real. You need to write the feature files, wire up step definitions for each chain, and maintain the glue code. It’s not free.

But the payoff is that you’re testing observable outcomes rather than implementation details. You’re not asserting that a specific Rust struct has a specific value. You’re asserting that a transfer of 500 tokens results in a balance of 500. That’s something you can verify across any chain, regardless of what’s happening under the hood.

There’s also way less test rewriting. When the implementation changes (you refactor how two contracts interact, or you move logic from one contract to another), the feature files don’t change. The behavior is the same, so the scenarios still pass. You only update the step definitions if the way you set up or query the chain changes. Contrast that with traditional unit tests where a refactor can mean rewriting dozens of tests that were tightly coupled to the old structure.

It creates a living spec, too. When the product requirements change, you update the feature file first. If Chain A’s implementation passes the new scenarios but Chain B’s doesn’t, the drift is immediately visible. You’re not relying on someone remembering to update a Confluence page.

And the readability matters more than you might think. When you’re bringing on a Solana specialist to work on one chain’s implementation, they can read the feature files and understand what the system needs to do without learning Solidity. When a Move developer picks up the Aptos implementation, they have the same reference point.

The tooling gap problem

I want to be honest about a real friction point here. When you’re building cross-chain, there’s a decent chance your backend services don’t all use the same language. Maybe your orchestrator is in Rust because most of your team knows it, but the TON ecosystem has better TypeScript libraries. I ran into this on a project involving Tact smart contracts on TON: the Rust crate ecosystem for TON is thin, and the crates that do exist are missing features. We ended up needing a small service written in TypeScript just to handle TON interactions.

Cucumber has implementations in most languages (cucumber-js, cucumber-rs, godog for Go), so you can write step definitions in whatever language each chain’s tooling demands. The feature files stay the same regardless. That said, if your step definitions span three languages, you’re now maintaining three sets of glue code. It’s a tradeoff.

A note on devops

There’s a related problem that Cucumber doesn’t directly solve but can help with: operational visibility.

If something breaks on one chain’s contracts, you probably need a specialist to even understand the error. A Solana program returning Error: 0x1 means nothing to someone who’s only worked with EVM reverts. You need runbooks, and those runbooks need to be readable by people who aren’t deep in each chain’s internals.

I’ve been thinking about pairing Cucumber scenarios with operational runbooks. The scenarios describe what correct behavior looks like. The runbooks describe what to check when the behavior is wrong. Together, they give your on-call engineers something to work with even when the chain-specific expert isn’t available.

This is more speculative, and I haven’t fully fleshed it out yet. But the pattern of “describe the expected behavior in plain language, then have chain-specific implementations underneath” seems like it could apply to monitoring and incident response too, not just testing.

Is it worth the setup cost?

It depends on your project. If you’re building on a single chain, this is overkill. If you’re on two EVM chains, you can probably get away with shared Solidity test suites and some scripting.

But if you’re genuinely multi-VM (EVM plus Solana, or three-plus ecosystems), and the contracts need to maintain behavioral parity, I think the upfront investment pays for itself quickly. The first time your CI catches a spec drift that would’ve made it to testnet, you’ll have saved more time than the setup cost.

The feature files also serve as documentation that stays honest, since they fail when they’re wrong. That alone puts them ahead of every Google Doc spec I’ve ever worked with.

If you want to try this, cucumber.io has solid docs for getting started. The feature file syntax takes maybe an hour to learn. The real work is in the step definitions, and that’s work you’d be doing anyway in your test suites. You’re just structuring it differently.