Flatten rootfs qcow2 before archiving (cross-worker portability) by breardon2011 · Pull Request #132 · diggerhq/opencomputer

breardon2011 · 2026-04-14T04:53:59Z

Summary

After PR #128 merged, prod surfaced a residual cross-worker fork corruption: forks that land on a different worker from the one that created the checkpoint come up with EBADMSG on every file read. Same-worker forks work fine.

Root cause

The cached rootfs.qcow2 for each sandbox is a thin qcow2 overlay with backing file = /data/firecracker/images/default.ext4. The base ext4 image is rebuilt independently on each worker from the same Dockerfile, but:

mkfs.ext4 generates a random UUID per build.
The ext4 inode table layout can vary between builds.

So byte content of default.ext4 differs between workers even when logical filesystem content is identical. The qcow2 overlay only stores cluster deltas; unchanged clusters are resolved through the backing file. On a cross-worker download, the target worker resolves those unchanged clusters through ITS OWN default.ext4 (different bytes), and the guest's restored ext4 metadata (captured in the memory snapshot at savevm time) fails checksum verification → EBADMSG.

Verified by inspection: md5 of default.ext4 differs between oc-worker-1 and oc-worker-2; the cross-worker forks that failed were the ones that had to download from S3 and resolve through a different backing.

Fix

Before archiving, qemu-img rebase -b "" <rootfs> merges backing-file content into the overlay so the qcow2 is self-contained. Unlike qemu-img convert, rebase preserves internal savevm snapshots — critical because loadvm on the destination needs the cp-<id> snapshot intact.

Applied to both archival paths:

CreateCheckpoint — reflink-stage to a temp archive dir, flatten rootfs there, tar+upload. Leaves the local-fork cacheDir copy as a thin overlay so same-worker forks stay fast.
doHibernate — flatten in-place in archiveDir before tar.

Same approach as autoscaling-etc branch's MigrateToS3Flatten — that branch already solved this for live migration but the fix never made it to main.

Tradeoff

Archive size grows from ~150MB to ~1.5GB because the base ext4 content is now embedded. Necessary for cross-worker correctness; size optimization can come later via deterministic default.ext4 builds (fixed UUID + fixed hash_seed).

Test plan

Verified root cause by comparing md5 of default.ext4 across prod workers and inspecting the qcow2's backing file field.
After merge + deploy, re-run scripts/integration-tests/02-fork-no-corruption.ts against prod — all 10 forks should pass regardless of which worker they land on.
Re-run sdks/typescript/examples/test-secret-store-fork.ts against prod — should return to 27/27.

The cached rootfs.qcow2 for a sandbox is a thin qcow2 overlay with its backing file set to `/data/firecracker/images/default.ext4`. The base ext4 image is rebuilt independently on each worker from the same Dockerfile, but `mkfs.ext4` generates a random UUID per build, so the raw byte content of default.ext4 differs between workers even when the logical filesystem content is identical. When a checkpoint or hibernation archive is uploaded from worker A and downloaded on worker B, the qcow2's backing reference still points at the local path. Worker B resolves unchanged clusters through ITS OWN default.ext4, which has different bytes. The restored guest's ext4 metadata (captured in the memory snapshot at savevm time) references specific cluster contents and checksums; the mismatch surfaces in the guest as EBADMSG ("Bad message") on every file read — so `ip`, `hostname`, dynamic linker lookups, etc. all fail and the fork is effectively useless. Observed in prod (opencomputer-prod eastus2) with PR #128 deployed: 10-checkpoint fork test, 5/10 forks corrupt. Looking at the worker distribution, every fork that landed on the SAME worker as the source sandbox passed, and every fork that landed on the OTHER worker corrupted. md5 of default.ext4 differed between the two workers. Fix: before archiving, `qemu-img rebase -b "" <rootfs>` merges the backing file's data into the overlay so the qcow2 is fully self-contained and cross-worker portable. `rebase` preserves internal savevm snapshots (unlike `convert`), which is required so loadvm on the destination can still restore the "cp-<id>" snapshot. Applied to both archival paths: - CreateCheckpoint: reflink-stage the archive files to a temp dir, flatten rootfs there, tar+upload. Keeps the local-fork cacheDir copy as a thin overlay (fast same-worker forks still work). - doHibernate: flatten happens in-place in archiveDir before tar. Tradeoff: archive size grows from ~150MB to ~1.5GB because the base ext4 content is now embedded. Necessary for cross-worker correctness; optimization for size can come later via deterministic default.ext4 builds (fixed UUID + fixed hash_seed). Same approach used on the autoscaling-etc branch for MigrateToS3Flatten.

vercel · 2026-04-14T04:54:02Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
opensandbox	Ready	Preview, Comment	Apr 14, 2026 4:54am

vercel bot deployed to Preview April 14, 2026 04:54 View deployment

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Flatten rootfs qcow2 before archiving (cross-worker portability)#132

Flatten rootfs qcow2 before archiving (cross-worker portability)#132
breardon2011 wants to merge 1 commit intomainfrom
fix/cross-worker-rootfs-backing

breardon2011 commented Apr 14, 2026

Uh oh!

vercel bot commented Apr 14, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

breardon2011 commented Apr 14, 2026

Summary

Root cause

Fix

Tradeoff

Test plan

Uh oh!

vercel bot commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

vercel bot commented Apr 14, 2026 •

edited

Loading