feat(rivetkit): sqlite vfs v2#4673
feat(rivetkit): sqlite vfs v2#4673NathanFlurry wants to merge 19 commits into04-15-chore_rename_sandbox_-_kitchen_sinkfrom
Conversation
|
🚅 Deployed to the rivet-pr-4673 environment in rivet-frontend
|
27742e1 to
83bef30
Compare
95ee628 to
9f5f7b3
Compare
Preview packages published to npmInstall with: npm install rivetkit@pr-4673All packages published as Engine binary is shipped via Docker images: docker pull rivetdev/engine:slim-bd7bc7b
docker pull rivetdev/engine:full-bd7bc7bIndividual packagesnpm install rivetkit@pr-4673
npm install @rivetkit/react@pr-4673
npm install @rivetkit/rivetkit-native@pr-4673
npm install @rivetkit/workflow-engine@pr-4673 |
9f5f7b3 to
2aff233
Compare
2aff233 to
61589f3
Compare
xWrite in the v2 VFS was calling resolve_pages for every dirty page, even when the write was a full-page aligned overwrite. For page numbers beyond db_size_pages (newly allocated pages), this meant fetching from the engine to get data that doesn't exist. A 256-row INSERT transaction (~1MB of dirty data) was making 288 round trips to the engine: one get_pages RTT per new page allocation. At 128ms staging RTT that's ~37s of theoretical network time vs the ~130ms needed for just the final commit. Now: - Full-page aligned writes (offset % page_size == 0 && amt % page_size == 0) skip resolve_pages entirely and use a zero-filled page as the base. - Partial writes filter out pages > db_size_pages before calling resolve_pages, synthesizing zero pages locally for new allocations. Direct engine profiling tests verify 0 engine fetches for 1MB, 5MB, and 100-hot-row-updates workloads. Adds new stress-test workloads to the kitchen-sink bench: churnInsertDelete, mixedOltpLarge, growingAggregation, indexCreationOnLargeTable, bulkUpdate1000Rows, truncateAndRegrow, manySmallTables.
The commit, compaction, and takeover paths all read meta_key and then write meta_key within the same transaction. Meta reads were using Snapshot isolation which does NOT register a read conflict range in UniversalDB (and FoundationDB semantics). Without a read conflict range, two concurrent transactions could both read meta at the same snapshot, both plan mutations based on that stale read, and both commit with last-write-wins semantics. For commit vs compaction this caused head_txid to rewind: commit advances from N to N+1, compaction then writes its own meta with head=N preserved (from its snapshot), overwriting the commit's head=N+1. The VFS sees its own successful commit response but the engine state is behind, and the next commit fence-mismatches with 'commit head_txid X+1 did not match current head_txid X'. Fix: introduce tx_get_value_serializable in udb and use it for meta reads in the fast-path commit, the compaction shard write, and the takeover TOCTOU re-check. Serializable reads register the key in the read conflict range so concurrent writes by other transactions trigger a retry. Adds regression tests in the v2 VFS Direct-engine harness: - autocommit_inserts_maintain_head_txid_consistency - autocommit_survives_close_reopen - concurrent_multi_actor_autocommits - stress_concurrent_multi_actor_autocommits Local bench against the engine now passes Insert single x100/x1000/x10000 (autocommit workloads) that previously timed out or fence-mismatched. The bench still has occasional fence errors under heavy concurrent multi-actor load. Those do not reproduce in the Direct engine stress test (5 actors x 200 interleaved commits), so the remaining issue is not in the storage layer and needs separate investigation.
Both the RocksDB and Postgres drivers called `substitute_versionstamp_if_incomplete` inside `Operation::Set` paths, which reads the last 4 bytes of the value as an "offset" and unconditionally truncates them before checking whether the value is actually a versionstamped payload. The helper then swallows the validation error, so a value whose trailing 4 bytes happen to form a plausible u32 pointing at a 0x33 byte gets silently corrupted: 4 bytes lost, no error surfaced. This showed up in the v2 SQLite slow path: large transactions (~50 MiB) split into 7-8 MiB staged chunks, UDB chunks the serde_bare blob into 10 KiB physical chunks, and a small fraction of those chunks have trailing bytes that look like a versionstamp offset. One corrupted chunk is enough to break serde_bare decoding at commit_finalize time with "unexpected end of input", aborting the commit. Versionstamp substitution only belongs on SetVersionstampedKey and SetVersionstampedValue mutation types, which go through AtomicOp. Plain `Set` must preserve the bytes verbatim, so this change removes the substitution from both driver Set paths and drops the now-unused import. Adds bench-parity Direct-engine tests in rivetkit-typescript/packages/sqlite-native/src/v2/vfs.rs so kitchen-sink bench workloads (large TX insert 500KB/10MB/50MB, churn insert/delete, mixed OLTP, bulk update 1000 rows, truncate + regrow, many small tables, index creation on 10k rows, growing aggregation, insert TX x10000) run against the real engine without the full stack. The 50MB test is the minimal reproducer for this bug and failed reliably before the fix.
…tories (US-060, US-061, US-062)
SQLite VFS v2 Review (Updated 2026-04-18)This updates my previous review, reflecting commits through US-062 and the latest chore commits. Improvements Since Last Review
IssuesBlocker: Temporary
|
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
04be7d5 to
87f6d92
Compare

Description
Please include a summary of the changes and the related issue. Please also include relevant motivation and context.
Type of change
How Has This Been Tested?
Please describe the tests that you ran to verify your changes.
Checklist: