Add chunked payload generation and network transport#50
Conversation
…elpers - cloudsync_payload_chunks: add exclude_filter_site_id flag (SQLite hidden column / PG 4th arg) to stream changes from all sites except filter_site_id, as the /check download path needs; setting it without a site_id is an error - add cloudsync_uuid_text()/cloudsync_uuid_blob() scalar functions on SQLite and PostgreSQL to convert site_id between its 16-byte binary form and the canonical UUID string (tolerant of dashed/undashed input), so string-based callers can pass a site_id to cloudsync_payload_chunks - sqlite vtab: rewrite best_index to assign argv in canonical column order, fixing a latent argument-ordering bug - perf: throttle the v3 fragment stale-group GC to at most once per 60s per connection (cloudsync_context.last_fragment_cleanup), removing an O(n^2) full-table scan that ran on every applied fragment - add PostgreSQL 1.0->1.1 migration for the new chunked-payload SQL surface - build: neutralize the ambient build env for curl's ./configure (CURL_CONFIG_ENV) so exported LDFLAGS/CPPFLAGS/LIBS don't break it - test: rename PG 39_payload_chunks.sql -> 52 (39 was duplicated); add multi-site exclude, UUID roundtrip and stale-GC-throttle coverage (SQLite unit + PG) - docs: API.md (new argument + two functions) and CHANGELOG Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Update — commit 92a048cAdds the Changes
Verified: SQLite TODO
|
Add dedicated integration coverage for chunked network sync using INTEGRATION_TEST_CHUNKED_DATABASE_ID and a single-table chunked_payload_items schema. Exercise both oversized TEXT values split into multiple v3 fragment payloads and multi-row non-v3 payload streams, then send cleanup deletes to limit remote storage growth. Rename the network trace build switch from SYNC_BENCH_DEBUG to the generic NETWORK_TRACE=1 so commands such as make NETWORK_TRACE=1 e2e and make NETWORK_TRACE=1 sync-bench compile with CLOUDSYNC_NETWORK_TRACE.
Expose INTEGRATION_TEST_CHUNKED_DATABASE_ID from repository secrets to the main build job so the dedicated chunked payload e2e tests can run in CI. Forward the same variable into the linux-musl arm64 Docker container and Android emulator test script, matching the existing integration test secret handling.
Add a chunked network failure-path integration test using INTEGRATION_TEST_CHUNKED_DATABASE_ID and a local-only chunked_payload_failure_items schema. Generate multiple non-v3 chunks, expect remote apply to fail because the table is absent remotely, and verify send_dbversion does not advance after the failed send.
Remove the dead old_eof placeholder from payload_chunks_filter. Document the cursor layout contract around the memset-from-eof reset so future field moves preserve cursor-lifetime state and per-scan state ownership.
Introduce a shared CLOUDSYNC_PAYLOAD_FRAGMENT_SIZE_FIXPOINT_ITERATIONS constant for payload fragment planning. Use the constant in both SQLite and PostgreSQL chunk planners and document why the bounded fixpoint loop is sufficient.
Change dbutils_settings_get_value to parse text-backed integer settings with base 10 instead of base 0. This avoids surprising octal handling for values with leading zeroes while preserving the documented decimal byte values used by payload_max_chunk_size. Add a unit assertion that '010' reads back as 10.
The receive path advanced check_dbversion/check_seq per applied chunk to the chunk's last row, which can fall mid-db_version. Since the server's cloudsync_payload_chunks resumes on db_version > since with no seq cursor, a stop between chunks of a split db_version silently skipped the un-applied rows on the next /check (data loss). Mirror the send path: advance the receive cursor only after the whole chunk stream is applied, to the stream watermark, never per chunk. cloudsync_payload_apply gains a C-level checkpoint argument (watermark / none / last-applied); the /check response signals watermark + final chunk, and falls back to legacy monolithic behavior when absent. The public single-arg SQL function and the send path are unchanged; re-delivered rows stay idempotent. Adds do_test_payload_chunks_split_dbversion reproducing a single db_version split across >=2 v2 chunks with partial apply. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add a server-side spool so the /check download path can page a window's chunk stream one chunk per call, instead of having the network driver (libpq / sqlitecloud-go) re-materialize the whole stream into memory. - cloudsync_payload_spool table + cloudsync_payload_spool_fill/_drop on both engines (SQLite C, lazy-create; PG plpgsql, table created at install). fill generates a window's chunks once (idempotent, atomic), marks the last chunk is_final, and self-GCs abandoned streams (24h TTL). - cloudsync_network_check_internal echoes a best-effort page cursor to/from /check so the stateless server serves the next spool page; retrocompatible (optional response field, sent in every request). - Tests: do_test_payload_spool (SQLite) and a spool block in 52_payload_chunks.sql (PG) covering byte-identity vs direct generation, idempotent re-fill, empty window, drop, and stale-GC. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add cloudsync_payload_spool_drop_chunk(stream_id, chunk_index) for SQLite and PostgreSQL so the server can idempotently delete one S3-backed spool chunk after it is safely persisted elsewhere. The implementation follows the existing spool drop convention by returning the deleted row count and keeps stream-level TTL/GC behavior unchanged. Tests cover scoped deletion, missing chunks, repeated drops, and preservation of other chunks and streams.
Add cloudsync_payload_blob_checked(since_db_version, since_seq, filter_site_id, exclude_filter_site_id, max_estimated_payload_size) for SQLite and PostgreSQL so legacy /check callers can request one payload in a single SQL round trip while the extension rejects oversized windows before materializing the BLOB. The implementation captures a stable watermark, estimates the uncompressed monolithic payload size internally, preserves (db_version, seq) cursor semantics, returns NULL for empty windows, and raises a specific limit-exceeded error when the estimate is above the configured maximum. Update PostgreSQL extension SQL/migration files, API docs, changelog, and SQLite/PostgreSQL tests covering successful encode, empty windows, include/exclude site filters, missing site errors, and limit failures.
Document that cloudsync_payload_blob_checked scans successful windows twice so callers understand the I/O tradeoff of guarding monolithic payload allocation. Add cloudsync_payload_context_free() and use it from SQLite and PostgreSQL checked-blob paths so malloc-backed payload buffers are released on errors and NULL results. Refactor the PostgreSQL encode pass to reuse the existing PayloadChunksState row extraction helpers, avoiding a duplicated 9-column SPI mapping while preserving the checked window semantics.
Advertise the check-spool-cursor capability so /check can return cursor-based spool pages. Decode data.payload inline base64 responses and apply the raw CloudSync payload bytes directly, sharing the same watermark/final checkpoint handling used by URL downloads. Keep the request field as cursor and read nextCursor from responses for the next page.
Update the Unreleased section to describe the check-spool-cursor capability and the /check cursor contract. Document that inline data.payload base64 pages and data.url artifact pages are both supported, with cursor requests and nextCursor responses.
|
Update on the chunked download path and the latest commits: This branch now covers the end-to-end /check chunked receive flow, including the client-side cursor protocol and the database functions the CloudSync service can use to generate and page chunks safely. What changed in the client network path:
Database surface added for the CloudSync service:
Download spool support for the service:
Small chunk optimization:
The latest commits documenting and implementing this are:
|
…anges Rework the chunked-download client API for ergonomics and caller control. - cloudsync_network_sync() now drains an entire chunked /check stream in a single call, fetching already-available chunks back-to-back with no delay. wait_ms/max_retries are spent only while the server payload is not yet ready (HTTP 202), not while paging through chunks already available. - Add cloudsync_network_receive_changes([max_chunks]) as the canonical receive function: drains all available chunks by default; max_chunks caps pages per call for progress/traffic control, resuming across calls via the in-memory page cursor. cloudsync_network_check_changes() is retained as a deprecated, fully-functional alias (removed in a future major). - Add a shared network_drain_changes() helper backing both sync and receive_changes. - Surface new JSON fields: receive.chunks/bytes/complete and send.chunks/bytes. receive.rows and receive.tables are now cumulative across the whole drain. Docs (API.md, CHANGELOG), integration tests (single-sync drain, capped receive, and deliberate alias coverage), the sync benchmark, the example apps, and the .claude command docs are migrated to the new name. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Follow-up commit
|
Summary
This PR adds chunk-aware payload generation and send-path transport support to sqlite-sync. It keeps the existing monolithic payload APIs intact while adding a streaming path for large rowsets and oversized individual BLOB/TEXT values.
Implemented changes:
cloudsync_payload_chunks():since_db_version,filter_site_id/ local site id, anduntil_db_version.payload_max_chunk_size:cloudsync_payload_encode()remains supported for monolithic payloads.cloudsync_payload_apply()accepts legacy payloads, monolithic payloads, and v3 fragment payloads regardless of the local chunk-size setting.cloudsync_network_send_changes():cloudsync_payload_chunks()instead of building one large payload first./applybackend contract, either inline asblobor through uploadurl.dblink, required by the test suite.Compatibility
Existing users of
cloudsync_payload_encode()and current network APIs continue to work. The new chunking behavior is opt-in for direct SQL callers viacloudsync_payload_chunks(), and automatic for the built-in network send path. Incoming payload apply remains format-compatible with older payloads and does not reject payloads based on the localpayload_max_chunk_size.Companion backend PR
Testing
makemake unittestmake testreached and passed the SQLite/unit portion, then stopped at the remote e2e stage becauseINTEGRATION_TEST_DATABASE_IDis not set locally.make postgres-docker-debug-rebuildpsql -U postgres -d postgres -f test/postgresql/full_test.sql(Failures: 0). This was run inside the debug container because the debug PostgreSQL build listens on loopback inside the container and the hostmake postgres-docker-run-testconnection is closed by that setup.git diff --check