Skip to content

Add chunked payload generation and network transport#50

Open
marcobambini wants to merge 17 commits into
mainfrom
codex/chunked-payloads-network
Open

Add chunked payload generation and network transport#50
marcobambini wants to merge 17 commits into
mainfrom
codex/chunked-payloads-network

Conversation

@marcobambini

@marcobambini marcobambini commented May 29, 2026

Copy link
Copy Markdown
Member

Summary

This PR adds chunk-aware payload generation and send-path transport support to sqlite-sync. It keeps the existing monolithic payload APIs intact while adding a streaming path for large rowsets and oversized individual BLOB/TEXT values.

Implemented changes:

  • Adds cloudsync_payload_chunks():
    • SQLite virtual table interface.
    • PostgreSQL set-returning function interface.
    • Three optional inputs: since_db_version, filter_site_id / local site id, and until_db_version.
    • Emits per-chunk metadata including chunk index, payload size, row count, db version range, and stable watermark.
  • Adds global persisted setting payload_max_chunk_size:
    • Default: 5 MB.
    • Technical minimum: 256 KB, with lower values clamped.
    • Controls only payload generation, not apply-time acceptance.
  • Adds v3 fragment payload support for oversized single BLOB/TEXT values:
    • Large values can be split across multiple transport chunks transparently.
    • Receivers stage fragments internally and apply the final value once all fragments arrive.
    • Duplicate fragment delivery is idempotent and stale incomplete fragment groups are cleaned up.
  • Keeps backward compatibility:
    • Existing cloudsync_payload_encode() remains supported for monolithic payloads.
    • cloudsync_payload_apply() accepts legacy payloads, monolithic payloads, and v3 fragment payloads regardless of the local chunk-size setting.
  • Updates cloudsync_network_send_changes():
    • Streams outgoing changes from cloudsync_payload_chunks() instead of building one large payload first.
    • Sends each chunk through the existing /apply backend contract, either inline as blob or through upload url.
    • Advances the local send checkpoint only after the chunk stream completes successfully.
    • Merges remote status responses monotonically while multiple chunks are in flight.
  • Updates PostgreSQL Docker debug images to build/install dblink, required by the test suite.
  • Updates API, performance docs, README, and changelog.
  • Adds extensive SQLite unit coverage and PostgreSQL SQL coverage for chunk sizing, large-value fragmentation, out-of-order fragments, stale fragment cleanup, and legacy monolithic apply compatibility.

Compatibility

Existing users of cloudsync_payload_encode() and current network APIs continue to work. The new chunking behavior is opt-in for direct SQL callers via cloudsync_payload_chunks(), and automatic for the built-in network send path. Incoming payload apply remains format-compatible with older payloads and does not reject payloads based on the local payload_max_chunk_size.

Companion backend PR

Testing

  • make
  • make unittest
  • make test reached and passed the SQLite/unit portion, then stopped at the remote e2e stage because INTEGRATION_TEST_DATABASE_ID is not set locally.
  • make postgres-docker-debug-rebuild
  • Full PostgreSQL suite via Docker container: psql -U postgres -d postgres -f test/postgresql/full_test.sql (Failures: 0). This was run inside the debug container because the debug PostgreSQL build listens on loopback inside the container and the host make postgres-docker-run-test connection is closed by that setup.
  • git diff --check

marcobambini and others added 2 commits May 29, 2026 11:58
…elpers

- cloudsync_payload_chunks: add exclude_filter_site_id flag (SQLite hidden
  column / PG 4th arg) to stream changes from all sites except filter_site_id,
  as the /check download path needs; setting it without a site_id is an error
- add cloudsync_uuid_text()/cloudsync_uuid_blob() scalar functions on SQLite and
  PostgreSQL to convert site_id between its 16-byte binary form and the canonical
  UUID string (tolerant of dashed/undashed input), so string-based callers can
  pass a site_id to cloudsync_payload_chunks
- sqlite vtab: rewrite best_index to assign argv in canonical column order,
  fixing a latent argument-ordering bug
- perf: throttle the v3 fragment stale-group GC to at most once per 60s per
  connection (cloudsync_context.last_fragment_cleanup), removing an O(n^2)
  full-table scan that ran on every applied fragment
- add PostgreSQL 1.0->1.1 migration for the new chunked-payload SQL surface
- build: neutralize the ambient build env for curl's ./configure (CURL_CONFIG_ENV)
  so exported LDFLAGS/CPPFLAGS/LIBS don't break it
- test: rename PG 39_payload_chunks.sql -> 52 (39 was duplicated); add multi-site
  exclude, UUID roundtrip and stale-GC-throttle coverage (SQLite unit + PG)
- docs: API.md (new argument + two functions) and CHANGELOG

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@andinux

andinux commented May 30, 2026

Copy link
Copy Markdown
Collaborator

Update — commit 92a048c

Adds the /check download-path support to cloudsync_payload_chunks plus related fixes.

Changes

  • exclude_filter_site_id flag on cloudsync_payload_chunks (SQLite hidden column / PG 4th arg): stream all sites except filter_site_id; error if set without a site_id.
  • New cloudsync_uuid_text() / cloudsync_uuid_blob() (SQLite + PG) to convert site_id between 16-byte binary and canonical UUID string, so string callers (the /check endpoint) can pass a site_id.
  • perf: throttled the v3 fragment stale-GC to once/60s per connection — removes an O(n²) full-table scan that ran on every applied fragment.
  • best_index rewritten to assign argv in canonical column order (latent arg-ordering bug fix).
  • PostgreSQL 1.0→1.1 migration for the new SQL surface.
  • build: hermetic env for curl ./configure (CURL_CONFIG_ENV).
  • tests: renamed 39_payload_chunks.sql → 52 (39 was duplicated); added multi-site exclude, UUID roundtrip, and stale-GC-throttle coverage.

Verified: SQLite unittest green; PG full_test.sql → Failures: 0.

TODO

  • Cross-engine v3 fragment round-trip test (SQLite ↔ PG) to lock the payload interop invariant.
  • Integration tests on CI: network send path + receive split-delivery (needs INTEGRATION_TEST_CHUNKED_DATABASE_ID).
  • Backend (sqliteai/cloudsync#45): make /check download client-capability-aware — monolithic/v2 for ≤1.0.x clients, chunked/v3 only for new ones; adopt the generate-once download model. EDIT: new implementation plan proposed with commit https://github.com/sqliteai/cloudsync/commit/a7ee135
  • Minor cleanups: strtoll(...,10) in dbutils_settings_get_value; contract comment on the memset-from-eof vtab reset; named constant for the fragment-sizing iteration count.

andinux and others added 13 commits June 1, 2026 18:55
Add dedicated integration coverage for chunked network sync using INTEGRATION_TEST_CHUNKED_DATABASE_ID and a single-table chunked_payload_items schema.

Exercise both oversized TEXT values split into multiple v3 fragment payloads and multi-row non-v3 payload streams, then send cleanup deletes to limit remote storage growth.

Rename the network trace build switch from SYNC_BENCH_DEBUG to the generic NETWORK_TRACE=1 so commands such as make NETWORK_TRACE=1 e2e and make NETWORK_TRACE=1 sync-bench compile with CLOUDSYNC_NETWORK_TRACE.
Expose INTEGRATION_TEST_CHUNKED_DATABASE_ID from repository secrets to the main build job so the dedicated chunked payload e2e tests can run in CI.

Forward the same variable into the linux-musl arm64 Docker container and Android emulator test script, matching the existing integration test secret handling.
Add a chunked network failure-path integration test using INTEGRATION_TEST_CHUNKED_DATABASE_ID and a local-only chunked_payload_failure_items schema.

Generate multiple non-v3 chunks, expect remote apply to fail because the table is absent remotely, and verify send_dbversion does not advance after the failed send.
Remove the dead old_eof placeholder from payload_chunks_filter.

Document the cursor layout contract around the memset-from-eof reset so future field moves preserve cursor-lifetime state and per-scan state ownership.
Introduce a shared CLOUDSYNC_PAYLOAD_FRAGMENT_SIZE_FIXPOINT_ITERATIONS constant for payload fragment planning.

Use the constant in both SQLite and PostgreSQL chunk planners and document why the bounded fixpoint loop is sufficient.
Change dbutils_settings_get_value to parse text-backed integer settings with base 10 instead of base 0.

This avoids surprising octal handling for values with leading zeroes while preserving the documented decimal byte values used by payload_max_chunk_size. Add a unit assertion that '010' reads back as 10.
The receive path advanced check_dbversion/check_seq per applied chunk to
the chunk's last row, which can fall mid-db_version. Since the server's
cloudsync_payload_chunks resumes on db_version > since with no seq cursor,
a stop between chunks of a split db_version silently skipped the un-applied
rows on the next /check (data loss).

Mirror the send path: advance the receive cursor only after the whole chunk
stream is applied, to the stream watermark, never per chunk. cloudsync_payload_apply
gains a C-level checkpoint argument (watermark / none / last-applied); the
/check response signals watermark + final chunk, and falls back to legacy
monolithic behavior when absent. The public single-arg SQL function and the
send path are unchanged; re-delivered rows stay idempotent.

Adds do_test_payload_chunks_split_dbversion reproducing a single db_version
split across >=2 v2 chunks with partial apply.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add a server-side spool so the /check download path can page a window's
chunk stream one chunk per call, instead of having the network driver
(libpq / sqlitecloud-go) re-materialize the whole stream into memory.

- cloudsync_payload_spool table + cloudsync_payload_spool_fill/_drop on
  both engines (SQLite C, lazy-create; PG plpgsql, table created at
  install). fill generates a window's chunks once (idempotent, atomic),
  marks the last chunk is_final, and self-GCs abandoned streams (24h TTL).
- cloudsync_network_check_internal echoes a best-effort page cursor
  to/from /check so the stateless server serves the next spool page;
  retrocompatible (optional response field, sent in every request).
- Tests: do_test_payload_spool (SQLite) and a spool block in
  52_payload_chunks.sql (PG) covering byte-identity vs direct generation,
  idempotent re-fill, empty window, drop, and stale-GC.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add cloudsync_payload_spool_drop_chunk(stream_id, chunk_index) for SQLite and PostgreSQL so the server can idempotently delete one S3-backed spool chunk after it is safely persisted elsewhere.

The implementation follows the existing spool drop convention by returning the deleted row count and keeps stream-level TTL/GC behavior unchanged. Tests cover scoped deletion, missing chunks, repeated drops, and preservation of other chunks and streams.
Add cloudsync_payload_blob_checked(since_db_version, since_seq, filter_site_id, exclude_filter_site_id, max_estimated_payload_size) for SQLite and PostgreSQL so legacy /check callers can request one payload in a single SQL round trip while the extension rejects oversized windows before materializing the BLOB.

The implementation captures a stable watermark, estimates the uncompressed monolithic payload size internally, preserves (db_version, seq) cursor semantics, returns NULL for empty windows, and raises a specific limit-exceeded error when the estimate is above the configured maximum.

Update PostgreSQL extension SQL/migration files, API docs, changelog, and SQLite/PostgreSQL tests covering successful encode, empty windows, include/exclude site filters, missing site errors, and limit failures.
Document that cloudsync_payload_blob_checked scans successful windows twice so callers understand the I/O tradeoff of guarding monolithic payload allocation.

Add cloudsync_payload_context_free() and use it from SQLite and PostgreSQL checked-blob paths so malloc-backed payload buffers are released on errors and NULL results.

Refactor the PostgreSQL encode pass to reuse the existing PayloadChunksState row extraction helpers, avoiding a duplicated 9-column SPI mapping while preserving the checked window semantics.
Advertise the check-spool-cursor capability so /check can return cursor-based spool pages.

Decode data.payload inline base64 responses and apply the raw CloudSync payload bytes directly, sharing the same watermark/final checkpoint handling used by URL downloads.

Keep the request field as cursor and read nextCursor from responses for the next page.
Update the Unreleased section to describe the check-spool-cursor capability and the /check cursor contract.

Document that inline data.payload base64 pages and data.url artifact pages are both supported, with cursor requests and nextCursor responses.
@andinux

andinux commented Jun 18, 2026

Copy link
Copy Markdown
Collaborator

Update on the chunked download path and the latest commits:

This branch now covers the end-to-end /check chunked receive flow, including the client-side cursor protocol and the database functions the CloudSync service can use to generate and page chunks safely.

What changed in the client network path:

  • The built-in /check client advertises X-CloudSync-Capabilities: check-status-response, check-spool-cursor.
  • The /check request sends cursor as the page index to request.
  • Cursor-mode responses use nextCursor when another page is available.
  • The client can now apply both response transports:
    • data.payload: base64-encoded raw CloudSync payload bytes, decoded and applied inline for small chunks.
    • data.url: artifact URL for larger chunks, downloaded and applied as before.
  • The same watermark/final semantics are used for inline and URL pages. Non-final pages apply without advancing the durable receive checkpoint; the final page advances check_dbversion to the stream watermark. This prevents the receive cursor from landing in the middle of a source db_version if a chunk stream is interrupted.
  • Legacy monolithic /check responses remain compatible: a top-level url with no watermark still uses the previous last-applied checkpoint behavior.

Database surface added for the CloudSync service:

  • cloudsync_payload_chunks(...) generates bounded transport chunks from the changes stream. It is available as a SQLite virtual table and a PostgreSQL set-returning function. It returns payload bytes plus chunk metadata such as chunk_index, payload size, row count, db_version min/max, and watermark.
  • payload_max_chunk_size controls chunk size. The default is 5 MB, with a 256 KB minimum clamp.
  • exclude_filter_site_id lets /check generate changes from every site except the requesting peer, so a client does not receive its own uploaded changes back.
  • cloudsync_payload_blob_checked(...) supports old clients that still need one monolithic response. It estimates payload size before materializing the legacy blob and rejects unsafe monolithic responses early.
  • cloudsync_uuid_text(...) and cloudsync_uuid_blob(...) convert between the binary 16-byte site_id stored by the extension and UUID strings used by service endpoints.

Download spool support for the service:

  • cloudsync_payload_spool stores a prepared chunk stream for a /check window.
  • cloudsync_payload_spool_fill(stream_id, since_db_version, filter_site_id, exclude_filter_site_id) generates the stream once into the spool. It is idempotent for a stream and marks only the last chunk as final.
  • The service can return one spooled page per /check call using the requested cursor and reply with nextCursor while more pages remain.
  • cloudsync_payload_spool_drop(stream_id) removes the whole prepared stream after completion or cancellation.
  • cloudsync_payload_spool_drop_chunk(stream_id, chunk_index) supports early cleanup for one URL-backed page after it has been safely persisted outside the spool.
  • Stale spool streams are garbage-collected during fill so abandoned streams do not accumulate indefinitely.

Small chunk optimization:

  • If the prepared page is small enough, the service can skip artifact storage and return data.payload inline as base64 of the raw payload bytes.
  • If the page is larger, the service returns the same metadata shape with data.url instead.
  • From the client perspective, both shapes are equivalent: apply the page, then advance the in-memory page cursor only after successful apply/download.

The latest commits documenting and implementing this are:

  • fb0f8fe feat(network): support inline check payloads
  • 2c1da93 docs(changelog): document inline check payloads

andinux and others added 2 commits June 18, 2026 15:51
…anges

Rework the chunked-download client API for ergonomics and caller control.

- cloudsync_network_sync() now drains an entire chunked /check stream in a
  single call, fetching already-available chunks back-to-back with no delay.
  wait_ms/max_retries are spent only while the server payload is not yet
  ready (HTTP 202), not while paging through chunks already available.
- Add cloudsync_network_receive_changes([max_chunks]) as the canonical
  receive function: drains all available chunks by default; max_chunks caps
  pages per call for progress/traffic control, resuming across calls via the
  in-memory page cursor. cloudsync_network_check_changes() is retained as a
  deprecated, fully-functional alias (removed in a future major).
- Add a shared network_drain_changes() helper backing both sync and
  receive_changes.
- Surface new JSON fields: receive.chunks/bytes/complete and send.chunks/bytes.
  receive.rows and receive.tables are now cumulative across the whole drain.

Docs (API.md, CHANGELOG), integration tests (single-sync drain, capped
receive, and deliberate alias coverage), the sync benchmark, the example
apps, and the .claude command docs are migrated to the new name.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@andinux

andinux commented Jun 19, 2026

Copy link
Copy Markdown
Collaborator

Follow-up commit 285bc69 — drain-all receive, chunk cap, and receive_changes rename

Reworks the chunked-download client API for ergonomics and caller control (extension only; no server/PostgreSQL changes). Behavior against legacy/monolithic servers is unchanged.

Behavior

  • cloudsync_network_sync() drains the whole chunked /check stream in a single call — already-available chunks are fetched back-to-back with no delay. wait_ms/max_retries are now spent only while the server payload is not yet ready (HTTP 202), not while paging through chunks that are already available. Previously a multi-chunk stream needed several sync() calls and wasted a wait_ms delay on each staged fragment.
  • New cloudsync_network_receive_changes([max_chunks]) is the canonical receive function:
    • drains all currently-available chunks by default (does not wait for not-yet-ready preparation);
    • optional max_chunks caps how many chunks are applied per call, for caller-driven progress / traffic control. The page cursor persists in memory on the network context, so a capped drain resumes on the next call — the caller just loops while receive.complete is false and manages no cursor itself. If the connection/process is lost mid-drain the cursor is lost safely: the next call re-drains from the start (idempotent apply, no skipped rows).
  • cloudsync_network_check_changes([max_chunks]) is now a deprecated, fully-functional alias of receive_changes (same behavior, same fields). To be removed in a future major.
  • Shared network_drain_changes() helper now backs both sync and receive_changes.

New JSON fields (additive)

  • receive.chunks (int), receive.bytes (int64), receive.complete (bool)
  • send.chunks (int), send.bytes (int64)
  • receive.rows / receive.tables are now cumulative across the whole drain (previously last-page only).
  • bytes = serialized (uncompressed) cloudsync payload bytes, transport-independent — same meaning on send and receive; not compressed wire size.

Docs / tests / migration

  • API.md: new receive_changes section (max_chunks, drain-all + lost-cursor-is-safe note, all fields), check_changes deprecation stub, send/sync field additions, TOC.
  • CHANGELOG.md: Added / Deprecated / Changed entries (same unreleased 1.1.0 — no version bump).
  • Integration tests: single-call multi-chunk drain (chunks>1, complete=1); capped-receive (receive_changes(1) never applies >1 chunk, reports complete=0 mid-stream, resumes to completion); one test kept on the check_changes alias for backward-compat coverage.
  • Migrated the sync benchmark, both example apps, and the .claude command docs to receive_changes.

Verification

make, make NETWORK_TRACE=1, and make dist/integration build clean under -Wall -Wextra; make unittest passes; smoke-tested that all four signatures register; git diff --check clean. The chunked e2e tests run only against a companion-server deployment with INTEGRATION_TEST_CHUNKED_DATABASE_ID set.

Inline-vs-S3 transport remains intentionally non-assertable in the public JSON (no transport field); verify the inline path manually with a NETWORK_TRACE=1 build ([cloudsync-network] check page transport=inline …).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants