Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .github/workflows/main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -89,6 +89,7 @@ jobs:
INTEGRATION_TEST_CLOUDSYNC_ADDRESS: ${{ secrets.INTEGRATION_TEST_CLOUDSYNC_ADDRESS }}
INTEGRATION_TEST_OFFLINE_DATABASE_ID: ${{ secrets.INTEGRATION_TEST_OFFLINE_DATABASE_ID }}
INTEGRATION_TEST_FAILURE_DATABASE_ID: ${{ secrets.INTEGRATION_TEST_FAILURE_DATABASE_ID }}
INTEGRATION_TEST_CHUNKED_DATABASE_ID: ${{ secrets.INTEGRATION_TEST_CHUNKED_DATABASE_ID }}

steps:

Expand Down Expand Up @@ -137,6 +138,7 @@ jobs:
-e INTEGRATION_TEST_CLOUDSYNC_ADDRESS="${{ env.INTEGRATION_TEST_CLOUDSYNC_ADDRESS }}" \
-e INTEGRATION_TEST_OFFLINE_DATABASE_ID="${{ env.INTEGRATION_TEST_OFFLINE_DATABASE_ID }}" \
-e INTEGRATION_TEST_FAILURE_DATABASE_ID="${{ env.INTEGRATION_TEST_FAILURE_DATABASE_ID }}" \
-e INTEGRATION_TEST_CHUNKED_DATABASE_ID="${{ env.INTEGRATION_TEST_CHUNKED_DATABASE_ID }}" \
alpine:latest \
tail -f /dev/null
docker exec alpine sh -c "apk update && apk add --no-cache gcc make curl sqlite openssl-dev musl-dev linux-headers"
Expand Down Expand Up @@ -212,6 +214,7 @@ jobs:
export INTEGRATION_TEST_CLOUDSYNC_ADDRESS="$INTEGRATION_TEST_CLOUDSYNC_ADDRESS"
export INTEGRATION_TEST_OFFLINE_DATABASE_ID="$INTEGRATION_TEST_OFFLINE_DATABASE_ID"
export INTEGRATION_TEST_FAILURE_DATABASE_ID="$INTEGRATION_TEST_FAILURE_DATABASE_ID"
export INTEGRATION_TEST_CHUNKED_DATABASE_ID="$INTEGRATION_TEST_CHUNKED_DATABASE_ID"
$(make test PLATFORM=$PLATFORM ARCH=$ARCH -n)
EOF
echo "::endgroup::"
Expand Down
229 changes: 228 additions & 1 deletion API.md

Large diffs are not rendered by default.

17 changes: 17 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,23 @@ All notable changes to this project will be documented in this file.

The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/).

## [Unreleased]

### Added

- **Chunked payload generation** via `cloudsync_payload_chunks()`, available as a SQLite virtual table and as a PostgreSQL set-returning function. The API emits transport-sized payload chunks and transparently fragments oversized BLOB/TEXT values into v3 fragment payloads.
- **`payload_max_chunk_size` global setting** for controlling generated chunk size. The default is 5 MB and values below the 256 KB technical minimum are clamped.
- **`exclude_filter_site_id` argument** for `cloudsync_payload_chunks()`. When set, the function streams changes from every site **except** `filter_site_id`, which is what the `/check` download path needs (a peer must not receive its own changes back). The default (omitted/`false`) preserves the existing single-site behavior. Passing the flag without a `filter_site_id` is an error.
- **`cloudsync_uuid_text()` / `cloudsync_uuid_blob()`** scalar functions on both SQLite and PostgreSQL, converting between the 16-byte binary `site_id` and its canonical UUID string. `cloudsync_uuid_text()` takes an optional `dash_format` argument (default `true`); `cloudsync_uuid_blob()` accepts dashed or undashed, case-insensitive input. These let string-based callers (e.g. the `/check` endpoint) pass a `site_id` to `cloudsync_payload_chunks()`.
- **Payload chunking documentation** in `API.md` and `PERFORMANCE.md`, including the explicit memory note that chunking bounds transport payloads but the database must still materialize a completed single BLOB/TEXT value when it is applied.
- **PostgreSQL `1.0 -> 1.1` upgrade script** (`migrations/cloudsync--1.0--1.1.sql`) for the new chunked-payload SQL surface, so existing deployments can `ALTER EXTENSION cloudsync UPDATE`.

### Changed

- `cloudsync_payload_apply()` now accepts legacy payloads, monolithic payloads, and v3 fragment payloads without enforcing the local `payload_max_chunk_size`, preserving compatibility between peers with different settings.
- `cloudsync_network_send_changes()` now streams outgoing changes through `cloudsync_payload_chunks()` instead of first building one monolithic payload. This bounds transport payload size for the built-in network path and lets large rowsets or oversized BLOB/TEXT values flow through the same `/apply` endpoint as regular payloads.
- The chunked-download receive path advances the local receive checkpoint (`check_dbversion` / `check_seq`) **only after a chunk stream has been fully applied**, jumping straight to the stream watermark — never into the middle of a source `db_version`. This mirrors the send path and ensures a stop between chunks cannot skip the un-applied rows of a `db_version` split across chunks on the next `/check` (the server resumes on `db_version > since`, with no intra-version cursor). `cloudsync_payload_apply()` no longer advances the receive checkpoint per applied chunk; the built-in network `/check` path drives it from the server's watermark and final-chunk signal, and falls back to the previous monolithic behavior when the server sends no watermark. Re-delivered rows remain idempotent.

## [1.0.20] - 2026-05-26

### Changed
Expand Down
12 changes: 9 additions & 3 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -86,6 +86,12 @@ COV_FILES = $(filter-out $(SRC_DIR)/lz4.c $(NETWORK_DIR)/network.c $(SQLITE_IMPL
CURL_LIB = $(CURL_DIR)/$(PLATFORM)/libcurl.a
TEST_TARGET = $(patsubst %.c,$(DIST_DIR)/%$(EXE), $(notdir $(TEST_SRC)))

# Build curl hermetically: neutralize the developer's ambient build env so
# curl's ./configure compile tests aren't broken by overrides leaking in
# (e.g. exported LDFLAGS/CPPFLAGS/LIBS pointing at Homebrew). Build flags for
# curl are supplied explicitly via CURL_CONFIG.
CURL_CONFIG_ENV = LDFLAGS= CPPFLAGS= LIBS= CFLAGS=

# Platform-specific settings
ifeq ($(PLATFORM),windows)
TARGET := $(DIST_DIR)/cloudsync.dll
Expand Down Expand Up @@ -185,7 +191,7 @@ endif
T_LDFLAGS += -fprofile-arcs -ftest-coverage
endif

ifdef SYNC_BENCH_DEBUG
ifdef NETWORK_TRACE
CFLAGS += -DCLOUDSYNC_NETWORK_TRACE
endif

Expand Down Expand Up @@ -292,7 +298,7 @@ sync-bench: $(TARGET) $(DIST_DIR)/sync_bench$(EXE)
./$(DIST_DIR)/sync_bench$(EXE)

sync-bench-debug:
$(MAKE) SYNC_BENCH_DEBUG=1 sync-bench
$(MAKE) NETWORK_TRACE=1 sync-bench

OPENSSL_TARBALL = $(OPENSSL_DIR)/$(OPENSSL_VERSION).tar.gz

Expand Down Expand Up @@ -326,7 +332,7 @@ else
unzip $(CURL_DIR)/src/curl.zip -d $(CURL_DIR)/src/.
endif

cd $(CURL_SRC) && ./configure \
cd $(CURL_SRC) && $(CURL_CONFIG_ENV) ./configure \
--without-libpsl \
--disable-alt-svc \
--disable-ares \
Expand Down
17 changes: 11 additions & 6 deletions PERFORMANCE.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ SELECT ... FROM cloudsync_changes WHERE db_version > <last_synced_version>

Each metadata table has an **index on `db_version`**, so payload generation scales primarily with the number of new changes, plus a small per-synced-table overhead to construct the `cloudsync_changes` query. It does not diff the full dataset. In SQLite, each changed column also performs a primary-key lookup in the base table to retrieve the current value.

The resulting payload is LZ4-compressed before transmission.
The legacy `cloudsync_payload_encode()` API builds one monolithic LZ4-compressed payload before transmission. For large deltas, `cloudsync_payload_chunks()` can be used instead: it streams a sequence of payload chunks bounded by the `payload_max_chunk_size` setting (default 5 MB, minimum 256 KB). If a single encoded BLOB/TEXT value is larger than the chunk budget, the value is split into transparent v3 fragments and reassembled by `cloudsync_payload_apply()` on the receiver.

#### Pull: Payload Application

Expand Down Expand Up @@ -69,7 +69,7 @@ When the application runs sync off the main thread, perceived latency depends on

- **Sync interval**: How often the app triggers a push/pull cycle. More frequent syncs mean smaller deltas (smaller D) and faster individual sync operations, at the cost of more network round-trips.
- **Network latency**: The round-trip time to the sync server. LZ4 compression reduces payload size, but latency is dominated by the network hop itself for small deltas.
- **Payload size**: Proportional to D x average column value size. Large BLOBs or TEXT values will increase transfer time linearly.
- **Payload size**: Proportional to D x average column value size. Large BLOBs or TEXT values will increase transfer time linearly. Use `cloudsync_payload_chunks()` when transport payloads may be large; it limits each generated transport payload but does not change the size of the final database value.

The extension does not impose a sync schedule -- the application controls when and how often to sync. A typical pattern is to sync on a timer (e.g., every 5-30 seconds) or on specific events (app foreground, user action).

Expand Down Expand Up @@ -118,7 +118,11 @@ Normal application reads are not directly instrumented by the extension. No trig

When a new device syncs for the first time (`db_version = 0`), the push payload contains the **entire dataset**: every column of every row across all synced tables. The payload size is proportional to `N * C` (total rows times columns).

The payload is built entirely in memory, starting with a 512 KB buffer (`CLOUDSYNC_PAYLOAD_MINBUF_SIZE` in `src/cloudsync.c`) and growing via `realloc` as needed. Peak memory usage is at least the full uncompressed payload size and can be higher during compression. For a database with 1 million rows and 10 columns of average 50 bytes each, the uncompressed payload could reach ~500 MB before LZ4 compression.
With the legacy `cloudsync_payload_encode()` API, the payload is built entirely in memory, starting with a 512 KB buffer (`CLOUDSYNC_PAYLOAD_MINBUF_SIZE` in `src/cloudsync.c`) and growing via `realloc` as needed. Peak memory usage is at least the full uncompressed payload size and can be higher during compression. For a database with 1 million rows and 10 columns of average 50 bytes each, the uncompressed payload could reach ~500 MB before LZ4 compression.

For large initial syncs, prefer `cloudsync_payload_chunks()`. It keeps each generated transport payload bounded by `payload_max_chunk_size` and can fragment a single oversized BLOB/TEXT column across multiple v3 fragment payloads. This prevents the transport payload itself from growing without bound and avoids constructing a monolithic v2 payload during v3 apply.

Important limitation: chunking does **not** make a single database cell streamable all the way into the storage engine. When the last fragment of a very large BLOB/TEXT value arrives, the receiver must still materialize the completed value once in order to bind/store it in the destination database. Size `payload_max_chunk_size` for transport safety, but size application memory limits for the largest individual value you allow.

Subsequent syncs are incremental (proportional to D, changes since the last sync), so the first sync is the expensive one. Applications with large datasets should plan for this -- for example, by seeding new devices from a database snapshot rather than syncing from scratch.

Expand Down Expand Up @@ -185,6 +189,7 @@ CloudSync: sync_time ~ O(D) -- grows with changes since last sy
2. **`db_version` index**: Enables efficient range scans for delta extraction.
3. **Deferred batch merge**: Column changes for the same primary key are accumulated and flushed as a single SQL statement.
4. **Prepared statement caching**: Merge statements are compiled once and reused across rows.
5. **LZ4 compression**: Reduces payload size for network transfer.
6. **Per-column tracking**: Only changed columns are included in the sync payload, not entire rows.
7. **Early exit on stale data**: The CLS algorithm skips rows where the incoming causal length is lower than the local one, avoiding unnecessary column-level comparisons.
5. **Chunked payload generation**: `cloudsync_payload_chunks()` bounds transport payload size and handles oversized single values with transparent v3 fragments.
6. **LZ4 compression**: Reduces payload size for network transfer.
7. **Per-column tracking**: Only changed columns are included in the sync payload, not entire rows.
8. **Early exit on stale data**: The CLS algorithm skips rows where the incoming causal length is lower than the local one, avoiding unnecessary column-level comparisons.
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -219,6 +219,7 @@ See the full guide: **[Row-Level Security Documentation](./docs/row-level-securi
## Documentation

- **[API Reference](./API.md)**: all functions, parameters, and examples
- **[Performance & Overhead](./PERFORMANCE.md)**: sync cost model, payload chunking, and large-value memory notes
- **[Installation Guide](./docs/installation.md)**: platform-specific setup (Swift, Android, Expo, React Native, Flutter, WASM)
- **[Block-Level LWW Guide](./docs/block-lww.md)**: line-level text merge for markdown and documents
- **[Row-Level Security Guide](./docs/row-level-security.md)**: multi-tenant access control with server-enforced policies
Expand Down
1 change: 1 addition & 0 deletions docker/postgresql/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ FROM postgres:${POSTGRES_TAG}
# and install the matching server-dev package
RUN apt-get update && apt-get install -y \
build-essential \
postgresql-contrib-${PG_MAJOR} \
postgresql-server-dev-${PG_MAJOR} \
git \
make \
Expand Down
4 changes: 3 additions & 1 deletion docker/postgresql/Dockerfile.debug
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,9 @@ RUN set -eux; \
cd /usr/src/postgresql-17; \
./configure --enable-debug --enable-cassert --without-icu CFLAGS="-O0 -g3 -fno-omit-frame-pointer"; \
make -j"$(nproc)"; \
make install
make install; \
make -C contrib/dblink -j"$(nproc)"; \
make -C contrib/dblink install

ENV PATH="/usr/local/pgsql/bin:${PATH}"
ENV LD_LIBRARY_PATH="/usr/local/pgsql/lib:${LD_LIBRARY_PATH}"
Expand Down
4 changes: 3 additions & 1 deletion docker/postgresql/Dockerfile.debug-no-optimization
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,9 @@ RUN set -eux; \
cd /usr/src/postgresql-17; \
./configure --enable-debug --enable-cassert --without-icu CFLAGS="-O0 -g3 -fno-omit-frame-pointer"; \
make -j"$(nproc)"; \
make install
make install; \
make -C contrib/dblink -j"$(nproc)"; \
make -C contrib/dblink install

ENV PATH="/usr/local/pgsql/bin:${PATH}"
ENV LD_LIBRARY_PATH="/usr/local/pgsql/lib:${LD_LIBRARY_PATH}"
Expand Down
Loading
Loading