feat(migrations): add onDuplicate option to CSV/JSON/Appwrite imports#11910
feat(migrations): add onDuplicate option to CSV/JSON/Appwrite imports#11910premtsd-code wants to merge 45 commits into1.9.xfrom
Conversation
Exposes two new optional boolean params on the three migration
creation endpoints so CSV / JSON / appwrite-to-appwrite imports can
choose how to handle rows whose IDs already exist at the destination.
Endpoints updated (app/controllers/api/migrations.php):
- POST /v1/migrations/appwrite
- POST /v1/migrations/csv/imports
- POST /v1/migrations/json/imports
Parameter semantics:
- overwrite=true -> destination uses upsertDocuments instead of
createDocuments; existing rows are replaced
with imported values
- skip=true -> destination wraps createDocuments in
skipDuplicates; existing rows are preserved
unchanged, duplicate-id rows silently no-op
- both false -> default; fails fast on DuplicateException
(original behavior, unchanged)
- both true -> overwrite wins (upsert subsumes skip)
Both params are stored in the migration Document's options array
(matches the existing pattern for destination behavior config like
path, size, delimiter, bucketId, etc.) and read back in the worker's
processDestination() to instantiate DestinationAppwrite with the
new constructor params.
Feature-branch note: depends on utopia-php/migration#feat/skip-duplicates
(DestinationAppwrite constructor params) which in turn depends on
utopia-php/database#852 (skipDuplicates scope guard). composer.json is
temporarily pinned to dev-feat/skip-duplicates and
dev-csv-import-upsert-v2 respectively; both must be reset to proper
release versions once the upstream PRs merge.
Three new test methods in MigrationsBase, following the existing testCreateCSVImport setup pattern: - testCreateCSVImportSkipDuplicates Seeds documents.csv, mutates one row, re-imports with skip=true. Asserts the mutated row keeps its mutated value (not overwritten by the CSV's original value) and the row count stays at 100. - testCreateCSVImportOverwrite Seeds documents.csv, mutates one row, re-imports with overwrite=true. Asserts the mutated row is restored to the CSV's original value (proving upsertDocuments actually replaced the row) and the row count stays at 100. - testCreateCSVImportDefaultFailsOnDuplicate Regression guard: re-imports documents.csv with no flags. Asserts the migration goes to status=failed with errors populated, proving the default duplicate-throws behavior is preserved. All three share a prepareCsvImportFixture() helper that sets up database + table (name, age columns) + bucket + documents.csv upload. Returns the known first-row id + original name/age so tests can mutate and assert on a predictable row. Reuses the existing documents.csv fixture (100 rows with \$id as the first column). No new fixture files needed.
🔄 PHP-Retry SummaryFlaky tests detected across commits: Commit
|
| Test | Retries | Total Time | Details |
|---|---|---|---|
LegacyTransactionsCustomClientTest::testBulkOperations |
1 | 240.35s | Logs |
VectorsDBConsoleClientTest::testGetCollectionLogs |
1 | 4ms | Logs |
DatabasesConsoleClientTest::testGetCollectionLogs |
1 | 4ms | Logs |
Commit 443f5cf - 1 flaky test
| Test | Retries | Total Time | Details |
|---|---|---|---|
TablesDBCustomServerTest::testListDocumentsWithCache |
1 | 770ms | Logs |
Commit 40aa8cb - 4 flaky tests
| Test | Retries | Total Time | Details |
|---|---|---|---|
LegacyCustomClientTest::testUpdateWithExistingRelationships |
1 | 240.51s | Logs |
LegacyCustomServerTest::testCreatedBetween |
1 | 240.21s | Logs |
VectorsDBConsoleClientTest::testGetCollectionLogs |
1 | 3ms | Logs |
DatabasesConsoleClientTest::testGetCollectionLogs |
1 | 8ms | Logs |
Commit 8f32d01 - 2 flaky tests
| Test | Retries | Total Time | Details |
|---|---|---|---|
VectorsDBConsoleClientTest::testGetCollectionLogs |
1 | 8ms | Logs |
DatabasesConsoleClientTest::testGetCollectionLogs |
1 | 7ms | Logs |
Commit b79637e - 26 flaky tests
| Test | Retries | Total Time | Details |
|---|---|---|---|
UsageTest::testFunctionsStats |
1 | 10.23s | Logs |
UsageTest::testPrepareSitesStats |
1 | 7ms | Logs |
UsageTest::testEmbeddingsTextUsageDoesNotBreakProjectUsage |
1 | 5ms | Logs |
DatabasesStringTypesTest::testDeleteStringTypeColumns |
1 | 30.05s | Logs |
TablesDBCustomServerTest::testListAttributes |
1 | 243.16s | Logs |
TablesDBCustomServerTest::testListIndexes |
1 | 64ms | Logs |
TablesDBCustomServerTest::testCreateDocument |
1 | 31ms | Logs |
TablesDBCustomServerTest::testUpsertDocument |
1 | 18ms | Logs |
TablesDBCustomServerTest::testListDocuments |
1 | 20ms | Logs |
TablesDBCustomServerTest::testListDocumentsWithCache |
1 | 20ms | Logs |
TablesDBCustomServerTest::testListDocumentsCacheBustedByAttributeChange |
1 | 22ms | Logs |
TablesDBCustomServerTest::testListDocumentsCachedWithoutSelectQuery |
1 | 23ms | Logs |
TablesDBCustomServerTest::testListDocumentsCachePurgedByUpdate |
1 | 21ms | Logs |
TablesDBCustomServerTest::testGetDocument |
1 | 21ms | Logs |
TablesDBCustomServerTest::testGetDocumentWithQueries |
1 | 17ms | Logs |
TablesDBCustomServerTest::testQueryBySequenceType |
1 | 19ms | Logs |
TablesDBCustomServerTest::testListDocumentsAfterPagination |
1 | 26ms | Logs |
TablesDBCustomServerTest::testListDocumentsBeforePagination |
1 | 23ms | Logs |
TablesDBCustomServerTest::testListDocumentsLimitAndOffset |
1 | 20ms | Logs |
TablesDBCustomServerTest::testDocumentsListQueries |
1 | 19ms | Logs |
TablesDBCustomServerTest::testUpdateDocument |
1 | 18ms | Logs |
TablesDBCustomServerTest::testDeleteDocument |
1 | 17ms | Logs |
TablesDBCustomServerTest::testDefaultPermissions |
1 | 20ms | Logs |
TablesDBCustomServerTest::testPersistentCreatedAt |
1 | 23ms | Logs |
TablesDBCustomServerTest::testNotSearch |
1 | 26ms | Logs |
TablesDBCustomServerTest::testCreatedBetween |
1 | 240.26s | Logs |
Note: Flaky test results are tracked for the last 5 commits
✨ Benchmark resultsComparing 1.9.x (before) to feat/skip-duplicates (after). Before
After
Delta
Top API waits
|
This comment has been minimized.
This comment has been minimized.
… database to 5.3.22
Greptile SummaryThis PR adds an Two P1 issues remain open: Confidence Score: 3/5Not safe to merge — two P1 defects mean the feature is non-functional for CSV/JSON imports and the dependency lock is unreproducible. Two P1 findings (DestinationCSV/DestinationJSON not receiving onDuplicate, and dev-branch pin) prevent a clean merge. Multiple P1s pull the score below the 4/5 ceiling.
Important Files Changed
|
8a00770 to
f3c2502
Compare
This comment has been minimized.
This comment has been minimized.
7057189 to
d0603c4
Compare
…lerance utopia-php/migration's DestinationAppwrite now handles schema tolerance on re-migration (PR #171 on feat/skip-duplicates): it pre-checks destination `_metadata` for each database / table / column / index and tolerates in Skip/Upsert mode. Re-runs no longer produce schema-level errors, so the E2E tests can drop the status-tolerant workaround and assert strict 'completed' outcomes. Changes: - composer.json: pin utopia-php/migration to dev-feat/skip-duplicates (aliased to 1.9.99 for stability resolution). Will be replaced with a fixed 1.10.0 tag once the migration PR lands. - testAppwriteMigrationRowsOnDuplicate: replace the tolerant runMigrationAssertingRowSuccess helper with performMigrationSync on the Skip and Upsert re-runs. Asserts 'completed' status on every run, destination row content matches the expected value per mode (Mutated preserved on Skip, Original restored on Upsert). Helper method removed. - testAppwriteMigrationReRunIsIdempotent (new): seeds two rows on source, runs the migration three times back-to-back (fresh, Skip re-run, Upsert re-run) against unchanged source data, asserts strict 'completed' on every run and row content is stable across all three. Exercises the schema-tolerance path end-to-end: every database/table/column on destination already exists with a matching spec, so DestinationAppwrite's pre-check returns Tolerate for every resource.
Branch is iterating on utopia-php/migration's re-migration tolerance. Other test matrices (unit, general, abuse, screenshots, benchmark, and every other e2e service) add ~30+ minutes to CI without exercising code this PR touches. Restrict the matrix to the Migrations service and skip the unrelated test jobs until the migration work is ready to merge. All jobs marked with 'TEMP:' comments + 'if: false' — revert to the full matrix before merging to main. Static analysis (lint, phpstan, composer audit, specs, locale, security) still runs on every PR push.
This comment has been minimized.
This comment has been minimized.
Picks up the PR #171 refactor + unit tests: - resolveSchemaAction decision point consolidation - deleteAttributeCompletely primitive (two-way cleanup in one place) - Hardened sourceIsNewer against MySQL zero-date sentinel - 14 unit tests locking the decision matrix
This comment has been minimized.
This comment has been minimized.
Picks up the UpdateInPlace branch — database/table metadata drift is now reconciled on Upsert-newer (renames, enable toggles, table permissions / documentSecurity) via updateDocument, without touching child rows.
… dest drift
Two new E2E tests exercising the schema-tolerance UpdateInPlace path
added in utopia-php/migration's DestinationAppwrite.
testAppwriteMigrationUpsertUpdatesContainerMetadata (positive):
- Fresh migration copies source database + table + column + row to dest.
- Mutates source database name (PUT /databases/:id) and table
name/permissions/rowSecurity/enabled (PUT /tablesdb/:db/tables/:id).
- One-second sleep before mutation ensures source's $updatedAt is
strictly greater than dest's at second granularity (strtotime
comparison).
- Upsert re-migration asserts:
- 'completed' status.
- dest database name matches source's new name.
- dest table name / enabled / rowSecurity match source's new values.
- child row's 'name' attribute is untouched — UpdateInPlace only
rewrites container metadata, not rows.
testAppwriteMigrationSkipPreservesContainerDrift (negative):
- Fresh migration, then mutate BOTH dest (simulating ops tightening
permissions post-migration) and source (divergence).
- Skip re-migration asserts dest kept its tightened values — Skip's
strict "don't touch" contract protects dev→prod cutover workflows
from accidentally wiping ops-side drift on schema re-sync.
Both tests use performMigrationSync for strict 'completed' assertions.
Runtime ~18s combined. Existing testAppwriteMigrationRowsOnDuplicate
and testAppwriteMigrationReRunIsIdempotent regression-tested locally.
… + nullable timestamps)
… to 'table' vocabulary)
testAppwriteMigrationUpsertDropsOrphanColumn: adds a column directly on destination (simulating post-rename orphan or dest-only drift), runs Upsert, asserts the orphan is dropped and source-declared column survives. Covers the per-table orphan cleanup fired inside createRecord before rows land. testAppwriteMigrationSkipKeepsOrphanColumn: same setup, Skip mode. Asserts the orphan survives, proving the cleanup is correctly gated to Upsert only.
This comment has been minimized.
This comment has been minimized.
Three new e2e tests in MigrationsBase covering the schema reconciliation paths added in utopia-php/migration: - testAppwriteMigrationUpsertUpdatesAttributeInPlace: PATCH source required/default (SDK-reachable), assert dest reflects change and the pre-existing row's column data is preserved (drop+recreate would have wiped it). - testAppwriteMigrationSkipPreservesAttributeDrift: leaf-level analog of the existing container-drift Skip test — guards Skip from ever consulting timestamps. - testAppwriteMigrationUpsertUpdatesRelationshipOnDeleteInPlace: PATCH source onDelete cascade->restrict (SDK-reachable), assert dest reflects change and structural fields (relationType, twoWay) untouched. composer.lock: utopia-php/migration 6e6f825 -> a36d95f (mechanical helpers replacement, parseTimestamp dedup, match dispatch, comment trim).
…ration The previous version of this test created a one-way relationship, which falls through to DropAndRecreate (one-way + onDelete change is gated off in updateRelationshipInPlace because utopia's updateRelationship partner-cascade throws on one-way). It never exercised the in-place path it was named for. Converted to two-way (parents.kids ↔ children.parent), and asserted both parent- and partner-side onDelete on dest. Partner-side assertion is the regression guard for the partner-meta refresh that was missing from updateRelationshipInPlace. composer.lock: utopia-php/migration a36d95f -> c76de9a (partner-side onDelete sync fix).
testAppwriteMigrationUpsertTwoWayRecreateSkipsPartnerSide exercises the DropAndRecreate path on a two-way relationship that the partner- side pair-key dedup guards. Source recreates the relationship between runs, forcing parent-side createdAt diff. Test asserts the migration completes cleanly and partner-table rows survive — without dedup, the partner pass re-fires DropAndRecreate and destroys those rows. composer.lock: utopia-php/migration c76de9a -> c13e77d (partner-side pair-key dedup restored).
Two coverage gaps closed: - testAppwriteMigrationUpsertOneWayRelationshipDropAndRecreate exercises the path that updateRelationshipInPlace gates off: one-way + onDelete change → returns false → falls through to DropAndRecreate via deleteRelationship. Coverage was lost when testAppwriteMigrationUpsertUpdatesRelationshipOnDeleteInPlace was converted to two-way to actually hit the in-place path. - testAppwriteMigrationUpsertAttributeRecreateDropsAndRecreates pins the createdAt-different leaf path: source drops + recreates the attribute (createdAt advances), re-migration must DropAndRecreate on dest and re-flow the row data through the row pass. Companion to testAppwriteMigrationUpsertUpdatesAttributeInPlace which covers the same-createdAt + newer-updatedAt path. Migration package already at 09c1b21 (the maintainability commit) from the previous lock bump — no further composer.lock change needed.
testAppwriteMigrationUpsertSameSpecRecreateTolerates exercises the new spec-match guard added in utopia-php/migration c8d1789. Source drops + recreates a column with the EXACT same spec as before; createdAt advances but specs match → action is forced to Tolerate. Asserts dest column's $createdAt stays at first-migration value (proving Tolerate, not DropAndRecreate). Row pass under Upsert still propagates source's new row value. Companion to testAppwriteMigrationUpsertAttributeRecreateDropsAndRecreates which exercises the spec-DIFFERS path: same precondition (drop + recreate), different outcome (DropAndRecreate vs Tolerate) gated on spec equality. composer.lock: utopia-php/migration 24fd23b -> c8d1789 (spec-match guard).
After dropping createdAt from resolveSchemaAction, source-side recreate no longer routes through DropAndRecreate via the outer decision. The inner fallthrough still drops+recreates when the spec diff is a non-SDK change, so this test now toggles 'array' (a non-SDK field) on recreate to actually exercise the drop+recreate path it pins. Also clarifies the two-way recreate test's docblock — with createdAt gone and identical spec on recreate, it exercises spec-match + pair-key dedup (both tolerate paths) rather than parent-side drop. End-state assertions unchanged.
… before count/validator)
Maintainer review on utopia-php/migration#171 renamed OnDuplicate::Upsert -> OnDuplicate::Overwrite (value 'upsert' -> 'overwrite') to align with Appwrite terms (skip / overwrite / fail). Applying the cross-repo ripple here: - app/controllers/api/migrations.php: 3 endpoint param descriptions updated ('upsert' -> 'overwrite' in the help text). The validator still uses OnDuplicate::values() so it auto-picks up the new value. - tests/e2e/Services/Migrations/MigrationsBase.php: all 'onDuplicate' => 'upsert' -> 'overwrite'; method names testAppwriteMigrationUpsert* -> testAppwriteMigrationOverwrite*; comments / assertion messages / local var names switched. - Left untouched: utopia's upsertDocuments operation, transaction TransactionState 'upsert' action, Operation validator — those refer to the database-level upsert primitive, not the OnDuplicate enum. composer.lock: utopia-php/migration 7d71505 -> b8ae7bc.
# Conflicts: # app/controllers/api/migrations.php # composer.lock
| $this->dbForProject, | ||
| $this->getDatabasesDB, | ||
| Config::getParam('collections', [])['databases']['collections'], | ||
| OnDuplicate::tryFrom($options['onDuplicate'] ?? '') ?? OnDuplicate::Fail, |
There was a problem hiding this comment.
onDuplicate parameter never reaches the worker — controller files are missing from this PR
The three controller files that must accept and store onDuplicate in the migration document's options array were not changed in this PR:
src/Appwrite/Platform/Modules/Migrations/Http/Migrations/Appwrite/Create.php— nooptionskey at all in the created documentsrc/Appwrite/Platform/Modules/Migrations/Http/Migrations/CSV/Imports/Create.php—optionsonly containspathandsizesrc/Appwrite/Platform/Modules/Migrations/Http/Migrations/JSON/Imports/Create.php—optionsonly containspathandsize
Because none of these controllers accept or persist onDuplicate, $options['onDuplicate'] ?? '' on line 295 will always be an empty string, OnDuplicate::tryFrom('') returns null, and the worker always uses OnDuplicate::Fail. The feature is completely non-functional as shipped regardless of what value a client sends.
| 'endpoint' => $this->webEndpoint, | ||
| 'projectId' => $this->getProject()['$id'], | ||
| 'apiKey' => $this->getProject()['apiKey'], | ||
| 'onDuplicate' => 'overwrite', |
There was a problem hiding this comment.
'onDuplicate' => 'overwrite' does not match the PR description's documented enum value
Every "upsert" test throughout this file uses 'onDuplicate' => 'overwrite' (lines 834, 910, 989, 1160, 1328, 2635, 3006), but the PR description states the valid values are "fail" | "skip" | "upsert". If the API's WhiteList validator is built from OnDuplicate::values() and 'upsert' is the actual enum backing value, every one of these tests will be rejected with a 400 validation error. If 'overwrite' is in fact the correct backing value, then the PR description and its summary table are wrong. The two must be reconciled before merge.
1.9.x reorganized app/controllers/api/migrations.php into the
Platform/Modules/Migrations structure but dropped the onDuplicate
param. After the merge, every e2e migration test that passed
'onDuplicate' => 'overwrite' would 400 since the param wasn't in the
allowlist anymore.
Restoring it on the three endpoints that take row-level conflict
behavior: Appwrite/Create, CSV/Imports/Create, JSON/Imports/Create.
Each:
- Imports OnDuplicate + WhiteList.
- Adds optional ->param('onDuplicate', OnDuplicate::Fail->value,
new WhiteList(OnDuplicate::values()), …).
- Threads $onDuplicate through the action signature.
- Stores it on the migration document's 'options' attribute so
Workers/Migrations.php can pick it up via
OnDuplicate::tryFrom($options['onDuplicate'] ?? '')
?? OnDuplicate::Fail.
Worker code already reads options['onDuplicate'] (unchanged) — no
edits needed there.
| "utopia-php/logger": "0.6.*", | ||
| "utopia-php/messaging": "0.22.*", | ||
| "utopia-php/migration": "1.9.*", | ||
| "utopia-php/migration": "dev-feat/skip-duplicates as 1.9.99", |
There was a problem hiding this comment.
Dev-branch pin still present despite PR description claiming tagged releases
composer.json still uses "dev-feat/skip-duplicates as 1.9.99" and composer.lock resolves to the live dev-branch commit d7c3d0536dbb9e37a4894cd0b3b0134a483e04e8. The PR description states "Both upstream deps are tagged releases — no dev-branch pins remain" and references utopia-php/migration@1.9.2, but neither the constraint nor the lock file reflects a tagged release. The stability-flags entry ("utopia-php/migration": 20) overrides prefer-stable: true, and any force-push to that branch would silently pull in a different, untested commit on the next composer install.
| "utopia-php/migration": "dev-feat/skip-duplicates as 1.9.99", | |
| "utopia-php/migration": "1.9.*", |
Summary
Exposes a single
onDuplicatestring param on the three migration creation endpoints so CSV / JSON / appwrite-to-appwrite imports can choose how to handle rows whose IDs already exist at the destination.Each now accepts optional
onDuplicate: "fail" | "skip" | "upsert"(default"fail").Parameter semantics
"fail"(default)createDocuments(), aborts on firstDuplicateException. Original behavior, unchanged."skip"createDocuments()wrapped inskipDuplicates()scope guard. Duplicate-id rows silently no-op at the adapter layer (INSERT IGNOREequivalent). Existing rows are preserved."upsert"upsertDocuments(). Existing rows are replaced with imported values.String +
WhiteListvalidator at the REST boundary — matches the existing Appwrite convention for enum-style params (priority,encryption,period,grant_type). Allowed values are drawn fromUtopia\Migration\Destinations\OnDuplicate::values()so the set is owned by the destination implementation.The param is stored in the migration Document's
optionsarray alongside existing fields (path,size, etc.), matching the existing pattern for destination behavior config. The worker'sprocessDestination()reads it back, converts the string viaOnDuplicate::tryFrom(...), and passes the enum toDestinationAppwrite's constructor.Scope
onDuplicateonly gates the row-write path inDestinationAppwrite(therowBufferflush increateRecord()). It does not affect other resource types — databases, tables, columns, users, teams, buckets, etc. still follow their existing duplicate-handling semantics (throw on conflict). Re-running an Appwrite→Appwrite migration with the full resource tree will still error on schema resources;onDuplicatecorrectly handles only the row-level flow.Changes
app/controllers/api/migrations.php— adds->param('onDuplicate', OnDuplicate::Fail->value, new WhiteList(OnDuplicate::values()), ...)to the Appwrite, CSV, and JSON migration create endpoints; stores inoptions; threads into action function signaturessrc/Appwrite/Platform/Workers/Migrations.php—processDestination()passesOnDuplicate::tryFrom($options['onDuplicate'] ?? '') ?? OnDuplicate::Failto the destination constructorcomposer.json— bumpsutopia-php/databaseto5.3.*andutopia-php/migrationto1.9.*(both tagged releases)tests/e2e/Services/Migrations/MigrationsBase.php— 7 E2E test methods (3 CSV, 3 JSON, 1 Appwrite→Appwrite) + JSON fixture helper + row-success poll helperE2E test coverage
CSV imports (3 tests)
testCreateCSVImportSkipDuplicates— importdocuments.csv(100 rows), mutate one row's age from 56 → 22, re-import withonDuplicate=skip. Asserts the mutated row keepsage=22and row count stays at 100.testCreateCSVImportOverwrite— same setup, re-import withonDuplicate=upsert. Asserts the row is restored to the CSV's originalage=56.testCreateCSVImportDefaultFailsOnDuplicate— regression guard. Re-import with noonDuplicateparam (defaults to"fail"). Asserts migration status isfailedwith errors populated.JSON imports (3 tests — same shape as CSV)
testCreateJSONImportSkipDuplicatestestCreateJSONImportOverwritetestCreateJSONImportDefaultFailsOnDuplicateAppwrite → Appwrite migration (1 test)
testAppwriteMigrationRowsOnDuplicate— seeds a source database/table/column/row, runs a first (full) migration to copy schema + row to the destination, mutates the destination row, then re-runs withonDuplicate=skip(expects row preserved) andonDuplicate=upsert(expects row restored to source value). Uses a schema-error-tolerant poll helper (runMigrationAssertingRowSuccess) because re-creating an existing database/table/column on destination always errors — those errors are orthogonal toonDuplicate, which only gates row writes. Assertions targetstatusCounters['row']directly rather than overall migration status.Fixture helpers
prepareCsvImportFixture/prepareJsonImportFixture— create database, table, columns (withstatus=availablewait), bucket, upload fixture file. Return the IDs + first-row metadata for the tests.runMigrationAssertingRowSuccess— Appwrite→Appwrite-specific poll helper that tolerates non-row errors.All tests verified locally against a full Appwrite stack (CSV + JSON: ~10s total, 153 assertions).
Dependencies
Both upstream deps are tagged releases — no dev-branch pins remain:
utopia-php/database@5.3.22— provides theskipDuplicates()scope guard (utopia-php/database#852)utopia-php/migration@1.9.2— provides theOnDuplicateenum andDestinationAppwriterefactor (utopia-php/migration#169)Test plan
composer validateclean (composer.json + composer.lock hash in sync)testAppwriteMigrationDatabasesRowand all sibling tests)