Skip to content

Enhancement of Weaviate migration script#691

Open
sorphwer wants to merge 3 commits intolanggenius:mainfrom
sorphwer:main
Open

Enhancement of Weaviate migration script#691
sorphwer wants to merge 3 commits intolanggenius:mainfrom
sorphwer:main

Conversation

@sorphwer
Copy link

@sorphwer sorphwer commented Mar 1, 2026

This PR enhanced the script based on our internal experiment.

Key improvement:
Fixes the following failure scenarios in the old script:

  1. Schema type mismatch: Old script copies properties as-is, preserving uuid type for document_id/doc_id. Dify expects text type, so the migrated collection appears successful but Dify fails at runtime.

  2. UUID object insertion failure: When source collection has uuid-typed fields, the Weaviate client returns Python UUID objects. Writing these into text-typed fields causes batch insert errors, leading to data loss or migration abort.

  3. moduleConfig rejection: Stale moduleConfig on chunk_index from older Weaviate versions can cause collection creation to fail on newer Weaviate, aborting migration entirely.

  4. Partial migration blindspot: Collections already migrated for vectorConfig but still carrying wrong property types were skipped with "NEW SCHEMA (skip)", leaving silent incompatibilities.

Handle uuid→text conversion for document_id/doc_id and remove spurious
moduleConfig from chunk_index during schema migration. This fixes
property type incompatibilities that could cause issues even when
vectorConfig is already correct.

Fixes the following failure scenarios in the old script:

1. Schema type mismatch: Old script copies properties as-is, preserving
   uuid type for document_id/doc_id. Dify expects text type, so the
   migrated collection appears successful but Dify fails at runtime.

2. UUID object insertion failure: When source collection has uuid-typed
   fields, the Weaviate client returns Python UUID objects. Writing these
   into text-typed fields causes batch insert errors, leading to data
   loss or migration abort.

3. moduleConfig rejection: Stale moduleConfig on chunk_index from older
   Weaviate versions can cause collection creation to fail on newer
   Weaviate, aborting migration entirely.

4. Partial migration blindspot: Collections already migrated for
   vectorConfig but still carrying wrong property types were skipped
   with "NEW SCHEMA (skip)", leaving silent incompatibilities.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
@sorphwer sorphwer requested a review from RiskeyL as a code owner March 1, 2026 03:54
@dosubot dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. bug Something isn't working labels Mar 1, 2026
@sorphwer sorphwer requested a review from ZhouhaoJiang March 1, 2026 03:57
sorphwer and others added 2 commits March 1, 2026 11:57
Document how to configure Weaviate connection for both in-container
and local (port-forward) scenarios, and clarify derived values.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Reorder replace_old_collection to prevent data loss on failure:
- Fetch schema BEFORE deleting anything
- Wrap data copy in try/except to preserve migrated collection on error
- Add count verification after copy, keep migrated as backup on mismatch
- Only delete the migrated collection after full verification passes
- Print recovery instructions (collection name) on every failure path

Co-Authored-By: Claude Opus 4.6 <[email protected]>
@RiskeyL
Copy link
Collaborator

RiskeyL commented Mar 1, 2026

Hi @DhruvGorasiya, could you please take a look at this PR and review the updates to the migration script?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working size:L This PR changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants