Enhancement of Weaviate migration script#691
Open
sorphwer wants to merge 3 commits intolanggenius:mainfrom
Open
Enhancement of Weaviate migration script#691sorphwer wants to merge 3 commits intolanggenius:mainfrom
sorphwer wants to merge 3 commits intolanggenius:mainfrom
Conversation
Handle uuid→text conversion for document_id/doc_id and remove spurious moduleConfig from chunk_index during schema migration. This fixes property type incompatibilities that could cause issues even when vectorConfig is already correct. Fixes the following failure scenarios in the old script: 1. Schema type mismatch: Old script copies properties as-is, preserving uuid type for document_id/doc_id. Dify expects text type, so the migrated collection appears successful but Dify fails at runtime. 2. UUID object insertion failure: When source collection has uuid-typed fields, the Weaviate client returns Python UUID objects. Writing these into text-typed fields causes batch insert errors, leading to data loss or migration abort. 3. moduleConfig rejection: Stale moduleConfig on chunk_index from older Weaviate versions can cause collection creation to fail on newer Weaviate, aborting migration entirely. 4. Partial migration blindspot: Collections already migrated for vectorConfig but still carrying wrong property types were skipped with "NEW SCHEMA (skip)", leaving silent incompatibilities. Co-Authored-By: Claude Opus 4.6 <[email protected]>
Document how to configure Weaviate connection for both in-container and local (port-forward) scenarios, and clarify derived values. Co-Authored-By: Claude Opus 4.6 <[email protected]>
Reorder replace_old_collection to prevent data loss on failure: - Fetch schema BEFORE deleting anything - Wrap data copy in try/except to preserve migrated collection on error - Add count verification after copy, keep migrated as backup on mismatch - Only delete the migrated collection after full verification passes - Print recovery instructions (collection name) on every failure path Co-Authored-By: Claude Opus 4.6 <[email protected]>
Collaborator
|
Hi @DhruvGorasiya, could you please take a look at this PR and review the updates to the migration script? |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR enhanced the script based on our internal experiment.
Key improvement:
Fixes the following failure scenarios in the old script:
Schema type mismatch: Old script copies properties as-is, preserving uuid type for document_id/doc_id. Dify expects text type, so the migrated collection appears successful but Dify fails at runtime.
UUID object insertion failure: When source collection has uuid-typed fields, the Weaviate client returns Python UUID objects. Writing these into text-typed fields causes batch insert errors, leading to data loss or migration abort.
moduleConfig rejection: Stale moduleConfig on chunk_index from older Weaviate versions can cause collection creation to fail on newer Weaviate, aborting migration entirely.
Partial migration blindspot: Collections already migrated for vectorConfig but still carrying wrong property types were skipped with "NEW SCHEMA (skip)", leaving silent incompatibilities.