Skip to content

DO NOT MERGE Add script to compare MIxS schema releases using GitHub API#1026

Closed
turbomam wants to merge 9 commits intomainfrom
845-what-are-our-options-for-diffing-schema-changes
Closed

DO NOT MERGE Add script to compare MIxS schema releases using GitHub API#1026
turbomam wants to merge 9 commits intomainfrom
845-what-are-our-options-for-diffing-schema-changes

Conversation

@turbomam
Copy link
Copy Markdown
Member

  • Created diff_two_linkml_mixs_releases.py script to fetch release information
  • Script lists all GitHub releases with commit hashes and YAML files
  • Finds mixs.yaml files in approved directories (src/, model/)
  • Supports GitHub API authentication via local/.env file
  • Added python-dotenv and requests dependencies to pyproject.toml
  • Created local/.env.template for GitHub token setup

🤖 Generated with Claude Code

- Created diff_two_linkml_mixs_releases.py script to fetch release information
- Script lists all GitHub releases with commit hashes and YAML files
- Finds mixs.yaml files in approved directories (src/, model/)
- Supports GitHub API authentication via local/.env file
- Added python-dotenv and requests dependencies to pyproject.toml
- Created local/.env.template for GitHub token setup

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
Copilot AI review requested due to automatic review settings July 16, 2025 12:04
@turbomam turbomam linked an issue Jul 16, 2025 that may be closed by this pull request

This comment was marked as outdated.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Jul 16, 2025

PR Preview Action v1.6.2

🚀 View preview at
https://GenomicsStandardsConsortium.github.io/mixs/pr-preview/pr-1026/

Built to branch gh-pages at 2025-07-21 16:44 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

@turbomam turbomam requested a review from Copilot July 16, 2025 12:27

This comment was marked as outdated.

turbomam and others added 2 commits July 16, 2025 08:33
…ions

- Identify populated keys (non-None values) in old_schema.schema and new_schema.schema
- Compare keys between versions showing only-in-old, only-in-new, and common keys
- Print counts and key names without printing values to avoid huge lists

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
- Add GitHub token validation with regex patterns for different token types
- Add timeout parameters (10s) to all requests.get() calls to prevent hanging
- Replace hardcoded commit hashes with symbolic constants DEFAULT_OLD_COMMIT/DEFAULT_NEW_COMMIT
- Refactor main block to extract duplicated logic into build_release_info_dict() function
- Update commit documentation to clearly indicate comparison versions (mixs6.0.0 vs main 2025-07-14)

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
@turbomam turbomam requested a review from Copilot July 16, 2025 12:41
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Adds a new Python script to compare MIxS schema releases via the GitHub API, along with necessary dependency updates and an environment template.

  • Introduce diff_two_linkml_mixs_releases.py to fetch release metadata, locate mixs.yaml, and diff schema keys.
  • Update pyproject.toml to include python-dotenv and requests.
  • Provide local/.env.template for GitHub token setup.

Reviewed Changes

Copilot reviewed 3 out of 5 changed files in this pull request and generated 3 comments.

File Description
src/scripts/diff_two_linkml_mixs_releases.py New script to fetch and compare MIxS schema releases
pyproject.toml Added python-dotenv and requests dependencies
local/.env.template Template for setting GITHUB_TOKEN
Comments suppressed due to low confidence (3)

src/scripts/diff_two_linkml_mixs_releases.py:1

  • Core functions like validate_github_token, find_mixs_yaml_path, and API interactions currently lack unit tests. Consider adding tests to ensure these utilities work correctly and handle edge cases.
"""

src/scripts/diff_two_linkml_mixs_releases.py:321

  • [nitpick] The script relies on hard-coded default commits and has no CLI interface. Consider adding argparse support so users can specify old/new commit SHAs or tags at runtime instead of editing the code.
if __name__ == "__main__":

pyproject.toml:27

  • The script imports yaml and linkml_runtime, but neither PyYAML nor linkml-runtime are listed in the main dependencies. Consider adding pyyaml and linkml-runtime to ensure the script runs when installed.
python-dotenv = "^1.0.0"

Raises:
ValueError: If specified commits are not found or have no mixs.yaml file.
"""
releases = get_releases()
Copy link

Copilot AI Jul 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The logic to build release_info is duplicated in both load_schema_views and build_release_info_dict. Consider refactoring into a shared helper function to reduce duplication.

Copilot uses AI. Check for mistakes.
return None


def get_releases() -> List[Tuple[str, datetime]]:
Copy link

Copilot AI Jul 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The get_releases function fetches only the first page of GitHub releases. If there are more than 30 releases, some will be missed. Consider adding pagination support to retrieve all pages.

Copilot uses AI. Check for mistakes.
turbomam and others added 4 commits July 16, 2025 08:44
…efactor duplicated logic

- Add PyYAML ^6.0 dependency to pyproject.toml for explicit yaml import support
- Add pagination support to get_releases() function to fetch all releases (not just first 30)
- Refactor duplicated release_info building logic into build_full_release_info() helper function
- Improve intelligent is_populated() function to properly detect empty collections, strings, and LinkML objects
- Add keywords test case to validate proper population detection

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
…ties

- Add dynamic schema traversal using SchemaView all_* methods
- Implement recursive populated value detection for complex structures
- Add element-level difference detection for enums, classes, and slots
- Include structured pattern analysis for slot settings usage
- Add comprehensive filtering for expected name changes and inter-type refactoring
- Improve GitHub API handling with authentication and rate limiting
- Enhance output formatting with better diff visualization

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
@jfy133
Copy link
Copy Markdown
Collaborator

jfy133 commented Aug 11, 2025

Note I would suggest putting in draft mode rather than 'DO NOT MERGE' in the title ;)

@turbomam
Copy link
Copy Markdown
Member Author

Closing — this exploratory diff script has been superseded by PR #1115, which implements a unified mixs-legacy-diff CLI tool covering v4+ schema comparisons with CI tests, format-agnostic readers (Excel xls/xlsx + LinkML YAML), and proper handling of the schema path change at v6.2.0.

Keeping the branch for reference.

@turbomam turbomam closed this Mar 10, 2026
@turbomam turbomam deleted the 845-what-are-our-options-for-diffing-schema-changes branch March 27, 2026 20:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

what are our options for diffing schema changes?

3 participants