Skip to content

Breaking Change: Adding Module Name to SimName Breaks Parsing Logic #1184

@fso42

Description

@fso42

FSO: claude test, Files are identified correctly, issues are sometimes wrong!

Problem Statement

A change was made to add the module name (modName) as the third component in simulation names in avaframe/com1DFA/com1DFA.py:2972-2984. This change will break existing functionality across multiple modules due to hardcoded assumptions about simName structure.

Current Implementation

# Location: avaframe/com1DFA/com1DFA.py:2972-2984
simName = "_".join(
    filter(
        None,
        [
            relNameSim,
            simHash,
            modName,              # NEWLY ADDED - THIS BREAKS THINGS
            defID,
            frictIndi or volIndi,
            row._asdict()["simTypeList"],
            cfgSim["GENERAL"]["modelType"],
        ],
    )
)

Structure Change

  • Before: relNameSim_simHash_defID_frictIndi_simType_modelType
  • After: relNameSim_simHash_modName_defID_frictIndi_simType_modelType

Impact Analysis

Critical Breaking Changes (Must Fix)

1. File Parsing Logic

File: avaframe/in3Utils/fileHandlerUtils.py:605-760

  • Issue: Array indexing assumes specific positions (infoParts[0] = simHash, infoParts[1] = modified indicator)
  • Result: All metadata extraction will be incorrect
  • Lines: 619, 623, 755, 757-759

2. Configuration Loading

File: avaframe/in3Utils/cfgUtils.py:530-540

  • Issue: simHash = infoParts[0] will get modName instead -> NOT CORRECT
  • Result: Configuration files won't load correctly
  • Lines: 540

3. AIMEC Analysis

File: avaframe/ana3AIMEC/dfa2Aimec.py:55-60

  • Issue: Mass balance file parsing reconstructs simName incorrectly
  • Result: AIMEC analysis will fail to match simulation files
  • Lines: 60

4. Component Count Validation

File: avaframe/in3Utils/fileHandlerUtils.py:757

  • Issue: if len(infoParts) == 6: expects exactly 6 components
  • Result: Metadata extraction for modified simulations will fail

Low Impact (Still Work)

5. Release Name Extraction

Files: Multiple files use simName.split('_')[0]

  • avaframe/ana5Utils/DFAPathGeneration.py:693
  • avaframe/ana1Tests/rotationTest.py:66,122
  • avaframe/log2Report/generateReport.py:256-261
  • Status: These continue to work (release name is still first component)

Required Fixes

Phase 1: Critical Fixes (Required for functionality)

  • Update fileHandlerUtils.py - Fix infoParts indexing throughout the file
  • Update cfgUtils.py - Fix simHash = infoParts[1] (was infoParts[0])
  • Update dfa2Aimec.py - Fix mass balance file parsing logic

Phase 2: Backward Compatibility (Recommended)

  • Implement Format Detection - Create parser that handles both old and new formats
  • Add migration utilities
  • Update all parsing functions to use new parser

Phase 3: Update Test Data

  • Update Benchmark Data - Update all hardcoded simNames in benchmarks/simParametersDict.py
  • Regenerate test reference data

Implementation Strategy

Option A: Breaking Change (Faster)

  • Update all parsing logic to expect new format
  • Requires regenerating all existing simulation data
  • Timeline: 2-3 days

Option B: Backward Compatible (Safer)

  • Implement dual-format parser
  • Gradually migrate existing data
  • Timeline: 4-5 days

Suggested Backward Compatible Parser

def parse_sim_name(sim_name):
    """Parse simulation name handling both old and new formats"""
    if "_AF_" in sim_name:
        name_parts = sim_name.split("_AF_")
        release_name = name_parts[0]
        info_parts = name_parts[1].split("_")
    else:
        name_parts = sim_name.split("_")
        release_name = name_parts[0]
        info_parts = name_parts[1:]

    # Detect format by checking if second component is a known module name
    known_modules = ["com1DFA", "com8MoTPSA", "com9MoTVoellmy"]

    if len(info_parts) > 1 and info_parts[1] in known_modules:
        # New format: relName_simHash_modName_defID_...
        return {
            "release_name": release_name,
            "sim_hash": info_parts[0],
            "module": info_parts[1],
            "def_id": info_parts[2] if len(info_parts) > 2 else None,
            "modified": info_parts[3] if len(info_parts) > 3 else None,
            "sim_type": info_parts[-2] if len(info_parts) > 4 else None,
            "model_type": info_parts[-1] if len(info_parts) > 1 else None,
        }
    else:
        # Old format: relName_simHash_defID_...
        return {
            "release_name": release_name,
            "sim_hash": info_parts[0],
            "module": None,  # Unknown/legacy
            "def_id": info_parts[1] if len(info_parts) > 1 else None,
            "modified": info_parts[2] if len(info_parts) > 2 else None,
            "sim_type": info_parts[-2] if len(info_parts) > 3 else None,
            "model_type": info_parts[-1] if len(info_parts) > 0 else None,
        }

Files to Modify

Critical (Must Fix)

  • avaframe/in3Utils/fileHandlerUtils.py
  • avaframe/in3Utils/cfgUtils.py
  • avaframe/ana3AIMEC/dfa2Aimec.py

Testing Data

  • benchmarks/simParametersDict.py
  • All test configuration files

Risk Assessment

Risk Level: HIGH

Risks:

  • Data loss if parsing fails
  • Broken analysis pipelines
  • Incompatible with existing simulation results
  • Time-consuming to fix across entire codebase

Mitigation:

  • Implement backward compatibility
  • Comprehensive testing
  • Staged rollout
  • Backup existing data

Next Steps

  1. Decide on approach (Breaking vs. Backward Compatible)
  2. Assign developer(s) to implement fixes
  3. Create detailed task breakdown
  4. Set up testing environment
  5. Begin implementation

Metadata

Metadata

Assignees

Labels

confirmedSomething isn't working

Type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions