Skip to content

feat(export): per-community subfolders + path-aware filenames for to_obsidian#844

Open
fitliferepo wants to merge 1 commit into
safishamsi:v7from
fitliferepo:feat/obsidian-per-community-folders
Open

feat(export): per-community subfolders + path-aware filenames for to_obsidian#844
fitliferepo wants to merge 1 commit into
safishamsi:v7from
fitliferepo:feat/obsidian-per-community-folders

Conversation

@fitliferepo
Copy link
Copy Markdown

Why

When to_obsidian is run on a real codebase, the current output has three issues that get in the way of using the vault:

Issue Before After
Filename collisions __init__()_1.md, __init__()_2.md (no way to tell which file) app-routers-pipeline-py--init.md, app-models-py--init.md (path-traceable)
Flat vault All N node notes in one folder; hard to browse One subfolder per community, _OVERVIEW.md inside
Cross-community links [[_COMMUNITY_X]] (dangle once you rename overviews) [[folder/_OVERVIEW|name]] (always resolves)

Tested on a ~1,700-node codebase: 0 dangling wikilinks, 0 lost files, 64 navigable subfolders instead of 1,700+ files at root.

What

graphify/export.py:

  • to_obsidian: pre-computes a unique folder name per community id (handles label collisions), routes each node note into its community folder, replaces _COMMUNITY_*.md with per-folder _OVERVIEW.md, emits a top-level _INDEX.md with communities sorted by size.
  • to_canvas: uses the same path-aware naming and emits canvas file refs as <folder>/<filename>.md so the canvas resolves against the new layout.
  • Filename derivation: <last-3-source-path-segments-slug>--<label-slug>. Falls back to plain label slug if source_file is missing, then to node if both are empty.

Tests

4 new tests + 1 updated:

  • test_to_obsidian_writes_per_community_subfolders — subfolder + overview + index emit
  • test_to_obsidian_filenames_disambiguate_by_source_path — no _1 suffixes for same-label nodes
  • test_to_obsidian_unique_folders_when_community_labels_collide — three communities labelled "main" → main/, main-2/, main-3/
  • test_to_obsidian_has_no_dangling_wikilinks — vault-wide invariant
  • test_to_canvas_file_paths_relative_to_vault — updated to assert canvas refs resolve to real files on disk

All 19 tests in test_export.py pass. Full suite (test_export.py test_cli_export.py test_build.py test_cluster.py test_analyze.py): 84 passed. Pre-existing failures in test_multilang.py (SQL) and test_ollama.py are unrelated and reproduce on stock v7.

Compatibility

  • to_obsidian and to_canvas keep the same signatures and return types.
  • Downstream tooling that opens the directory as an Obsidian vault is unaffected — Obsidian resolves [[bare-name]] links across subfolders automatically.
  • If users have callers that read _COMMUNITY_*.md at the vault root, they'll need to switch to reading <folder>/_OVERVIEW.md.

…obsidian

Restructure to_obsidian and to_canvas output to address three long-standing
issues with the Obsidian vault layout:

1. Filename collisions. The old logic produced files like `__init__()_1.md`,
   `__init__()_2.md` when multiple nodes shared a label. New logic slugs the
   last 3 source-path segments + the label, so two `__init__()` nodes in
   different modules get distinct, source-traceable filenames like
   `app-routers-pipeline-py--init` and `app-models-py--init`.

2. Flat vault layout. All node notes used to land in one directory. Now each
   node is written into a per-community subfolder, and each folder gets a
   `_OVERVIEW.md` (replacing top-level `_COMMUNITY_*.md`). A new `_INDEX.md`
   at vault root lists every community sorted by size.

3. Dangling cross-community links. The old "Connections to other communities"
   section emitted `[[_COMMUNITY_X]]` wikilinks; with the new layout these
   now resolve to `[[<folder>/_OVERVIEW|<community name>]]`.

Edge cases handled:
- Community labels that slug to the same string (e.g. three communities all
  labelled "main") now get unique folder names `main/`, `main-2/`, `main-3/`,
  each with its own _OVERVIEW.
- Nodes with empty label or empty source_file fall back to `node.md`.

to_canvas updated to match: canvas file refs now point at
`<community-folder>/<filename>.md` so the canvas resolves against the new
vault structure.

Adds 4 new tests covering subfolder layout, path-disambiguated filenames,
unique-folder behaviour under colliding community labels, and a vault-wide
no-dangling-wikilinks invariant. Updates the existing canvas test to assert
that every canvas file ref maps to a file actually written by to_obsidian.
@safishamsi
Copy link
Copy Markdown
Owner

Thanks for this — per-community subfolders is a great UX improvement and the test coverage is solid. Two issues to fix before we merge:

1. Duplicated helper code (must fix)
_slug, _node_stem, and the folder-precompute block are duplicated verbatim between to_obsidian and to_canvas. If they ever drift, canvas refs will silently point to the wrong files. Please extract them to module-level helpers shared by both functions.

2. Cross-folder wikilink collisions (must fix)
used_in_folder only deduplicates stems within a folder, so two nodes in different communities can share the same <path>--<label> stem. Wikilinks are written as bare [[neighbor_label]], so Obsidian's shortest-path resolver picks one arbitrarily on collision. The existing test_to_obsidian_has_no_dangling_wikilinks test passes either way because it only checks that some file with that stem exists, not that it's unique. Please either (a) enforce global stem uniqueness across all folders, or (b) write wikilinks as fully-qualified [[community_folder/stem]] paths.

Minor

  • _INDEX.md uses community_labels.get(cid, ...) raw in [[folder/_OVERVIEW|name]] — if a label contains | or ] the wikilink breaks. Worth sanitizing.

Once these are fixed and the PR is rebased onto current main (v0.7.19), we'll merge. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants