Skip to content

Docs parser improvements#334

Merged
joeperpetua merged 17 commits intoN4S4:masterfrom
Sartohanix:docs_parser-improvements
Feb 21, 2026
Merged

Docs parser improvements#334
joeperpetua merged 17 commits intoN4S4:masterfrom
Sartohanix:docs_parser-improvements

Conversation

@Sartohanix
Copy link
Contributor

@Sartohanix Sartohanix commented Feb 16, 2026

Improvement suggestion for docs_parser.py script

Motivation

Add more flexibility for the parsing of internal api names. Current implementation relies on regex parsing of each .py file's source, looking for "api_name = <...>" (e.g. api_name = "SYNO.Core.X.Y") in methods of the implemented classes.

One suggestion, implemented in this PR, is to allow for class-scope attributes _API_NAME = ... or _api_name = ..., to be interpreted as the default internal API for methods of that class, if not otherwise specified (by api_name = ...).

In addition, current regex implementation comes with at least two bugs:

  1. The METHOD_API_NAME_PATTERN() regex leaks from a given method's def x(): to subsequent ones: if a method does not include api_name = ... but a subsequent one in the file does, it is the latter that gets wrongly matched as "internal API" of the method. No warning is issued in the process. This currently concerns the following methods:

    • File core_certificate.py, class Certificate
      • list_cert
      • set_default_cert
      • delete_certificate
    • File photos.py, class Photos
      • list_folders
      • list_teams_folders
      • count_folders
      • count_team_folders
      • lookup_folder
      • lookup_team_folder
      • share_album
      • share_team_folder
    • File directory_server.py: class DirectoryServer
      • list_directory_objects
      • modify_user_info
      • delete_item
    • File core_package.py: class Package
      • install_package
  2. For files implementing multiple classes, the "Supported APIs" list doesn't properly gather API names by class. See the (only) example of Share/SharePermission/KeyManagerStore/KeyManagerAutoKey classes from core_share.py. Currently, all APIs of a file are listed under each class of that file.

Proposition

File parsing is now performed through recursive walk of ast.tree. This operation was performed anyway through docstring_extractor package helpers. It is now explicit and allows for a more flexible identification of the file structure, in particular of api_names. More specifically, the ast parser now looks up for assignments of the form

  • _API_NAME = ... or _api_name = ... in class-scope nodes
  • api_name = ... in method-scope nodes

When provided, class-level attribute _API_NAME acts as the default internal API name for all contained methods. It gets overridden by method-scope api_name = ... when also provided.

Misc. changes and improvements

  • API summary list (docs/apis/readme.md) is now sorted alphabetically. Idem for per-class sublists of internal APIs.
  • A few tweaks regarding newlines in .md files, yielding better looking raw markdown docfiles (in particular <div></div> blocks featuring markdown text decoration)
  • Some harmless code re-factoring (in particular, function definitions grouped by purpose) (sorry about this making diff review a bit tedious)
  • Updated docs-parser task under Taskfile.yaml to include -l argument when executing docs_parser.py. The task now generates the API list (docs/apis/readme.md)

Tests & output

Proposed implementation seems to provide expected output at the markdown and docusaurus level. Correcting bugs 1. and 2. above removed internal API names for cited methods (previously off). This will be corrected by updating mis-formed methods in the concerned classes.

Copy link
Collaborator

@joeperpetua joeperpetua left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! It is very well explained and researched.
I made a few minor suggestions inline, please take a look.

Regarding the new parsing approach, since extract_file_info is effectively becoming a custom AST parser, I think we should move it to its own module. Keeping it separate from the main logic and helpers would make the parser much more approachable for new contributors (we could do the same for the styling helpers, too).

Let me know if you have time/would like to tackle that refactor in this PR. If not, we can merge this as-is and handle the cleanup in the future.

@Sartohanix
Copy link
Contributor Author

Thanks for the PR! It is very well explained and researched. I made a few minor suggestions inline, please take a look.

Regarding the new parsing approach, since extract_file_info is effectively becoming a custom AST parser, I think we should move it to its own module. Keeping it separate from the main logic and helpers would make the parser much more approachable for new contributors (we could do the same for the styling helpers, too).

Let me know if you have time/would like to tackle that refactor in this PR. If not, we can merge this as-is and handle the cleanup in the future.

Thanks for the follow up. Factoring out the parser and formatter as their own modules would certainly look cleaner. I can take care of that directly in the PR. Where do you suggest these new files be created ? Maybe some new 'docs_parser' folder ?

@joeperpetua
Copy link
Collaborator

I think we can move the docs_parser.py scripts/docs_parser/main.py and then abstract it to different modules.

For example:

  • scripts/docs_parser/config.py: all the constants / regex / etc
  • scripts/docs_parser/ui.py: all the md/html generators
  • scripts/docs_parser/parser.py: the AST heavy logic
  • scripts/docs_parser/utils.py: generic helpers
  • scripts/docs_parser/main.py: all the logic and generation importing the modules

Then we should be able to run it as a module from root py -m scripts.docs_parser.main.

What do you think?

@Sartohanix
Copy link
Contributor Author

scripts/docs_parser sounds like the right place. The file splitting sounds about right, but I'll see whether config.py and utils.py are really necessary.

I think that main.py could also be made __main__.py, giving a more concise call to py -m scripts.docs_parser.

@Sartohanix
Copy link
Contributor Author

Sartohanix commented Feb 17, 2026

I've made some further modification to get_docs_status() / gen_doc_metadata(), based on a cleaner parsing of the docs_status.yaml and error handling when classes are missing from it.

I took the liberty of changing the yaml syntax from single-key-dict lists to standard dict listing. I believe this has a few advantages: warnings/error on duplicates + more direct parsing. Also, the sorting (for sidebar display positions) is explicitly performed in the code now instead of relying on manual item ordering.

This is minor but can be reverted if preferable in the previous form.

@Sartohanix
Copy link
Contributor Author

Things look good now. I'm marking the PR ready for review, but don't hesitate to let me know if some more tweaks are needed.

One last suggestion would be to bump the docusaurus version (latest is 3.9.2, current is 3.7.0) under documentation/package.json to resolve complaints at build time, but not sure if it's really needed or desirable.

@Sartohanix Sartohanix marked this pull request as ready for review February 18, 2026 01:11
Copy link
Collaborator

@joeperpetua joeperpetua left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything looks good to me, great job!

The parser will be now way easier to maintain/scale. The changes for the docs_status is a good addition 👍

I will make a new issue for the docusaurus update as it is not that much related to this PR.

@Sartohanix
Copy link
Contributor Author

Validation checks should now pass: fixed the pre-commit failure.

@joeperpetua joeperpetua merged commit ba60878 into N4S4:master Feb 21, 2026
2 checks passed
@Sartohanix Sartohanix mentioned this pull request Feb 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants