Skip to content

Static config model#1181

Merged
ryneeverett merged 6 commits intoGothenburgBitFactory:developfrom
Lotram:static-config-model
Mar 15, 2026
Merged

Static config model#1181
ryneeverett merged 6 commits intoGothenburgBitFactory:developfrom
Lotram:static-config-model

Conversation

@Lotram
Copy link
Copy Markdown
Collaborator

@Lotram Lotram commented Jan 14, 2026

Before changing the actual config validation implementation, I think it's easier to use another Config model once everything has been validated, and use this model downstream.
I plan to change the validate_config implementation once we agree on these changes, and this Config model.

What do you think @ryneeverett ?

There is a failing test related to flavors in test_load.py, It will be fixed once I modify the validation step

Tests should not directly use abstract classes, but now use basic implementation instead
@Lotram Lotram force-pushed the static-config-model branch from 98f028d to 8497f04 Compare January 14, 2026 14:02
@Lotram Lotram marked this pull request as draft January 14, 2026 14:02
Copy link
Copy Markdown
Collaborator

@ryneeverett ryneeverett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like you're going with my "consumer view" proposal from #1167:

Maybe we're asking too much from pydantic to provide the ideal configuration interface for both inputs and outputs. Maybe we should consider another step in which the validation model is transformed into a view (like your quoted example) that is better optimized for consumers of the model.

That's fine and it's looking good. Some thoughts:

  • I think Config should be renamed to ConfigView or something that more particularly describes its specific purpose. Of course, once it is passed out of bugwarrior/config to other parts of the code base it can be referred to simply as "config", but within the bugwarrior.config package that is too vague of a name.
  • it would be nice to keep the configuration "view" separate from the configuration file validation (schema.py). So maybe validate_config returns the validated bugwarrior_config_model and Config moves to load.py and is instantiated from validated_data in load_config.
  • Is the get_service() -> ServiceConfig.service_class change necessary? I'd rather not require services to verbosely add that boilerplate, and it also violates the separation of concerns in service implementations -- ServiceConfig is for user configuration.

Comment thread bugwarrior/config/schema.py Outdated
@Lotram
Copy link
Copy Markdown
Collaborator Author

Lotram commented Jan 15, 2026

think Config should be renamed to ConfigView or something that more particularly describes its specific purpose. Of course, once it is passed out of bugwarrior/config to other parts of the code base it can be referred to simply as "config", but within the bugwarrior.config package that is too vague of a name.

I have no problem with this, except that my plan is to remove the existing dynamic config object, so there would be only 1 Config model after all my changes. Still, I can rename ConfigView if you prefere

it would be nice to keep the configuration "view" separate from the configuration file validation (schema.py). So maybe validate_config returns the validated bugwarrior_config_model and Config moves to load.py and is instantiated from validated_data in load_config.

Sure.

Is the get_service() -> ServiceConfig.service_class change necessary? I'd rather not require services to verbosely add that boilerplate, and it also violates the separation of concerns in service implementations -- ServiceConfig is for user configuration.

So I'm not entirely satisfied with this solution either, I agree about the boilerplate. Not so much about separation of concerns, as the Config and Service objects are tightly coupled anyway, but I get your point

My main concern with the existing implementation is that we need to call entrypoints (through importlib) everytime we want to retrieve a Service class, but IMO we should do it only once at the beginning, and then have the classes available somewhere.

We could do

class ServiceConfig(BaseModel):
   @cached_property
   def service_class(self):
        return get_service(self.service)

which does not really change anything about separation of concerns, but at least remove any boilerplate (at the cost of a slightly less precise typing, but that's not really a problem).

I might be able to avoid some usees of service_class by iterating over Config.services instead (but then I need to find a way to build this services property without ServiceConfig.service_class)

@ryneeverett
Copy link
Copy Markdown
Collaborator

I guess I misinterpreted your plan. Forget about the naming and code moving suggestions since they seem to be based on scaffolding.

I plan to change the validate_config implementation once we agree on these changes, and this Config model.

I don't have any objections to the Config model at the moment, but it seems like it's going to have to change a lot. As long as the configuration file is fully validated by pydantic I don't foresee any objections.

Not so much about separation of concerns, as the Config and Service objects are tightly coupled anyway

Perhaps I used the wrong terminology, but my concern here was not from an architectural point of view but from a service implementer's point of view. Somebody writing a service (who might not know python or have much of any programming experience) should be able to write their Config class as a simple data structure describing user configuration.

However, I would distinguish between unidirectional coupling and bidirectional coupling, the latter being worse. Config doesn't need to know about Service, which I think is another valid point against the current service_class methods.

My main concern with the existing implementation is that we need to call entrypoints (through importlib) everytime we want to retrieve a Service class

The same Service class is only being retrieved more than once if a user has multiple instances of the same service configured, right? Why couldn't we just make get_service a cached function if this is a concern?

@Lotram
Copy link
Copy Markdown
Collaborator Author

Lotram commented Jan 22, 2026

@ryneeverett I'd like your opinion on commit c6e2a8. My opinion is the current implementation relies on a mix of function validations and some pydantic model validations, I think we should choose either one method or the other, so I suggest two alternatives for validate_config.

  • validate_config_using_type_adapters, that uses only TypeAdapter and some other checks, since we don't really need to create a model to validate objects that we won't use afterwards (i.e. other flavors, and their targets)
  • validate_config_using_model, that only uses a pydantic model with some model_validator functions. Opposite solution, almost everything is done in the model, we just need a function to set the type of service_configs list

Since we want to validate all flavors, I think we should also validate all service configs, not only the ones targeted by the choosen flavor.

A note on get_service_config_union_type: it can seems complex at first sight, but using a union discriminator is actually the most pydantic way to have polymorphism, IMO. We could define this union at import time, but this would require to load all available services, not only the ones referenced in the config, I'm not sure it's worth it.

Also, error messages may be wrong for now, I'll make sure they're are close as possible to current ones when we choose one implem or the other

Copy link
Copy Markdown
Collaborator

@ryneeverett ryneeverett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My opinion is the current implementation relies on a mix of function validations and some pydantic model validations, I think we should choose either one method or the other...

My impression is that the type adapter approach is a more consistent and complete implementation of the current approach of "pre-validation" followed by the model validation. It seems like an improvement, but I'm not sure it satisfies your either/or principle.

The WholeConfig would actually replace Config, right? And with the WholeConfig approach, we eliminate the "stages" of validation, right? This seems nicer for users because they get all the validation errors at once.

We could define this union at import time, but this would require to load all available services, not only the ones referenced in the config, I'm not sure it's worth it.

I would tend to agree. It would be nice if unconfigured services weren't imported since this could result in side effects or exceptions. We shouldn't insist that users have a perfect installation and python environment when it comes to features they're not using.

Since we want to validate all flavors, I think we should also validate all service configs, not only the ones targeted by the choosen flavor.

👍 Shouldn't we also be validating that all targets are defined in all flavors rather than just the selected one?

Comment thread bugwarrior/config/schema.py Outdated
Comment thread bugwarrior/config/schema.py Outdated
Comment thread bugwarrior/config/schema.py Outdated
@Lotram
Copy link
Copy Markdown
Collaborator Author

Lotram commented Jan 23, 2026

My impression is that the type adapter approach is a more consistent and complete implementation of the current approach of "pre-validation" followed by the model validation. It seems like an improvement, but I'm not sure it satisfies your either/or principle.

Yes, with that option, we would have the validate part using type adapters, no models, ensuring everything is ok. The model would then only be used for downstream processing (kind of like the config view you were talking earlier). The biggest improvement IMO is we don't have to create a dynamic model simply for validation

The WholeConfig would actually replace Config, right? And with the WholeConfig approach, we eliminate the "stages" of validation, right?

Yes exactly, with that solution, validation is almost entirely included in the model. IMO The small downside is: this object is carrying a lot of data we won't ever use after validation, making the developer experience a bit more complex, but that's probably fine.

This seems nicer for users because they get all the validation errors at once.

In both cases, I think we can indeed aim for that. That's what I was starting to do with the validation_errors in validate_config_using_type_adapters, gathering all errors and raising them all at once.
I think either solution should be completely transparent for the end user, and we should choose the simplest / most readable / easiest to maintain implementation from a developer perspective (as long as we can indeed guarantee it won't change anything for the user)

@ryneeverett
Copy link
Copy Markdown
Collaborator

IMO The small downside is: this object is carrying a lot of data we won't ever use after validation, making the developer experience a bit more complex, but that's probably fine.

Hmm, I was hoping we could move the data that is purely for validation purposes into "private" fields to make it clear that they aren't meant to be consumed but it seems that Pydantic considers any underscored fields to not be part of the model for validation either. A couple alternative ideas:

  • delete that data from the model before returning it
  • prefix with private_ rather than _

In both cases, I think we can indeed aim for that. That's what I was starting to do with the validation_errors in validate_config_using_type_adapters, gathering all errors and raising them all at once. I think either solution should be completely transparent for the end user, and we should choose the simplest / most readable / easiest to maintain implementation from a developer perspective (as long as we can indeed guarantee it won't change anything for the user)

👍

@Lotram
Copy link
Copy Markdown
Collaborator Author

Lotram commented Jan 26, 2026

So just to be clear, do you have any preference between the two solutions ? If not, I think I'll go with the validate_config_using_type_adapters one, so we can have a lighter model.

@ryneeverett
Copy link
Copy Markdown
Collaborator

So just to be clear, do you have any preference between the two solutions ? If not, I think I'll go with the validate_config_using_type_adapters one, so we can have a lighter model.

From the perspective of the maintainability of schema.py, I prefer WholeConfig. I would characterize model definitions as pydantic's high level API, whereas type adapters are a lower level API that people are less familiar with and is less intuitively obvious. I also think lengthy functions like validate_config_using_type_adapters are a bit of a code smell, but to be fair it is less smelly than the current validate_config.

From the perspective of the maintainability of bugwarrior as a whole, I'd rather compartmentalize implementation details within schema.py than let them leak out, so I agree that having a cleaner model to pass on to consumers is desirable. Can we not delete the fields we don't want to expose from the model before returning it? If there's no decent way to return a "lighter model" to consumers then I'd agree that the type adapter approach is preferable.

@ryneeverett
Copy link
Copy Markdown
Collaborator

On the other hand, the two compromises we'd have to make for WholeConfig would seem to undercut its advantages:

  • Deleting fields is probably a violation of pydantic's high level API, makes the code less intuitively obvious, and is arguably a code smell.
  • In order for ServiceConfigType to work, we either need to rely on namespace magic (regardless of whether it's documented) or add an intermediary model using create_model, which arguably isn't pydantic's high level API.

So I guess I'd be in favor of the type adapter approach too.

@Lotram
Copy link
Copy Markdown
Collaborator Author

Lotram commented Jan 26, 2026

From the perspective of the maintainability of bugwarrior as a whole, I'd rather compartmentalize implementation details within schema.py than let them leak out, so I agree that having a cleaner model to pass on to consumers is desirable. Can we not delete the fields we don't want to expose from the model before returning it? If there's no decent way to return a "lighter model" to consumers then I'd agree that the type adapter approach is preferable.

If we think the model validation is cleaner, we could also use a first model to validate everything, and have another one afterwards (coming back to the "view" idea).

In order for ServiceConfigType to work, we either need to rely on namespace magic (regardless of whether it's documented) or add an intermediary model using create_model, which arguably isn't pydantic's high level API.

Since the only non-static part of the model is the list of service config classes, we could use a type adapter only for this part. Something like :

class WholeConfig(SchemaBase):
    all_service_configs: list[ServiceConfig]
    main_section: str
    config_path: str
    flavors: list[MainSectionConfig]

def validate_config_using_model(
    config: dict, main_section: str, config_path: str
) -> "WholeConfig":
    config = format_config(config)

    ServiceConfigType = get_service_config_union_type(config["services"])

    all_service_configs = TypeAdapter(list[ServiceConfigType]).validate_python(
        config["services"]
    )
    return WholeConfig(
        all_service_configs=all_service_configs,
        main_section=main_section,
        flavors=config.pop("flavors"),
        config_path=config_path,
        **config,
    )

Other alternative, regarding the pydantic doc discussion, since it does use "The locals of the current class", we could do

WholeConfig.ServiceConfigType = get_service_config_union_type(config["services"])
# instead of  ServiceConfigType = get_service_config_union_type(config["services"])

Comment thread bugwarrior/config/schema.py Outdated
@ryneeverett
Copy link
Copy Markdown
Collaborator

If we did go with the two-model idea, it seems like we'd want to build the "view" out of a generic dataclass rather than a pydantic model. Otherwise, it seems like we'd have to feed the raw data into both models and validate/parse them both which seems wrong and also opens up the theoretical possibility of "passing" the validation stage and then failing the view-model validation stage. I suppose we'd only need to define the top-level dataclass though and could simply assign the top-level fields we want from the pydantic model, right?

@Lotram
Copy link
Copy Markdown
Collaborator Author

Lotram commented Jan 28, 2026

If we did go with the two-model idea, it seems like we'd want to build the "view" out of a generic dataclass rather than a pydantic model. Otherwise, it seems like we'd have to feed the raw data into both models and validate/parse them both which seems wrong and also opens up the theoretical possibility of "passing" the validation stage and then failing the view-model validation stage.

I'm not sure I understand what you're saying here. While we can use a dataclass, I don't see why we can't use another pydantic model, by doing something like:

class ConfigView(SchemaBase):
    service_configs: list[ServiceConfig]
    main: MainSectionConfig


class WholeConfig(SchemaBase):
    ... 
    def to_config_view(self):
        return ConfigView(
            service_configs=self.service_configs,
            main=self.main,
            hooks=self.hooks,
            notifications=self.notifications,
        )

There is no additional validation, except for the types, and no parsing.

I'll work on a final version this week-end.

@ryneeverett
Copy link
Copy Markdown
Collaborator

I see. I hadn't realized that pydantic models accept instantiated submodels as inputs just as well as the data to instantiate those submodels. But that does seem to be the case:

>>> from typing import List, Optional
...
... from pydantic import BaseModel
...
...
... class Foo(BaseModel):
...     count: int
...     size: Optional[float] = None
...
...
... class Bar(BaseModel):
...     apple: str = 'x'
...     banana: str = 'y'
...
...
... class Spam(BaseModel):
...     foo: Foo
...     bars: List[Bar]
...
>>> foo = Foo(count='4')
>>> n = Spam(foo=foo, bars=[{'apple': 'x1'}, {'apple': 'x2'}])
>>> m = Spam(foo={'count': 4}, bars=[{'apple': 'x1'}, {'apple': 'x2'}])
>>> n == m
True

Comment thread tests/config/test_load.py
Comment thread bugwarrior/services/jira.py Outdated
Comment thread bugwarrior/config/load.py
@Lotram Lotram force-pushed the static-config-model branch from e1e737c to 4aaf503 Compare February 1, 2026 21:53
@ryneeverett
Copy link
Copy Markdown
Collaborator

Breaking validation out of schema.py seems good. I'm thinking Config should go with it since it's not part of the configuration schema per se and load.py already imports it from validation.py

Extract validation logic from schema.py into dedicated validation.py module.
The new Config class provides structured access to configuration
with typed properties (main, service_configs, services)
instead of returning a raw dict.

Key changes:
- Normalize config structure with flavors/services during parsing
- Introduce get_service_instances() to streamline service instantiation
- Improve error formatting for validation errors
@Lotram Lotram force-pushed the static-config-model branch from 66834ac to b30659c Compare February 4, 2026 17:29
@Lotram
Copy link
Copy Markdown
Collaborator Author

Lotram commented Feb 4, 2026

@ryneeverett I squashed most of my commits, and applied your comments.

Comment thread bugwarrior/config/load.py Outdated
Comment thread bugwarrior/config/load.py Outdated
Comment thread bugwarrior/config/load.py Outdated
Comment thread bugwarrior/config/validation.py Outdated
main=main,
hooks=hooks,
notifications=notifications,
)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than completely doing validation prior to constructing the model, couldn't we put the Config instantiation in a try/except block and then append any errors to error_messages? Seems like this would eliminate a lot of the repetitive code here, would avoid further repetition if we were to add more fields to Config some day, and would keep our aberration from typical pydantic usage to a minimum.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually yes I thought about this, but ensuring we raise all errors, while avoiding duplicates, is actually tricky:

We could definitely avoid pre-validating hooks and notifications, but then: if there are some service-related errors, we would catch them both during pre-validation and Config validation (same for flavors). Avoiding this would require further checks, not sure that's better.

I wanted to avoid mixing type adapter and model validations, that's why I went with the full type adapter validation, then passing it to the model. I still can revert it back to using the WholeConfig model validation.

I agree the validation repetition does not feel right though

Copy link
Copy Markdown
Collaborator

@ryneeverett ryneeverett Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As it currently stands, if errors are encountered then the values we're passing into Config are undefined. The only way we'd get repeated errors is if we passed the same user input into the model in these cases. Rather than doing that, couldn't we define them as generic dummy values that we know will not generate any errors? For example, in the except block after validating service configs, we could define service_configs = [].

Edit: I just realized that my comment about them being undefined is untrue because you're initializing them first. I think the point stands though.

Copy link
Copy Markdown
Collaborator Author

@Lotram Lotram Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could pass service_configs = [] , but main still needs to be defined though. What should happen if there is a flavor / main_section related error ? We won't be able to define main, and we'll get another error, right.

Other alternative: we define another temp model, from fields that are neither flavors nor services (i.e. only notifications and hooks for now). Something like:

   
    try:
        extra_options = ExtraConfigOptions.model_validate(config) # where config does not have either flavors nor service keys anymore.
    except pydantic.ValidationError as error:
        error_messages.append(error.errors()) # some extra formatting required here 

    if error_messages:
        raise_validation_error(
            "".join(error_messages), config_path, error_count=len(error_messages)
        )

    main = flavors[main_section]
    filtered_service_configs = [service_configs[target] for target in main.targets]

    return Config(
        service_configs=filtered_service_configs, main=main, extra_options=extra_options
    )

class ExtraConfigOptions(BaseConfig):
    hooks: Hooks = Hooks()
    notifications: Notifications = Notifications()


class Config(BaseConfig):
    service_configs: list[ServiceConfig]
    main: MainSectionConfig
    extra_options: ExtraConfigOptions

We could also do the same as this, but with Config inheriting from ExtraConfigOptions, but I think it makes more sense to actually have these as a field.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could pass service_configs = [] , but main still needs to be defined though. What should happen if there is a flavor / main_section related error ? We won't be able to define main, and we'll get another error, right.

Similarly we could create an inherently valid value -- main = MainSectionConfig(targets=[]).

Comment thread bugwarrior/config/load.py Outdated
Comment thread bugwarrior/config/schema.py
Comment thread bugwarrior/config/validation.py Outdated
@Lotram Lotram marked this pull request as ready for review February 9, 2026 10:44
Copy link
Copy Markdown
Collaborator

@ryneeverett ryneeverett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, I just happened to check in on this PR to see if I dropped the ball. Somehow I got zero notifications of your activity this week.

I'm not very enthusiastic about ExtraOptions. The extra namespace and structure doesn't have any meaning beyond the validation implementation. Moreover, I still think that using the errors from the Config construction has virtues beyond eliminating repetition:

  • It's the most typical way to use pydantic.
  • More future-proofed in that we'll never have an issue where we somehow accidentally pass invalid data into the Config constructor and then end up with a second phase of validation errors that aren't properly formatted.

Comment thread bugwarrior/config/load.py Outdated
Comment thread bugwarrior/config/schema.py
Comment thread bugwarrior/config/validation.py Outdated
main=main,
hooks=hooks,
notifications=notifications,
)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could pass service_configs = [] , but main still needs to be defined though. What should happen if there is a flavor / main_section related error ? We won't be able to define main, and we'll get another error, right.

Similarly we could create an inherently valid value -- main = MainSectionConfig(targets=[]).

@ryneeverett
Copy link
Copy Markdown
Collaborator

@Lotram No pressure if you've been busy but I wanted to ping you in case github notifications on this PR are also broken for you.

@Lotram
Copy link
Copy Markdown
Collaborator Author

Lotram commented Feb 23, 2026

Sorry I was quite busy the past few weeks, I'll definitely try to finish this before the end of the week (probably during the week end)

@Lotram Lotram force-pushed the static-config-model branch from 2a1dea0 to 48b01ce Compare March 2, 2026 12:41
@Lotram
Copy link
Copy Markdown
Collaborator Author

Lotram commented Mar 2, 2026

@ryneeverett I removed ExtraOptions. I'm not a big fan of having to define an "empty" config to validate hooks and notifications, but I have nothing better to suggest, so let's keep that for now, and we can create a new PR if we find a better solution

Comment thread bugwarrior/config/validation.py Outdated
Comment thread bugwarrior/config/validation.py Outdated
Comment thread bugwarrior/config/validation.py Outdated
Comment thread bugwarrior/config/validation.py Outdated
Comment thread bugwarrior/config/validation.py Outdated
Comment thread tests/config/test_validation.py Outdated
Comment thread tests/config/test_validation.py Outdated
Comment thread tests/config/test_data.py Outdated
Comment thread tests/base.py
Comment thread tests/test_clickup.py
Comment thread bugwarrior/config/validation.py Outdated
Comment thread tests/config/test_load.py
Comment thread tests/base.py
Comment thread tests/test_gitlab.py Outdated
- narrow type in validation.py, using asserts
- rename services_list to _raw_service_configs
- remove conf property in test_gitlab.py
- restore test in test_load.py, without broken assertion
@ryneeverett ryneeverett merged commit 3a8bf0d into GothenburgBitFactory:develop Mar 15, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants