Skip to content

Conversation

@Lotram
Copy link
Contributor

@Lotram Lotram commented Jan 14, 2026

Before changing the actual config validation implementation, I think it's easier to use another Config model once everything has been validated, and use this model downstream.
I plan to change the validate_config implementation once we agree on these changes, and this Config model.

What do you think @ryneeverett ?

There is a failing test related to flavors in test_load.py, It will be fixed once I modify the validation step

Tests should not directly use abstract classes, but now use basic implementation instead
@Lotram Lotram force-pushed the static-config-model branch from 98f028d to 8497f04 Compare January 14, 2026 14:02
@Lotram Lotram marked this pull request as draft January 14, 2026 14:02
Copy link
Collaborator

@ryneeverett ryneeverett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like you're going with my "consumer view" proposal from #1167:

Maybe we're asking too much from pydantic to provide the ideal configuration interface for both inputs and outputs. Maybe we should consider another step in which the validation model is transformed into a view (like your quoted example) that is better optimized for consumers of the model.

That's fine and it's looking good. Some thoughts:

  • I think Config should be renamed to ConfigView or something that more particularly describes its specific purpose. Of course, once it is passed out of bugwarrior/config to other parts of the code base it can be referred to simply as "config", but within the bugwarrior.config package that is too vague of a name.
  • it would be nice to keep the configuration "view" separate from the configuration file validation (schema.py). So maybe validate_config returns the validated bugwarrior_config_model and Config moves to load.py and is instantiated from validated_data in load_config.
  • Is the get_service() -> ServiceConfig.service_class change necessary? I'd rather not require services to verbosely add that boilerplate, and it also violates the separation of concerns in service implementations -- ServiceConfig is for user configuration.

@Lotram
Copy link
Contributor Author

Lotram commented Jan 15, 2026

think Config should be renamed to ConfigView or something that more particularly describes its specific purpose. Of course, once it is passed out of bugwarrior/config to other parts of the code base it can be referred to simply as "config", but within the bugwarrior.config package that is too vague of a name.

I have no problem with this, except that my plan is to remove the existing dynamic config object, so there would be only 1 Config model after all my changes. Still, I can rename ConfigView if you prefere

it would be nice to keep the configuration "view" separate from the configuration file validation (schema.py). So maybe validate_config returns the validated bugwarrior_config_model and Config moves to load.py and is instantiated from validated_data in load_config.

Sure.

Is the get_service() -> ServiceConfig.service_class change necessary? I'd rather not require services to verbosely add that boilerplate, and it also violates the separation of concerns in service implementations -- ServiceConfig is for user configuration.

So I'm not entirely satisfied with this solution either, I agree about the boilerplate. Not so much about separation of concerns, as the Config and Service objects are tightly coupled anyway, but I get your point

My main concern with the existing implementation is that we need to call entrypoints (through importlib) everytime we want to retrieve a Service class, but IMO we should do it only once at the beginning, and then have the classes available somewhere.

We could do

class ServiceConfig(BaseModel):
   @cached_property
   def service_class(self):
        return get_service(self.service)

which does not really change anything about separation of concerns, but at least remove any boilerplate (at the cost of a slightly less precise typing, but that's not really a problem).

I might be able to avoid some usees of service_class by iterating over Config.services instead (but then I need to find a way to build this services property without ServiceConfig.service_class)

@ryneeverett
Copy link
Collaborator

I guess I misinterpreted your plan. Forget about the naming and code moving suggestions since they seem to be based on scaffolding.

I plan to change the validate_config implementation once we agree on these changes, and this Config model.

I don't have any objections to the Config model at the moment, but it seems like it's going to have to change a lot. As long as the configuration file is fully validated by pydantic I don't foresee any objections.

Not so much about separation of concerns, as the Config and Service objects are tightly coupled anyway

Perhaps I used the wrong terminology, but my concern here was not from an architectural point of view but from a service implementer's point of view. Somebody writing a service (who might not know python or have much of any programming experience) should be able to write their Config class as a simple data structure describing user configuration.

However, I would distinguish between unidirectional coupling and bidirectional coupling, the latter being worse. Config doesn't need to know about Service, which I think is another valid point against the current service_class methods.

My main concern with the existing implementation is that we need to call entrypoints (through importlib) everytime we want to retrieve a Service class

The same Service class is only being retrieved more than once if a user has multiple instances of the same service configured, right? Why couldn't we just make get_service a cached function if this is a concern?

@Lotram
Copy link
Contributor Author

Lotram commented Jan 22, 2026

@ryneeverett I'd like your opinion on commit c6e2a8. My opinion is the current implementation relies on a mix of function validations and some pydantic model validations, I think we should choose either one method or the other, so I suggest two alternatives for validate_config.

  • validate_config_using_type_adapters, that uses only TypeAdapter and some other checks, since we don't really need to create a model to validate objects that we won't use afterwards (i.e. other flavors, and their targets)
  • validate_config_using_model, that only uses a pydantic model with some model_validator functions. Opposite solution, almost everything is done in the model, we just need a function to set the type of service_configs list

Since we want to validate all flavors, I think we should also validate all service configs, not only the ones targeted by the choosen flavor.

A note on get_service_config_union_type: it can seems complex at first sight, but using a union discriminator is actually the most pydantic way to have polymorphism, IMO. We could define this union at import time, but this would require to load all available services, not only the ones referenced in the config, I'm not sure it's worth it.

Also, error messages may be wrong for now, I'll make sure they're are close as possible to current ones when we choose one implem or the other

Copy link
Collaborator

@ryneeverett ryneeverett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My opinion is the current implementation relies on a mix of function validations and some pydantic model validations, I think we should choose either one method or the other...

My impression is that the type adapter approach is a more consistent and complete implementation of the current approach of "pre-validation" followed by the model validation. It seems like an improvement, but I'm not sure it satisfies your either/or principle.

The WholeConfig would actually replace Config, right? And with the WholeConfig approach, we eliminate the "stages" of validation, right? This seems nicer for users because they get all the validation errors at once.

We could define this union at import time, but this would require to load all available services, not only the ones referenced in the config, I'm not sure it's worth it.

I would tend to agree. It would be nice if unconfigured services weren't imported since this could result in side effects or exceptions. We shouldn't insist that users have a perfect installation and python environment when it comes to features they're not using.

Since we want to validate all flavors, I think we should also validate all service configs, not only the ones targeted by the choosen flavor.

👍 Shouldn't we also be validating that all targets are defined in all flavors rather than just the selected one?

if "service" in service
]
return Annotated[
reduce(or_, service_config_classes), Field(discriminator='service')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get what's going on here. Seems like we want to deduplicate service classes but I don't follow how the bitwise OR operator does that or why there isn't an explicit Union type.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I admit it's not really explicit. Basically, let's say our targets are a GitHub and Jira services, we want to build the following type: GitHubServiceConfig | JiraServiceConfig, so pydantic can use the proper subclass when instanciating each service config, using the discriminator.

the or_ function is simply the function called by the | operator. The Union is created by doing type1 | type2

I need to check, but I may be able to do Union[*service_config_classes] instead of reduce(or_, service_config_classes), result is the same, but probably easier to understand

Edit, yes it works:

In [9]: Union[*service_config_classes] == reduce(or_, service_config_classes)
Out[9]: True

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I simply need to check it works in all supported python versions

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I wasn't realizing that pipes were overloaded to make Unions in type annotations so it makes sense now. Nonetheless, the Union[*service_config_classes] form is preferable if it works.

# NOTE: I think this is an elegant way to 'hide' the dynamical part of the config definition
# but it might be a bit too implicit
# for more info, see https://docs.pydantic.dev/latest/internals/resolving_annotations/#resolving-annotations-at-class-definition
ServiceConfigType = get_service_config_union_type(config["services"])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not entirely following. is the relevant example in the linked docs 'UnknownType`? It isn't clear to me from the docs that the namespace in which a model is instantiated is included when looking up types. (And how do type checkers handle this?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep the relevant example here is UnknownType (+ explanations about how types are resolved)

it isn't clear to me from the docs that the namespace in which a model is instantiated is included when looking up types.

Among other thing, the doc says:

For the locals argument, Pydantic will try to resolve symbols in the following namespaces, sorted by highest priority:

  • A namespace created on the fly, containing the current class name ({cls.__name__: cls}).

which is exactly the point of this line. Afterwards, we have:

In [1]: locals()["ServiceConfigType"]
Out[1]: typing.Annotated[typing.Union[bugwarrior.services.github.GithubConfig, .... (other config types)], FieldInfo(annotation=NoneType, required=True, discriminator='service') 

so the type can be resolved when building the WholeConfig object.

I admit it's not explicit at all, but unfortunately I found no other way to override the type of all_service_configs at runtime (other alternative being the one we don't want: building this type before model creation, but using all service config classes, not just the ones targeted in the config).

Another solution would be to create a new model with create_model:

    ServiceConfigType = get_service_config_union_type(config["services"])

    _WholeConfig = create_model(
        "WholeConfig", __base__=WholeConfig, all_service_configs=list[ServiceConfigType]
    )
    return _WholeConfig(
        all_service_configs=config.pop("services"),
        main_section=main_section,
        flavors=config.pop("flavors"),
        config_path=config_path,
        **config,
    )

And how do type checkers handle this?

Simple trick (which I should have put directly above WholeConfig instead):

if TYPE_CHECKING:
    ServiceConfigType = ServiceConfig

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the locals argument, Pydantic will try to resolve symbols in the following namespaces, sorted by highest priority:

  • A namespace created on the fly, containing the current class name ({cls.__name__: cls}).

which is exactly the point of this line.

I can't find any way to read this that indicates that the locals of the namespace in which the model is instantiated is searched for annotations. My interpretation of this particular line is that {'WholeConfig': WholeConfig} is included -- purely for recursive references, like the docs say.

I admit it's not explicit at all, but unfortunately I found no other way to override the type of all_service_configs at runtime (other alternative being the one we don't want: building this type before model creation, but using all service config classes, not just the ones targeted in the config).

Another solution would be to create a new model with create_model:

    ServiceConfigType = get_service_config_union_type(config["services"])

    _WholeConfig = create_model(
        "WholeConfig", __base__=WholeConfig, all_service_configs=list[ServiceConfigType]
    )
    return _WholeConfig(
        all_service_configs=config.pop("services"),
        main_section=main_section,
        flavors=config.pop("flavors"),
        config_path=config_path,
        **config,
    )

If this behavior is in fact documented then I'm fine with leaving it as is and just adding a comment explaining that the local namespace is searched for type annotations when instantiating a model. If it is undocumented then I'd be more in favor of this create_model solution.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, I quoted the wrong bullet point, I should have used this one:

The parent namespace of the class, if different from the globals described above. This is the locals of the frame where the class is being defined. For Base, because the class is being defined in the module directly, this namespace won't be used as it will result in the globals being used again. For Model, the parent namespace is the locals of the frame of inner().

Our use-case is similar to their Model one, because in our case, the WholeConfig model is being built inside the validate_config_using_model (even though the class itself is defined outside).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry but I can't agree with your interpretation. The parent namespace of the class is not "different from the globals described above" -- it is the module's __dict__. The docs clearly say "the locals of the frame where the class is being defined". That the locals in which the class is being built/instantiated are included seems to be undocumented behavior.

@Lotram
Copy link
Contributor Author

Lotram commented Jan 23, 2026

My impression is that the type adapter approach is a more consistent and complete implementation of the current approach of "pre-validation" followed by the model validation. It seems like an improvement, but I'm not sure it satisfies your either/or principle.

Yes, with that option, we would have the validate part using type adapters, no models, ensuring everything is ok. The model would then only be used for downstream processing (kind of like the config view you were talking earlier). The biggest improvement IMO is we don't have to create a dynamic model simply for validation

The WholeConfig would actually replace Config, right? And with the WholeConfig approach, we eliminate the "stages" of validation, right?

Yes exactly, with that solution, validation is almost entirely included in the model. IMO The small downside is: this object is carrying a lot of data we won't ever use after validation, making the developer experience a bit more complex, but that's probably fine.

This seems nicer for users because they get all the validation errors at once.

In both cases, I think we can indeed aim for that. That's what I was starting to do with the validation_errors in validate_config_using_type_adapters, gathering all errors and raising them all at once.
I think either solution should be completely transparent for the end user, and we should choose the simplest / most readable / easiest to maintain implementation from a developer perspective (as long as we can indeed guarantee it won't change anything for the user)

@ryneeverett
Copy link
Collaborator

IMO The small downside is: this object is carrying a lot of data we won't ever use after validation, making the developer experience a bit more complex, but that's probably fine.

Hmm, I was hoping we could move the data that is purely for validation purposes into "private" fields to make it clear that they aren't meant to be consumed but it seems that Pydantic considers any underscored fields to not be part of the model for validation either. A couple alternative ideas:

  • delete that data from the model before returning it
  • prefix with private_ rather than _

In both cases, I think we can indeed aim for that. That's what I was starting to do with the validation_errors in validate_config_using_type_adapters, gathering all errors and raising them all at once. I think either solution should be completely transparent for the end user, and we should choose the simplest / most readable / easiest to maintain implementation from a developer perspective (as long as we can indeed guarantee it won't change anything for the user)

👍

@Lotram
Copy link
Contributor Author

Lotram commented Jan 26, 2026

So just to be clear, do you have any preference between the two solutions ? If not, I think I'll go with the validate_config_using_type_adapters one, so we can have a lighter model.

@ryneeverett
Copy link
Collaborator

So just to be clear, do you have any preference between the two solutions ? If not, I think I'll go with the validate_config_using_type_adapters one, so we can have a lighter model.

From the perspective of the maintainability of schema.py, I prefer WholeConfig. I would characterize model definitions as pydantic's high level API, whereas type adapters are a lower level API that people are less familiar with and is less intuitively obvious. I also think lengthy functions like validate_config_using_type_adapters are a bit of a code smell, but to be fair it is less smelly than the current validate_config.

From the perspective of the maintainability of bugwarrior as a whole, I'd rather compartmentalize implementation details within schema.py than let them leak out, so I agree that having a cleaner model to pass on to consumers is desirable. Can we not delete the fields we don't want to expose from the model before returning it? If there's no decent way to return a "lighter model" to consumers then I'd agree that the type adapter approach is preferable.

@ryneeverett
Copy link
Collaborator

On the other hand, the two compromises we'd have to make for WholeConfig would seem to undercut its advantages:

  • Deleting fields is probably a violation of pydantic's high level API, makes the code less intuitively obvious, and is arguably a code smell.
  • In order for ServiceConfigType to work, we either need to rely on namespace magic (regardless of whether it's documented) or add an intermediary model using create_model, which arguably isn't pydantic's high level API.

So I guess I'd be in favor of the type adapter approach too.

@Lotram
Copy link
Contributor Author

Lotram commented Jan 26, 2026

From the perspective of the maintainability of bugwarrior as a whole, I'd rather compartmentalize implementation details within schema.py than let them leak out, so I agree that having a cleaner model to pass on to consumers is desirable. Can we not delete the fields we don't want to expose from the model before returning it? If there's no decent way to return a "lighter model" to consumers then I'd agree that the type adapter approach is preferable.

If we think the model validation is cleaner, we could also use a first model to validate everything, and have another one afterwards (coming back to the "view" idea).

In order for ServiceConfigType to work, we either need to rely on namespace magic (regardless of whether it's documented) or add an intermediary model using create_model, which arguably isn't pydantic's high level API.

Since the only non-static part of the model is the list of service config classes, we could use a type adapter only for this part. Something like :

class WholeConfig(SchemaBase):
    all_service_configs: list[ServiceConfig]
    main_section: str
    config_path: str
    flavors: list[MainSectionConfig]

def validate_config_using_model(
    config: dict, main_section: str, config_path: str
) -> "WholeConfig":
    config = format_config(config)

    ServiceConfigType = get_service_config_union_type(config["services"])

    all_service_configs = TypeAdapter(list[ServiceConfigType]).validate_python(
        config["services"]
    )
    return WholeConfig(
        all_service_configs=all_service_configs,
        main_section=main_section,
        flavors=config.pop("flavors"),
        config_path=config_path,
        **config,
    )

Other alternative, regarding the pydantic doc discussion, since it does use "The locals of the current class", we could do

WholeConfig.ServiceConfigType = get_service_config_union_type(config["services"])
# instead of  ServiceConfigType = get_service_config_union_type(config["services"])

@ryneeverett
Copy link
Collaborator

If we did go with the two-model idea, it seems like we'd want to build the "view" out of a generic dataclass rather than a pydantic model. Otherwise, it seems like we'd have to feed the raw data into both models and validate/parse them both which seems wrong and also opens up the theoretical possibility of "passing" the validation stage and then failing the view-model validation stage. I suppose we'd only need to define the top-level dataclass though and could simply assign the top-level fields we want from the pydantic model, right?

@Lotram
Copy link
Contributor Author

Lotram commented Jan 28, 2026

If we did go with the two-model idea, it seems like we'd want to build the "view" out of a generic dataclass rather than a pydantic model. Otherwise, it seems like we'd have to feed the raw data into both models and validate/parse them both which seems wrong and also opens up the theoretical possibility of "passing" the validation stage and then failing the view-model validation stage.

I'm not sure I understand what you're saying here. While we can use a dataclass, I don't see why we can't use another pydantic model, by doing something like:

class ConfigView(SchemaBase):
    service_configs: list[ServiceConfig]
    main: MainSectionConfig


class WholeConfig(SchemaBase):
    ... 
    def to_config_view(self):
        return ConfigView(
            service_configs=self.service_configs,
            main=self.main,
            hooks=self.hooks,
            notifications=self.notifications,
        )

There is no additional validation, except for the types, and no parsing.

I'll work on a final version this week-end.

@ryneeverett
Copy link
Collaborator

I see. I hadn't realized that pydantic models accept instantiated submodels as inputs just as well as the data to instantiate those submodels. But that does seem to be the case:

>>> from typing import List, Optional
...
... from pydantic import BaseModel
...
...
... class Foo(BaseModel):
...     count: int
...     size: Optional[float] = None
...
...
... class Bar(BaseModel):
...     apple: str = 'x'
...     banana: str = 'y'
...
...
... class Spam(BaseModel):
...     foo: Foo
...     bars: List[Bar]
...
>>> foo = Foo(count='4')
>>> n = Spam(foo=foo, bars=[{'apple': 'x1'}, {'apple': 'x2'}])
>>> m = Spam(foo={'count': 4}, bars=[{'apple': 'x1'}, {'apple': 'x2'}])
>>> n == m
True

@Lotram Lotram force-pushed the static-config-model branch from e1e737c to 4aaf503 Compare February 1, 2026 21:53
@ryneeverett
Copy link
Collaborator

Breaking validation out of schema.py seems good. I'm thinking Config should go with it since it's not part of the configuration schema per se and load.py already imports it from validation.py

Extract validation logic from schema.py into dedicated validation.py module.
The new Config class provides structured access to configuration
with typed properties (main, service_configs, services)
instead of returning a raw dict.

Key changes:
- Normalize config structure with flavors/services during parsing
- Introduce get_service_instances() to streamline service instantiation
- Improve error formatting for validation errors
@Lotram Lotram force-pushed the static-config-model branch from 66834ac to b30659c Compare February 4, 2026 17:29
@Lotram
Copy link
Contributor Author

Lotram commented Feb 4, 2026

@ryneeverett I squashed most of my commits, and applied your comments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants