-
Notifications
You must be signed in to change notification settings - Fork 218
Static config model #1181
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Static config model #1181
Conversation
Tests should not directly use abstract classes, but now use basic implementation instead
98f028d to
8497f04
Compare
ryneeverett
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like you're going with my "consumer view" proposal from #1167:
Maybe we're asking too much from pydantic to provide the ideal configuration interface for both inputs and outputs. Maybe we should consider another step in which the validation model is transformed into a view (like your quoted example) that is better optimized for consumers of the model.
That's fine and it's looking good. Some thoughts:
- I think
Configshould be renamed toConfigViewor something that more particularly describes its specific purpose. Of course, once it is passed out of bugwarrior/config to other parts of the code base it can be referred to simply as "config", but within the bugwarrior.config package that is too vague of a name. - it would be nice to keep the configuration "view" separate from the configuration file validation (schema.py). So maybe
validate_configreturns the validatedbugwarrior_config_modelandConfigmoves to load.py and is instantiated fromvalidated_datainload_config. - Is the
get_service()->ServiceConfig.service_classchange necessary? I'd rather not require services to verbosely add that boilerplate, and it also violates the separation of concerns in service implementations -- ServiceConfig is for user configuration.
I have no problem with this, except that my plan is to remove the existing dynamic config object, so there would be only 1 Config model after all my changes. Still, I can rename
Sure.
So I'm not entirely satisfied with this solution either, I agree about the boilerplate. Not so much about separation of concerns, as the My main concern with the existing implementation is that we need to call entrypoints (through importlib) everytime we want to retrieve a We could do which does not really change anything about separation of concerns, but at least remove any boilerplate (at the cost of a slightly less precise typing, but that's not really a problem). I might be able to avoid some usees of |
|
I guess I misinterpreted your plan. Forget about the naming and code moving suggestions since they seem to be based on scaffolding.
I don't have any objections to the Config model at the moment, but it seems like it's going to have to change a lot. As long as the configuration file is fully validated by pydantic I don't foresee any objections.
Perhaps I used the wrong terminology, but my concern here was not from an architectural point of view but from a service implementer's point of view. Somebody writing a service (who might not know python or have much of any programming experience) should be able to write their Config class as a simple data structure describing user configuration. However, I would distinguish between unidirectional coupling and bidirectional coupling, the latter being worse. Config doesn't need to know about Service, which I think is another valid point against the current
The same Service class is only being retrieved more than once if a user has multiple instances of the same service configured, right? Why couldn't we just make |
|
@ryneeverett I'd like your opinion on commit c6e2a8. My opinion is the current implementation relies on a mix of function validations and some pydantic model validations, I think we should choose either one method or the other, so I suggest two alternatives for
Since we want to validate all flavors, I think we should also validate all service configs, not only the ones targeted by the choosen flavor. A note on Also, error messages may be wrong for now, I'll make sure they're are close as possible to current ones when we choose one implem or the other |
ryneeverett
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My opinion is the current implementation relies on a mix of function validations and some pydantic model validations, I think we should choose either one method or the other...
My impression is that the type adapter approach is a more consistent and complete implementation of the current approach of "pre-validation" followed by the model validation. It seems like an improvement, but I'm not sure it satisfies your either/or principle.
The WholeConfig would actually replace Config, right? And with the WholeConfig approach, we eliminate the "stages" of validation, right? This seems nicer for users because they get all the validation errors at once.
We could define this union at import time, but this would require to load all available services, not only the ones referenced in the config, I'm not sure it's worth it.
I would tend to agree. It would be nice if unconfigured services weren't imported since this could result in side effects or exceptions. We shouldn't insist that users have a perfect installation and python environment when it comes to features they're not using.
Since we want to validate all flavors, I think we should also validate all service configs, not only the ones targeted by the choosen flavor.
👍 Shouldn't we also be validating that all targets are defined in all flavors rather than just the selected one?
bugwarrior/config/schema.py
Outdated
| if "service" in service | ||
| ] | ||
| return Annotated[ | ||
| reduce(or_, service_config_classes), Field(discriminator='service') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't get what's going on here. Seems like we want to deduplicate service classes but I don't follow how the bitwise OR operator does that or why there isn't an explicit Union type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I admit it's not really explicit. Basically, let's say our targets are a GitHub and Jira services, we want to build the following type: GitHubServiceConfig | JiraServiceConfig, so pydantic can use the proper subclass when instanciating each service config, using the discriminator.
the or_ function is simply the function called by the | operator. The Union is created by doing type1 | type2
I need to check, but I may be able to do Union[*service_config_classes] instead of reduce(or_, service_config_classes), result is the same, but probably easier to understand
Edit, yes it works:
In [9]: Union[*service_config_classes] == reduce(or_, service_config_classes)
Out[9]: True
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I simply need to check it works in all supported python versions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I wasn't realizing that pipes were overloaded to make Unions in type annotations so it makes sense now. Nonetheless, the Union[*service_config_classes] form is preferable if it works.
bugwarrior/config/schema.py
Outdated
| # NOTE: I think this is an elegant way to 'hide' the dynamical part of the config definition | ||
| # but it might be a bit too implicit | ||
| # for more info, see https://docs.pydantic.dev/latest/internals/resolving_annotations/#resolving-annotations-at-class-definition | ||
| ServiceConfigType = get_service_config_union_type(config["services"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not entirely following. is the relevant example in the linked docs 'UnknownType`? It isn't clear to me from the docs that the namespace in which a model is instantiated is included when looking up types. (And how do type checkers handle this?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep the relevant example here is UnknownType (+ explanations about how types are resolved)
it isn't clear to me from the docs that the namespace in which a model is instantiated is included when looking up types.
Among other thing, the doc says:
For the locals argument, Pydantic will try to resolve symbols in the following namespaces, sorted by highest priority:
- A namespace created on the fly, containing the current class name (
{cls.__name__: cls}).
which is exactly the point of this line. Afterwards, we have:
In [1]: locals()["ServiceConfigType"]
Out[1]: typing.Annotated[typing.Union[bugwarrior.services.github.GithubConfig, .... (other config types)], FieldInfo(annotation=NoneType, required=True, discriminator='service')
so the type can be resolved when building the WholeConfig object.
I admit it's not explicit at all, but unfortunately I found no other way to override the type of all_service_configs at runtime (other alternative being the one we don't want: building this type before model creation, but using all service config classes, not just the ones targeted in the config).
Another solution would be to create a new model with create_model:
ServiceConfigType = get_service_config_union_type(config["services"])
_WholeConfig = create_model(
"WholeConfig", __base__=WholeConfig, all_service_configs=list[ServiceConfigType]
)
return _WholeConfig(
all_service_configs=config.pop("services"),
main_section=main_section,
flavors=config.pop("flavors"),
config_path=config_path,
**config,
)
And how do type checkers handle this?
Simple trick (which I should have put directly above WholeConfig instead):
if TYPE_CHECKING:
ServiceConfigType = ServiceConfig
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the locals argument, Pydantic will try to resolve symbols in the following namespaces, sorted by highest priority:
- A namespace created on the fly, containing the current class name (
{cls.__name__: cls}).which is exactly the point of this line.
I can't find any way to read this that indicates that the locals of the namespace in which the model is instantiated is searched for annotations. My interpretation of this particular line is that {'WholeConfig': WholeConfig} is included -- purely for recursive references, like the docs say.
I admit it's not explicit at all, but unfortunately I found no other way to override the type of
all_service_configsat runtime (other alternative being the one we don't want: building this type before model creation, but using all service config classes, not just the ones targeted in the config).Another solution would be to create a new model with
create_model:ServiceConfigType = get_service_config_union_type(config["services"]) _WholeConfig = create_model( "WholeConfig", __base__=WholeConfig, all_service_configs=list[ServiceConfigType] ) return _WholeConfig( all_service_configs=config.pop("services"), main_section=main_section, flavors=config.pop("flavors"), config_path=config_path, **config, )
If this behavior is in fact documented then I'm fine with leaving it as is and just adding a comment explaining that the local namespace is searched for type annotations when instantiating a model. If it is undocumented then I'd be more in favor of this create_model solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, I quoted the wrong bullet point, I should have used this one:
The parent namespace of the class, if different from the globals described above. This is the locals of the frame where the class is being defined. For Base, because the class is being defined in the module directly, this namespace won't be used as it will result in the globals being used again. For
Model, the parent namespace is the locals of the frame of inner().
Our use-case is similar to their Model one, because in our case, the WholeConfig model is being built inside the validate_config_using_model (even though the class itself is defined outside).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry but I can't agree with your interpretation. The parent namespace of the class is not "different from the globals described above" -- it is the module's __dict__. The docs clearly say "the locals of the frame where the class is being defined". That the locals in which the class is being built/instantiated are included seems to be undocumented behavior.
Yes, with that option, we would have the
Yes exactly, with that solution, validation is almost entirely included in the model. IMO The small downside is: this object is carrying a lot of data we won't ever use after validation, making the developer experience a bit more complex, but that's probably fine.
In both cases, I think we can indeed aim for that. That's what I was starting to do with the |
Hmm, I was hoping we could move the data that is purely for validation purposes into "private" fields to make it clear that they aren't meant to be consumed but it seems that Pydantic considers any underscored fields to not be part of the model for validation either. A couple alternative ideas:
👍 |
|
So just to be clear, do you have any preference between the two solutions ? If not, I think I'll go with the |
From the perspective of the maintainability of schema.py, I prefer WholeConfig. I would characterize model definitions as pydantic's high level API, whereas type adapters are a lower level API that people are less familiar with and is less intuitively obvious. I also think lengthy functions like From the perspective of the maintainability of bugwarrior as a whole, I'd rather compartmentalize implementation details within schema.py than let them leak out, so I agree that having a cleaner model to pass on to consumers is desirable. Can we not delete the fields we don't want to expose from the model before returning it? If there's no decent way to return a "lighter model" to consumers then I'd agree that the type adapter approach is preferable. |
|
On the other hand, the two compromises we'd have to make for WholeConfig would seem to undercut its advantages:
So I guess I'd be in favor of the type adapter approach too. |
If we think the model validation is cleaner, we could also use a first model to validate everything, and have another one afterwards (coming back to the "view" idea).
Since the only non-static part of the model is the list of service config classes, we could use a type adapter only for this part. Something like : Other alternative, regarding the pydantic doc discussion, since it does use "The locals of the current class", we could do |
|
If we did go with the two-model idea, it seems like we'd want to build the "view" out of a generic dataclass rather than a pydantic model. Otherwise, it seems like we'd have to feed the raw data into both models and validate/parse them both which seems wrong and also opens up the theoretical possibility of "passing" the validation stage and then failing the view-model validation stage. I suppose we'd only need to define the top-level dataclass though and could simply assign the top-level fields we want from the pydantic model, right? |
I'm not sure I understand what you're saying here. While we can use a dataclass, I don't see why we can't use another pydantic model, by doing something like: There is no additional validation, except for the types, and no parsing. I'll work on a final version this week-end. |
|
I see. I hadn't realized that pydantic models accept instantiated submodels as inputs just as well as the data to instantiate those submodels. But that does seem to be the case: >>> from typing import List, Optional
...
... from pydantic import BaseModel
...
...
... class Foo(BaseModel):
... count: int
... size: Optional[float] = None
...
...
... class Bar(BaseModel):
... apple: str = 'x'
... banana: str = 'y'
...
...
... class Spam(BaseModel):
... foo: Foo
... bars: List[Bar]
...
>>> foo = Foo(count='4')
>>> n = Spam(foo=foo, bars=[{'apple': 'x1'}, {'apple': 'x2'}])
>>> m = Spam(foo={'count': 4}, bars=[{'apple': 'x1'}, {'apple': 'x2'}])
>>> n == m
True |
e1e737c to
4aaf503
Compare
|
Breaking validation out of schema.py seems good. I'm thinking Config should go with it since it's not part of the configuration schema per se and load.py already imports it from validation.py |
Extract validation logic from schema.py into dedicated validation.py module. The new Config class provides structured access to configuration with typed properties (main, service_configs, services) instead of returning a raw dict. Key changes: - Normalize config structure with flavors/services during parsing - Introduce get_service_instances() to streamline service instantiation - Improve error formatting for validation errors
66834ac to
b30659c
Compare
|
@ryneeverett I squashed most of my commits, and applied your comments. |
Before changing the actual config validation implementation, I think it's easier to use another
Configmodel once everything has been validated, and use this model downstream.I plan to change the validate_config implementation once we agree on these changes, and this
Configmodel.What do you think @ryneeverett ?
There is a failing test related to flavors in test_load.py, It will be fixed once I modify the validation step