Static config model#1181
Conversation
Tests should not directly use abstract classes, but now use basic implementation instead
98f028d to
8497f04
Compare
ryneeverett
left a comment
There was a problem hiding this comment.
It seems like you're going with my "consumer view" proposal from #1167:
Maybe we're asking too much from pydantic to provide the ideal configuration interface for both inputs and outputs. Maybe we should consider another step in which the validation model is transformed into a view (like your quoted example) that is better optimized for consumers of the model.
That's fine and it's looking good. Some thoughts:
- I think
Configshould be renamed toConfigViewor something that more particularly describes its specific purpose. Of course, once it is passed out of bugwarrior/config to other parts of the code base it can be referred to simply as "config", but within the bugwarrior.config package that is too vague of a name. - it would be nice to keep the configuration "view" separate from the configuration file validation (schema.py). So maybe
validate_configreturns the validatedbugwarrior_config_modelandConfigmoves to load.py and is instantiated fromvalidated_datainload_config. - Is the
get_service()->ServiceConfig.service_classchange necessary? I'd rather not require services to verbosely add that boilerplate, and it also violates the separation of concerns in service implementations -- ServiceConfig is for user configuration.
I have no problem with this, except that my plan is to remove the existing dynamic config object, so there would be only 1 Config model after all my changes. Still, I can rename
Sure.
So I'm not entirely satisfied with this solution either, I agree about the boilerplate. Not so much about separation of concerns, as the My main concern with the existing implementation is that we need to call entrypoints (through importlib) everytime we want to retrieve a We could do which does not really change anything about separation of concerns, but at least remove any boilerplate (at the cost of a slightly less precise typing, but that's not really a problem). I might be able to avoid some usees of |
|
I guess I misinterpreted your plan. Forget about the naming and code moving suggestions since they seem to be based on scaffolding.
I don't have any objections to the Config model at the moment, but it seems like it's going to have to change a lot. As long as the configuration file is fully validated by pydantic I don't foresee any objections.
Perhaps I used the wrong terminology, but my concern here was not from an architectural point of view but from a service implementer's point of view. Somebody writing a service (who might not know python or have much of any programming experience) should be able to write their Config class as a simple data structure describing user configuration. However, I would distinguish between unidirectional coupling and bidirectional coupling, the latter being worse. Config doesn't need to know about Service, which I think is another valid point against the current
The same Service class is only being retrieved more than once if a user has multiple instances of the same service configured, right? Why couldn't we just make |
|
@ryneeverett I'd like your opinion on commit c6e2a8. My opinion is the current implementation relies on a mix of function validations and some pydantic model validations, I think we should choose either one method or the other, so I suggest two alternatives for
Since we want to validate all flavors, I think we should also validate all service configs, not only the ones targeted by the choosen flavor. A note on Also, error messages may be wrong for now, I'll make sure they're are close as possible to current ones when we choose one implem or the other |
ryneeverett
left a comment
There was a problem hiding this comment.
My opinion is the current implementation relies on a mix of function validations and some pydantic model validations, I think we should choose either one method or the other...
My impression is that the type adapter approach is a more consistent and complete implementation of the current approach of "pre-validation" followed by the model validation. It seems like an improvement, but I'm not sure it satisfies your either/or principle.
The WholeConfig would actually replace Config, right? And with the WholeConfig approach, we eliminate the "stages" of validation, right? This seems nicer for users because they get all the validation errors at once.
We could define this union at import time, but this would require to load all available services, not only the ones referenced in the config, I'm not sure it's worth it.
I would tend to agree. It would be nice if unconfigured services weren't imported since this could result in side effects or exceptions. We shouldn't insist that users have a perfect installation and python environment when it comes to features they're not using.
Since we want to validate all flavors, I think we should also validate all service configs, not only the ones targeted by the choosen flavor.
👍 Shouldn't we also be validating that all targets are defined in all flavors rather than just the selected one?
Yes, with that option, we would have the
Yes exactly, with that solution, validation is almost entirely included in the model. IMO The small downside is: this object is carrying a lot of data we won't ever use after validation, making the developer experience a bit more complex, but that's probably fine.
In both cases, I think we can indeed aim for that. That's what I was starting to do with the |
Hmm, I was hoping we could move the data that is purely for validation purposes into "private" fields to make it clear that they aren't meant to be consumed but it seems that Pydantic considers any underscored fields to not be part of the model for validation either. A couple alternative ideas:
👍 |
|
So just to be clear, do you have any preference between the two solutions ? If not, I think I'll go with the |
From the perspective of the maintainability of schema.py, I prefer WholeConfig. I would characterize model definitions as pydantic's high level API, whereas type adapters are a lower level API that people are less familiar with and is less intuitively obvious. I also think lengthy functions like From the perspective of the maintainability of bugwarrior as a whole, I'd rather compartmentalize implementation details within schema.py than let them leak out, so I agree that having a cleaner model to pass on to consumers is desirable. Can we not delete the fields we don't want to expose from the model before returning it? If there's no decent way to return a "lighter model" to consumers then I'd agree that the type adapter approach is preferable. |
|
On the other hand, the two compromises we'd have to make for WholeConfig would seem to undercut its advantages:
So I guess I'd be in favor of the type adapter approach too. |
If we think the model validation is cleaner, we could also use a first model to validate everything, and have another one afterwards (coming back to the "view" idea).
Since the only non-static part of the model is the list of service config classes, we could use a type adapter only for this part. Something like : Other alternative, regarding the pydantic doc discussion, since it does use "The locals of the current class", we could do |
|
If we did go with the two-model idea, it seems like we'd want to build the "view" out of a generic dataclass rather than a pydantic model. Otherwise, it seems like we'd have to feed the raw data into both models and validate/parse them both which seems wrong and also opens up the theoretical possibility of "passing" the validation stage and then failing the view-model validation stage. I suppose we'd only need to define the top-level dataclass though and could simply assign the top-level fields we want from the pydantic model, right? |
I'm not sure I understand what you're saying here. While we can use a dataclass, I don't see why we can't use another pydantic model, by doing something like: There is no additional validation, except for the types, and no parsing. I'll work on a final version this week-end. |
|
I see. I hadn't realized that pydantic models accept instantiated submodels as inputs just as well as the data to instantiate those submodels. But that does seem to be the case: >>> from typing import List, Optional
...
... from pydantic import BaseModel
...
...
... class Foo(BaseModel):
... count: int
... size: Optional[float] = None
...
...
... class Bar(BaseModel):
... apple: str = 'x'
... banana: str = 'y'
...
...
... class Spam(BaseModel):
... foo: Foo
... bars: List[Bar]
...
>>> foo = Foo(count='4')
>>> n = Spam(foo=foo, bars=[{'apple': 'x1'}, {'apple': 'x2'}])
>>> m = Spam(foo={'count': 4}, bars=[{'apple': 'x1'}, {'apple': 'x2'}])
>>> n == m
True |
e1e737c to
4aaf503
Compare
|
Breaking validation out of schema.py seems good. I'm thinking Config should go with it since it's not part of the configuration schema per se and load.py already imports it from validation.py |
Extract validation logic from schema.py into dedicated validation.py module. The new Config class provides structured access to configuration with typed properties (main, service_configs, services) instead of returning a raw dict. Key changes: - Normalize config structure with flavors/services during parsing - Introduce get_service_instances() to streamline service instantiation - Improve error formatting for validation errors
66834ac to
b30659c
Compare
|
@ryneeverett I squashed most of my commits, and applied your comments. |
| main=main, | ||
| hooks=hooks, | ||
| notifications=notifications, | ||
| ) |
There was a problem hiding this comment.
Rather than completely doing validation prior to constructing the model, couldn't we put the Config instantiation in a try/except block and then append any errors to error_messages? Seems like this would eliminate a lot of the repetitive code here, would avoid further repetition if we were to add more fields to Config some day, and would keep our aberration from typical pydantic usage to a minimum.
There was a problem hiding this comment.
Actually yes I thought about this, but ensuring we raise all errors, while avoiding duplicates, is actually tricky:
We could definitely avoid pre-validating hooks and notifications, but then: if there are some service-related errors, we would catch them both during pre-validation and Config validation (same for flavors). Avoiding this would require further checks, not sure that's better.
I wanted to avoid mixing type adapter and model validations, that's why I went with the full type adapter validation, then passing it to the model. I still can revert it back to using the WholeConfig model validation.
I agree the validation repetition does not feel right though
There was a problem hiding this comment.
As it currently stands, if errors are encountered then the values we're passing into Config are undefined. The only way we'd get repeated errors is if we passed the same user input into the model in these cases. Rather than doing that, couldn't we define them as generic dummy values that we know will not generate any errors? For example, in the except block after validating service configs, we could define service_configs = [].
Edit: I just realized that my comment about them being undefined is untrue because you're initializing them first. I think the point stands though.
There was a problem hiding this comment.
I could pass service_configs = [] , but main still needs to be defined though. What should happen if there is a flavor / main_section related error ? We won't be able to define main, and we'll get another error, right.
Other alternative: we define another temp model, from fields that are neither flavors nor services (i.e. only notifications and hooks for now). Something like:
try:
extra_options = ExtraConfigOptions.model_validate(config) # where config does not have either flavors nor service keys anymore.
except pydantic.ValidationError as error:
error_messages.append(error.errors()) # some extra formatting required here
if error_messages:
raise_validation_error(
"".join(error_messages), config_path, error_count=len(error_messages)
)
main = flavors[main_section]
filtered_service_configs = [service_configs[target] for target in main.targets]
return Config(
service_configs=filtered_service_configs, main=main, extra_options=extra_options
)
class ExtraConfigOptions(BaseConfig):
hooks: Hooks = Hooks()
notifications: Notifications = Notifications()
class Config(BaseConfig):
service_configs: list[ServiceConfig]
main: MainSectionConfig
extra_options: ExtraConfigOptionsWe could also do the same as this, but with Config inheriting from ExtraConfigOptions, but I think it makes more sense to actually have these as a field.
There was a problem hiding this comment.
I could pass
service_configs = [], butmainstill needs to be defined though. What should happen if there is aflavor/main_sectionrelated error ? We won't be able to definemain, and we'll get another error, right.
Similarly we could create an inherently valid value -- main = MainSectionConfig(targets=[]).
ryneeverett
left a comment
There was a problem hiding this comment.
FYI, I just happened to check in on this PR to see if I dropped the ball. Somehow I got zero notifications of your activity this week.
I'm not very enthusiastic about ExtraOptions. The extra namespace and structure doesn't have any meaning beyond the validation implementation. Moreover, I still think that using the errors from the Config construction has virtues beyond eliminating repetition:
- It's the most typical way to use pydantic.
- More future-proofed in that we'll never have an issue where we somehow accidentally pass invalid data into the Config constructor and then end up with a second phase of validation errors that aren't properly formatted.
| main=main, | ||
| hooks=hooks, | ||
| notifications=notifications, | ||
| ) |
There was a problem hiding this comment.
I could pass
service_configs = [], butmainstill needs to be defined though. What should happen if there is aflavor/main_sectionrelated error ? We won't be able to definemain, and we'll get another error, right.
Similarly we could create an inherently valid value -- main = MainSectionConfig(targets=[]).
|
@Lotram No pressure if you've been busy but I wanted to ping you in case github notifications on this PR are also broken for you. |
|
Sorry I was quite busy the past few weeks, I'll definitely try to finish this before the end of the week (probably during the week end) |
2a1dea0 to
48b01ce
Compare
|
@ryneeverett I removed |
- narrow type in validation.py, using asserts - rename services_list to _raw_service_configs - remove conf property in test_gitlab.py - restore test in test_load.py, without broken assertion
Before changing the actual config validation implementation, I think it's easier to use another
Configmodel once everything has been validated, and use this model downstream.I plan to change the validate_config implementation once we agree on these changes, and this
Configmodel.What do you think @ryneeverett ?
There is a failing test related to flavors in test_load.py, It will be fixed once I modify the validation step