Conversation
f1252ab to
c1ff598
Compare
This adds the implementation for file objectstore removal code. This should work in both HA and non-HA modes, as the witness for the object store should happen everywhere. The s3 objectstore is a lot more difficult and requires more thought. It will be wise to wait for[1], which cleans up the s3 objectstore, which might help the implementation of this. 1. juju#20222
This adds the implementation for file objectstore removal code. This should work in both HA and non-HA modes, as the witness for the object store should happen everywhere. The s3 objectstore is a lot more difficult and requires more thought. It will be wise to wait for[1], which cleans up the s3 objectstore, which might help the implementation of this. 1. juju#20222
3b47ad1 to
f83acda
Compare
This adds the implementation for file objectstore removal code. This should work in both HA and non-HA modes, as the witness for the object store should happen everywhere. The s3 objectstore is a lot more difficult and requires more thought. It will be wise to wait for[1], which cleans up the s3 objectstore, which might help the implementation of this. 1. juju#20222
dcd55cb to
06f3e9c
Compare
manadart
left a comment
There was a problem hiding this comment.
Some nits, but this is very cogent, and works perfectly.
| ShouldRestart: func(err error) bool { | ||
| return true | ||
| }, | ||
| RestartDelay: time.Second * 10, |
There was a problem hiding this comment.
Yeah, probably, I really want to get juju/worker#41 in at some point, which gives us the best of both worlds.
There was a problem hiding this comment.
All our workers restart after 10 seconds... I'm happy to revisit this as a whole.
5ef5600 to
12ea488
Compare
| // We need to compute the sha256 hash here, juju by default uses SHA384, | ||
| // but s3 defaults to SHA256. | ||
| // If the reader is a Seeker, then we can seek back to the beginning of | ||
| // the file, so that we can read it again. |
There was a problem hiding this comment.
Feels like the object store model should compute and store both when a object is first stored.
Even though we only index by sha384, having the other values precomputed and stored in metadata can prove useful. Like here.
There was a problem hiding this comment.
That's a big change, happy to make it. I'll do in a follow up.
| func (w *drainWorker) Report() map[string]any { | ||
| return map[string]any{ | ||
| "namespace": w.namespace, | ||
| "rootBucket": w.rootBucket, |
There was a problem hiding this comment.
Nice to have, but can we please get some statistics here please?
Objects Drained.
Objects To Be Drained.
Current Object Draining (hash, size).
This will help very much in seeing if the drainer is stuck or just slowly progressing.
This is the very bare bones for the objectstore drainer.
We require this because we need to migrate all objects for every model.
The drainer needs to operate on a model namespace and not globally.
This begins the wire up of the manifold. This will allow us to start testing the draining implementation.
We're still missing the draining tests, but that will follow. We now handle models being drained.
We need to ensure we have all the dependencies in the worker manifold when we create the worker.
This reuses the drainer tests from the prior implementation that worked and moved them to this location.
Stop the agentconfigupdater bouncing the objectstore when it detects a different s3 type. This is bruteforce, we want to drain before the bounce.
When the objectstore has finished draining, we need to flush the objectstore. This does that before completeing.
When draining from file to objectstore, we need to drain the files and reload the objectstore workers. This wires everything up so it all slots together nicely.
If there is an error setting the objectstore type, handle that error.
Now that we need to consult the objectstore type everytime the worker comes up, we need to account for that in tests.
When we get a change request, record the objectstore type, otherwise it won't be correctly written.
Return an error with a not supported tag, not a NotSupportedf one with a trailing string.
It's easier to read the non-generic type than the generic one.
If the worker already exists, we'll hang and prevent the worker from completing. This causes no end of issues. The solution is to crash the worker and cause it to start again.
If we end up with duplicated namespaces then it will cause issues prevent that from happening.
12ea488 to
20742f3
Compare
Check the objectstore type in the change config closure, we can guarantee that it's not changed whilst we set it.
|
/merge |
…iner #20363 ~~Requires #20222 to land first.~~ ---- > [!NOTE] > Fixes one of the issues from the parent change request. In order to move into HA, you need the agents moving. This changeset does that. This was originally left out of the objectstore drainer as it was becoming rather large. Instead, draining the agents objectstore is actually rather trivial. We don't allow this to run in parallel of other models, just because it's hard to manage. Yet, we can do it upfront and then drain that, before moving on. ----- This pull request introduces several enhancements and fixes related to object store management, controller state services, and database schema consistency. The most significant changes include the addition of a new method to retrieve all model namespaces, improvements to object store interfaces and reporting, and updates to dependency wiring and schema constraints. **Controller State and Service Improvements:** * Added a new `GetModelNamespaces` method to the controller state and service layers, allowing retrieval of all model namespaces. This includes interface, implementation, and test coverage. [[1]](diffhunk://#diff-965d090c5522c7a356017c02928c112d4d04dd0f62d3fba4d575039abb675d09R22-R25) [[2]](diffhunk://#diff-965d090c5522c7a356017c02928c112d4d04dd0f62d3fba4d575039abb675d09R56-R64) [[3]](diffhunk://#diff-c764c90403b9f6afc521aa1de61d85e1d2561c39093d37f001105fb2ba6a5944R93-R124) [[4]](diffhunk://#diff-fce208ee534dabc23cecdb71a11c9d794dc433c3660d002b28bf72f943c065efR121-R159) [[5]](diffhunk://#diff-634d3bc49c4d4d4e29f24d8b64b71a3ebbff17caebc4dfa869c20dec3e9a8f66R64-R76) [[6]](diffhunk://#diff-d0628d333234596b6c3ea5ce85476bc72e865ef0366c1f48b6e731434b455661R69-R89) [[7]](diffhunk://#diff-987fa37d1527a733c49055098ff2aa93a174ee905fa0033492d1e1e1b02116b8R18-R21) * The `ObjectStoreServices` struct now exposes a `Controller()` method to access the controller service, facilitating service composition. [[1]](diffhunk://#diff-9800dec3fc7f9724cb944333837e7cb940730aaf82f8eedfeaf99c47b55cb6adR9-R10) [[2]](diffhunk://#diff-9800dec3fc7f9724cb944333837e7cb940730aaf82f8eedfeaf99c47b55cb6adR45-R51) [[3]](diffhunk://#diff-ac583ad94f1cb46d31a7d828a96e36cedc6c399d5941702276abbbf4c21bbb2cR217-R219) **Object Store Enhancements:** * Introduced the `ObjectStoreFlusher` interface for flushing object store workers. * Promoted `selectFileHash` to `SelectFileHash` for broader accessibility and updated all usages accordingly. [[1]](diffhunk://#diff-ce129169cb1e17866d007bd4b4f03faff5a3131988bc77b41fb2201673e624aeL232-R232) [[2]](diffhunk://#diff-ce129169cb1e17866d007bd4b4f03faff5a3131988bc77b41fb2201673e624aeL263-R263) [[3]](diffhunk://#diff-f600a6baef834f181261d9c8f295ff37659196f888be1919b7b0b63e4a792917L598-R599) [[4]](diffhunk://#diff-f600a6baef834f181261d9c8f295ff37659196f888be1919b7b0b63e4a792917L785-R786) [[5]](diffhunk://#diff-f600a6baef834f181261d9c8f295ff37659196f888be1919b7b0b63e4a792917L912-R913) [[6]](diffhunk://#diff-2fa784214fe68e91f01384f92670636921d9febb07a316c0e8a1b6b2bab4a3afL533-R534) [[7]](diffhunk://#diff-2fa784214fe68e91f01384f92670636921d9febb07a316c0e8a1b6b2bab4a3afL671-R672) * Improved the `Report()` methods for both file and S3 object stores to include a `type` field, aiding diagnostics and introspection. [[1]](diffhunk://#diff-f600a6baef834f181261d9c8f295ff37659196f888be1919b7b0b63e4a792917R374) [[2]](diffhunk://#diff-2fa784214fe68e91f01384f92670636921d9febb07a316c0e8a1b6b2bab4a3afR344) **Dependency and Manifold Wiring:** * Expanded the configuration and dependency injection for the `object-store-drainer` manifold, adding several new dependencies and configuration fields to support more robust operation. [[1]](diffhunk://#diff-e50517ed0e30987c3d6010dc5b66fe398f9a3dc7a22e89f8e07a506fd7cab07aR727-R742) [[2]](diffhunk://#diff-fdd0a1647b52b33a58468b07ecbd115245244b0a5a037f2ce8212dd7b6578ae3R1231-R1248) [[3]](diffhunk://#diff-fdd0a1647b52b33a58468b07ecbd115245244b0a5a037f2ce8212dd7b6578ae3R2180-R2197) **Database Schema Fixes:** * Corrected the values and foreign key constraint in the `object_store_drain_phase_type` table to ensure referential integrity and accurate phase representation. **Cleanup and Refactoring:** * Removed unused or redundant code related to object store type changes in the agent config updater, simplifying logic and reducing unnecessary agent restarts. [[1]](diffhunk://#diff-9a37a48182d01288b28b20f08d90179b36eaab41868c04ed6511a98fbff704baL151-L154) [[2]](diffhunk://#diff-9a37a48182d01288b28b20f08d90179b36eaab41868c04ed6511a98fbff704baL211-L214) [[3]](diffhunk://#diff-9a37a48182d01288b28b20f08d90179b36eaab41868c04ed6511a98fbff704baL256-L259) These changes collectively improve the maintainability, observability, and correctness of the object store and controller service subsystems. ## QA steps Setup s3 (I'm using minio, but anything will suffice, even another Juju model with an s3 compatible application): ```sh docker run -d \n -p 9000:9000 \n -p 9001:9001 \n --user $(id -u):$(id -g) \n --name minio1 \n -e "MINIO_ROOT_USER=ROOTUSER" \n -e "MINIO_ROOT_PASSWORD=CHANGEME123" \n -v /home/jenkins/.local/minio/data:/data \n quay.io/minio/minio server /data --console-address ":9001" ``` minio is now community addition, so the browser version is basic. ```sh $ docker exec -it <container> bash $ mc alias set foo http://localhost:9000 ROOTUSER CHANGEME123 $ mc admin info foo $ mc admin accesskey create foo # Note down the access keys ``` Bootstrap: > [!NOTE] > Don't use localhost for the container address, otherwise it can't access it. Your own address from `ip a` ```sh $ juju bootstrap lxd test $ juju add-model default $ juju deploy juju-qa-test ``` Take note of the controller uuid, you can inspect that model in the minio CLI/dashboard. ```sh juju controller-config "object-store-type=s3" "object-store-s3-endpoint=<endpoint>" "object-store-s3-static-key=<key>" "object-store-s3-static-secret=<secret>" ``` Move into HA... ```sh $ juju add-unit -m controller "controller" -n 2 ```
Requires #20291 to land first.Note
We require #20291 to land first as it re-writes the objectstore tests to better handle the test worker in the objectstore tests. Thus makes it a lot cleaner to apply the fixes to the tests on top of that.
This wires up the draining of models from the file backed objectstore to the s3 backed objectstore. This sits on top of all the work that took place at the sprint; the objectstore fortress, drainer, and facade. Once we spot a draining in process, we use the raw underlying types:
There is complexity in ensuring the ordering of the process. Ensuring that each model is drained from the files to s3, then importantly flushing all the current existing objectstore workers, so that they pick up the new s3 configuration. Once that is done, we complete the process and unlock the guard.
If there is an error during the process, we report the error and require manual intervention.
It is required that all files are backed up before performing this process, as once a file has been drained, it is removed from the prior location.
Note
This currently doesn't drain the controller of agents, that'll follow on in another PR. Trying to moving into HA after the draining process, it will fail as there are no file backed object stores.
Warning
This doesn't allow s3 to s3 storage, we require changes to how we store the underlying data. This requires modifications to the controller charm.
This pull request introduces several updates to enhance object store management and expand functionality for retrieving model namespaces. Key changes include improvements to the object store drainer configuration, the addition of a new
GetModelNamespacesmethod, and updates to object store metadata handling.Object Store Enhancements:
objectstoredrainer.ManifoldConfiginmanifolds.goto include additional dependencies and services, such asGetControllerService,GetGuardService, andNewDrainerWorker, to improve object store draining capabilities. (cmd/jujud-controller/agent/machine/manifolds.go, cmd/jujud-controller/agent/machine/manifolds.goR751-R765)ObjectStoreFlusherinterface for flushing object store workers. (core/objectstore/objectstore.go, core/objectstore/objectstore.goR81-R86)SelectFileHashfunction to replace the privateselectFileHashfunction, ensuring consistent hash selection across object store implementations. (internal/objectstore/base.go, [1];internal/objectstore/fileobjectstore.go, [2];internal/objectstore/s3objectstore.go, [3]Model Namespace Retrieval:
GetModelNamespacesmethod to theStateinterface and implemented it in theStatestruct, allowing retrieval of all model namespaces from the database. (domain/controller/service/service.go, [1] [2];domain/controller/state/state.go, [3]GetModelNamespacesmethod, including scenarios where no namespaces are found. (domain/controller/service/service_test.go, [1];domain/controller/state/state_test.go, [2]Schema and Dependency Updates:
object_store_drain_phase_typeto correct phase type ordering and added a foreign key constraint forphase_type_id. (domain/schema/controller/sql/0014-objectstore-metadata.sql, domain/schema/controller/sql/0014-objectstore-metadata.sqlL43-R51)manifolds_test.goto reflect new requirements for theobject-store-drainermanifold. (cmd/jujud-controller/agent/machine/manifolds_test.go, [1] [2]These changes collectively improve the robustness and functionality of the system, particularly in managing object store operations and retrieving model-related data.
QA steps
Setup s3 (I'm using minio, but anything will suffice, even another Juju model with an s3 compatible application):
minio is now community addition, so the browser version is basic.
Bootstrap:
Note
Don't use localhost for the container address, otherwise it can't access it. Your own address from
ip a$ juju bootstrap lxd test $ juju add-model default $ juju deploy ubuntuTake note of the controller uuid, you can inspect that model in the minio CLI/dashboard.
Watch the logs for
object store draining completed successfullyMake sure that the blobs have been removed:
There should only be the controller left.
Also check that the objectstore workers were flushed: