-
Notifications
You must be signed in to change notification settings - Fork 47
Description
What happened?
I am using acto to test Kafka Operator, below is my config.json:
"deploy": {
"steps": [
{
"apply": {
"file": "data/strimzi-kafka-operator/bundle.yaml",
"operator": true
}
}
]
},
"crd_name": "kafkas.kafka.strimzi.io",
"seed_custom_resource": "data/strimzi-kafka-operator/cr.yaml",
"analysis": {
"github_link": "https://github.com/strimzi/strimzi-kafka-operator.git",
"commit": "ef60183b123245490900dd103a0cf2e15a4f5d3e",
"entrypoint": null,
"type": "Kafka",
"package": "github.com/kapi/src/main/java/io/strimzi/api/kafka"
}
}
Alarm 1:
We can found there is an inconsistent of status.observedGeneration. Acto failed to change the property of path status.observedGeneration from 3 to 0 for Kafka cluster.
"crash": null,
"health": null,
"operator_log": null,
"consistency": {
"message": "Found no matching fields for input",
"input_diff": {
"prev": 3,
"curr": 0,
"path": {
"path": [
"status",
"observedGeneration"
]
}
},
"system_state_diff": null
},
"differential": null,
"custom": null
Alarm 2:
This alarm is caused by a misoperation vulnerability in the Kafka operator.
"crash": {
"message": "Pod test-cluster-kafka-0 crashed"
},
This alarm shows the Kafka cluster crashed. Acto added spec.kafka.authorization.type == custom and spec.kafka.authorization.tokenEndPointUri to the kafka cluster's cr.
"dictionary_item_added": {
"root['spec']['kafka']['authorization'][type]": {
"prev": "NotPresent",
"curr": "custom",
"path": {
"path": [
"spec",
"kafka",
"authorization",
"type"
]
}
},
"root['spec']['kafka']['authorization'][tokenEndpointUri]": {
"prev": "NotPresent",
"curr": "ACTOKEY",
"path": {
"path": [
"spec",
"kafka",
"authorization",
"tokenEndpointUri"
]
}
}
}
}
What did you expect to happen?
Alarm 1:
Here we can see the the status.observedGeneration create/update after a reconciliation of Kafka cluster: https://github.com/strimzi/strimzi-kafka-operator/blob/ef60183b123245490900dd103a0cf2e15a4f5d3e/cluster-operator/src/main/java/io/strimzi/operator/cluster/operator/assembly/KafkaAssemblyOperator.java#L150.
It is a system-managed field, and it will not trigger reconciliation when user manually update that field. No status field passed into kefkaReconciler function: https://github.com/strimzi/strimzi-kafka-operator/blob/ef60183b123245490900dd103a0cf2e15a4f5d3e/cluster-operator/src/main/java/io/strimzi/operator/cluster/operator/assembly/KafkaReconciler.java#L180C5-L192C24
Alarm 2:
Here we can see for authorization type 'custom', it does not have tokenEndpointUri property: https://github.com/strimzi/strimzi-kafka-operator/blob/ef60183b123245490900dd103a0cf2e15a4f5d3e/api/src/main/java/io/strimzi/api/kafka/model/KafkaAuthorizationCustom.java#L27
And only type keyCloak has tokenEndpointsUri property: https://github.com/strimzi/strimzi-kafka-operator/blob/ef60183b123245490900dd103a0cf2e15a4f5d3e/api/src/main/java/io/strimzi/api/kafka/model/KafkaAuthorizationKeycloak.java#L26
This indicates that it is an invalid configuration. The operator should reject this kind of erroneous desired state.
Root Cause
Alarm 1:
Thus, this is a false alarm. The operator's behavior is correct. It did not update the system state because it is a system-managed field and wouldn't trigger rolling update.
Alarm 2:
This is a true alarm. This indicates that acto applies an invalid configuration for the spec.kafka.authorization and does not properly configure the custom authorization. cause all Kafka broker pods unavailable and the whole cluster is not functionality. Finally, cluster got crashed due to it was unable to recover from error state.