We record zookeeper related stats as they are useful for troubleshooting issues.
Latency related stats are valuable because they can indicate whether zookeeper message processing is a bottleneck.
|
TimeGauge.builder("replica.zk.latency", this, TimeUnit.MILLISECONDS, |
|
self -> serverStats(self).getAvgLatency()) |
|
.tag("type", "avg") |
|
.register(meterRegistry); |
|
TimeGauge.builder("replica.zk.latency", this, TimeUnit.MILLISECONDS, |
|
self -> serverStats(self).getMaxLatency()) |
|
.tag("type", "max") |
|
.register(meterRegistry); |
|
TimeGauge.builder("replica.zk.latency", this, TimeUnit.MILLISECONDS, |
|
self -> serverStats(self).getMinLatency()) |
|
.tag("type", "min") |
|
.register(meterRegistry); |
It seems like latency related values are never reset (e.g. max latency represents the max latency of all requests since server start)
It may be worth resetting the latency via serverStats(self).resetLatency() so that we can visualize the latency of recent requests.
Note that unlike Micrometer which applies a recency-bias, this is a hard reset - hence it may be difficult to add alerts without false-positives. (but we weren't really looking at this metric before anyways)
We record zookeeper related stats as they are useful for troubleshooting issues.
Latency related stats are valuable because they can indicate whether zookeeper message processing is a bottleneck.
centraldogma/server/src/main/java/com/linecorp/centraldogma/server/internal/replication/EmbeddedZooKeeper.java
Lines 89 to 100 in 923af4f
It seems like latency related values are never reset (e.g. max latency represents the max latency of all requests since server start)
It may be worth resetting the latency via
serverStats(self).resetLatency()so that we can visualize the latency of recent requests.Note that unlike
Micrometerwhich applies a recency-bias, this is a hard reset - hence it may be difficult to add alerts without false-positives. (but we weren't really looking at this metric before anyways)