Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the robustness of the storage backend by introducing a comprehensive suite of end-to-end tests. These tests validate critical aspects of the storage demotion pipeline, ensuring data integrity and accessibility even when memory replicas are evicted or client processes fail. The new test framework utilizes an in-process master, simplifying test orchestration and improving reliability. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a valuable end-to-end test suite for the storage backend, which is crucial for validating the storage demotion pipeline. The tests cover key scenarios including offload/readback, cross-client access, eviction, and fault tolerance. The implementation is solid, but I have a few suggestions to enhance code quality, correctness, and test robustness. Specifically, there's a bug in the client wrapper creation where a parameter is ignored, a test that could be made stricter, and opportunities to improve code consistency and reduce duplication.
| auto client_opt = Client::Create(hostname, metadata_connstring, protocol, | ||
| std::nullopt, master_server_entry); |
There was a problem hiding this comment.
The device_name parameter is unused in CreateClientWrapper. It should be passed to Client::Create to properly configure the client with the specified device. The current implementation hardcodes std::nullopt, which ignores the device name provided by the caller in the new test.
| auto client_opt = Client::Create(hostname, metadata_connstring, protocol, | |
| std::nullopt, master_server_entry); | |
| auto client_opt = Client::Create(hostname, metadata_connstring, protocol, | |
| device_name, master_server_entry); |
| auto err = reader->GetWithExpectedSize(keys[i], kValueSize, got); | ||
| if (err == ErrorCode::OK) { | ||
| EXPECT_EQ(got, expected_values[i]) | ||
| << "Data mismatch for " << keys[i]; | ||
| } else { | ||
| LOG(WARNING) | ||
| << "Key " << keys[i] << " not readable after writer " | ||
| << "death: " << toString(err); | ||
| } |
There was a problem hiding this comment.
The test DiskReplicaSurvivesWriterDeath should fail if reading from the disk replica fails after the writer's death. Currently, it only logs a warning, which could mask a regression. It's better to use ASSERT_EQ to ensure the read operation is successful, making the test more robust.
std::string got;
ASSERT_EQ(reader->GetWithExpectedSize(keys[i], kValueSize, got), ErrorCode::OK)
<< "Failed to read key " << keys[i] << " after writer death";
EXPECT_EQ(got, expected_values[i])
<< "Data mismatch for " << keys[i];| bool ClientTestWrapper::HasDiskReplica(const std::string& key) { | ||
| auto query_result = client_->Query(key); | ||
| if (!query_result.has_value()) return false; | ||
| for (const auto& replica : query_result.value().replicas) { | ||
| if (replica.is_disk_replica()) return true; | ||
| } | ||
| return false; | ||
| } | ||
|
|
||
| bool ClientTestWrapper::HasLocalDiskReplica(const std::string& key) { | ||
| auto query_result = client_->Query(key); | ||
| if (!query_result.has_value()) return false; | ||
| for (const auto& replica : query_result.value().replicas) { | ||
| if (replica.is_local_disk_replica()) return true; | ||
| } | ||
| return false; | ||
| } | ||
|
|
||
| bool ClientTestWrapper::HasMemoryReplica(const std::string& key) { | ||
| auto query_result = client_->Query(key); | ||
| if (!query_result.has_value()) return false; | ||
| for (const auto& replica : query_result.value().replicas) { | ||
| if (replica.is_memory_replica()) return true; | ||
| } | ||
| return false; | ||
| } |
There was a problem hiding this comment.
These three functions (HasDiskReplica, HasLocalDiskReplica, HasMemoryReplica) have very similar implementations. You can reduce code duplication and improve readability by using std::any_of. This would make the code more concise and maintainable. Note that you might need to include <algorithm>.
bool ClientTestWrapper::HasDiskReplica(const std::string& key) {
auto query_result = client_->Query(key);
if (!query_result.has_value()) return false;
const auto& replicas = query_result.value().replicas;
return std::any_of(replicas.begin(), replicas.end(),
[](const auto& r) { return r.is_disk_replica(); });
}
bool ClientTestWrapper::HasLocalDiskReplica(const std::string& key) {
auto query_result = client_->Query(key);
if (!query_result.has_value()) return false;
const auto& replicas = query_result.value().replicas;
return std::any_of(replicas.begin(), replicas.end(),
[](const auto& r) { return r.is_local_disk_replica(); });
}
bool ClientTestWrapper::HasMemoryReplica(const std::string& key) {
auto query_result = client_->Query(key);
if (!query_result.has_value()) return false;
const auto& replicas = query_result.value().replicas;
return std::any_of(replicas.begin(), replicas.end(),
[](const auto& r) { return r.is_memory_replica(); });
}| } | ||
|
|
||
| // Wait for client TTL expiry so master cleans up the writer. | ||
| sleep(DEFAULT_CLIENT_LIVE_TTL_SEC + 5); |
There was a problem hiding this comment.
For consistency with the rest of the file, which uses std::this_thread::sleep_for, consider replacing the C-style sleep() call with its modern C++ equivalent. This improves code consistency and type safety.
| sleep(DEFAULT_CLIENT_LIVE_TTL_SEC + 5); | |
| std::this_thread::sleep_for(std::chrono::seconds(DEFAULT_CLIENT_LIVE_TTL_SEC + 5)); |
|
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
|
Thanks a lot. This is a very thorough test! We will run it locally to conduct more detailed testing of the storage backend. |
Description
Implemented a new storage_backend_e2e_test that validates the storage demotion pipeline using an in-process master (InProcMaster) instead of external master + etcd orchestration.
Key changes:
Added storage_backend_e2e_test target and test registration in CMakeLists.txt.
Added new E2E coverage in storage_backend_e2e_test.cpp for:
offload and readback
cross-client readback
disk-only read after eviction
disk replica survival after writer exit
graceful behavior under quota pressure
Extended client_wrapper.h and client_wrapper.cpp with replica inspection helpers (HasDiskReplica, HasMemoryReplica, etc.) and GetWithExpectedSize to support disk-only assertions.
Stabilized eviction coverage by enforcing valid mount sizing and asserting eviction behavior in a way that matches the implementation (at least one expired seed key evicted, then validated readable from disk).
Module
mooncake-transfer-engine)mooncake-store)mooncake-ep)mooncake-integration)mooncake-p2p-store)mooncake-wheel)mooncake-pg)mooncake-rl)Type of Change
How Has This Been Tested?
This change itself is about e2e tests for testing storage backend.
Checklist
./scripts/code_format.shbefore submitting.