Skip to content

Raftsimple (copy of raftexample) - Simple key-value store example #376

Open
ntdkhiem wants to merge 22 commits intoetcd-io:mainfrom
ntdkhiem:raftsimple
Open

Raftsimple (copy of raftexample) - Simple key-value store example #376
ntdkhiem wants to merge 22 commits intoetcd-io:mainfrom
ntdkhiem:raftsimple

Conversation

@ntdkhiem
Copy link

@ntdkhiem ntdkhiem commented Jan 30, 2026

It has been over a year since the last PR for issue #2, so I just wanted to take a crack at it for the sake of my own learning.

The goal is to provide a clean, minimal example of implementing a key-value store with Raft. I have stripped down the complexity of the existing raftexample from etcd. While I initially removed complex persistence and transport layers to focus purely on the core Raft flow, I have now introduced a simple network layer using net/http and implemented basic WAL persistence. Thanks to @mhagger's suggestions in etcd-io/etcd#15471, the core implementation remains much simpler and easier for me to follow.

This PR is now ready to be reviewed. The basic node communication is working, and I have completed the following:

[X] Re-implemented a basic snapshotting mechanism.

[X] Added basic WAL persistence.

[X] Implemented a simple network layer with net/http.

[X] Added tests.

As a first-time contributor to this project, I am very open to feedback on the structure and approach. Does this simplified design align with what you envisioned @ahrtr?

@k8s-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: ntdkhiem
Once this PR has been reviewed and has the lgtm label, please assign spzala for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot
Copy link

Hi @ntdkhiem. Thanks for your PR.

I'm waiting for a etcd-io member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@ntdkhiem ntdkhiem force-pushed the raftsimple branch 2 times, most recently from e91e416 to aeefc80 Compare February 3, 2026 02:14
@ntdkhiem ntdkhiem marked this pull request as ready for review February 3, 2026 02:16
Signed-off-by: Khiem Nguyen <[email protected]>
@ntdkhiem
Copy link
Author

@ahrtr I fully understand that this issue is very low in priority for you right now. If I may, what are something I can do to help facilitating your reviewing process?

  • Is there too many etcd-related context in this PR that I need to change?
  • Do I need to enhance the snapshotting functionality or any other features to make this PR a legit one?
  • Do I need to follow certain standards or methodologies?
  • Should I create a quick documentation explaining how I use raft in this PR?
  • Should I add more unit and integration test cases to test the overall program?
  • Any other things you have in mind?

@ahrtr
Copy link
Member

ahrtr commented Feb 11, 2026

Sorry, did not get time to review this recently.

Before I review it sometime later, can you please

  • ensure all the workflow green
  • show me the steps on how you verify the example locally

@ahrtr
Copy link
Member

ahrtr commented Feb 11, 2026

pls resolve the Go Vulnerability in a separate PR, once it's merged, then rebase this PR. thx

@ntdkhiem
Copy link
Author

ntdkhiem commented Feb 11, 2026

Wait...This is weird. I accidentally pushed a new commit to resolve the vulnerability before reading your latest comment. But, when I reverted the commit and forced push, it still passed the Go Vulnerability Checker / test. Should I make a separate PR anyway?

@ahrtr
Copy link
Member

ahrtr commented Feb 11, 2026

Should I make a separate PR anyway?

yes, pls

@ntdkhiem
Copy link
Author

ntdkhiem commented Feb 11, 2026

Sorry, did not get time to review this recently.

Before I review it sometime later, can you please

  • ensure all the workflow green
  • show me the steps on how you verify the example locally

Totally understandable!

The project is located in contrib/raftsimple. After you built the project, running it will spin up 3 Raft nodes by default (but you can specify the number of nodes with -nodes). Each Raft node will have a key-value store API, exposing at port 9120+nodeID (where nodeID is one-based indexing)

The API allows four methods: GET, POST, PUT, DELETE to either retrieve/create a key or add/delete a Raft node.

On a separate terminal:

curl http://127.0.0.1:9121/foo -XPUT -d bar

will add/replace the key foo with the value bar to the store.

curl http://127.0.0.1:9121/foo

will retrieve the value from key foo. Will throw Failed to GET if the key doesn't exist.

curl http://127.0.0.1:9121/4 -XPOST

will create a new Raft node and key-value store API, exposing at port 9124. Will throw Failed to POST if the target node is already existed.

curl http://127.0.0.1:9121/1 -XDELETE

will delete Raft node 1, along with its key-value store API. That means port 9121 should be no longer available.

curl http://127.0.0.1:9122/1 -XPOST

will revive Raft node 1, kicking off its snapshot reloading, along with its key-value store API at port 9121.

During executing these commands, you can monitor the main terminal to see Raft in effect.

@ntdkhiem
Copy link
Author

pls resolve the Go Vulnerability in a separate PR, once it's merged, then rebase this PR. thx

Sorry, I'm quite confused about your comment. The vulnerability is found in the Go standard library that I use in my code. A fix would be to bump up Go version from 1.25.6 to 1.25.7 in contrib/raftsimple/go.mod. If I make a PR for this, wouldn't it merge this entire project to the codebase?

Thank you in advance.

@ahrtr
Copy link
Member

ahrtr commented Feb 11, 2026

sorry, I mixed the raftexample/go.mod with the go.mod under the root directory. You need to bump your go.mod in this PR.

@ahrtr
Copy link
Member

ahrtr commented Feb 12, 2026

I see that you reuse most of the existing source code of raftexample, which is good.

  • Please follow exactly the same usage as raftexample. e.g remove the flag --nodes. Read the readme.
  • Essentially we need to reuse raftexample, and just to replace the etcd stuff (e.g. snap, wal etc.) with a simplified implementation.
  • I don't mind multiple commits in one PR, but ensure they are well organized (try to remove some unnecessary intermediate commits)
  • add a readme for this example. I expect you should be able to reuse raftexample/README.md with minimal change.

@ntdkhiem
Copy link
Author

ntdkhiem commented Feb 17, 2026

@ahrtr , I’ve simplified the network layer to run all nodes within a single process using in-memory communication.

If following the exact usage, the raftexample runs on multiple processes, communicating via TCP. I will have to rely on rafthttp from etcd (defeats the purpose of no etcd-related context) or refactor the network layer to use TCP connection.

Am I required to do that or is there some flexibility in this layer? I thought this was supposed to only demonstrate Raft and make sure everything is as simple as possible. Please advice.

PS: Thanks for reminding me. I will clean up the git history and add README once everything is good to go.

@ntdkhiem
Copy link
Author

@ahrtr a quick nudge.

@ahrtr
Copy link
Member

ahrtr commented Feb 21, 2026

If following the exact usage, the raftexample runs on multiple processes, communicating via TCP. I will have to rely on rafthttp from etcd (defeats the purpose of no etcd-related context) or refactor the network layer to use TCP connection.

HTTP/TCP transport isn’t the main challenge. The main thing is to implement the snapshot & wal stuff correctly, but with minimal functionality so that we don't get stuck too much in the k/v database details.

We wanted to implement a self-contained raft example in the first place, which means it shouldn't depend on etcd. But since since it's just the example only which depends on etcd, and also I may not have enough bandwidth to followup this. If it's too difficult for you to defeat the etcd stuff (e.g. snap, wal etc), I am OK to just copy the whole raftexample into raft for now. But please note that the snap package will be deprecated and removed eventually, so you need to implement a simplified equivalent version or just copy that source code into the example.

@ntdkhiem
Copy link
Author

@ahrtr, thank you for letting me know that HTTP/TCP is not an issue, and for sharing your priorities. I implemented basic snapshotting earlier in the process, and WAL is what I'm tackling next. It isn't too much effort to implement the TCP protocol, but I just wanted to raise it so we can align on what this example should look like for a beginner getting started with Raft.

Khiem Nguyen added 2 commits March 1, 2026 20:11
Khiem Nguyen and others added 11 commits March 1, 2026 20:12
- POST method doens't work at the moment.

Signed-off-by: Khiem Nguyen <[email protected]>
- ref: etcd-io/etcd#15471

Signed-off-by: Khiem Nguyen <[email protected]>
Signed-off-by: Khiem Nguyen <[email protected]>
Signed-off-by: Khiem Nguyen <[email protected]>
- Resolve GO-2026-4337

Signed-off-by: Khiem Nguyen <[email protected]>
- When a node get deleted and restored, it attempts to replay all
messages which will also include the one entry removing itself. When
this happens, the node is removed and the rest of the cluster panic
because it couldn't send MsgHeartBeat to that node.

Signed-off-by: Khiem Nguyen <[email protected]>
Signed-off-by: Khiem Nguyen <[email protected]>
Signed-off-by: Khiem Nguyen <[email protected]>
ntdkhiem added 5 commits March 1, 2026 20:21
simulate creating, crashing, and recovery life cycle.

Signed-off-by: Khiem Nguyen <[email protected]>
Signed-off-by: Khiem Nguyen <[email protected]>
Signed-off-by: Khiem Nguyen <[email protected]>
- add test to ensure correct snapshot implementation

Signed-off-by: Khiem Nguyen <[email protected]>
@ntdkhiem
Copy link
Author

ntdkhiem commented Mar 2, 2026

Seems like implementing network layer with net/http isn't that big of a challenge.

I implemented the basic WAL and a simple network layer. I will do a bit more testing to make sure I'm not missing anything.

If you have some seconds to shed out of your calendar, what do you think about my current code base?

Signed-off-by: Khiem Nguyen <[email protected]>
@ntdkhiem ntdkhiem changed the title Raftsimple - Simple key-value store example Raftsimple (copy of raftexample) - Simple key-value store example Mar 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants