Skip to content

feat: Raft-based High Availability using Apache Ratis#3731

Open
robfrank wants to merge 164 commits intomainfrom
ha-redesign
Open

feat: Raft-based High Availability using Apache Ratis#3731
robfrank wants to merge 164 commits intomainfrom
ha-redesign

Conversation

@robfrank
Copy link
Copy Markdown
Collaborator

Summary

Redesigns the ArcadeDB HA stack using Apache Ratis (Raft consensus) for leader election, log replication, and cluster management. Replaces the custom replication protocol with proven consensus semantics.

Closes #3730

What's included

New module: ha-raft

  • RaftHAPlugin — server plugin with ServiceLoader discovery
  • RaftHAServer — wraps Ratis RaftServer/RaftClient, peer list parsing, leader election
  • RaftReplicatedDatabase — intercepts TX commit and schema changes, submits to Raft log
  • ArcadeStateMachine — applies committed entries (WAL replay + schema ops) on each node
  • RaftLogEntryCodec — serializes TX and SCHEMA entries for the Raft log
  • SnapshotManager — checksum-based incremental snapshot for node bootstrap
  • ClusterMonitor — replication lag tracking and health reporting
  • GetClusterHandler/api/v1/cluster endpoint for cluster status

New module: e2e-ha

End-to-end container tests using TestContainers + Toxiproxy for realistic cluster scenarios.

Key features

  • Automatic leader election via Raft protocol
  • Majority quorum enforcement — prevents split-brain by design
  • Leader command forwarding (followers transparently proxy writes)
  • Crash recovery via Raft log replay
  • Snapshot-based resync for lagging nodes
  • Static peer list configuration (host:raftPort:httpPort:priority)

Test plan

  • 25 unit tests (codec, state machine, peer parsing, config validation, snapshots)
  • 14 integration tests (2/3-node replication, failover, crash recovery, split-brain, quorum loss)
  • 9 e2e container tests (partitions, rolling restarts, network delay/packet loss, leader failover)
  • All tests compile and pass (mvn test -pl ha-raft and mvn compile test-compile -pl e2e-ha)

🤖 Generated with Claude Code

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a fundamental shift in ArcadeDB's high availability architecture by integrating Apache Ratis, a robust Raft consensus implementation. This change aims to provide stronger consistency guarantees, improved fault tolerance, and a more reliable distributed system. The new design replaces custom replication logic with a proven consensus algorithm, backed by extensive testing for various failure scenarios and network conditions.

Highlights

  • Raft-based High Availability: The ArcadeDB HA stack has been redesigned using Apache Ratis (Raft consensus) for leader election, log replication, and cluster management, replacing the custom replication protocol.
  • New ha-raft Module: A new Maven module (arcadedb-ha-raft) has been introduced, containing core components like RaftHAPlugin, RaftHAServer, ArcadeStateMachine, RaftReplicatedDatabase, RaftLogEntryCodec, SnapshotManager, and ClusterMonitor.
  • Enhanced Configuration: New global configuration properties (HA_IMPLEMENTATION, HA_RAFT_PORT, HA_REPLICATION_LAG_WARNING, HA_RAFT_PERSIST_STORAGE, HA_RAFT_SNAPSHOT_THRESHOLD, HA_CLUSTER_TOKEN) have been added to support and configure the Raft HA implementation.
  • Robust Testing Framework: A comprehensive suite of unit, integration, and end-to-end container tests (e2e-ha module) has been added to validate Raft HA under various scenarios, including leader failover, replica crash/recovery, network partitions, and packet loss, using TestContainers and Toxiproxy.
  • Improved Log Readability: Ratis internal logs are now suppressed, and human-readable cluster event messages, including leadership changes and replica lag monitoring, are logged using ArcadeDB server names.
  • Replica-to-Leader Auth Forwarding: A mechanism has been implemented to securely forward authenticated requests from replica nodes to the leader using a shared cluster-internal trust token, addressing issues with session-based authentication.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Ignored Files
  • Ignored by pattern: .github/workflows/** (2)
    • .github/workflows/e2e-ha.yml
    • .github/workflows/mvn-test.yml
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@github-actions
Copy link
Copy Markdown
Contributor

📜 License Compliance Check

✅ License check passed. See artifacts for full report.

License Summary (first 50 lines)

Lists of 402 third-party dependencies.
     (Apache License 2.0) LZ4 Java Compression (at.yawk.lz4:lz4-java:1.10.4 - https://github.com/yawkat/lz4-java)
     (EPL 2.0) (GNU Lesser General Public License) Logback Classic Module (ch.qos.logback:logback-classic:1.5.32 - http://logback.qos.ch/logback-classic)
     (EPL 2.0) (GNU Lesser General Public License) Logback Core Module (ch.qos.logback:logback-core:1.5.32 - http://logback.qos.ch/logback-core)
     (Apache 2) ArcadeDB BOLT Protocol (com.arcadedb:arcadedb-bolt:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-bolt/)
     (Apache 2) ArcadeDB Console (com.arcadedb:arcadedb-console:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-console/)
     (Apache 2) ArcadeDB Engine (com.arcadedb:arcadedb-engine:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-engine/)
     (Apache 2) ArcadeDB GraphQL (com.arcadedb:arcadedb-graphql:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-graphql/)
     (Apache 2) ArcadeDB Gremlin (com.arcadedb:arcadedb-gremlin:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-gremlin/)
     (Apache 2) ArcadeDB gRPC Stubs (com.arcadedb:arcadedb-grpc:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-grpc/)
     (Apache 2) ArcadeDB gRPC Client (com.arcadedb:arcadedb-grpc-client:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-grpc-client/)
     (Apache 2) ArcadeDB gRpcW (com.arcadedb:arcadedb-grpcw:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-grpcw/)
     (Apache 2) ArcadeDB HA Raft (com.arcadedb:arcadedb-ha-raft:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-ha-raft/)
     (Apache 2) ArcadeDB Integration (com.arcadedb:arcadedb-integration:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-integration/)
     (Apache 2) ArcadeDB load tests (com.arcadedb:arcadedb-load-tests:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-load-tests/)
     (Apache 2) ArcadeDB Metrics (com.arcadedb:arcadedb-metrics:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-metrics/)
     (Apache 2) ArcadeDB MongoDB Wire Protocol (com.arcadedb:arcadedb-mongodbw:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-mongodbw/)
     (Apache 2) ArcadeDB Network (com.arcadedb:arcadedb-network:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-network/)
     (Apache 2) ArcadeDB PostgresW (com.arcadedb:arcadedb-postgresw:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-postgresw/)
     (Apache 2) ArcadeDB RedisW (com.arcadedb:arcadedb-redisw:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-redisw/)
     (Apache 2) ArcadeDB Server (com.arcadedb:arcadedb-server:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-server/)
     (Apache 2) ArcadeDB Studio (com.arcadedb:arcadedb-studio:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-studio/)
     (Apache 2) ArcadeDB Test Utils (com.arcadedb:arcadedb-test-utils:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-test-utils/)
     (Apache License 2.0) HPPC Collections (com.carrotsearch:hppc:0.7.1 - http://labs.carrotsearch.com/hppc.html/hppc)
     (Apache License 2.0) Metrics Core (com.codahale.metrics:metrics-core:3.0.2 - http://metrics.codahale.com/metrics-core/)
     (The Apache License, Version 2.0) com.conversantmedia:disruptor (com.conversantmedia:disruptor:1.2.21 - https://github.com/conversant/disruptor)
     (Apache License 2.0) Jackson-annotations (com.fasterxml.jackson.core:jackson-annotations:2.20 - https://github.com/FasterXML/jackson)
     (Apache License 2.0) Jackson-annotations (com.fasterxml.jackson.core:jackson-annotations:2.21 - https://github.com/FasterXML/jackson)
     (Apache License 2.0) Jackson-core (com.fasterxml.jackson.core:jackson-core:2.21.1 - https://github.com/FasterXML/jackson-core)
     (Apache License 2.0) Jackson-core (com.fasterxml.jackson.core:jackson-core:2.21.2 - https://github.com/FasterXML/jackson-core)
     (Apache License 2.0) jackson-databind (com.fasterxml.jackson.core:jackson-databind:2.21.1 - https://github.com/FasterXML/jackson)
     (Apache License 2.0) jackson-databind (com.fasterxml.jackson.core:jackson-databind:2.21.2 - https://github.com/FasterXML/jackson)
     (Apache License 2.0) Jackson-dataformat-YAML (com.fasterxml.jackson.dataformat:jackson-dataformat-yaml:2.21.1 - https://github.com/FasterXML/jackson-dataformats-text)
     (Apache License 2.0) Jackson datatype: JSR310 (com.fasterxml.jackson.datatype:jackson-datatype-jsr310:2.21.1 - https://github.com/FasterXML/jackson-modules-java8/jackson-datatype-jsr310)
     (Apache License 2.0) Caffeine cache (com.github.ben-manes.caffeine:caffeine:2.3.1 - https://github.com/ben-manes/caffeine)
     (Apache License 2.0) docker-java-api (com.github.docker-java:docker-java-api:3.7.1 - https://github.com/docker-java/docker-java)
     (Apache License 2.0) docker-java-transport (com.github.docker-java:docker-java-transport:3.7.1 - https://github.com/docker-java/docker-java)
     (Apache License 2.0) docker-java-transport-zerodep (com.github.docker-java:docker-java-transport-zerodep:3.7.1 - https://github.com/docker-java/docker-java)
     (Apache Software License, version 2.0) (Lesser General Public License, version 3 or greater) btf (com.github.java-json-tools:btf:1.3 - https://github.com/java-json-tools/btf)
     (Apache Software License, version 2.0) (Lesser General Public License, version 3 or greater) jackson-coreutils (com.github.java-json-tools:jackson-coreutils:2.0 - https://github.com/java-json-tools/jackson-coreutils)
     (Apache Software License, version 2.0) (Lesser General Public License, version 3 or greater) jackson-coreutils-equivalence (com.github.java-json-tools:jackson-coreutils-equivalence:1.0 - https://github.com/java-json-tools/jackson-coreutils)
     (Apache Software License, version 2.0) (Lesser General Public License, version 3 or greater) json-patch (com.github.java-json-tools:json-patch:1.13 - https://github.com/java-json-tools/json-patch)
     (Apache Software License, version 2.0) (Lesser General Public License, version 3 or greater) json-schema-core (com.github.java-json-tools:json-schema-core:1.2.14 - https://github.com/java-json-tools/json-schema-core)
     (Apache Software License, version 2.0) (Lesser General Public License, version 3 or greater) json-schema-validator (com.github.java-json-tools:json-schema-validator:2.2.14 - https://github.com/java-json-tools/json-schema-validator)
     (Apache Software License, version 2.0) (Lesser General Public License, version 3 or greater) msg-simple (com.github.java-json-tools:msg-simple:1.2 - https://github.com/java-json-tools/msg-simple)
     (Apache Software License, version 2.0) (Lesser General Public License, version 3 or greater) uri-template (com.github.java-json-tools:uri-template:0.10 - https://github.com/java-json-tools/uri-template)
     (Apache License 2.0) (GNU Lesser General Public License) javaparser-core (com.github.javaparser:javaparser-core:3.26.3 - https://github.com/javaparser/javaparser-core)
     (Apache License 2.0) JCIP Annotations under Apache License (com.github.stephenc.jcip:jcip-annotations:1.0-1 - http://stephenc.github.com/jcip-annotations)
     (Apache License 2.0) Google Android Annotations Library (com.google.android:annotations:4.1.1.4 - http://source.android.com/)

2 similar comments
@github-actions
Copy link
Copy Markdown
Contributor

📜 License Compliance Check

✅ License check passed. See artifacts for full report.

License Summary (first 50 lines)

Lists of 402 third-party dependencies.
     (Apache License 2.0) LZ4 Java Compression (at.yawk.lz4:lz4-java:1.10.4 - https://github.com/yawkat/lz4-java)
     (EPL 2.0) (GNU Lesser General Public License) Logback Classic Module (ch.qos.logback:logback-classic:1.5.32 - http://logback.qos.ch/logback-classic)
     (EPL 2.0) (GNU Lesser General Public License) Logback Core Module (ch.qos.logback:logback-core:1.5.32 - http://logback.qos.ch/logback-core)
     (Apache 2) ArcadeDB BOLT Protocol (com.arcadedb:arcadedb-bolt:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-bolt/)
     (Apache 2) ArcadeDB Console (com.arcadedb:arcadedb-console:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-console/)
     (Apache 2) ArcadeDB Engine (com.arcadedb:arcadedb-engine:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-engine/)
     (Apache 2) ArcadeDB GraphQL (com.arcadedb:arcadedb-graphql:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-graphql/)
     (Apache 2) ArcadeDB Gremlin (com.arcadedb:arcadedb-gremlin:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-gremlin/)
     (Apache 2) ArcadeDB gRPC Stubs (com.arcadedb:arcadedb-grpc:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-grpc/)
     (Apache 2) ArcadeDB gRPC Client (com.arcadedb:arcadedb-grpc-client:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-grpc-client/)
     (Apache 2) ArcadeDB gRpcW (com.arcadedb:arcadedb-grpcw:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-grpcw/)
     (Apache 2) ArcadeDB HA Raft (com.arcadedb:arcadedb-ha-raft:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-ha-raft/)
     (Apache 2) ArcadeDB Integration (com.arcadedb:arcadedb-integration:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-integration/)
     (Apache 2) ArcadeDB load tests (com.arcadedb:arcadedb-load-tests:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-load-tests/)
     (Apache 2) ArcadeDB Metrics (com.arcadedb:arcadedb-metrics:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-metrics/)
     (Apache 2) ArcadeDB MongoDB Wire Protocol (com.arcadedb:arcadedb-mongodbw:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-mongodbw/)
     (Apache 2) ArcadeDB Network (com.arcadedb:arcadedb-network:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-network/)
     (Apache 2) ArcadeDB PostgresW (com.arcadedb:arcadedb-postgresw:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-postgresw/)
     (Apache 2) ArcadeDB RedisW (com.arcadedb:arcadedb-redisw:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-redisw/)
     (Apache 2) ArcadeDB Server (com.arcadedb:arcadedb-server:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-server/)
     (Apache 2) ArcadeDB Studio (com.arcadedb:arcadedb-studio:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-studio/)
     (Apache 2) ArcadeDB Test Utils (com.arcadedb:arcadedb-test-utils:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-test-utils/)
     (Apache License 2.0) HPPC Collections (com.carrotsearch:hppc:0.7.1 - http://labs.carrotsearch.com/hppc.html/hppc)
     (Apache License 2.0) Metrics Core (com.codahale.metrics:metrics-core:3.0.2 - http://metrics.codahale.com/metrics-core/)
     (The Apache License, Version 2.0) com.conversantmedia:disruptor (com.conversantmedia:disruptor:1.2.21 - https://github.com/conversant/disruptor)
     (Apache License 2.0) Jackson-annotations (com.fasterxml.jackson.core:jackson-annotations:2.20 - https://github.com/FasterXML/jackson)
     (Apache License 2.0) Jackson-annotations (com.fasterxml.jackson.core:jackson-annotations:2.21 - https://github.com/FasterXML/jackson)
     (Apache License 2.0) Jackson-core (com.fasterxml.jackson.core:jackson-core:2.21.1 - https://github.com/FasterXML/jackson-core)
     (Apache License 2.0) Jackson-core (com.fasterxml.jackson.core:jackson-core:2.21.2 - https://github.com/FasterXML/jackson-core)
     (Apache License 2.0) jackson-databind (com.fasterxml.jackson.core:jackson-databind:2.21.1 - https://github.com/FasterXML/jackson)
     (Apache License 2.0) jackson-databind (com.fasterxml.jackson.core:jackson-databind:2.21.2 - https://github.com/FasterXML/jackson)
     (Apache License 2.0) Jackson-dataformat-YAML (com.fasterxml.jackson.dataformat:jackson-dataformat-yaml:2.21.1 - https://github.com/FasterXML/jackson-dataformats-text)
     (Apache License 2.0) Jackson datatype: JSR310 (com.fasterxml.jackson.datatype:jackson-datatype-jsr310:2.21.1 - https://github.com/FasterXML/jackson-modules-java8/jackson-datatype-jsr310)
     (Apache License 2.0) Caffeine cache (com.github.ben-manes.caffeine:caffeine:2.3.1 - https://github.com/ben-manes/caffeine)
     (Apache License 2.0) docker-java-api (com.github.docker-java:docker-java-api:3.7.1 - https://github.com/docker-java/docker-java)
     (Apache License 2.0) docker-java-transport (com.github.docker-java:docker-java-transport:3.7.1 - https://github.com/docker-java/docker-java)
     (Apache License 2.0) docker-java-transport-zerodep (com.github.docker-java:docker-java-transport-zerodep:3.7.1 - https://github.com/docker-java/docker-java)
     (Apache Software License, version 2.0) (Lesser General Public License, version 3 or greater) btf (com.github.java-json-tools:btf:1.3 - https://github.com/java-json-tools/btf)
     (Apache Software License, version 2.0) (Lesser General Public License, version 3 or greater) jackson-coreutils (com.github.java-json-tools:jackson-coreutils:2.0 - https://github.com/java-json-tools/jackson-coreutils)
     (Apache Software License, version 2.0) (Lesser General Public License, version 3 or greater) jackson-coreutils-equivalence (com.github.java-json-tools:jackson-coreutils-equivalence:1.0 - https://github.com/java-json-tools/jackson-coreutils)
     (Apache Software License, version 2.0) (Lesser General Public License, version 3 or greater) json-patch (com.github.java-json-tools:json-patch:1.13 - https://github.com/java-json-tools/json-patch)
     (Apache Software License, version 2.0) (Lesser General Public License, version 3 or greater) json-schema-core (com.github.java-json-tools:json-schema-core:1.2.14 - https://github.com/java-json-tools/json-schema-core)
     (Apache Software License, version 2.0) (Lesser General Public License, version 3 or greater) json-schema-validator (com.github.java-json-tools:json-schema-validator:2.2.14 - https://github.com/java-json-tools/json-schema-validator)
     (Apache Software License, version 2.0) (Lesser General Public License, version 3 or greater) msg-simple (com.github.java-json-tools:msg-simple:1.2 - https://github.com/java-json-tools/msg-simple)
     (Apache Software License, version 2.0) (Lesser General Public License, version 3 or greater) uri-template (com.github.java-json-tools:uri-template:0.10 - https://github.com/java-json-tools/uri-template)
     (Apache License 2.0) (GNU Lesser General Public License) javaparser-core (com.github.javaparser:javaparser-core:3.26.3 - https://github.com/javaparser/javaparser-core)
     (Apache License 2.0) JCIP Annotations under Apache License (com.github.stephenc.jcip:jcip-annotations:1.0-1 - http://stephenc.github.com/jcip-annotations)
     (Apache License 2.0) Google Android Annotations Library (com.google.android:annotations:4.1.1.4 - http://source.android.com/)

@github-actions
Copy link
Copy Markdown
Contributor

📜 License Compliance Check

✅ License check passed. See artifacts for full report.

License Summary (first 50 lines)

Lists of 402 third-party dependencies.
     (Apache License 2.0) LZ4 Java Compression (at.yawk.lz4:lz4-java:1.10.4 - https://github.com/yawkat/lz4-java)
     (EPL 2.0) (GNU Lesser General Public License) Logback Classic Module (ch.qos.logback:logback-classic:1.5.32 - http://logback.qos.ch/logback-classic)
     (EPL 2.0) (GNU Lesser General Public License) Logback Core Module (ch.qos.logback:logback-core:1.5.32 - http://logback.qos.ch/logback-core)
     (Apache 2) ArcadeDB BOLT Protocol (com.arcadedb:arcadedb-bolt:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-bolt/)
     (Apache 2) ArcadeDB Console (com.arcadedb:arcadedb-console:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-console/)
     (Apache 2) ArcadeDB Engine (com.arcadedb:arcadedb-engine:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-engine/)
     (Apache 2) ArcadeDB GraphQL (com.arcadedb:arcadedb-graphql:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-graphql/)
     (Apache 2) ArcadeDB Gremlin (com.arcadedb:arcadedb-gremlin:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-gremlin/)
     (Apache 2) ArcadeDB gRPC Stubs (com.arcadedb:arcadedb-grpc:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-grpc/)
     (Apache 2) ArcadeDB gRPC Client (com.arcadedb:arcadedb-grpc-client:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-grpc-client/)
     (Apache 2) ArcadeDB gRpcW (com.arcadedb:arcadedb-grpcw:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-grpcw/)
     (Apache 2) ArcadeDB HA Raft (com.arcadedb:arcadedb-ha-raft:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-ha-raft/)
     (Apache 2) ArcadeDB Integration (com.arcadedb:arcadedb-integration:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-integration/)
     (Apache 2) ArcadeDB load tests (com.arcadedb:arcadedb-load-tests:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-load-tests/)
     (Apache 2) ArcadeDB Metrics (com.arcadedb:arcadedb-metrics:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-metrics/)
     (Apache 2) ArcadeDB MongoDB Wire Protocol (com.arcadedb:arcadedb-mongodbw:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-mongodbw/)
     (Apache 2) ArcadeDB Network (com.arcadedb:arcadedb-network:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-network/)
     (Apache 2) ArcadeDB PostgresW (com.arcadedb:arcadedb-postgresw:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-postgresw/)
     (Apache 2) ArcadeDB RedisW (com.arcadedb:arcadedb-redisw:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-redisw/)
     (Apache 2) ArcadeDB Server (com.arcadedb:arcadedb-server:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-server/)
     (Apache 2) ArcadeDB Studio (com.arcadedb:arcadedb-studio:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-studio/)
     (Apache 2) ArcadeDB Test Utils (com.arcadedb:arcadedb-test-utils:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-test-utils/)
     (Apache License 2.0) HPPC Collections (com.carrotsearch:hppc:0.7.1 - http://labs.carrotsearch.com/hppc.html/hppc)
     (Apache License 2.0) Metrics Core (com.codahale.metrics:metrics-core:3.0.2 - http://metrics.codahale.com/metrics-core/)
     (The Apache License, Version 2.0) com.conversantmedia:disruptor (com.conversantmedia:disruptor:1.2.21 - https://github.com/conversant/disruptor)
     (Apache License 2.0) Jackson-annotations (com.fasterxml.jackson.core:jackson-annotations:2.20 - https://github.com/FasterXML/jackson)
     (Apache License 2.0) Jackson-annotations (com.fasterxml.jackson.core:jackson-annotations:2.21 - https://github.com/FasterXML/jackson)
     (Apache License 2.0) Jackson-core (com.fasterxml.jackson.core:jackson-core:2.21.1 - https://github.com/FasterXML/jackson-core)
     (Apache License 2.0) Jackson-core (com.fasterxml.jackson.core:jackson-core:2.21.2 - https://github.com/FasterXML/jackson-core)
     (Apache License 2.0) jackson-databind (com.fasterxml.jackson.core:jackson-databind:2.21.1 - https://github.com/FasterXML/jackson)
     (Apache License 2.0) jackson-databind (com.fasterxml.jackson.core:jackson-databind:2.21.2 - https://github.com/FasterXML/jackson)
     (Apache License 2.0) Jackson-dataformat-YAML (com.fasterxml.jackson.dataformat:jackson-dataformat-yaml:2.21.1 - https://github.com/FasterXML/jackson-dataformats-text)
     (Apache License 2.0) Jackson datatype: JSR310 (com.fasterxml.jackson.datatype:jackson-datatype-jsr310:2.21.1 - https://github.com/FasterXML/jackson-modules-java8/jackson-datatype-jsr310)
     (Apache License 2.0) Caffeine cache (com.github.ben-manes.caffeine:caffeine:2.3.1 - https://github.com/ben-manes/caffeine)
     (Apache License 2.0) docker-java-api (com.github.docker-java:docker-java-api:3.7.1 - https://github.com/docker-java/docker-java)
     (Apache License 2.0) docker-java-transport (com.github.docker-java:docker-java-transport:3.7.1 - https://github.com/docker-java/docker-java)
     (Apache License 2.0) docker-java-transport-zerodep (com.github.docker-java:docker-java-transport-zerodep:3.7.1 - https://github.com/docker-java/docker-java)
     (Apache Software License, version 2.0) (Lesser General Public License, version 3 or greater) btf (com.github.java-json-tools:btf:1.3 - https://github.com/java-json-tools/btf)
     (Apache Software License, version 2.0) (Lesser General Public License, version 3 or greater) jackson-coreutils (com.github.java-json-tools:jackson-coreutils:2.0 - https://github.com/java-json-tools/jackson-coreutils)
     (Apache Software License, version 2.0) (Lesser General Public License, version 3 or greater) jackson-coreutils-equivalence (com.github.java-json-tools:jackson-coreutils-equivalence:1.0 - https://github.com/java-json-tools/jackson-coreutils)
     (Apache Software License, version 2.0) (Lesser General Public License, version 3 or greater) json-patch (com.github.java-json-tools:json-patch:1.13 - https://github.com/java-json-tools/json-patch)
     (Apache Software License, version 2.0) (Lesser General Public License, version 3 or greater) json-schema-core (com.github.java-json-tools:json-schema-core:1.2.14 - https://github.com/java-json-tools/json-schema-core)
     (Apache Software License, version 2.0) (Lesser General Public License, version 3 or greater) json-schema-validator (com.github.java-json-tools:json-schema-validator:2.2.14 - https://github.com/java-json-tools/json-schema-validator)
     (Apache Software License, version 2.0) (Lesser General Public License, version 3 or greater) msg-simple (com.github.java-json-tools:msg-simple:1.2 - https://github.com/java-json-tools/msg-simple)
     (Apache Software License, version 2.0) (Lesser General Public License, version 3 or greater) uri-template (com.github.java-json-tools:uri-template:0.10 - https://github.com/java-json-tools/uri-template)
     (Apache License 2.0) (GNU Lesser General Public License) javaparser-core (com.github.javaparser:javaparser-core:3.26.3 - https://github.com/javaparser/javaparser-core)
     (Apache License 2.0) JCIP Annotations under Apache License (com.github.stephenc.jcip:jcip-annotations:1.0-1 - http://stephenc.github.com/jcip-annotations)
     (Apache License 2.0) Google Android Annotations Library (com.google.android:annotations:4.1.1.4 - http://source.android.com/)

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new Raft-based High Availability (HA) implementation for ArcadeDB, replacing the existing custom protocol. It includes a new ha-raft Maven module with components like RaftLogEntryCodec for serializing WAL and schema changes, ArcadeStateMachine for applying these changes, RaftHAServer for managing the Ratis cluster, and RaftReplicatedDatabase for intercepting database operations. New global configuration properties have been added for Raft-specific settings, and the server startup logic is updated to integrate the new HA plugin. Extensive unit and integration tests have been added to cover various replication, failover, and network partition scenarios, including a new /api/v1/cluster endpoint for status. The .gitignore file was updated. Feedback includes a dependency misconfiguration in ha-raft/pom.xml, a suggestion for .gitignore hygiene, a potential edge case in TransactionManager.applyChanges regarding bucket count updates, an incomplete replica lag monitoring implementation in RaftHAServer, and a recommendation for more specific error messages in AbstractServerHttpHandler for cluster token validation.

Comment on lines +118 to +123
</dependencies>

<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

There appears to be a dependency misconfiguration in this pom.xml. The arcadedb-server artifact is included twice: once with provided scope (which is correct for a plugin) and again here with compile scope. The compile scope dependency also uses a hardcoded version (26.4.1-SNAPSHOT) instead of the ${project.parent.version} property.

This can lead to unpredictable build behavior and dependency conflicts. The compile scope dependency on arcadedb-server should be removed to avoid these issues.

*.lsmvecidx
*.metadata.json

notes.txt
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The file notes.txt appears to be a personal note file. It's generally better to add such user-specific files to your global .gitignore file (e.g., ~/.config/git/ignore) or the repository's local exclude file (.git/info/exclude) rather than the project's shared .gitignore. This helps keep the project's ignore list clean and focused on project-specific generated files and artifacts.

Comment on lines +414 to +416
if (changed) {
for (final Map.Entry<Integer, Integer> entry : bucketRecordDelta.entrySet()) {
final LocalBucket bucket = (LocalBucket) database.getSchema().getBucketById(entry.getKey());
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This loop over bucketRecordDelta is now inside if (changed). This is a good optimization to avoid unnecessary work when no pages were actually modified in the transaction. However, the changed flag is only set to true inside the loop over tx.pages. If a transaction only contains bucket count changes but no page modifications (which might be a rare edge case), this change would prevent the bucket counts from being updated.

If it's guaranteed that any transaction affecting bucket counts will also have modified pages, then this is fine. Otherwise, it might be safer to update the bucket counts regardless of the changed flag, as was the case previously.

Comment on lines +425 to +436
private void checkReplicaLag() {
try {
final var division = raftServer.getDivision(raftGroup.getGroupId());
final var info = division.getInfo();
if (!info.isLeader())
return;
final long commitIndex = info.getLastAppliedIndex();
clusterMonitor.updateLeaderCommitIndex(commitIndex);
} catch (final Exception e) {
LogManager.instance().log(this, Level.FINE, "Error checking replica lag", e);
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The implementation of checkReplicaLag seems incomplete. As the comment inside the method notes, the Ratis public API may not expose per-follower match indexes, so this implementation only updates the leader's commit index in the ClusterMonitor. This means the lag monitoring feature will not function as intended because it cannot calculate or report any lag for individual replicas.

Consider either removing this monitoring logic until it can be fully implemented or adding a more prominent // TODO and logging a warning at startup that replica lag monitoring is not yet functional.

Comment on lines +290 to +292
sendErrorResponse(exchange, 401, "Invalid cluster token", null, null);
return null;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The response for an invalid cluster token is a generic 401 error. For improved diagnostics in a misconfigured cluster, it would be helpful to provide a more specific error message in the response body, such as {"error": "Invalid cluster token"}. This would make it easier for operators to identify the root cause of authentication failures when forwarding requests between nodes.

@claude
Copy link
Copy Markdown

claude bot commented Mar 26, 2026

Code Review: Raft-based HA using Apache Ratis

This is a well-architected and thoroughly tested addition. The use of Apache Ratis is a sound choice for proven consensus semantics, and the plugin-based design with ServiceLoader is clean. Below are issues found, ordered by severity.


Critical

1. Duplicate arcadedb-server dependency with hardcoded version in ha-raft/pom.xml

At the bottom of ha-raft/pom.xml, there is a redundant compile-scope dependency on arcadedb-server version 26.4.1-SNAPSHOT with compile scope. The module already declares arcadedb-server with provided scope (correct per project conventions). This second declaration overrides provided with compile, adding it to the module runtime classpath, hardcodes a version that will break when the project version changes, and violates the wire-protocol module dependency rules. Remove the duplicate dependency entirely.

2. ATTRIBUTIONS.md not updated for Apache Ratis dependency

Project guidelines require updating ATTRIBUTIONS.md when adding dependencies. Apache Ratis 3.2.0 (ratis-server, ratis-grpc, ratis-metrics-default) is a significant new dependency not reflected in ATTRIBUTIONS.md. Since Apache Ratis has its own NOTICE file, relevant notices must also be incorporated into the main NOTICE file.


High Priority

3. Nine docs/plans/ files should not be committed to the repository

Nine AI-generated implementation plan documents are being added to docs/plans/. These are AI conversation artifacts rather than permanent project documentation. Several are 2000+ lines and will confuse future contributors. They should be removed from the PR.

4. SnapshotManager / installSnapshot() not wired up

The SnapshotManager Javadoc states that full installSnapshot() integration with Ratis is not yet wired. Replicas that fall behind past log compaction require a manual data copy from the leader. With default HA_RAFT_SNAPSHOT_THRESHOLD=10000, any replica that misses 10,000+ entries becomes permanently unable to rejoin without manual intervention. This should either be implemented before merge, or clearly marked experimental with a tracking issue.

5. HA_SERVER_LIST format is a breaking change without validation

The format changed from hostname:port to hostname:raftPort:httpPort[:priority] with no backward compatibility. Existing users upgrading will experience silent misconfiguration. Add a validator that warns about the old 2-token format.

6. No TLS for Raft gRPC communication

All inter-node Raft traffic is unencrypted. This should be documented clearly, and a follow-up issue should track TLS support.


Medium Priority

7. maven-shade-plugin in ha-raft/pom.xml without configuration

The build section only references maven-shade-plugin with no configuration. This produces a fat JAR inappropriate for a module deployed alongside arcadedb-server and may cause classpath conflicts with Ratis classes. Either configure it with proper dependency relocations or remove it.

8. ArcadeStateMachine.applySchemaEntry() - no rollback on partial failure

If file creation succeeds for some files but throws on a later one, the state machine is left in a partially-applied state. Created files are not rolled back, potentially leaving a replica in an inconsistent schema state.

9. HA_RAFT_PERSIST_STORAGE leaks a test-only concern into production config

This flag is documented as not intended for production use. Having it in GlobalConfiguration (user-visible) is confusing. At minimum, prefix the description with [TEST ONLY].

10. ClusterMonitor.leaderCommitIndex - potential negative lag in getReplicaLags()

updateLeaderCommitIndex() and updateReplicaMatchIndex() are not synchronized, so a lag of -1 could briefly appear in monitoring output. An AtomicLong for leaderCommitIndex would be more idiomatic.


Minor / Style

11. Typo in test class filename: ClusterDatatbaseChecker.java should be ClusterDatabaseChecker.java.

12. deserializeWalTransaction() should move to RaftLogEntryCodec for better cohesion.

13. mvn-test.yml: replacing actions/setup-java built-in Maven cache with a manual actions/cache step may be less precise - confirm this is intentional.


Summary

Severity Count
Critical 2
High 4
Medium 4
Minor 3

The critical pom.xml dependency bug and missing ATTRIBUTIONS.md update must be fixed before merge. The incomplete installSnapshot() integration, the breaking HA_SERVER_LIST format change, and the AI planning artifacts in docs/plans/ also need resolution.

The core architecture and test coverage are excellent - the Raft consensus approach is the right foundation for long-term HA correctness.

@claude
Copy link
Copy Markdown

claude bot commented Mar 26, 2026

Code Review: Raft-based High Availability (Apache Ratis)

This PR introduces a new ha-raft module implementing ArcadeDB's HA stack on Apache Ratis. The design is architecturally sound — using Raft for consensus, a proxy DatabaseInternal for intercepting commits, and a clean HAReplicatedDatabase interface. That said, there are several critical correctness issues and security gaps that should be addressed before merging.


Critical / Data-Correctness Bugs

1. Leadership-change race in ArcadeStateMachine.applyTxEntry()

if (raftHAServer != null && raftHAServer.isLeader()) {
    // Skip – leader already committed locally
    return;
}

Leadership is checked at apply time, not at commit time. Ratis delivers applyTransaction asynchronously. If the node loses leadership between submitting the log entry and the state machine callback firing, it will skip applying a transaction it now needs as a follower. Conversely, a newly elected leader that inherits uncommitted entries from the previous term would also skip applying them. The originator should be encoded in the log entry itself (e.g., as a peer ID field), not inferred from current role.

2. Two-phase commit is not atomic under concurrent leadership change

In RaftReplicatedDatabase.commit():

final RaftClientReply reply = raftHAServer.getClient().io().send(...);
if (!reply.isSuccess()) throw ...;
if (leader)
    tx.commit2ndPhase(phase1);   // <- leadership may have changed here

If the node loses leadership between send() returning success and commit2ndPhase() executing, the state machine's applyTxEntry() will now not skip (since isLeader() is false), applying the transaction a second time. The ignoreErrors=true path in applyChanges() silently discards these duplicate-page errors, masking the double-application.

3. Non-atomic schema version increment

schemaJson.put("schemaVersion", schemaJson.getLong("schemaVersion") + 1);

The version is read and incremented client-side before replication. Concurrent DDL operations can both read the same base version and both submit version+1, causing a version collision on replicas.

4. No bounds checking in deserializeWalTransaction

final int pageCount = buf.getInt();
tx.pages = new WALFile.WALPage[pageCount];
for (int i = 0; i < pageCount; i++) {
    final int deltaSize = page.changesTo - page.changesFrom + 1;
    final byte[] content = new byte[deltaSize];

A corrupted or malicious log entry with a large pageCount causes an OOM allocation, and changesTo < changesFrom causes NegativeArraySizeException. There is no magic byte / version marker to detect truncated entries. Since Raft log entries persist to disk, corruption can reproduce across restarts.


Security Concerns

5. Cluster token stored without file permission hardening

initClusterToken() writes the shared token to cluster-token.txt with default OS permissions (typically world-readable). An attacker with local filesystem access to any node can read this token and forge X-ArcadeDB-Cluster-Token headers against any peer. Please apply 600 permissions after writing:

tokenFile.toPath().toFile().setReadable(false, false);
tokenFile.toPath().toFile().setReadable(true, true);

Or use Files.setPosixFilePermissions.

6. Cluster-internal auth bypasses authorisation, not just authentication

A successfully validated cluster token causes the handler to resolve the user from X-ArcadeDB-Forwarded-User with no further authorisation checks. If an attacker forges or steals the cluster token (see item 5) and knows any valid username, they can execute any command as that user on any node. The cluster-auth path should apply the same database/operation authorisation checks as normal user auth.

7. Leader-forwarding uses plain HTTP

URI.create("http://" + leaderHttpAddress + "/api/v1/command/" + databaseName)

The cluster token and original Authorization header are forwarded in cleartext. Any network-level observer between nodes can capture credentials. This should use HTTPS or a mTLS sidecar, and should be documented as a deployment requirement.


Code Quality

8. ClusterDatatbaseChecker.java should not be committed

ha-raft/src/test/java/com/arcadedb/server/ha/raft/ClusterDatatbaseChecker.java is a one-off developer utility with hardcoded local paths (/Users/frank/projects/...), no JUnit annotations (it has a main() method only so it will never run in CI), no license header, and a typo in the class name. Please remove it.

9. docs/plans/ directory contains AI assistant planning artifacts

Nine markdown files under docs/plans/ contain prompts like > **For Claude:** REQUIRED SUB-SKILL: .... These are AI tooling artifacts, not user- or developer-facing documentation, and should not be committed to the public repository.

10. PluginManager hard-codes class name as a string

final boolean isRaftPlugin = autoDiscoverRaft && "RaftHAPlugin".equals(name);

If the class is renamed this check silently breaks. A marker interface or an annotation would be more robust.

11. Wildcard imports in several files

RaftReplicatedDatabase.java uses import com.arcadedb.database.*;, import com.arcadedb.engine.*;, import java.util.*; etc. The existing codebase avoids wildcard imports.

12. Commented-out code in ArcadeDBServer.java

//  private             ServerMonitor                         serverMonitor;

Dead code like this should be removed before merging.


Performance

13. Synchronous Raft io().send() on every commit

final RaftClientReply reply = raftHAServer.getClient().io().send(Message.valueOf(entry));

Every transaction commit blocks the calling thread waiting for quorum acknowledgement. For workloads with many small transactions this will be a severe throughput regression compared to the existing pipelined async HA. Consider documenting this trade-off and, if possible, exploring a pipeline or batch-commit option in Ratis (async().send()).

14. Duplicate static HttpClient

There is a private static final HttpClient HTTP_CLIENT in both RaftReplicatedDatabase and PostCommandHandler. These should be shared at the server level.


Completeness

15. Snapshot-based resync is not implemented

ArcadeStateMachine.takeSnapshot() explicitly notes this is not implemented. This means any replica that falls behind past the Raft log compaction point cannot rejoin without a manual data copy. This is a significant operational gap. Please either implement it or document the limitation prominently in release notes and ops docs.


Test Coverage

16. No test for leadership-change race during commit

The two critical races described in items 1 and 2 above are not tested. The integration tests cover clean failovers, but not leadership changes that occur mid-commit.

17. No test for token file persistence across node restarts

There is no test verifying that after a restart, initClusterToken() correctly re-reads the existing token from cluster-token.txt instead of generating a new one (which would cause a cluster-wide auth failure).


Licensing Compliance

18. ATTRIBUTIONS.md and NOTICE not updated for Apache Ratis

Per CLAUDE.md:

When adding a dependency, you MUST update ATTRIBUTIONS.md and, if Apache-licensed with a NOTICE file, incorporate required notices into the main NOTICE file.

Apache Ratis is Apache 2.0 and has its own NOTICE file. Neither ATTRIBUTIONS.md nor NOTICE is updated in this PR.


Summary

Category Issue Severity
Correctness Leadership-change race in state machine apply Critical
Correctness Two-phase commit not atomic under leadership change Critical
Correctness Non-atomic schema version increment High
Correctness No bounds checking in deserializeWalTransaction High
Security Cluster token file not permission-hardened High
Security Cluster-internal auth bypasses authorisation High
Security Leader-forwarding over plain HTTP Medium
Performance Synchronous send on every commit High
Completeness Snapshot resync not implemented High
Tests No test for leadership-change race High
Tests No test for token persistence across restarts Medium
Code Quality ClusterDatatbaseChecker dev utility committed Medium
Code Quality docs/plans/ AI artifacts committed Low
Code Quality String-based plugin name check in PluginManager Low
Code Quality Wildcard imports Low
Compliance ATTRIBUTIONS.md / NOTICE not updated Medium

The architectural direction is solid. The critical correctness and security issues should be resolved before this merges to avoid data-loss scenarios in production clusters.

@github-actions
Copy link
Copy Markdown
Contributor

📜 License Compliance Check

✅ License check passed. See artifacts for full report.

License Summary (first 50 lines)

Lists of 402 third-party dependencies.
     (Apache License 2.0) LZ4 Java Compression (at.yawk.lz4:lz4-java:1.10.4 - https://github.com/yawkat/lz4-java)
     (EPL 2.0) (GNU Lesser General Public License) Logback Classic Module (ch.qos.logback:logback-classic:1.5.32 - http://logback.qos.ch/logback-classic)
     (EPL 2.0) (GNU Lesser General Public License) Logback Core Module (ch.qos.logback:logback-core:1.5.32 - http://logback.qos.ch/logback-core)
     (Apache 2) ArcadeDB BOLT Protocol (com.arcadedb:arcadedb-bolt:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-bolt/)
     (Apache 2) ArcadeDB Console (com.arcadedb:arcadedb-console:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-console/)
     (Apache 2) ArcadeDB Engine (com.arcadedb:arcadedb-engine:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-engine/)
     (Apache 2) ArcadeDB GraphQL (com.arcadedb:arcadedb-graphql:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-graphql/)
     (Apache 2) ArcadeDB Gremlin (com.arcadedb:arcadedb-gremlin:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-gremlin/)
     (Apache 2) ArcadeDB gRPC Stubs (com.arcadedb:arcadedb-grpc:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-grpc/)
     (Apache 2) ArcadeDB gRPC Client (com.arcadedb:arcadedb-grpc-client:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-grpc-client/)
     (Apache 2) ArcadeDB gRpcW (com.arcadedb:arcadedb-grpcw:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-grpcw/)
     (Apache 2) ArcadeDB HA Raft (com.arcadedb:arcadedb-ha-raft:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-ha-raft/)
     (Apache 2) ArcadeDB Integration (com.arcadedb:arcadedb-integration:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-integration/)
     (Apache 2) ArcadeDB load tests (com.arcadedb:arcadedb-load-tests:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-load-tests/)
     (Apache 2) ArcadeDB Metrics (com.arcadedb:arcadedb-metrics:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-metrics/)
     (Apache 2) ArcadeDB MongoDB Wire Protocol (com.arcadedb:arcadedb-mongodbw:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-mongodbw/)
     (Apache 2) ArcadeDB Network (com.arcadedb:arcadedb-network:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-network/)
     (Apache 2) ArcadeDB PostgresW (com.arcadedb:arcadedb-postgresw:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-postgresw/)
     (Apache 2) ArcadeDB RedisW (com.arcadedb:arcadedb-redisw:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-redisw/)
     (Apache 2) ArcadeDB Server (com.arcadedb:arcadedb-server:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-server/)
     (Apache 2) ArcadeDB Studio (com.arcadedb:arcadedb-studio:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-studio/)
     (Apache 2) ArcadeDB Test Utils (com.arcadedb:arcadedb-test-utils:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-test-utils/)
     (Apache License 2.0) HPPC Collections (com.carrotsearch:hppc:0.7.1 - http://labs.carrotsearch.com/hppc.html/hppc)
     (Apache License 2.0) Metrics Core (com.codahale.metrics:metrics-core:3.0.2 - http://metrics.codahale.com/metrics-core/)
     (The Apache License, Version 2.0) com.conversantmedia:disruptor (com.conversantmedia:disruptor:1.2.21 - https://github.com/conversant/disruptor)
     (Apache License 2.0) Jackson-annotations (com.fasterxml.jackson.core:jackson-annotations:2.20 - https://github.com/FasterXML/jackson)
     (Apache License 2.0) Jackson-annotations (com.fasterxml.jackson.core:jackson-annotations:2.21 - https://github.com/FasterXML/jackson)
     (Apache License 2.0) Jackson-core (com.fasterxml.jackson.core:jackson-core:2.21.1 - https://github.com/FasterXML/jackson-core)
     (Apache License 2.0) Jackson-core (com.fasterxml.jackson.core:jackson-core:2.21.2 - https://github.com/FasterXML/jackson-core)
     (Apache License 2.0) jackson-databind (com.fasterxml.jackson.core:jackson-databind:2.21.1 - https://github.com/FasterXML/jackson)
     (Apache License 2.0) jackson-databind (com.fasterxml.jackson.core:jackson-databind:2.21.2 - https://github.com/FasterXML/jackson)
     (Apache License 2.0) Jackson-dataformat-YAML (com.fasterxml.jackson.dataformat:jackson-dataformat-yaml:2.21.1 - https://github.com/FasterXML/jackson-dataformats-text)
     (Apache License 2.0) Jackson datatype: JSR310 (com.fasterxml.jackson.datatype:jackson-datatype-jsr310:2.21.1 - https://github.com/FasterXML/jackson-modules-java8/jackson-datatype-jsr310)
     (Apache License 2.0) Caffeine cache (com.github.ben-manes.caffeine:caffeine:2.3.1 - https://github.com/ben-manes/caffeine)
     (Apache License 2.0) docker-java-api (com.github.docker-java:docker-java-api:3.7.1 - https://github.com/docker-java/docker-java)
     (Apache License 2.0) docker-java-transport (com.github.docker-java:docker-java-transport:3.7.1 - https://github.com/docker-java/docker-java)
     (Apache License 2.0) docker-java-transport-zerodep (com.github.docker-java:docker-java-transport-zerodep:3.7.1 - https://github.com/docker-java/docker-java)
     (Apache Software License, version 2.0) (Lesser General Public License, version 3 or greater) btf (com.github.java-json-tools:btf:1.3 - https://github.com/java-json-tools/btf)
     (Apache Software License, version 2.0) (Lesser General Public License, version 3 or greater) jackson-coreutils (com.github.java-json-tools:jackson-coreutils:2.0 - https://github.com/java-json-tools/jackson-coreutils)
     (Apache Software License, version 2.0) (Lesser General Public License, version 3 or greater) jackson-coreutils-equivalence (com.github.java-json-tools:jackson-coreutils-equivalence:1.0 - https://github.com/java-json-tools/jackson-coreutils)
     (Apache Software License, version 2.0) (Lesser General Public License, version 3 or greater) json-patch (com.github.java-json-tools:json-patch:1.13 - https://github.com/java-json-tools/json-patch)
     (Apache Software License, version 2.0) (Lesser General Public License, version 3 or greater) json-schema-core (com.github.java-json-tools:json-schema-core:1.2.14 - https://github.com/java-json-tools/json-schema-core)
     (Apache Software License, version 2.0) (Lesser General Public License, version 3 or greater) json-schema-validator (com.github.java-json-tools:json-schema-validator:2.2.14 - https://github.com/java-json-tools/json-schema-validator)
     (Apache Software License, version 2.0) (Lesser General Public License, version 3 or greater) msg-simple (com.github.java-json-tools:msg-simple:1.2 - https://github.com/java-json-tools/msg-simple)
     (Apache Software License, version 2.0) (Lesser General Public License, version 3 or greater) uri-template (com.github.java-json-tools:uri-template:0.10 - https://github.com/java-json-tools/uri-template)
     (Apache License 2.0) (GNU Lesser General Public License) javaparser-core (com.github.javaparser:javaparser-core:3.26.3 - https://github.com/javaparser/javaparser-core)
     (Apache License 2.0) JCIP Annotations under Apache License (com.github.stephenc.jcip:jcip-annotations:1.0-1 - http://stephenc.github.com/jcip-annotations)
     (Apache License 2.0) Google Android Annotations Library (com.google.android:annotations:4.1.1.4 - http://source.android.com/)

@robfrank
Copy link
Copy Markdown
Collaborator Author

Reviewed the automated bot feedback — no actionable items:

github-advanced-security:

  • Workflow permissions — already set at the setup job level (lines 15-19).
  • Outdated sub-dependencies (micrometer-core, antlr4-runtime, assertj-core, junit-jupiter) — transitive deps from Ratis, not directly controllable. No CVEs flagged, just version staleness.

gemini-code-assist:

  • Duplicate arcadedb-server dependency (HIGH) — false positive. The two entries are provided scope (main code) + test/test-jar scope (test utilities), which is the correct wire protocol module pattern. The actual compile-scope duplicate was removed in 0a5dd7f4c.
  • notes.txt in .gitignore (MEDIUM) — the repo already has other personal/IDE entries in .gitignore. Moving to global gitignore is a style preference, not a correctness issue.
  • TransactionManager bucket counts inside if(changed) (MEDIUM) — not a real edge case. Bucket record deltas are computed from page modifications; a transaction with bucket count changes but zero page writes cannot occur in practice.
  • checkReplicaLag incomplete (MEDIUM) — known limitation: Ratis public API does not expose per-follower match indexes. The method correctly tracks what is available (leader commit index). Nothing to fix without a Ratis API change.
  • Cluster token error message (MEDIUM) — already implemented. Line 290 already returns "Invalid cluster token" in the response body.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Thank you for your detailed response, @robfrank. I'd like to clarify a couple of points regarding the feedback provided by claude[bot].

  1. ClusterMonitor.leaderCommitIndex - potential negative lag: claude[bot]'s concern (Comment 5, Medium 10) was specifically about the potential for leaderCommitIndex - matchIndex to result in a negative value due to a lack of synchronization between updateLeaderCommitIndex() and updateReplicaMatchIndex(). While the Ratis API might not expose per-follower match indexes directly, the internal consistency of ClusterMonitor itself should prevent such transient negative lags. Using an AtomicLong for leaderCommitIndex or ensuring proper synchronization around updates and reads of both leaderCommitIndex and replicaMatchIndexes would address this.

  2. Cluster-internal auth bypasses authorisation: Regarding the cluster token, claude[bot]'s point (Comment 6, Security 6) was not about the error message for an invalid token, but rather that after a valid cluster token is presented, the system bypasses further authorization checks for the X-ArcadeDB-Forwarded-User. The recommendation was that the cluster-auth path should still apply the same database/operation authorization checks as normal user authentication to prevent an attacker with a stolen token from gaining full administrative access if they know a valid username.

@github-actions
Copy link
Copy Markdown
Contributor

📜 License Compliance Check

✅ License check passed. See artifacts for full report.

License Summary (first 50 lines)

Lists of 402 third-party dependencies.
     (Apache License 2.0) LZ4 Java Compression (at.yawk.lz4:lz4-java:1.10.4 - https://github.com/yawkat/lz4-java)
     (EPL 2.0) (GNU Lesser General Public License) Logback Classic Module (ch.qos.logback:logback-classic:1.5.32 - http://logback.qos.ch/logback-classic)
     (EPL 2.0) (GNU Lesser General Public License) Logback Core Module (ch.qos.logback:logback-core:1.5.32 - http://logback.qos.ch/logback-core)
     (Apache 2) ArcadeDB BOLT Protocol (com.arcadedb:arcadedb-bolt:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-bolt/)
     (Apache 2) ArcadeDB Console (com.arcadedb:arcadedb-console:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-console/)
     (Apache 2) ArcadeDB Engine (com.arcadedb:arcadedb-engine:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-engine/)
     (Apache 2) ArcadeDB GraphQL (com.arcadedb:arcadedb-graphql:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-graphql/)
     (Apache 2) ArcadeDB Gremlin (com.arcadedb:arcadedb-gremlin:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-gremlin/)
     (Apache 2) ArcadeDB gRPC Stubs (com.arcadedb:arcadedb-grpc:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-grpc/)
     (Apache 2) ArcadeDB gRPC Client (com.arcadedb:arcadedb-grpc-client:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-grpc-client/)
     (Apache 2) ArcadeDB gRpcW (com.arcadedb:arcadedb-grpcw:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-grpcw/)
     (Apache 2) ArcadeDB HA Raft (com.arcadedb:arcadedb-ha-raft:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-ha-raft/)
     (Apache 2) ArcadeDB Integration (com.arcadedb:arcadedb-integration:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-integration/)
     (Apache 2) ArcadeDB load tests (com.arcadedb:arcadedb-load-tests:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-load-tests/)
     (Apache 2) ArcadeDB Metrics (com.arcadedb:arcadedb-metrics:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-metrics/)
     (Apache 2) ArcadeDB MongoDB Wire Protocol (com.arcadedb:arcadedb-mongodbw:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-mongodbw/)
     (Apache 2) ArcadeDB Network (com.arcadedb:arcadedb-network:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-network/)
     (Apache 2) ArcadeDB PostgresW (com.arcadedb:arcadedb-postgresw:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-postgresw/)
     (Apache 2) ArcadeDB RedisW (com.arcadedb:arcadedb-redisw:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-redisw/)
     (Apache 2) ArcadeDB Server (com.arcadedb:arcadedb-server:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-server/)
     (Apache 2) ArcadeDB Studio (com.arcadedb:arcadedb-studio:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-studio/)
     (Apache 2) ArcadeDB Test Utils (com.arcadedb:arcadedb-test-utils:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-test-utils/)
     (Apache License 2.0) HPPC Collections (com.carrotsearch:hppc:0.7.1 - http://labs.carrotsearch.com/hppc.html/hppc)
     (Apache License 2.0) Metrics Core (com.codahale.metrics:metrics-core:3.0.2 - http://metrics.codahale.com/metrics-core/)
     (The Apache License, Version 2.0) com.conversantmedia:disruptor (com.conversantmedia:disruptor:1.2.21 - https://github.com/conversant/disruptor)
     (Apache License 2.0) Jackson-annotations (com.fasterxml.jackson.core:jackson-annotations:2.20 - https://github.com/FasterXML/jackson)
     (Apache License 2.0) Jackson-annotations (com.fasterxml.jackson.core:jackson-annotations:2.21 - https://github.com/FasterXML/jackson)
     (Apache License 2.0) Jackson-core (com.fasterxml.jackson.core:jackson-core:2.21.1 - https://github.com/FasterXML/jackson-core)
     (Apache License 2.0) Jackson-core (com.fasterxml.jackson.core:jackson-core:2.21.2 - https://github.com/FasterXML/jackson-core)
     (Apache License 2.0) jackson-databind (com.fasterxml.jackson.core:jackson-databind:2.21.1 - https://github.com/FasterXML/jackson)
     (Apache License 2.0) jackson-databind (com.fasterxml.jackson.core:jackson-databind:2.21.2 - https://github.com/FasterXML/jackson)
     (Apache License 2.0) Jackson-dataformat-YAML (com.fasterxml.jackson.dataformat:jackson-dataformat-yaml:2.21.1 - https://github.com/FasterXML/jackson-dataformats-text)
     (Apache License 2.0) Jackson datatype: JSR310 (com.fasterxml.jackson.datatype:jackson-datatype-jsr310:2.21.1 - https://github.com/FasterXML/jackson-modules-java8/jackson-datatype-jsr310)
     (Apache License 2.0) Caffeine cache (com.github.ben-manes.caffeine:caffeine:2.3.1 - https://github.com/ben-manes/caffeine)
     (Apache License 2.0) docker-java-api (com.github.docker-java:docker-java-api:3.7.1 - https://github.com/docker-java/docker-java)
     (Apache License 2.0) docker-java-transport (com.github.docker-java:docker-java-transport:3.7.1 - https://github.com/docker-java/docker-java)
     (Apache License 2.0) docker-java-transport-zerodep (com.github.docker-java:docker-java-transport-zerodep:3.7.1 - https://github.com/docker-java/docker-java)
     (Apache Software License, version 2.0) (Lesser General Public License, version 3 or greater) btf (com.github.java-json-tools:btf:1.3 - https://github.com/java-json-tools/btf)
     (Apache Software License, version 2.0) (Lesser General Public License, version 3 or greater) jackson-coreutils (com.github.java-json-tools:jackson-coreutils:2.0 - https://github.com/java-json-tools/jackson-coreutils)
     (Apache Software License, version 2.0) (Lesser General Public License, version 3 or greater) jackson-coreutils-equivalence (com.github.java-json-tools:jackson-coreutils-equivalence:1.0 - https://github.com/java-json-tools/jackson-coreutils)
     (Apache Software License, version 2.0) (Lesser General Public License, version 3 or greater) json-patch (com.github.java-json-tools:json-patch:1.13 - https://github.com/java-json-tools/json-patch)
     (Apache Software License, version 2.0) (Lesser General Public License, version 3 or greater) json-schema-core (com.github.java-json-tools:json-schema-core:1.2.14 - https://github.com/java-json-tools/json-schema-core)
     (Apache Software License, version 2.0) (Lesser General Public License, version 3 or greater) json-schema-validator (com.github.java-json-tools:json-schema-validator:2.2.14 - https://github.com/java-json-tools/json-schema-validator)
     (Apache Software License, version 2.0) (Lesser General Public License, version 3 or greater) msg-simple (com.github.java-json-tools:msg-simple:1.2 - https://github.com/java-json-tools/msg-simple)
     (Apache Software License, version 2.0) (Lesser General Public License, version 3 or greater) uri-template (com.github.java-json-tools:uri-template:0.10 - https://github.com/java-json-tools/uri-template)
     (Apache License 2.0) (GNU Lesser General Public License) javaparser-core (com.github.javaparser:javaparser-core:3.26.3 - https://github.com/javaparser/javaparser-core)
     (Apache License 2.0) JCIP Annotations under Apache License (com.github.stephenc.jcip:jcip-annotations:1.0-1 - http://stephenc.github.com/jcip-annotations)
     (Apache License 2.0) Google Android Annotations Library (com.google.android:annotations:4.1.1.4 - http://source.android.com/)

@claude
Copy link
Copy Markdown

claude bot commented Mar 26, 2026

PR Review: feat: Raft-based High Availability using Apache Ratis

This is a well-structured and ambitious PR. The TDD approach is followed, separation of concerns is clean, the ha-raft module correctly uses provided scope for arcadedb-server (matching CLAUDE.md guidelines), and the ServiceLoader plugin registration is idiomatic. The tests cover a good range of scenarios including split-brain, failover, and crash recovery.

That said, here are specific concerns worth addressing before merge:


Critical / Correctness

1. ArcadeStateMachine.applyTxEntry() skips application on the leader — risky assumption

The state machine skips applying entries on the leader because "it was already applied via commit2ndPhase". This creates a window where the leader's database state and its Raft log can diverge:

  • If the leader calls commit2ndPhase then crashes before Ratis persists the entry, the followers commit the entry while the leader never did.
  • After the leader restarts and replays its log, it will skip re-applying the entry it never fully applied, leaving it in a permanently inconsistent state.

The correct design is for all nodes (including the leader) to apply every committed log entry. If the leader already applied it locally before the commit, Ratis's idempotency (via log index) and ignoreErrors=true in applyChanges should make re-applying safe.

2. takeSnapshot() does not write an actual snapshot

ArcadeStateMachine.takeSnapshot() simply returns lastAppliedTermIndex.getIndex() without writing the database files to the Ratis snapshot directory. Ratis treats this as a successful snapshot and may compact the log. When a node falls behind far enough that the required log entries have been compacted, it will be unable to catch up — there is no snapshot to install. The SnapshotManager checksum utilities exist but installSnapshot() is not wired into the Ratis StateMachine.Storage API. This effectively makes lagging-node recovery broken in practice, despite being listed as a key feature.

3. Commit protocol has a failure window on the leader

In RaftReplicatedDatabase.commit(), the leader executes:

  1. commit1stPhase (local WAL write)
  2. Send to Raft (blocks until committed by quorum)
  3. commit2ndPhase (local page flush)

If step 3 fails (e.g., disk error), the followers have committed the entry but the leader's local database is in a half-applied state. On leader restart it will skip re-applying (per issue #1), making the leader permanently inconsistent with its own log.


Security

4. Cluster auth token forwarding over plain HTTP

forwardCommandToLeaderViaRaft() sends X-ArcadeDB-Cluster-Token as an HTTP header over a plain http:// connection. If the cluster is not behind a TLS-terminating proxy, the token is transmitted in clear text across the network. Consider:

  • Enforcing HTTPS for inter-node communication, or
  • Adding a note that this requires TLS at the deployment layer, and logging a warning if the leader address is http://.

Missing ATTRIBUTIONS.md update

5. Apache Ratis dependency not added to ATTRIBUTIONS.md

CLAUDE.md explicitly requires updating ATTRIBUTIONS.md (and NOTICE if applicable) for every new dependency. Apache Ratis is Apache 2.0 licensed and ships a NOTICE file — that content needs to be incorporated.


Design / Robustness

6. validateConfiguration() only warns, does not prevent misconfiguration

Two-node MAJORITY quorum will cause writes to block whenever one node is down. NONE quorum on a multi-node cluster allows split-brain. Currently the plugin only logs warnings. For cases that are definitively unsafe (e.g., NONE quorum with ≥3 nodes), consider throwing a ConfigurationException to prevent the server from starting with a known-broken configuration.

7. ThreadLocal schema WAL buffers are never cleaned up

schemaWalBuffer and schemaBucketDeltaBuffer in RaftReplicatedDatabase are ThreadLocal<List<...>> that accumulate data inside recordFileChanges() but are only cleared at the end of that method. If a thread in the server's HTTP thread pool carries these locals between requests without the cleanup path running (e.g., due to an exception), the buffers can leak. Use try/finally in recordFileChanges() to guarantee cleanup.

8. Leader failover during HTTP command forwarding has no retry

forwardCommandToLeaderViaRaft() does a single HTTP POST with no retry. If leadership transfers between the isLeader() check and the actual forward, the request will fail with a non-informative error. A retry loop with a fresh leader lookup (capped at ~3 attempts) would make the failover transparent to clients.

9. Schema entry decoder silently swallows IOException

The WAL section decoder inside decodeSchemaEntry() catches IOException to remain "backward-compatible with older log entries". Silent swallowing of deserialization errors can mask data corruption. At minimum, log at FINE level with the entry length and position so that issues are diagnosable.


Minor / Style

10. Typo in test utility filename

ClusterDatatbaseChecker.java — "Datatbase" should be "Database".

11. docs/plans/ should not be committed

Six AI implementation plan files (2026-02-15-ha-raft-impl.md, 2026-02-15-ha-raft-redesign.md, etc.) are committed to the repository. These contain AI-specific prompts ("For Claude: REQUIRED SUB-SKILL: Use superpowers:executing-plans…") and step-by-step scaffolding that is not useful to users or contributors. They add noise to the git log and inflate repo size. These should be removed before merge.

12. Over-broad GitHub Actions permissions in e2e-ha.yml

The e2e-ha.yml workflow requests contents: write, packages: write, attestations: write, and id-token: write. A test workflow only needs contents: read (and packages: read if pulling images from GHCR). Drop the unused permissions to follow least-privilege.

13. GetClusterHandler resolves RaftHAPlugin on every request

The handler calls server.getPlugin(...) on every HTTP request rather than caching the reference at construction time. This is a hot path for monitoring tools. Consider injecting RaftHAPlugin in the constructor (or via RaftHAPlugin.registerAPI()), with a null check returning 503 if not yet started.


Summary

The architectural approach (Ratis + WAL page diffs as log entries + wrapping DatabaseInternal) is sound, and the test breadth is impressive. The two showstopper issues before this can be relied upon in production are (1) the leader skip in applyTxEntry which can cause post-crash inconsistency, and (2) the unimplemented snapshot installation which makes lagging-node recovery unavailable. Everything else is addressable incrementally.

@github-actions
Copy link
Copy Markdown
Contributor

📜 License Compliance Check

✅ License check passed. See artifacts for full report.

License Summary (first 50 lines)

Lists of 402 third-party dependencies.
     (Apache License 2.0) LZ4 Java Compression (at.yawk.lz4:lz4-java:1.10.4 - https://github.com/yawkat/lz4-java)
     (EPL 2.0) (GNU Lesser General Public License) Logback Classic Module (ch.qos.logback:logback-classic:1.5.32 - http://logback.qos.ch/logback-classic)
     (EPL 2.0) (GNU Lesser General Public License) Logback Core Module (ch.qos.logback:logback-core:1.5.32 - http://logback.qos.ch/logback-core)
     (Apache 2) ArcadeDB BOLT Protocol (com.arcadedb:arcadedb-bolt:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-bolt/)
     (Apache 2) ArcadeDB Console (com.arcadedb:arcadedb-console:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-console/)
     (Apache 2) ArcadeDB Engine (com.arcadedb:arcadedb-engine:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-engine/)
     (Apache 2) ArcadeDB GraphQL (com.arcadedb:arcadedb-graphql:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-graphql/)
     (Apache 2) ArcadeDB Gremlin (com.arcadedb:arcadedb-gremlin:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-gremlin/)
     (Apache 2) ArcadeDB gRPC Stubs (com.arcadedb:arcadedb-grpc:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-grpc/)
     (Apache 2) ArcadeDB gRPC Client (com.arcadedb:arcadedb-grpc-client:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-grpc-client/)
     (Apache 2) ArcadeDB gRpcW (com.arcadedb:arcadedb-grpcw:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-grpcw/)
     (Apache 2) ArcadeDB HA Raft (com.arcadedb:arcadedb-ha-raft:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-ha-raft/)
     (Apache 2) ArcadeDB Integration (com.arcadedb:arcadedb-integration:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-integration/)
     (Apache 2) ArcadeDB load tests (com.arcadedb:arcadedb-load-tests:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-load-tests/)
     (Apache 2) ArcadeDB Metrics (com.arcadedb:arcadedb-metrics:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-metrics/)
     (Apache 2) ArcadeDB MongoDB Wire Protocol (com.arcadedb:arcadedb-mongodbw:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-mongodbw/)
     (Apache 2) ArcadeDB Network (com.arcadedb:arcadedb-network:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-network/)
     (Apache 2) ArcadeDB PostgresW (com.arcadedb:arcadedb-postgresw:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-postgresw/)
     (Apache 2) ArcadeDB RedisW (com.arcadedb:arcadedb-redisw:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-redisw/)
     (Apache 2) ArcadeDB Server (com.arcadedb:arcadedb-server:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-server/)
     (Apache 2) ArcadeDB Studio (com.arcadedb:arcadedb-studio:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-studio/)
     (Apache 2) ArcadeDB Test Utils (com.arcadedb:arcadedb-test-utils:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-test-utils/)
     (Apache License 2.0) HPPC Collections (com.carrotsearch:hppc:0.7.1 - http://labs.carrotsearch.com/hppc.html/hppc)
     (Apache License 2.0) Metrics Core (com.codahale.metrics:metrics-core:3.0.2 - http://metrics.codahale.com/metrics-core/)
     (The Apache License, Version 2.0) com.conversantmedia:disruptor (com.conversantmedia:disruptor:1.2.21 - https://github.com/conversant/disruptor)
     (Apache License 2.0) Jackson-annotations (com.fasterxml.jackson.core:jackson-annotations:2.20 - https://github.com/FasterXML/jackson)
     (Apache License 2.0) Jackson-annotations (com.fasterxml.jackson.core:jackson-annotations:2.21 - https://github.com/FasterXML/jackson)
     (Apache License 2.0) Jackson-core (com.fasterxml.jackson.core:jackson-core:2.21.1 - https://github.com/FasterXML/jackson-core)
     (Apache License 2.0) Jackson-core (com.fasterxml.jackson.core:jackson-core:2.21.2 - https://github.com/FasterXML/jackson-core)
     (Apache License 2.0) jackson-databind (com.fasterxml.jackson.core:jackson-databind:2.21.1 - https://github.com/FasterXML/jackson)
     (Apache License 2.0) jackson-databind (com.fasterxml.jackson.core:jackson-databind:2.21.2 - https://github.com/FasterXML/jackson)
     (Apache License 2.0) Jackson-dataformat-YAML (com.fasterxml.jackson.dataformat:jackson-dataformat-yaml:2.21.1 - https://github.com/FasterXML/jackson-dataformats-text)
     (Apache License 2.0) Jackson datatype: JSR310 (com.fasterxml.jackson.datatype:jackson-datatype-jsr310:2.21.1 - https://github.com/FasterXML/jackson-modules-java8/jackson-datatype-jsr310)
     (Apache License 2.0) Caffeine cache (com.github.ben-manes.caffeine:caffeine:2.3.1 - https://github.com/ben-manes/caffeine)
     (Apache License 2.0) docker-java-api (com.github.docker-java:docker-java-api:3.7.1 - https://github.com/docker-java/docker-java)
     (Apache License 2.0) docker-java-transport (com.github.docker-java:docker-java-transport:3.7.1 - https://github.com/docker-java/docker-java)
     (Apache License 2.0) docker-java-transport-zerodep (com.github.docker-java:docker-java-transport-zerodep:3.7.1 - https://github.com/docker-java/docker-java)
     (Apache Software License, version 2.0) (Lesser General Public License, version 3 or greater) btf (com.github.java-json-tools:btf:1.3 - https://github.com/java-json-tools/btf)
     (Apache Software License, version 2.0) (Lesser General Public License, version 3 or greater) jackson-coreutils (com.github.java-json-tools:jackson-coreutils:2.0 - https://github.com/java-json-tools/jackson-coreutils)
     (Apache Software License, version 2.0) (Lesser General Public License, version 3 or greater) jackson-coreutils-equivalence (com.github.java-json-tools:jackson-coreutils-equivalence:1.0 - https://github.com/java-json-tools/jackson-coreutils)
     (Apache Software License, version 2.0) (Lesser General Public License, version 3 or greater) json-patch (com.github.java-json-tools:json-patch:1.13 - https://github.com/java-json-tools/json-patch)
     (Apache Software License, version 2.0) (Lesser General Public License, version 3 or greater) json-schema-core (com.github.java-json-tools:json-schema-core:1.2.14 - https://github.com/java-json-tools/json-schema-core)
     (Apache Software License, version 2.0) (Lesser General Public License, version 3 or greater) json-schema-validator (com.github.java-json-tools:json-schema-validator:2.2.14 - https://github.com/java-json-tools/json-schema-validator)
     (Apache Software License, version 2.0) (Lesser General Public License, version 3 or greater) msg-simple (com.github.java-json-tools:msg-simple:1.2 - https://github.com/java-json-tools/msg-simple)
     (Apache Software License, version 2.0) (Lesser General Public License, version 3 or greater) uri-template (com.github.java-json-tools:uri-template:0.10 - https://github.com/java-json-tools/uri-template)
     (Apache License 2.0) (GNU Lesser General Public License) javaparser-core (com.github.javaparser:javaparser-core:3.26.3 - https://github.com/javaparser/javaparser-core)
     (Apache License 2.0) JCIP Annotations under Apache License (com.github.stephenc.jcip:jcip-annotations:1.0-1 - http://stephenc.github.com/jcip-annotations)
     (Apache License 2.0) Google Android Annotations Library (com.google.android:annotations:4.1.1.4 - http://source.android.com/)

@claude
Copy link
Copy Markdown

claude bot commented Mar 26, 2026

PR Review posted via separate file.

@claude
Copy link
Copy Markdown

claude bot commented Mar 26, 2026

FULL REVIEW POSTED BELOW - please see the next comment for the complete review content

@claude
Copy link
Copy Markdown

claude bot commented Mar 26, 2026

PR Review: Raft-based High Availability using Apache Ratis

This is an ambitious and well-structured PR. Architecture is sound and test coverage is impressive (unit + integration + e2e with Toxiproxy). Issues found, ordered by severity:

Critical 1: Snapshot resync incomplete but log purging enabled
SnapshotManager is documented as not yet wired to Ratis installSnapshot(), yet setPurgeUptoSnapshotIndex=true is set in RaftHAServer. Raft compacts the log after snapshots but lagging replicas cannot catch up. Either disable log purging until installSnapshot() is fully implemented, or complete the integration before merging.

Critical 2: deserializeWalTransaction duplicates WAL internal format
ArcadeStateMachine.deserializeWalTransaction() hand-rolls binary WAL deserialization via a raw ByteBuffer. If any field in WALFile.WALPage or WALFile.WALTransaction changes, this silently breaks without any compile-time signal. Delegate to WALFile own deserialization methods instead.

Critical 3: ignoreErrors=true swallows replay failures
Both applyTxEntry and applySchemaEntry pass ignoreErrors=true to applyChanges(). Data corruption during WAL replay is silently skipped and can leave replicas in an inconsistent state.

Concurrency 4: databaseWrapper field is not volatile
In ArcadeDBServer, databaseWrapper is written by setDatabaseWrapper() and read by createDatabase()/getDatabase() from different threads with no synchronization or volatile. Threads may see a stale null.

Concurrency 5: rewrapDatabases() is not atomic
rewrapDatabases() iterates the databases ConcurrentHashMap and calls put() during iteration. A concurrent getDatabase() can return the old unwrapped ServerDatabase, giving a caller an un-replicated handle.

Concurrency 6: Static HTTP_CLIENT in RaftReplicatedDatabase
HttpClient.newHttpClient() as a static field is shared across all instances and tests. HttpClient thread pools will not be shut down between tests, risking leaks and cross-test interference.

Design 7: HA_SERVER_LIST format change breaks existing users
HA_SERVER_LIST description changed from hostname:port to host:raftPort:httpPort[:priority], incompatible with the legacy HAServer parser. Introduce a dedicated HA_RAFT_SERVER_LIST key for the Raft format.

Design 8: Peer identity tied to server name numeric suffix
findLocalPeerId extracts peer index from trailing digits of server name (arcadedb-2 becomes peer-2). This breaks if a peer is removed from the middle of the list.

Design 9: maven-shade-plugin has no configuration
ha-raft/pom.xml declares maven-shade-plugin with no executions or configuration, producing an uberjar with all Ratis/gRPC/Netty transitive deps. Add proper configuration or remove the plugin.

Security 10: /api/v1/cluster authentication
GetClusterHandler exposes full cluster topology. Confirm authentication is enforced by the parent class and add a test rejecting unauthenticated requests.

Security 11: Cluster token in plaintext
initClusterToken() writes the UUID token to cluster-token.txt and sends it as plain HTTP header X-ArcadeDB-Cluster-Token. Ensure file permissions are 600 and the file is excluded from snapshots.

Minor 12: ATTRIBUTIONS.md and NOTICE not updated for Apache Ratis 3.2.0.
Minor 13: docs/plans/ AI-generated plan files with local worktree paths committed to the public repo. Remove them.
Minor 14: RaftHAPlugin.registerAPI always registers /api/v1/cluster even when HA_IMPLEMENTATION=legacy. Gate the registration.
Minor 15: Commented-out ServerMonitor line in ArcadeDBServer. Remove before merging.
Minor 16: Typo: ClusterDatatbaseChecker.java should be ClusterDatabaseChecker.java.

The primary blockers are the incomplete snapshot resync with enabled log purging (1), WAL deserialization duplication (2), ignoreErrors=true during replay (3), and the HA_SERVER_LIST breaking change (7). Concurrency issues (4-6) and the committed plan documents (13) should also be fixed before merge.

Generated with Claude Code

@codacy-production
Copy link
Copy Markdown

Coverage summary from Codacy

See diff coverage on Codacy

Coverage variation Diff coverage
Report missing for 99a4a121 69.25%
Coverage variation details
Coverable lines Covered lines Coverage
Common ancestor commit (99a4a12) Report Missing Report Missing Report Missing
Head commit (89915ae) 147352 95427 64.76%

Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch: <coverage of head commit> - <coverage of common ancestor commit>

Diff coverage details
Coverable lines Covered lines Diff coverage
Pull request (#3731) 1047 725 69.25%

Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: <covered lines added or modified>/<coverable lines added or modified> * 100%

See your quality gate settings    Change summary preferences

Footnotes

  1. Codacy didn't receive coverage data for the commit, or there was an error processing the received data. Check your integration for errors and validate that your coverage setup is correct.

@codecov
Copy link
Copy Markdown

codecov bot commented Mar 26, 2026

Codecov Report

❌ Patch coverage is 73.17881% with 81 lines in your changes missing coverage. Please review.
✅ Project coverage is 64.48%. Comparing base (a763aee) to head (41e8e82).
⚠️ Report is 3 commits behind head on main.

Files with missing lines Patch % Lines
...om/arcadedb/server/ha/raft/ArcadeStateMachine.java 72.88% 41 Missing and 23 partials ⚠️
...n/java/com/arcadedb/engine/TransactionManager.java 25.00% 7 Missing and 2 partials ⚠️
...java/com/arcadedb/database/DatabaseComparator.java 53.84% 4 Missing and 2 partials ⚠️
.../src/main/java/com/arcadedb/database/Database.java 0.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3731      +/-   ##
==========================================
- Coverage   64.70%   64.48%   -0.22%     
==========================================
  Files        1577     1568       -9     
  Lines      116168   116232      +64     
  Branches    24649    24693      +44     
==========================================
- Hits        75161    74950     -211     
- Misses      30743    30909     +166     
- Partials    10264    10373     +109     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@claude
Copy link
Copy Markdown

claude bot commented Mar 26, 2026

Code Review: Raft-Based High Availability (#3731)

This is an ambitious and well-structured PR that replaces the custom replication protocol with Apache Ratis (Raft consensus). The architecture is sound and the code is generally clean. Below is my detailed review.


Summary

82 files changed, ~15,900 additions. Introduces a new ha-raft Maven module with 10 core components, 25 unit tests, 14 integration tests, and 9 e2e container tests. The design correctly uses ServiceLoader for plugin discovery, binary serialization for efficiency, and proper Ratis abstractions.


Critical Issues

1. Snapshot-based recovery not implemented (ArcadeStateMachine.java)

The comments explicitly acknowledge that takeSnapshot() / restoreSnapshot() are not implemented. This is a hard blocker for production readiness: if a replica falls far enough behind that Raft log entries are purged (after the 10,000-entry threshold), the replica cannot resync and is permanently stuck. This should either be implemented or the PR scope should be clearly documented as "not production-ready until snapshot is implemented."

2. Race condition in cluster token initialization (RaftHAServer.java ~lines 380-398)

If multiple nodes start simultaneously, they may both attempt to create/write the cluster token file without any locking or atomic file creation. Use Files.createFile() (which throws FileAlreadyExistsException atomically) or similar safe-create-or-read pattern.


High Priority Issues

3. ThreadLocal buffer leak (RaftReplicatedDatabase.java lines 77-78)

private static final ThreadLocal<List<byte[]>> schemaWalBuffer = ...
private static final ThreadLocal<List<Map<Integer, Integer>>> schemaBucketDeltaBuffer = ...

If an exception is thrown between the buffer-clear and buffer-send operations, the buffers are left populated. Wrap buffer usage in try-finally to guarantee cleanup. In a thread-pool environment this causes state to leak into the next operation on the same thread.

4. Blocking HTTP call without timeout (RaftReplicatedDatabase.java ~line 927)

final HttpResponse<String> response = HTTP_CLIENT.send(builder.build(), HttpResponse.BodyHandlers.ofString());

This blocking call has no timeout. If the leader is slow or unreachable, the calling thread hangs indefinitely. Configure HttpClient with a connectTimeout and use sendAsync with a timeout, or at minimum document the reliance on OS-level TCP timeouts.

5. Silent error handling in forwarded command responses (RaftReplicatedDatabase.java ~lines 947-965)

When the leader returns an error response (e.g. {"error": "type not found"}), parseResultSetFromJson() appears to return an empty result set silently. The calling client has no way to distinguish "command succeeded but returned 0 results" from "command failed on leader." Add explicit error status checking on the HTTP response before parsing.


Medium Priority Issues

6. Database name not URL-encoded in forwarded HTTP URI

"http://" + leaderHttpAddress + "/api/v1/command/" + getName()

While ArcadeDB name validation likely prevents special characters, URL-encode the database name as defense-in-depth (e.g. using URLEncoder.encode()).

7. Server name parsing is fragile (RaftHAServer.findLocalPeerId() ~lines 186-195)

Parsing the peer ID by looking for the last hyphen/separator assumes a specific naming convention. A server name like arc-prod-node-1 has multiple separators, making the numeric suffix extraction error-prone. Add an explicit validation that the parsed suffix is a valid integer and fail fast with a descriptive message.

8. Leader forwarding idempotency gap

When a replica forwards a DDL command to the leader and the leader crashes mid-execution, a retry from the replica could re-execute the same DDL on the new leader, causing unexpected side effects. Consider adding an idempotency/request ID header to forwarded commands.


Code Style / Minor Issues

9. Broad Exception catch — Several places catch Exception broadly which masks programming errors. Split into IOException, InterruptedException, etc. and rethrow unexpected exceptions.

10. notes.txt in .gitignore — Appears to be a developer artifact; remove if not intentional.

11. Test assertions — Per project conventions, prefer assertThat(x).isTrue() over assertTrue(x) where applicable in new tests.


Security Notes

  • The X-ArcadeDB-Cluster-Token sent over HTTP for leader forwarding should be accompanied by a recommendation/enforcement to use HTTPS for cluster-internal communication, otherwise the token is exposed in plaintext on the wire.
  • /api/v1/cluster (GetClusterHandler) exposes all peer HTTP addresses and leadership info. Consider requiring authentication for this endpoint if it does not already.

Performance Notes

  • Binary codec and ThreadLocal buffers are good choices for GC pressure reduction.
  • The async CompletableFuture return from applyTransaction() is correct for Ratis.
  • Schema replication is synchronous through Raft, which is correct for consistency but means large index creations will serialize writes — acceptable as a known trade-off, just worth documenting.

Test Coverage

Good breadth of testing. Specific gaps worth addressing before production:

  • No split-brain / network partition recovery test (beyond quorum-loss)
  • No concurrent DDL from multiple replicas simultaneously
  • No test for the snapshot threshold boundary (what happens at entry 10,001 without snapshot support)
  • No rolling-upgrade compatibility test

Summary Assessment

The architecture is solid and the Raft integration is well-done. The code is readable and follows project conventions well. The main concerns are:

  1. Snapshot recovery must be implemented (or the PR explicitly scoped as a stepping stone)
  2. Cluster token race condition needs a fix before multi-node startup is reliable
  3. ThreadLocal cleanup and HTTP timeout should be addressed before merging

Happy to discuss any of these points further.

@github-actions
Copy link
Copy Markdown
Contributor

📜 License Compliance Check

✅ License check passed. See artifacts for full report.

License Summary (first 50 lines)

Lists of 402 third-party dependencies.
     (Apache License 2.0) LZ4 Java Compression (at.yawk.lz4:lz4-java:1.10.4 - https://github.com/yawkat/lz4-java)
     (EPL 2.0) (GNU Lesser General Public License) Logback Classic Module (ch.qos.logback:logback-classic:1.5.32 - http://logback.qos.ch/logback-classic)
     (EPL 2.0) (GNU Lesser General Public License) Logback Core Module (ch.qos.logback:logback-core:1.5.32 - http://logback.qos.ch/logback-core)
     (Apache 2) ArcadeDB BOLT Protocol (com.arcadedb:arcadedb-bolt:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-bolt/)
     (Apache 2) ArcadeDB Console (com.arcadedb:arcadedb-console:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-console/)
     (Apache 2) ArcadeDB Engine (com.arcadedb:arcadedb-engine:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-engine/)
     (Apache 2) ArcadeDB GraphQL (com.arcadedb:arcadedb-graphql:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-graphql/)
     (Apache 2) ArcadeDB Gremlin (com.arcadedb:arcadedb-gremlin:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-gremlin/)
     (Apache 2) ArcadeDB gRPC Stubs (com.arcadedb:arcadedb-grpc:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-grpc/)
     (Apache 2) ArcadeDB gRPC Client (com.arcadedb:arcadedb-grpc-client:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-grpc-client/)
     (Apache 2) ArcadeDB gRpcW (com.arcadedb:arcadedb-grpcw:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-grpcw/)
     (Apache 2) ArcadeDB HA Raft (com.arcadedb:arcadedb-ha-raft:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-ha-raft/)
     (Apache 2) ArcadeDB Integration (com.arcadedb:arcadedb-integration:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-integration/)
     (Apache 2) ArcadeDB load tests (com.arcadedb:arcadedb-load-tests:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-load-tests/)
     (Apache 2) ArcadeDB Metrics (com.arcadedb:arcadedb-metrics:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-metrics/)
     (Apache 2) ArcadeDB MongoDB Wire Protocol (com.arcadedb:arcadedb-mongodbw:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-mongodbw/)
     (Apache 2) ArcadeDB Network (com.arcadedb:arcadedb-network:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-network/)
     (Apache 2) ArcadeDB PostgresW (com.arcadedb:arcadedb-postgresw:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-postgresw/)
     (Apache 2) ArcadeDB RedisW (com.arcadedb:arcadedb-redisw:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-redisw/)
     (Apache 2) ArcadeDB Server (com.arcadedb:arcadedb-server:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-server/)
     (Apache 2) ArcadeDB Studio (com.arcadedb:arcadedb-studio:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-studio/)
     (Apache 2) ArcadeDB Test Utils (com.arcadedb:arcadedb-test-utils:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-test-utils/)
     (Apache License 2.0) HPPC Collections (com.carrotsearch:hppc:0.7.1 - http://labs.carrotsearch.com/hppc.html/hppc)
     (Apache License 2.0) Metrics Core (com.codahale.metrics:metrics-core:3.0.2 - http://metrics.codahale.com/metrics-core/)
     (The Apache License, Version 2.0) com.conversantmedia:disruptor (com.conversantmedia:disruptor:1.2.21 - https://github.com/conversant/disruptor)
     (Apache License 2.0) Jackson-annotations (com.fasterxml.jackson.core:jackson-annotations:2.20 - https://github.com/FasterXML/jackson)
     (Apache License 2.0) Jackson-annotations (com.fasterxml.jackson.core:jackson-annotations:2.21 - https://github.com/FasterXML/jackson)
     (Apache License 2.0) Jackson-core (com.fasterxml.jackson.core:jackson-core:2.21.1 - https://github.com/FasterXML/jackson-core)
     (Apache License 2.0) Jackson-core (com.fasterxml.jackson.core:jackson-core:2.21.2 - https://github.com/FasterXML/jackson-core)
     (Apache License 2.0) jackson-databind (com.fasterxml.jackson.core:jackson-databind:2.21.1 - https://github.com/FasterXML/jackson)
     (Apache License 2.0) jackson-databind (com.fasterxml.jackson.core:jackson-databind:2.21.2 - https://github.com/FasterXML/jackson)
     (Apache License 2.0) Jackson-dataformat-YAML (com.fasterxml.jackson.dataformat:jackson-dataformat-yaml:2.21.1 - https://github.com/FasterXML/jackson-dataformats-text)
     (Apache License 2.0) Jackson datatype: JSR310 (com.fasterxml.jackson.datatype:jackson-datatype-jsr310:2.21.1 - https://github.com/FasterXML/jackson-modules-java8/jackson-datatype-jsr310)
     (Apache License 2.0) Caffeine cache (com.github.ben-manes.caffeine:caffeine:2.3.1 - https://github.com/ben-manes/caffeine)
     (Apache License 2.0) docker-java-api (com.github.docker-java:docker-java-api:3.7.1 - https://github.com/docker-java/docker-java)
     (Apache License 2.0) docker-java-transport (com.github.docker-java:docker-java-transport:3.7.1 - https://github.com/docker-java/docker-java)
     (Apache License 2.0) docker-java-transport-zerodep (com.github.docker-java:docker-java-transport-zerodep:3.7.1 - https://github.com/docker-java/docker-java)
     (Apache Software License, version 2.0) (Lesser General Public License, version 3 or greater) btf (com.github.java-json-tools:btf:1.3 - https://github.com/java-json-tools/btf)
     (Apache Software License, version 2.0) (Lesser General Public License, version 3 or greater) jackson-coreutils (com.github.java-json-tools:jackson-coreutils:2.0 - https://github.com/java-json-tools/jackson-coreutils)
     (Apache Software License, version 2.0) (Lesser General Public License, version 3 or greater) jackson-coreutils-equivalence (com.github.java-json-tools:jackson-coreutils-equivalence:1.0 - https://github.com/java-json-tools/jackson-coreutils)
     (Apache Software License, version 2.0) (Lesser General Public License, version 3 or greater) json-patch (com.github.java-json-tools:json-patch:1.13 - https://github.com/java-json-tools/json-patch)
     (Apache Software License, version 2.0) (Lesser General Public License, version 3 or greater) json-schema-core (com.github.java-json-tools:json-schema-core:1.2.14 - https://github.com/java-json-tools/json-schema-core)
     (Apache Software License, version 2.0) (Lesser General Public License, version 3 or greater) json-schema-validator (com.github.java-json-tools:json-schema-validator:2.2.14 - https://github.com/java-json-tools/json-schema-validator)
     (Apache Software License, version 2.0) (Lesser General Public License, version 3 or greater) msg-simple (com.github.java-json-tools:msg-simple:1.2 - https://github.com/java-json-tools/msg-simple)
     (Apache Software License, version 2.0) (Lesser General Public License, version 3 or greater) uri-template (com.github.java-json-tools:uri-template:0.10 - https://github.com/java-json-tools/uri-template)
     (Apache License 2.0) (GNU Lesser General Public License) javaparser-core (com.github.javaparser:javaparser-core:3.26.3 - https://github.com/javaparser/javaparser-core)
     (Apache License 2.0) JCIP Annotations under Apache License (com.github.stephenc.jcip:jcip-annotations:1.0-1 - http://stephenc.github.com/jcip-annotations)
     (Apache License 2.0) Google Android Annotations Library (com.google.android:annotations:4.1.1.4 - http://source.android.com/)

2 similar comments
@github-actions
Copy link
Copy Markdown
Contributor

📜 License Compliance Check

✅ License check passed. See artifacts for full report.

License Summary (first 50 lines)

Lists of 402 third-party dependencies.
     (Apache License 2.0) LZ4 Java Compression (at.yawk.lz4:lz4-java:1.10.4 - https://github.com/yawkat/lz4-java)
     (EPL 2.0) (GNU Lesser General Public License) Logback Classic Module (ch.qos.logback:logback-classic:1.5.32 - http://logback.qos.ch/logback-classic)
     (EPL 2.0) (GNU Lesser General Public License) Logback Core Module (ch.qos.logback:logback-core:1.5.32 - http://logback.qos.ch/logback-core)
     (Apache 2) ArcadeDB BOLT Protocol (com.arcadedb:arcadedb-bolt:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-bolt/)
     (Apache 2) ArcadeDB Console (com.arcadedb:arcadedb-console:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-console/)
     (Apache 2) ArcadeDB Engine (com.arcadedb:arcadedb-engine:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-engine/)
     (Apache 2) ArcadeDB GraphQL (com.arcadedb:arcadedb-graphql:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-graphql/)
     (Apache 2) ArcadeDB Gremlin (com.arcadedb:arcadedb-gremlin:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-gremlin/)
     (Apache 2) ArcadeDB gRPC Stubs (com.arcadedb:arcadedb-grpc:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-grpc/)
     (Apache 2) ArcadeDB gRPC Client (com.arcadedb:arcadedb-grpc-client:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-grpc-client/)
     (Apache 2) ArcadeDB gRpcW (com.arcadedb:arcadedb-grpcw:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-grpcw/)
     (Apache 2) ArcadeDB HA Raft (com.arcadedb:arcadedb-ha-raft:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-ha-raft/)
     (Apache 2) ArcadeDB Integration (com.arcadedb:arcadedb-integration:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-integration/)
     (Apache 2) ArcadeDB load tests (com.arcadedb:arcadedb-load-tests:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-load-tests/)
     (Apache 2) ArcadeDB Metrics (com.arcadedb:arcadedb-metrics:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-metrics/)
     (Apache 2) ArcadeDB MongoDB Wire Protocol (com.arcadedb:arcadedb-mongodbw:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-mongodbw/)
     (Apache 2) ArcadeDB Network (com.arcadedb:arcadedb-network:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-network/)
     (Apache 2) ArcadeDB PostgresW (com.arcadedb:arcadedb-postgresw:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-postgresw/)
     (Apache 2) ArcadeDB RedisW (com.arcadedb:arcadedb-redisw:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-redisw/)
     (Apache 2) ArcadeDB Server (com.arcadedb:arcadedb-server:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-server/)
     (Apache 2) ArcadeDB Studio (com.arcadedb:arcadedb-studio:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-studio/)
     (Apache 2) ArcadeDB Test Utils (com.arcadedb:arcadedb-test-utils:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-test-utils/)
     (Apache License 2.0) HPPC Collections (com.carrotsearch:hppc:0.7.1 - http://labs.carrotsearch.com/hppc.html/hppc)
     (Apache License 2.0) Metrics Core (com.codahale.metrics:metrics-core:3.0.2 - http://metrics.codahale.com/metrics-core/)
     (The Apache License, Version 2.0) com.conversantmedia:disruptor (com.conversantmedia:disruptor:1.2.21 - https://github.com/conversant/disruptor)
     (Apache License 2.0) Jackson-annotations (com.fasterxml.jackson.core:jackson-annotations:2.20 - https://github.com/FasterXML/jackson)
     (Apache License 2.0) Jackson-annotations (com.fasterxml.jackson.core:jackson-annotations:2.21 - https://github.com/FasterXML/jackson)
     (Apache License 2.0) Jackson-core (com.fasterxml.jackson.core:jackson-core:2.21.1 - https://github.com/FasterXML/jackson-core)
     (Apache License 2.0) Jackson-core (com.fasterxml.jackson.core:jackson-core:2.21.2 - https://github.com/FasterXML/jackson-core)
     (Apache License 2.0) jackson-databind (com.fasterxml.jackson.core:jackson-databind:2.21.1 - https://github.com/FasterXML/jackson)
     (Apache License 2.0) jackson-databind (com.fasterxml.jackson.core:jackson-databind:2.21.2 - https://github.com/FasterXML/jackson)
     (Apache License 2.0) Jackson-dataformat-YAML (com.fasterxml.jackson.dataformat:jackson-dataformat-yaml:2.21.1 - https://github.com/FasterXML/jackson-dataformats-text)
     (Apache License 2.0) Jackson datatype: JSR310 (com.fasterxml.jackson.datatype:jackson-datatype-jsr310:2.21.1 - https://github.com/FasterXML/jackson-modules-java8/jackson-datatype-jsr310)
     (Apache License 2.0) Caffeine cache (com.github.ben-manes.caffeine:caffeine:2.3.1 - https://github.com/ben-manes/caffeine)
     (Apache License 2.0) docker-java-api (com.github.docker-java:docker-java-api:3.7.1 - https://github.com/docker-java/docker-java)
     (Apache License 2.0) docker-java-transport (com.github.docker-java:docker-java-transport:3.7.1 - https://github.com/docker-java/docker-java)
     (Apache License 2.0) docker-java-transport-zerodep (com.github.docker-java:docker-java-transport-zerodep:3.7.1 - https://github.com/docker-java/docker-java)
     (Apache Software License, version 2.0) (Lesser General Public License, version 3 or greater) btf (com.github.java-json-tools:btf:1.3 - https://github.com/java-json-tools/btf)
     (Apache Software License, version 2.0) (Lesser General Public License, version 3 or greater) jackson-coreutils (com.github.java-json-tools:jackson-coreutils:2.0 - https://github.com/java-json-tools/jackson-coreutils)
     (Apache Software License, version 2.0) (Lesser General Public License, version 3 or greater) jackson-coreutils-equivalence (com.github.java-json-tools:jackson-coreutils-equivalence:1.0 - https://github.com/java-json-tools/jackson-coreutils)
     (Apache Software License, version 2.0) (Lesser General Public License, version 3 or greater) json-patch (com.github.java-json-tools:json-patch:1.13 - https://github.com/java-json-tools/json-patch)
     (Apache Software License, version 2.0) (Lesser General Public License, version 3 or greater) json-schema-core (com.github.java-json-tools:json-schema-core:1.2.14 - https://github.com/java-json-tools/json-schema-core)
     (Apache Software License, version 2.0) (Lesser General Public License, version 3 or greater) json-schema-validator (com.github.java-json-tools:json-schema-validator:2.2.14 - https://github.com/java-json-tools/json-schema-validator)
     (Apache Software License, version 2.0) (Lesser General Public License, version 3 or greater) msg-simple (com.github.java-json-tools:msg-simple:1.2 - https://github.com/java-json-tools/msg-simple)
     (Apache Software License, version 2.0) (Lesser General Public License, version 3 or greater) uri-template (com.github.java-json-tools:uri-template:0.10 - https://github.com/java-json-tools/uri-template)
     (Apache License 2.0) (GNU Lesser General Public License) javaparser-core (com.github.javaparser:javaparser-core:3.26.3 - https://github.com/javaparser/javaparser-core)
     (Apache License 2.0) JCIP Annotations under Apache License (com.github.stephenc.jcip:jcip-annotations:1.0-1 - http://stephenc.github.com/jcip-annotations)
     (Apache License 2.0) Google Android Annotations Library (com.google.android:annotations:4.1.1.4 - http://source.android.com/)

@github-actions
Copy link
Copy Markdown
Contributor

📜 License Compliance Check

✅ License check passed. See artifacts for full report.

License Summary (first 50 lines)

Lists of 402 third-party dependencies.
     (Apache License 2.0) LZ4 Java Compression (at.yawk.lz4:lz4-java:1.10.4 - https://github.com/yawkat/lz4-java)
     (EPL 2.0) (GNU Lesser General Public License) Logback Classic Module (ch.qos.logback:logback-classic:1.5.32 - http://logback.qos.ch/logback-classic)
     (EPL 2.0) (GNU Lesser General Public License) Logback Core Module (ch.qos.logback:logback-core:1.5.32 - http://logback.qos.ch/logback-core)
     (Apache 2) ArcadeDB BOLT Protocol (com.arcadedb:arcadedb-bolt:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-bolt/)
     (Apache 2) ArcadeDB Console (com.arcadedb:arcadedb-console:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-console/)
     (Apache 2) ArcadeDB Engine (com.arcadedb:arcadedb-engine:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-engine/)
     (Apache 2) ArcadeDB GraphQL (com.arcadedb:arcadedb-graphql:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-graphql/)
     (Apache 2) ArcadeDB Gremlin (com.arcadedb:arcadedb-gremlin:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-gremlin/)
     (Apache 2) ArcadeDB gRPC Stubs (com.arcadedb:arcadedb-grpc:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-grpc/)
     (Apache 2) ArcadeDB gRPC Client (com.arcadedb:arcadedb-grpc-client:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-grpc-client/)
     (Apache 2) ArcadeDB gRpcW (com.arcadedb:arcadedb-grpcw:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-grpcw/)
     (Apache 2) ArcadeDB HA Raft (com.arcadedb:arcadedb-ha-raft:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-ha-raft/)
     (Apache 2) ArcadeDB Integration (com.arcadedb:arcadedb-integration:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-integration/)
     (Apache 2) ArcadeDB load tests (com.arcadedb:arcadedb-load-tests:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-load-tests/)
     (Apache 2) ArcadeDB Metrics (com.arcadedb:arcadedb-metrics:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-metrics/)
     (Apache 2) ArcadeDB MongoDB Wire Protocol (com.arcadedb:arcadedb-mongodbw:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-mongodbw/)
     (Apache 2) ArcadeDB Network (com.arcadedb:arcadedb-network:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-network/)
     (Apache 2) ArcadeDB PostgresW (com.arcadedb:arcadedb-postgresw:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-postgresw/)
     (Apache 2) ArcadeDB RedisW (com.arcadedb:arcadedb-redisw:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-redisw/)
     (Apache 2) ArcadeDB Server (com.arcadedb:arcadedb-server:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-server/)
     (Apache 2) ArcadeDB Studio (com.arcadedb:arcadedb-studio:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-studio/)
     (Apache 2) ArcadeDB Test Utils (com.arcadedb:arcadedb-test-utils:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-test-utils/)
     (Apache License 2.0) HPPC Collections (com.carrotsearch:hppc:0.7.1 - http://labs.carrotsearch.com/hppc.html/hppc)
     (Apache License 2.0) Metrics Core (com.codahale.metrics:metrics-core:3.0.2 - http://metrics.codahale.com/metrics-core/)
     (The Apache License, Version 2.0) com.conversantmedia:disruptor (com.conversantmedia:disruptor:1.2.21 - https://github.com/conversant/disruptor)
     (Apache License 2.0) Jackson-annotations (com.fasterxml.jackson.core:jackson-annotations:2.20 - https://github.com/FasterXML/jackson)
     (Apache License 2.0) Jackson-annotations (com.fasterxml.jackson.core:jackson-annotations:2.21 - https://github.com/FasterXML/jackson)
     (Apache License 2.0) Jackson-core (com.fasterxml.jackson.core:jackson-core:2.21.1 - https://github.com/FasterXML/jackson-core)
     (Apache License 2.0) Jackson-core (com.fasterxml.jackson.core:jackson-core:2.21.2 - https://github.com/FasterXML/jackson-core)
     (Apache License 2.0) jackson-databind (com.fasterxml.jackson.core:jackson-databind:2.21.1 - https://github.com/FasterXML/jackson)
     (Apache License 2.0) jackson-databind (com.fasterxml.jackson.core:jackson-databind:2.21.2 - https://github.com/FasterXML/jackson)
     (Apache License 2.0) Jackson-dataformat-YAML (com.fasterxml.jackson.dataformat:jackson-dataformat-yaml:2.21.1 - https://github.com/FasterXML/jackson-dataformats-text)
     (Apache License 2.0) Jackson datatype: JSR310 (com.fasterxml.jackson.datatype:jackson-datatype-jsr310:2.21.1 - https://github.com/FasterXML/jackson-modules-java8/jackson-datatype-jsr310)
     (Apache License 2.0) Caffeine cache (com.github.ben-manes.caffeine:caffeine:2.3.1 - https://github.com/ben-manes/caffeine)
     (Apache License 2.0) docker-java-api (com.github.docker-java:docker-java-api:3.7.1 - https://github.com/docker-java/docker-java)
     (Apache License 2.0) docker-java-transport (com.github.docker-java:docker-java-transport:3.7.1 - https://github.com/docker-java/docker-java)
     (Apache License 2.0) docker-java-transport-zerodep (com.github.docker-java:docker-java-transport-zerodep:3.7.1 - https://github.com/docker-java/docker-java)
     (Apache Software License, version 2.0) (Lesser General Public License, version 3 or greater) btf (com.github.java-json-tools:btf:1.3 - https://github.com/java-json-tools/btf)
     (Apache Software License, version 2.0) (Lesser General Public License, version 3 or greater) jackson-coreutils (com.github.java-json-tools:jackson-coreutils:2.0 - https://github.com/java-json-tools/jackson-coreutils)
     (Apache Software License, version 2.0) (Lesser General Public License, version 3 or greater) jackson-coreutils-equivalence (com.github.java-json-tools:jackson-coreutils-equivalence:1.0 - https://github.com/java-json-tools/jackson-coreutils)
     (Apache Software License, version 2.0) (Lesser General Public License, version 3 or greater) json-patch (com.github.java-json-tools:json-patch:1.13 - https://github.com/java-json-tools/json-patch)
     (Apache Software License, version 2.0) (Lesser General Public License, version 3 or greater) json-schema-core (com.github.java-json-tools:json-schema-core:1.2.14 - https://github.com/java-json-tools/json-schema-core)
     (Apache Software License, version 2.0) (Lesser General Public License, version 3 or greater) json-schema-validator (com.github.java-json-tools:json-schema-validator:2.2.14 - https://github.com/java-json-tools/json-schema-validator)
     (Apache Software License, version 2.0) (Lesser General Public License, version 3 or greater) msg-simple (com.github.java-json-tools:msg-simple:1.2 - https://github.com/java-json-tools/msg-simple)
     (Apache Software License, version 2.0) (Lesser General Public License, version 3 or greater) uri-template (com.github.java-json-tools:uri-template:0.10 - https://github.com/java-json-tools/uri-template)
     (Apache License 2.0) (GNU Lesser General Public License) javaparser-core (com.github.javaparser:javaparser-core:3.26.3 - https://github.com/javaparser/javaparser-core)
     (Apache License 2.0) JCIP Annotations under Apache License (com.github.stephenc.jcip:jcip-annotations:1.0-1 - http://stephenc.github.com/jcip-annotations)
     (Apache License 2.0) Google Android Annotations Library (com.google.android:annotations:4.1.1.4 - http://source.android.com/)

@claude
Copy link
Copy Markdown

claude bot commented Mar 26, 2026

Code Review: Raft-based HA using Apache Ratis

Overall, this is a well-structured and well-tested implementation of a Raft-based HA stack. The use of Apache Ratis (Apache 2.0, battle-tested) is a sound choice, and the module structure correctly follows the project's provided-scope convention for the server dependency. The feature-flag approach (HA_IMPLEMENTATION=raft) keeping the legacy HA untouched is the right call for a safe migration path.

That said, there are several issues that should be addressed before merge.


BLOCKERS

1. System.out.println debug output must be removed

ClusterDatatbaseChecker.java contains a System.out.println call. Per project guidelines, all debug System.out calls must be removed before finishing. Replace with LogManager.instance().log(...) at FINE level, or remove entirely.

2. ATTRIBUTIONS.md and NOTICE not updated for Apache Ratis

The project requires: When adding a dependency, you MUST update ATTRIBUTIONS.md and, if Apache-licensed with a NOTICE file, incorporate required notices into the main NOTICE file. Apache Ratis is Apache-licensed and ships a NOTICE file. Neither ATTRIBUTIONS.md nor the project NOTICE appears to have been updated in this PR.

3. Unresolved TODO in codec — empty bucket deltas

RaftLogEntryCodec contains: final Map<Integer, Integer> bucketDeltas = Map.of(); // TODO: extract from phase1. An empty bucket-deltas map in a TX entry means replicas will not apply record-count updates correctly. This needs to be resolved before production use, or the TODO needs a clear follow-up issue and the known limitation documented.


IMPORTANT ISSUES

4. ThreadLocal buffers should call .remove() not just .clear()

RaftReplicatedDatabase uses static ThreadLocal buffers (schemaWalBuffer, schemaBucketDeltaBuffer). All cleanup paths call .get().clear(), which empties the list but keeps the ThreadLocal entry alive for the thread. In thread-pool environments (Undertow reuses threads), this is a per-thread memory leak. Use .remove() after consuming the buffers — withInitial will re-initialise them automatically on the next access.

5. installSnapshot() not implemented

One test is disabled: @Disabled("Requires ArcadeStateMachine.installSnapshot() for snapshot-based replica resync"). Without installSnapshot(), any replica that falls behind Raft log compaction cannot catch up automatically — it requires manual intervention. This is a significant operational gap. The Javadoc documents the limitation, but it should also be called out explicitly in the PR description and tracked as a follow-up issue.

6. Typo in test class name

ClusterDatatbaseChecker should be ClusterDatabaseChecker.


MINOR ISSUES

7. maven-shade-plugin — verify no classpath conflicts

The ha-raft pom uses maven-shade-plugin without explicit relocations. Ratis' bundled Protobuf and gRPC classes could collide with whatever the server already ships. Verify the shaded jar relocates conflicting packages, or confirm the distribution assembly handles this correctly.

8. HA_CLUSTER_TOKEN sent as plain HTTP header — review logging exposure

Verify the token is never logged at any level in AbstractServerHttpHandler or the forwarding code in RaftReplicatedDatabase. Also consider auto-generating a strong random UUID on first start when HA_CLUSTER_TOKEN is not explicitly set, rather than allowing an empty value in production.

9. findLastSeparatorIndex() fragile naming assumption

Peer ID resolution relies on the server name having a numeric suffix separated by - or _ (e.g., arcadedb-0). If a deployment uses a different naming convention, findLocalPeerId() fails silently. Add a ConfigurationException with a descriptive message when no match is found.

10. E2E workflow passes -DskipTests — suppresses the tests it is meant to run

The e2e-ha.yml workflow runs ./mvnw verify -DskipTests ... -pl e2e-ha. The -DskipTests flag skips all tests, defeating the purpose of the job. It should be -DskipTests=false or the flag should be omitted.


POSITIVE HIGHLIGHTS

  • Solid test pyramid: 25 unit + 14 integration (in-process MiniRaftCluster) + 9 e2e (Testcontainers + Toxiproxy) is genuinely comprehensive.
  • Correct use of provided scope for arcadedb-server, matching the documented wire-protocol module pattern.
  • volatile fields in ArcadeStateMachine, ConcurrentHashMap in ClusterMonitor — concurrency handled correctly.
  • Persisting last-applied term:index to disk prevents re-applying already-committed entries on restart.
  • Legacy HA stack untouched; HA_IMPLEMENTATION=legacy default ensures a safe rollout path.
  • Apache Ratis 3.2.0 is Apache 2.0 licensed — no license concern with the dependency choice itself.

@codacy-production
Copy link
Copy Markdown

codacy-production bot commented Mar 26, 2026

Coverage summary from Codacy

See diff coverage on Codacy

Coverage variation Diff coverage
-8.84% 72.61%
Coverage variation details
Coverable lines Covered lines Coverage
Common ancestor commit (85719a2) 115414 84829 73.50%
Head commit (b05d46c) 147349 (+31935) 95281 (+10452) 64.66% (-8.84%)

Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch: <coverage of head commit> - <coverage of common ancestor commit>

Diff coverage details
Coverable lines Covered lines Diff coverage
Pull request (#3731) 1044 758 72.61%

Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: <covered lines added or modified>/<coverable lines added or modified> * 100%

See your quality gate settings    Change summary preferences

@github-actions
Copy link
Copy Markdown
Contributor

📜 License Compliance Check

✅ License check passed. See artifacts for full report.

License Summary (first 50 lines)

Lists of 402 third-party dependencies.
     (Apache License 2.0) LZ4 Java Compression (at.yawk.lz4:lz4-java:1.10.4 - https://github.com/yawkat/lz4-java)
     (EPL 2.0) (GNU Lesser General Public License) Logback Classic Module (ch.qos.logback:logback-classic:1.5.32 - http://logback.qos.ch/logback-classic)
     (EPL 2.0) (GNU Lesser General Public License) Logback Core Module (ch.qos.logback:logback-core:1.5.32 - http://logback.qos.ch/logback-core)
     (Apache 2) ArcadeDB BOLT Protocol (com.arcadedb:arcadedb-bolt:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-bolt/)
     (Apache 2) ArcadeDB Console (com.arcadedb:arcadedb-console:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-console/)
     (Apache 2) ArcadeDB Engine (com.arcadedb:arcadedb-engine:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-engine/)
     (Apache 2) ArcadeDB GraphQL (com.arcadedb:arcadedb-graphql:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-graphql/)
     (Apache 2) ArcadeDB Gremlin (com.arcadedb:arcadedb-gremlin:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-gremlin/)
     (Apache 2) ArcadeDB gRPC Stubs (com.arcadedb:arcadedb-grpc:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-grpc/)
     (Apache 2) ArcadeDB gRPC Client (com.arcadedb:arcadedb-grpc-client:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-grpc-client/)
     (Apache 2) ArcadeDB gRpcW (com.arcadedb:arcadedb-grpcw:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-grpcw/)
     (Apache 2) ArcadeDB HA Raft (com.arcadedb:arcadedb-ha-raft:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-ha-raft/)
     (Apache 2) ArcadeDB Integration (com.arcadedb:arcadedb-integration:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-integration/)
     (Apache 2) ArcadeDB load tests (com.arcadedb:arcadedb-load-tests:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-load-tests/)
     (Apache 2) ArcadeDB Metrics (com.arcadedb:arcadedb-metrics:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-metrics/)
     (Apache 2) ArcadeDB MongoDB Wire Protocol (com.arcadedb:arcadedb-mongodbw:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-mongodbw/)
     (Apache 2) ArcadeDB Network (com.arcadedb:arcadedb-network:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-network/)
     (Apache 2) ArcadeDB PostgresW (com.arcadedb:arcadedb-postgresw:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-postgresw/)
     (Apache 2) ArcadeDB RedisW (com.arcadedb:arcadedb-redisw:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-redisw/)
     (Apache 2) ArcadeDB Server (com.arcadedb:arcadedb-server:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-server/)
     (Apache 2) ArcadeDB Studio (com.arcadedb:arcadedb-studio:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-studio/)
     (Apache 2) ArcadeDB Test Utils (com.arcadedb:arcadedb-test-utils:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-test-utils/)
     (Apache License 2.0) HPPC Collections (com.carrotsearch:hppc:0.7.1 - http://labs.carrotsearch.com/hppc.html/hppc)
     (Apache License 2.0) Metrics Core (com.codahale.metrics:metrics-core:3.0.2 - http://metrics.codahale.com/metrics-core/)
     (The Apache License, Version 2.0) com.conversantmedia:disruptor (com.conversantmedia:disruptor:1.2.21 - https://github.com/conversant/disruptor)
     (Apache License 2.0) Jackson-annotations (com.fasterxml.jackson.core:jackson-annotations:2.20 - https://github.com/FasterXML/jackson)
     (Apache License 2.0) Jackson-annotations (com.fasterxml.jackson.core:jackson-annotations:2.21 - https://github.com/FasterXML/jackson)
     (Apache License 2.0) Jackson-core (com.fasterxml.jackson.core:jackson-core:2.21.1 - https://github.com/FasterXML/jackson-core)
     (Apache License 2.0) Jackson-core (com.fasterxml.jackson.core:jackson-core:2.21.2 - https://github.com/FasterXML/jackson-core)
     (Apache License 2.0) jackson-databind (com.fasterxml.jackson.core:jackson-databind:2.21.1 - https://github.com/FasterXML/jackson)
     (Apache License 2.0) jackson-databind (com.fasterxml.jackson.core:jackson-databind:2.21.2 - https://github.com/FasterXML/jackson)
     (Apache License 2.0) Jackson-dataformat-YAML (com.fasterxml.jackson.dataformat:jackson-dataformat-yaml:2.21.1 - https://github.com/FasterXML/jackson-dataformats-text)
     (Apache License 2.0) Jackson datatype: JSR310 (com.fasterxml.jackson.datatype:jackson-datatype-jsr310:2.21.1 - https://github.com/FasterXML/jackson-modules-java8/jackson-datatype-jsr310)
     (Apache License 2.0) Caffeine cache (com.github.ben-manes.caffeine:caffeine:2.3.1 - https://github.com/ben-manes/caffeine)
     (Apache License 2.0) docker-java-api (com.github.docker-java:docker-java-api:3.7.1 - https://github.com/docker-java/docker-java)
     (Apache License 2.0) docker-java-transport (com.github.docker-java:docker-java-transport:3.7.1 - https://github.com/docker-java/docker-java)
     (Apache License 2.0) docker-java-transport-zerodep (com.github.docker-java:docker-java-transport-zerodep:3.7.1 - https://github.com/docker-java/docker-java)
     (Apache Software License, version 2.0) (Lesser General Public License, version 3 or greater) btf (com.github.java-json-tools:btf:1.3 - https://github.com/java-json-tools/btf)
     (Apache Software License, version 2.0) (Lesser General Public License, version 3 or greater) jackson-coreutils (com.github.java-json-tools:jackson-coreutils:2.0 - https://github.com/java-json-tools/jackson-coreutils)
     (Apache Software License, version 2.0) (Lesser General Public License, version 3 or greater) jackson-coreutils-equivalence (com.github.java-json-tools:jackson-coreutils-equivalence:1.0 - https://github.com/java-json-tools/jackson-coreutils)
     (Apache Software License, version 2.0) (Lesser General Public License, version 3 or greater) json-patch (com.github.java-json-tools:json-patch:1.13 - https://github.com/java-json-tools/json-patch)
     (Apache Software License, version 2.0) (Lesser General Public License, version 3 or greater) json-schema-core (com.github.java-json-tools:json-schema-core:1.2.14 - https://github.com/java-json-tools/json-schema-core)
     (Apache Software License, version 2.0) (Lesser General Public License, version 3 or greater) json-schema-validator (com.github.java-json-tools:json-schema-validator:2.2.14 - https://github.com/java-json-tools/json-schema-validator)
     (Apache Software License, version 2.0) (Lesser General Public License, version 3 or greater) msg-simple (com.github.java-json-tools:msg-simple:1.2 - https://github.com/java-json-tools/msg-simple)
     (Apache Software License, version 2.0) (Lesser General Public License, version 3 or greater) uri-template (com.github.java-json-tools:uri-template:0.10 - https://github.com/java-json-tools/uri-template)
     (Apache License 2.0) (GNU Lesser General Public License) javaparser-core (com.github.javaparser:javaparser-core:3.26.3 - https://github.com/javaparser/javaparser-core)
     (Apache License 2.0) JCIP Annotations under Apache License (com.github.stephenc.jcip:jcip-annotations:1.0-1 - http://stephenc.github.com/jcip-annotations)
     (Apache License 2.0) Google Android Annotations Library (com.google.android:annotations:4.1.1.4 - http://source.android.com/)

@claude
Copy link
Copy Markdown

claude bot commented Mar 27, 2026

PR Review: Raft-based High Availability using Apache Ratis

This is a substantial architectural change. The overall approach is sound and the test coverage is commendable. Here are the issues found.


CRITICAL / BLOCKERS

1. ATTRIBUTIONS.md and NOTICE not updated
Per the project's CLAUDE.md, adding Apache Ratis (3.2.0, Apache-licensed with a NOTICE file) requires updating ATTRIBUTIONS.md and incorporating required NOTICE content into the main NOTICE file. Neither file appears to have been changed in this PR.

2. Snapshot-based resync is not wired — incomplete feature
SnapshotManager.java exists as utilities only; the Javadoc explicitly admits: "Full installSnapshot() integration with Ratis is not yet wired — replicas that fall behind past log compaction require a manual data copy from the leader." Once Raft log compaction runs, a lagging replica cannot auto-recover. This is a significant operational gap that should be documented or blocked before merging.

3. Inter-node command forwarding uses plain HTTP with no auth fallback
In RaftReplicatedDatabase.forwardCommandToLeaderViaRaft, the URI is always constructed with http:// (no TLS). The cluster token is only added when clusterToken is not null and not blank. With no token configured, forwarded writes carry no inter-node authentication.

4. rewrapDatabases() has a non-atomic replace race
The databases map is a ConcurrentHashMap, but the rewrap reads and replaces non-atomically. Between entry.getValue() and databases.put(key, newServerDb), another thread may be serving the old unwrapped ServerDatabase. Use databases.replace(key, oldValue, newValue) or synchronize the rewrap.


SIGNIFICANT ISSUES

5. Wildcard imports in RaftReplicatedDatabase
The code uses backtick-import com.arcadedb.database.*backtick, backtick-import com.arcadedb.engine.*backtick, and backtick-import java.util.*backtick. Project coding standards require explicit imports (no wildcards).

6. Commented-out code left in ArcadeDBServer.java
There is a leftover commented-out private ServerMonitor serverMonitor declaration that should be removed.

7. Typo in ClusterDatatbaseChecker.java
The file and class are named ClusterDatatbaseChecker (double-t in "Database"). Should be ClusterDatabaseChecker.

8. forwardCommandToLeaderViaRaft — misleading name
The method name implies Raft transport, but the implementation uses HttpClient to the leader's HTTP endpoint. Consider renaming to forwardCommandToLeaderViaHttp.

9. persistLastApplied uses StandardOpenOption.SYNC on every transaction
Forcing an OS-level sync on every committed Raft entry is a performance bottleneck. Ratis itself tracks what is committed — losing this file on crash only means replaying a few already-applied entries (handled gracefully with ignoreErrors=true). Consider writing it periodically instead.

10. ClusterMonitor.getReplicaLags() creates a ConcurrentHashMap unnecessarily
The result map is built and returned, never shared. A regular HashMap avoids unnecessary overhead.

11. startLagMonitor / stopLagMonitor — no synchronization on executor lifecycle
Both methods access lagMonitorExecutor from potentially different threads (state machine callback vs stop) without synchronization. Use synchronized or AtomicReference to guard the executor.

12. ArcadeStateMachine.applyTxEntry — leader may double-apply transactions
The applyTransaction switch always calls applyTxEntry. If the leader's state machine also applies log entries that were already committed locally via commit2ndPhase, page version conflicts could occur. Please clarify how this is guarded and ensure it is covered by tests.


MINOR / STYLE

13. Plan documents committed to the repo
The docs/plans/ directory contains large AI implementation plan documents with Claude-specific directives. These are internal development artefacts and should not be committed to the public repo.

14. findLocalPeerId — poor error message on non-numeric suffix
If the server name suffix is non-numeric, Integer.parseInt throws a bare NumberFormatException with no context. Wrapping with a descriptive IllegalArgumentException would improve diagnosability.

15. ha-raft not included in package/distribution
The package module does not appear to include arcadedb-ha-raft. Is this intentional? If intended for production use, it should be included in the assembly and documented.

16. e2e-ha workflow uses --fail-never
Using --fail-never means test failures will not fail the CI job. Consider removing it for this workflow (unlike coverage jobs where it is intentional).


POSITIVE HIGHLIGHTS

  • Good use of BaseStateMachine from Ratis — follows the correct extension point.
  • RaftLogEntryCodec is well-tested with round-trip tests including edge cases.
  • The TransactionManager fix (wrapping bucket delta updates in if-changed) is a correct and useful bug fix.
  • Log-level downgrade from WARNING to FINE for expected Raft replay conditions in TransactionManager is the right approach.
  • Peer list parsing handles multiple formats cleanly and is well-tested.
  • GetClusterHandler lazy-resolves RaftHAServer to handle plugin startup ordering correctly.

Reviewed with Claude Code

@claude
Copy link
Copy Markdown

claude bot commented Mar 27, 2026

Code Review

This is a substantial and ambitious PR — replacing a custom HA protocol with Apache Ratis (a proven Raft implementation) is the right architectural direction. The module structure is clean, the test coverage is broad, and the e2e container test setup with Toxiproxy is excellent for realistic failure scenarios.

That said, there are several blocking issues that need to be addressed before merge.


🔴 Critical — Blockers

1. Data integrity: Leadership-change race in ArcadeStateMachine.applyTxEntry()

The state machine checks isLeader() at apply time to decide whether to call applyChanges(). This is incorrect. By the time an entry is applied, the node that submitted it may have lost leadership, and the new leader's state machine will skip applying it because it is now a leader running the entry as well. The Raft contract guarantees that a committed entry must be applied on all nodes, regardless of current role. Leadership must not gate applyChanges().

2. Data integrity: Two-phase commit not atomic under concurrent leadership change

If a node loses leadership between send() returning (first phase success) and commit2ndPhase() executing, both the old leader and the new leader's state machine will apply the same transaction. ignoreErrors=true in applyChanges() masks the duplicate silently. This can produce phantom writes on replicas.

3. ha-raft/pom.xml — Duplicate arcadedb-server dependency with hardcoded version

There appear to be two declarations for arcadedb-server: one correctly scoped as provided and a second compile-scope declaration with a hardcoded version (26.4.1-SNAPSHOT). This violates the project's wire protocol module dependency rules and will cause the provided scope to be overridden, bundling the server jar into the module artifact.

4. ATTRIBUTIONS.md not updated

ha-raft introduces Apache Ratis (ratis-server, ratis-grpc, ratis-metrics-default, etc.). Per project guidelines, ATTRIBUTIONS.md must be updated and the Ratis NOTICE file content must be incorporated into the main NOTICE file. This is required before merge.


🟠 High — Should Fix

5. Security: Cluster token file is world-readable

The cluster token written to cluster-token.txt uses the JVM default file permissions, leaving it world-readable on most Linux systems. Any local user can read it and forge X-ArcadeDB-Cluster-Token headers to bypass auth on any node. Fix:

Files.setPosixFilePermissions(tokenFile, PosixFilePermissions.fromString("rw-------"));

6. Security: X-ArcadeDB-Forwarded-User auth bypass

When a valid cluster token is present, the request user is resolved directly from X-ArcadeDB-Forwarded-User with no further authorization check. A stolen cluster token + any known username grants full unrestricted access to every command on every node. At minimum, the forwarded user should still be validated against the server's security manager, or explicitly scoped to a service account.

7. SnapshotManager.installSnapshot() is not wired up

ArcadeStateMachine.takeSnapshot() is explicitly noted as a stub. Any replica that falls behind past log compaction cannot rejoin without manual data copy, making crash recovery for lagging nodes a manual operation. This should either be implemented or documented as a known limitation with a follow-up issue reference.

8. docs/plans/ AI planning artifacts committed to the repo

Nine files in docs/plans/ are AI-generated implementation plans, including prompts like > **For Claude:** REQUIRED SUB-SKILL:. These are working notes, not project documentation. They should be removed from the PR. If planning docs are desired long-term, a dedicated non-committed location or a separate branch is more appropriate.


🟡 Medium — Should Address

9. No bounds checking in RaftLogEntryCodec.deserializeWalTransaction()

A corrupted or malicious log entry can pass an arbitrary pageCount value, causing OOM, or changesTo < changesFrom, causing NegativeArraySizeException. A magic byte and a version marker should be added to detect truncation. Bounds should be validated (pageCount < MAX_PAGES, changesFrom <= changesTo).

10. ClusterMonitor.leaderCommitIndex is not an AtomicLong

getReplicaLags() reads leaderCommitIndex (updated from one thread) while another thread may be calling updateLeaderCommitIndex(). A non-volatile plain long can produce a briefly negative lag value which may confuse monitoring consumers. Use AtomicLong or volatile long.

11. ArcadeStateMachine.applySchemaEntry() — no rollback on partial failure

If schema file creation fails partway through (e.g. disk full), replicas can end up with partial schema state with no recovery path. Writes should be staged to temp files and atomically renamed, or a rollback mechanism should be added.

12. HA_RAFT_PERSIST_STORAGE config key appears to be test-only

This is exposed in user-visible GlobalConfiguration without labeling it as internal/debug-only. Either add appropriate documentation, or move it to test configuration.


🔵 Minor

13. ClusterDatatbaseChecker.java — not a real test

  • Typo in filename: Datatbase should be Database
  • Contains only a main() method with no JUnit annotations — it will never run in CI
  • Has hardcoded local paths (/Users/frank/projects/...)
  • Missing Apache 2.0 license header

This should either be converted to a proper @Test-annotated integration test, or removed.

14. PluginManager hard-codes the class name as a string

if (plugin.getClass().getSimpleName().equals("RaftHAPlugin")) { ... }

A rename silently breaks this check. Use instanceof RaftHAPlugin or a marker interface instead.

15. Performance: synchronous quorum on every commit

Every transaction blocks on io().send() until a quorum ack arrives. This is correct for durability but represents a severe throughput regression compared to the existing pipelined async HA for small-transaction workloads. At minimum, document this trade-off and consider a configurable fire-and-forget mode for non-critical writes (with appropriate durability warnings).


Test Coverage Gaps

  • No test for the leadership-change race during commit (issues 1 and 2 above)
  • No test for cluster token file permissions after node restart
  • RaftFullSnapshotResyncIT tests the snapshot path, but installSnapshot() is a stub — the test may be passing against no-op code

Overall, the design is solid and the Raft approach is the right long-term direction. Fixing the two data-integrity races (items 1 and 2) and the security issues (items 5 and 6) are the top priorities before this can merge safely.

robfrank added 17 commits April 11, 2026 23:51
Adds ImportDatabaseScenarioIT which starts a 3-node Raft HA cluster,
stages a JSONL fixture inside the leader container, issues an import
database command, and verifies that all 500 Person records are visible
on every node via RemoteDatabase with FIXED connection strategy.
@github-actions
Copy link
Copy Markdown
Contributor

📜 License Compliance Check

✅ License check passed. See artifacts for full report.

License Summary (first 50 lines)

Lists of 387 third-party dependencies.
     (Apache License 2.0) LZ4 Java Compression (at.yawk.lz4:lz4-java:1.10.4 - https://github.com/yawkat/lz4-java)
     (EPL 2.0) (GNU Lesser General Public License) Logback Classic Module (ch.qos.logback:logback-classic:1.5.32 - http://logback.qos.ch/logback-classic)
     (EPL 2.0) (GNU Lesser General Public License) Logback Core Module (ch.qos.logback:logback-core:1.5.32 - http://logback.qos.ch/logback-core)
     (Apache 2) ArcadeDB BOLT Protocol (com.arcadedb:arcadedb-bolt:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-bolt/)
     (Apache 2) ArcadeDB Console (com.arcadedb:arcadedb-console:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-console/)
     (Apache 2) ArcadeDB Engine (com.arcadedb:arcadedb-engine:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-engine/)
     (Apache 2) ArcadeDB GraphQL (com.arcadedb:arcadedb-graphql:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-graphql/)
     (Apache 2) ArcadeDB Gremlin (com.arcadedb:arcadedb-gremlin:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-gremlin/)
     (Apache 2) ArcadeDB gRPC Stubs (com.arcadedb:arcadedb-grpc:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-grpc/)
     (Apache 2) ArcadeDB gRPC Client (com.arcadedb:arcadedb-grpc-client:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-grpc-client/)
     (Apache 2) ArcadeDB gRpcW (com.arcadedb:arcadedb-grpcw:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-grpcw/)
     (Apache 2) ArcadeDB HA Raft (com.arcadedb:arcadedb-ha-raft:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-ha-raft/)
     (Apache 2) ArcadeDB Integration (com.arcadedb:arcadedb-integration:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-integration/)
     (Apache 2) ArcadeDB load tests (com.arcadedb:arcadedb-load-tests:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-load-tests/)
     (Apache 2) ArcadeDB Metrics (com.arcadedb:arcadedb-metrics:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-metrics/)
     (Apache 2) ArcadeDB MongoDB Wire Protocol (com.arcadedb:arcadedb-mongodbw:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-mongodbw/)
     (Apache 2) ArcadeDB Network (com.arcadedb:arcadedb-network:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-network/)
     (Apache 2) ArcadeDB PostgresW (com.arcadedb:arcadedb-postgresw:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-postgresw/)
     (Apache 2) ArcadeDB RedisW (com.arcadedb:arcadedb-redisw:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-redisw/)
     (Apache 2) ArcadeDB Server (com.arcadedb:arcadedb-server:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-server/)
     (Apache 2) ArcadeDB Studio (com.arcadedb:arcadedb-studio:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-studio/)
     (Apache 2) ArcadeDB Test Utils (com.arcadedb:arcadedb-test-utils:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-test-utils/)
     (Apache License 2.0) HPPC Collections (com.carrotsearch:hppc:0.7.1 - http://labs.carrotsearch.com/hppc.html/hppc)
     (Apache License 2.0) Metrics Core (com.codahale.metrics:metrics-core:3.0.2 - http://metrics.codahale.com/metrics-core/)
     (The Apache License, Version 2.0) com.conversantmedia:disruptor (com.conversantmedia:disruptor:1.2.21 - https://github.com/conversant/disruptor)
     (Apache License 2.0) Jackson-annotations (com.fasterxml.jackson.core:jackson-annotations:2.20 - https://github.com/FasterXML/jackson)
     (Apache License 2.0) Jackson-annotations (com.fasterxml.jackson.core:jackson-annotations:2.21 - https://github.com/FasterXML/jackson)
     (Apache License 2.0) Jackson-core (com.fasterxml.jackson.core:jackson-core:2.21.1 - https://github.com/FasterXML/jackson-core)
     (Apache License 2.0) Jackson-core (com.fasterxml.jackson.core:jackson-core:2.21.2 - https://github.com/FasterXML/jackson-core)
     (Apache License 2.0) jackson-databind (com.fasterxml.jackson.core:jackson-databind:2.21.1 - https://github.com/FasterXML/jackson)
     (Apache License 2.0) jackson-databind (com.fasterxml.jackson.core:jackson-databind:2.21.2 - https://github.com/FasterXML/jackson)
     (Apache License 2.0) Jackson-dataformat-YAML (com.fasterxml.jackson.dataformat:jackson-dataformat-yaml:2.21.1 - https://github.com/FasterXML/jackson-dataformats-text)
     (Apache License 2.0) Jackson datatype: JSR310 (com.fasterxml.jackson.datatype:jackson-datatype-jsr310:2.21.1 - https://github.com/FasterXML/jackson-modules-java8/jackson-datatype-jsr310)
     (Apache License 2.0) Caffeine cache (com.github.ben-manes.caffeine:caffeine:2.3.1 - https://github.com/ben-manes/caffeine)
     (Apache License 2.0) docker-java-api (com.github.docker-java:docker-java-api:3.7.1 - https://github.com/docker-java/docker-java)
     (Apache License 2.0) docker-java-transport (com.github.docker-java:docker-java-transport:3.7.1 - https://github.com/docker-java/docker-java)
     (Apache License 2.0) docker-java-transport-zerodep (com.github.docker-java:docker-java-transport-zerodep:3.7.1 - https://github.com/docker-java/docker-java)
     (Apache Software License, version 2.0) (Lesser General Public License, version 3 or greater) btf (com.github.java-json-tools:btf:1.3 - https://github.com/java-json-tools/btf)
     (Apache Software License, version 2.0) (Lesser General Public License, version 3 or greater) jackson-coreutils (com.github.java-json-tools:jackson-coreutils:2.0 - https://github.com/java-json-tools/jackson-coreutils)
     (Apache Software License, version 2.0) (Lesser General Public License, version 3 or greater) jackson-coreutils-equivalence (com.github.java-json-tools:jackson-coreutils-equivalence:1.0 - https://github.com/java-json-tools/jackson-coreutils)
     (Apache Software License, version 2.0) (Lesser General Public License, version 3 or greater) json-patch (com.github.java-json-tools:json-patch:1.13 - https://github.com/java-json-tools/json-patch)
     (Apache Software License, version 2.0) (Lesser General Public License, version 3 or greater) json-schema-core (com.github.java-json-tools:json-schema-core:1.2.14 - https://github.com/java-json-tools/json-schema-core)
     (Apache Software License, version 2.0) (Lesser General Public License, version 3 or greater) json-schema-validator (com.github.java-json-tools:json-schema-validator:2.2.14 - https://github.com/java-json-tools/json-schema-validator)
     (Apache Software License, version 2.0) (Lesser General Public License, version 3 or greater) msg-simple (com.github.java-json-tools:msg-simple:1.2 - https://github.com/java-json-tools/msg-simple)
     (Apache Software License, version 2.0) (Lesser General Public License, version 3 or greater) uri-template (com.github.java-json-tools:uri-template:0.10 - https://github.com/java-json-tools/uri-template)
     (Apache License 2.0) (GNU Lesser General Public License) javaparser-core (com.github.javaparser:javaparser-core:3.26.3 - https://github.com/javaparser/javaparser-core)
     (Apache License 2.0) JCIP Annotations under Apache License (com.github.stephenc.jcip:jcip-annotations:1.0-1 - http://stephenc.github.com/jcip-annotations)
     (Apache License 2.0) Google Android Annotations Library (com.google.android:annotations:4.1.1.4 - http://source.android.com/)

@claude
Copy link
Copy Markdown

claude bot commented Apr 11, 2026

Code Review: PR #3731 - Raft-based High Availability using Apache Ratis

This is a substantial PR that replaces ArcadeDB's custom replication protocol with Raft consensus. The architecture is well thought-out and the test coverage is impressive. Below are my findings, organized by severity.


Critical Issues

1. Security: Time-constant comparison uses platform-default charset (SnapshotHttpHandler.java)

String.getBytes() uses the platform default charset, which may differ between nodes. The token comparison MessageDigest.isEqual(expectedToken.getBytes(), ...) must use StandardCharsets.UTF_8 explicitly on both sides.

2. Security: Cluster token derived from root password breaks password rotation

In RaftHAServer.initClusterToken(), the cluster token is derived via PBKDF2 from clusterName + ":" + rootPassword. Changing the root password silently invalidates the cluster token and breaks inter-node communication. This relationship should be documented prominently, or redesigned (e.g., require explicit HA_CLUSTER_TOKEN in production).

3. Bug: Thread-local WAL buffer shared across nested recordFileChanges() calls

In RaftReplicatedDatabase.recordFileChanges(), when the method returns early (already recording), any commit() calls inside a nested recordFileChanges() will operate on the same static ThreadLocal buffers as the outer scope, corrupting the outer recording's WAL buffer. A stack-based approach is needed.

4. Bug: installDatabaseSnapshot opens a database before extraction completes safely

In ArcadeStateMachine.installDatabaseSnapshot(), server.getDatabase(databaseName) is called immediately after the ZIP extraction loop. If extraction fails partway through (disk full, I/O error), the database is opened in a corrupt state. The getDatabase() call should only happen after validating complete extraction.

5. Phase 2 failure / double-apply risk in RaftReplicatedDatabase.commit()

If Phase 2 (local apply) fails after Raft consensus is reached, the code steps down. However, the entry is already in the Raft log and will be applied again when the node rejoins as a follower via applyTxEntry(ignoreErrors=true). This double-application path needs a test confirming it is safe.


High Priority Issues

6. Wildcard imports in RaftReplicatedDatabase.java

import com.arcadedb.database.*;
import com.arcadedb.engine.*;

Project guidelines require explicit per-class imports. All other files in this PR follow this correctly.

7. Fully qualified class references throughout ArcadeStateMachine.java and RaftHAPlugin.java

The project guideline explicitly states: "don't use fully qualified names if possible, always import the class." Dozens of inline java.net.*, java.io.*, java.util.zip.*, java.nio.* references violate this.

8. ATTRIBUTIONS.md not updated for Apache Ratis

Apache Ratis is Apache 2.0 licensed (good), but project guidelines state: "When adding a dependency, you MUST update ATTRIBUTIONS.md." Apache Ratis also has a NOTICE file - required notices must be incorporated into the main NOTICE file per Apache 2.0 license terms.

9. ha-raft/config/server-users.jsonl committed with credential hash

Even hashed credentials should not be committed to the repository. This file should be in .gitignore or replaced with a template.

10. ha-raft/config/mcp-config.json appears to be a local dev artifact

These config files should not be part of the module and will confuse future contributors.

11. SnapshotManager Javadoc admits feature is incomplete

The Javadoc states that full installSnapshot() integration with Ratis is not yet wired, and replicas that fall behind past log compaction require a manual data copy from the leader. This is a significant limitation that should be called out in the PR description with a tracking issue filed.


Medium Priority Issues

12. RaftGroupCommitter.flushBatch(): batch entries are NOT atomic

Each entry is sent via separate raftClient.async().send(msg) calls. The class name and Javadoc may mislead readers into thinking a batch is atomic. The Javadoc should clarify this is throughput batching only, not atomic commit.

13. Per-entry timeout measured from get() call, not batch submission

If the first entry takes nearly the full quorumTimeout, subsequent entries may incorrectly time out even though they were just dispatched.

14. Server name parsing is fragile (RaftHAServer.findLocalPeerId)

The mapping relies on parsing the numeric suffix after the last - or _. This constraint is not enforced at configuration time and could surprise users with custom naming conventions.

15. HTTP port offset heuristic is fragile (RaftHAServer.getHttpPortOffset)

The fallback magic number 46 (= 2480 - 2434) ties the default HTTP and Raft port values together implicitly. Clusters with custom ports will silently compute wrong HTTP addresses for new peers.

16. Leader forwarding escalates to "root" user

When no security context is available, the command forward uses "root" as the username. Silently escalating to root on the receiving node is a security concern that deserves scrutiny and explicit documentation.

17. Quorum.parse() does not handle all values listed in HA_QUORUM config

GlobalConfiguration.HA_QUORUM lists: "none", "one", "two", "three", "majority", "all". But Quorum.parse() only handles "majority" and "all" - any other value throws ConfigurationException. The allowed values must be aligned.


Low Priority / Style Issues

18. Typo: ClusterDatatbaseChecker.java - "Datatbase" should be "Database"

19. logLevel = WARNING for informational cluster configuration print

printClusterConfiguration() logs an ASCII peer table at Level.WARNING. Level.INFO would be more appropriate.

20. AI-generated planning documents committed under docs/plans/ and docs/superpowers/

Internal working documents (multiple 2026-02-, 2026-03-, 2026-04-* files) should not be part of the final merge.

21. Dependency scope for ha-raft module needs clarification

arcadedb-engine and arcadedb-network are compile-scope dependencies. Per project guidelines, if this module is loaded as a plugin JAR alongside the server assembly, these should be provided to avoid class duplication. The shade plugin configuration also needs clarification on what gets included.


Positive Observations

  • Using Apache Ratis (Apache 2.0) is an excellent choice that eliminates the maintenance burden of a custom consensus implementation.
  • The 3-phase commit design (Phase 1 under read lock, replication without lock, Phase 2 under read lock) is correct and well-documented in code comments.
  • RaftGroupCommitter batching is a solid approach to amortizing round-trip latency.
  • Zip-slip protection in installDatabaseSnapshot() is good security hygiene.
  • MessageDigest.isEqual usage shows awareness of timing attacks (just needs charset fix).
  • Test coverage is strong: unit tests for codec, state machine, config validation, and quorum, plus integration tests for 2-node/3-node replication, failover, crash recovery, and concurrent writes.
  • HALog with configurable verbosity is a good operational tooling addition.
  • Peer display names (e.g., arcadedb-0 (localhost:2480)) in log messages improve observability.

Required Changes Before Merge

  1. Fix getBytes() charset in token comparison (SnapshotHttpHandler)
  2. Replace wildcard imports with explicit imports (RaftReplicatedDatabase)
  3. Replace inline fully-qualified class names with proper imports (ArcadeStateMachine, RaftHAPlugin)
  4. Update ATTRIBUTIONS.md and NOTICE with Apache Ratis information
  5. Remove or gitignore ha-raft/config/server-users.jsonl and ha-raft/config/mcp-config.json
  6. Remove docs/plans/ and docs/superpowers/ planning documents
  7. Fix Quorum.parse() to align with HA_QUORUM allowed values
  8. Fix ClusterDatatbaseChecker typo
  9. Clarify ha-raft module packaging model (shade configuration, dependency scopes)
  10. File a tracking issue for the incomplete SnapshotManager Ratis integration

Reviewed with Claude Code

- Health monitor: background thread detects Ratis CLOSED/EXCEPTION state
  after network partitions and auto-recovers via restartRatisIfNeeded()
- Leader proxy: follower transparently forwards write requests to the
  leader instead of returning HTTP 400, with loop prevention and body cap
- Client-side HA failover: RemoteDatabase gains opt-in ReadConsistency
  (EVENTUAL/READ_YOUR_WRITES/LINEARIZABLE), commit-index tracking via
  X-ArcadeDB-Commit-Index header, and election retry on HTTP 503
- Constant-time cluster token comparison via MessageDigest.isEqual
- Snapshot throttling: semaphore limits concurrent downloads (default 2)
- Symlink rejection: skip symbolic links in snapshot ZIP to prevent
  path traversal from misconfigured deploys
- gRPC flow control window: configurable via HA_GRPC_FLOW_CONTROL_WINDOW
  for faster catch-up replication after partitions
- 8 new GlobalConfiguration keys for all tunables

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

📜 License Compliance Check

✅ License check passed. See artifacts for full report.

License Summary (first 50 lines)

Lists of 387 third-party dependencies.
     (Apache License 2.0) LZ4 Java Compression (at.yawk.lz4:lz4-java:1.10.4 - https://github.com/yawkat/lz4-java)
     (EPL 2.0) (GNU Lesser General Public License) Logback Classic Module (ch.qos.logback:logback-classic:1.5.32 - http://logback.qos.ch/logback-classic)
     (EPL 2.0) (GNU Lesser General Public License) Logback Core Module (ch.qos.logback:logback-core:1.5.32 - http://logback.qos.ch/logback-core)
     (Apache 2) ArcadeDB BOLT Protocol (com.arcadedb:arcadedb-bolt:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-bolt/)
     (Apache 2) ArcadeDB Console (com.arcadedb:arcadedb-console:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-console/)
     (Apache 2) ArcadeDB Engine (com.arcadedb:arcadedb-engine:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-engine/)
     (Apache 2) ArcadeDB GraphQL (com.arcadedb:arcadedb-graphql:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-graphql/)
     (Apache 2) ArcadeDB Gremlin (com.arcadedb:arcadedb-gremlin:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-gremlin/)
     (Apache 2) ArcadeDB gRPC Stubs (com.arcadedb:arcadedb-grpc:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-grpc/)
     (Apache 2) ArcadeDB gRPC Client (com.arcadedb:arcadedb-grpc-client:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-grpc-client/)
     (Apache 2) ArcadeDB gRpcW (com.arcadedb:arcadedb-grpcw:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-grpcw/)
     (Apache 2) ArcadeDB HA Raft (com.arcadedb:arcadedb-ha-raft:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-ha-raft/)
     (Apache 2) ArcadeDB Integration (com.arcadedb:arcadedb-integration:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-integration/)
     (Apache 2) ArcadeDB load tests (com.arcadedb:arcadedb-load-tests:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-load-tests/)
     (Apache 2) ArcadeDB Metrics (com.arcadedb:arcadedb-metrics:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-metrics/)
     (Apache 2) ArcadeDB MongoDB Wire Protocol (com.arcadedb:arcadedb-mongodbw:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-mongodbw/)
     (Apache 2) ArcadeDB Network (com.arcadedb:arcadedb-network:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-network/)
     (Apache 2) ArcadeDB PostgresW (com.arcadedb:arcadedb-postgresw:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-postgresw/)
     (Apache 2) ArcadeDB RedisW (com.arcadedb:arcadedb-redisw:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-redisw/)
     (Apache 2) ArcadeDB Server (com.arcadedb:arcadedb-server:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-server/)
     (Apache 2) ArcadeDB Studio (com.arcadedb:arcadedb-studio:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-studio/)
     (Apache 2) ArcadeDB Test Utils (com.arcadedb:arcadedb-test-utils:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-test-utils/)
     (Apache License 2.0) HPPC Collections (com.carrotsearch:hppc:0.7.1 - http://labs.carrotsearch.com/hppc.html/hppc)
     (Apache License 2.0) Metrics Core (com.codahale.metrics:metrics-core:3.0.2 - http://metrics.codahale.com/metrics-core/)
     (The Apache License, Version 2.0) com.conversantmedia:disruptor (com.conversantmedia:disruptor:1.2.21 - https://github.com/conversant/disruptor)
     (Apache License 2.0) Jackson-annotations (com.fasterxml.jackson.core:jackson-annotations:2.20 - https://github.com/FasterXML/jackson)
     (Apache License 2.0) Jackson-annotations (com.fasterxml.jackson.core:jackson-annotations:2.21 - https://github.com/FasterXML/jackson)
     (Apache License 2.0) Jackson-core (com.fasterxml.jackson.core:jackson-core:2.21.1 - https://github.com/FasterXML/jackson-core)
     (Apache License 2.0) Jackson-core (com.fasterxml.jackson.core:jackson-core:2.21.2 - https://github.com/FasterXML/jackson-core)
     (Apache License 2.0) jackson-databind (com.fasterxml.jackson.core:jackson-databind:2.21.1 - https://github.com/FasterXML/jackson)
     (Apache License 2.0) jackson-databind (com.fasterxml.jackson.core:jackson-databind:2.21.2 - https://github.com/FasterXML/jackson)
     (Apache License 2.0) Jackson-dataformat-YAML (com.fasterxml.jackson.dataformat:jackson-dataformat-yaml:2.21.1 - https://github.com/FasterXML/jackson-dataformats-text)
     (Apache License 2.0) Jackson datatype: JSR310 (com.fasterxml.jackson.datatype:jackson-datatype-jsr310:2.21.1 - https://github.com/FasterXML/jackson-modules-java8/jackson-datatype-jsr310)
     (Apache License 2.0) Caffeine cache (com.github.ben-manes.caffeine:caffeine:2.3.1 - https://github.com/ben-manes/caffeine)
     (Apache License 2.0) docker-java-api (com.github.docker-java:docker-java-api:3.7.1 - https://github.com/docker-java/docker-java)
     (Apache License 2.0) docker-java-transport (com.github.docker-java:docker-java-transport:3.7.1 - https://github.com/docker-java/docker-java)
     (Apache License 2.0) docker-java-transport-zerodep (com.github.docker-java:docker-java-transport-zerodep:3.7.1 - https://github.com/docker-java/docker-java)
     (Apache Software License, version 2.0) (Lesser General Public License, version 3 or greater) btf (com.github.java-json-tools:btf:1.3 - https://github.com/java-json-tools/btf)
     (Apache Software License, version 2.0) (Lesser General Public License, version 3 or greater) jackson-coreutils (com.github.java-json-tools:jackson-coreutils:2.0 - https://github.com/java-json-tools/jackson-coreutils)
     (Apache Software License, version 2.0) (Lesser General Public License, version 3 or greater) jackson-coreutils-equivalence (com.github.java-json-tools:jackson-coreutils-equivalence:1.0 - https://github.com/java-json-tools/jackson-coreutils)
     (Apache Software License, version 2.0) (Lesser General Public License, version 3 or greater) json-patch (com.github.java-json-tools:json-patch:1.13 - https://github.com/java-json-tools/json-patch)
     (Apache Software License, version 2.0) (Lesser General Public License, version 3 or greater) json-schema-core (com.github.java-json-tools:json-schema-core:1.2.14 - https://github.com/java-json-tools/json-schema-core)
     (Apache Software License, version 2.0) (Lesser General Public License, version 3 or greater) json-schema-validator (com.github.java-json-tools:json-schema-validator:2.2.14 - https://github.com/java-json-tools/json-schema-validator)
     (Apache Software License, version 2.0) (Lesser General Public License, version 3 or greater) msg-simple (com.github.java-json-tools:msg-simple:1.2 - https://github.com/java-json-tools/msg-simple)
     (Apache Software License, version 2.0) (Lesser General Public License, version 3 or greater) uri-template (com.github.java-json-tools:uri-template:0.10 - https://github.com/java-json-tools/uri-template)
     (Apache License 2.0) (GNU Lesser General Public License) javaparser-core (com.github.javaparser:javaparser-core:3.26.3 - https://github.com/javaparser/javaparser-core)
     (Apache License 2.0) JCIP Annotations under Apache License (com.github.stephenc.jcip:jcip-annotations:1.0-1 - http://stephenc.github.com/jcip-annotations)
     (Apache License 2.0) Google Android Annotations Library (com.google.android:annotations:4.1.1.4 - http://source.android.com/)

@claude
Copy link
Copy Markdown

claude bot commented Apr 12, 2026

Code Review - Raft-based HA (Apache Ratis)

This is an ambitious and well-structured addition. The test coverage (unit + integration + e2e container tests) is a real highlight. Below are findings organized by severity.


Critical Issues

1. takeSnapshot() / installSnapshot() not implemented - log purging active

ArcadeStateMachine.java returns the last-applied index without writing database files to the Ratis snapshot directory. RaftHAServer sets setPurgeUptoSnapshotIndex=true. Any replica that falls behind the log compaction threshold cannot rejoin: no snapshot to install and required log entries already purged. The SnapshotManager.java Javadoc acknowledges this is not yet wired to the Ratis StateMachine.Storage API.

Required: Implement full snapshot/install support before merge, or disable setPurgeUptoSnapshotIndex and document the limitation with a tracking issue.

2. Leadership-change race in applyTxEntry()

ArcadeStateMachine.applyTxEntry() uses raftHAServer.isLeader() at apply time to decide whether to skip applying a transaction. If the node loses leadership between submitting the log entry and the applyLogEntry callback firing, it skips a transaction it now needs as a follower, causing permanent divergence. The originator identity must be encoded in the log entry itself (a peerId field), not inferred from current role.

3. Bounds checking missing in deserializeWalTransaction

RaftLogEntryCodec.java reads pageCount from the buffer without validating it against the remaining buffer length. A corrupted log entry with a large pageCount triggers an OOM allocation. Additionally, changesTo < changesFrom causes NegativeArraySizeException. Since Raft log entries persist to disk, this corruption reproduces across restarts. Add bounds validation and a magic-byte/version header.

4. ignoreErrors=true swallows WAL replay failures

ArcadeStateMachine.applyTxEntry() and applySchemaEntry() both pass ignoreErrors=true to applyChanges(). Silent corruption during replay leaves replicas in an inconsistent state with no diagnostic signal. At minimum, log these failures at ERROR level.

5. ATTRIBUTIONS.md and NOTICE not updated for Apache Ratis

Apache Ratis 3.2.0 (ratis-server, ratis-grpc, ratis-metrics-default) is a new Apache 2.0 dependency with its own NOTICE file. Project guidelines require updating both ATTRIBUTIONS.md and the main NOTICE file.


High Severity

6. Cluster token file world-readable

RaftHAServer.initClusterToken() writes the cluster token to cluster-token.txt with default OS permissions (typically 644). Any local user can read the token and forge X-ArcadeDB-Cluster-Token headers. Apply Files.setPosixFilePermissions(path, PosixFilePermissions.fromString("rw-------")) after writing. Use Files.createFile() (atomic) to prevent a race on simultaneous startup.

7. Cluster token derived from root password

Token generated via PBKDF2 from clusterName + ":" + rootPassword. Rotating the root password silently invalidates the cluster token and breaks inter-node communication. Document this prominently, or require an explicit HA_CLUSTER_TOKEN setting for production.

8. Two-phase commit not atomic under leadership change

In RaftReplicatedDatabase.commit(), after send() (quorum reached) but before commit2ndPhase(), a leadership change means applyTxEntry() no longer skips the entry, potentially applying it twice. The ignoreErrors=true path silently masks duplicate-page errors.

9. ThreadLocal buffer lifecycle in recordFileChanges()

schemaWalBuffer and schemaBucketDeltaBuffer in RaftReplicatedDatabase are ThreadLocal<List<...>>. An exception between buffer-clear and buffer-send leaves buffers populated for the next call on the same thread. Wrap in try/finally. Nested recordFileChanges() calls also corrupt the outer recording WAL buffer.

10. Blocking HTTP forward without timeout

HTTP_CLIENT.send(builder.build(), ...) in RaftReplicatedDatabase has no connect or read timeout. If the leader is slow or unreachable, the calling thread blocks indefinitely. Configure connectTimeout on the HttpClient instance.

11. Leader forwarding over plain HTTP

RaftHAServer.forwardCommandToLeaderViaRaft() uses "http://". Both X-ArcadeDB-Cluster-Token and Authorization headers are transmitted in cleartext. Add a prominent deployment warning that a TLS reverse proxy is required for production clusters not on a trusted private network.

12. Non-atomic schema version increment

RaftReplicatedDatabase increments schema version client-side before replication. Concurrent DDL operations can both read the same base version and both submit version+1, causing a version collision on replicas. Version assignment should happen inside the Raft state machine at apply time.

13. Committed credential and config artifacts

  • ha-raft/config/server-users.jsonl commits a password hash to the repository. Remove it and add to .gitignore.
  • ha-raft/config/mcp-config.json is a local dev artifact that should not be committed.

Medium Severity

14. maven-shade-plugin declared without configuration

ha-raft/pom.xml declares maven-shade-plugin with no executions or configuration. This silently produces a fat JAR bundling all Ratis, gRPC, and Netty transitive deps, causing classpath conflicts alongside arcadedb-server. Configure it with proper dependency relocations or remove it.

15. arcadedb-engine and arcadedb-network should be provided scope

When ha-raft is loaded as a plugin alongside the server assembly, these classes are already on the classpath. compile scope causes class duplication.

16. Quorum.parse() misaligned with HA_QUORUM allowed values

GlobalConfiguration.HA_QUORUM lists: "none", "one", "two", "three", "majority", "all". Quorum.parse() only handles "majority" and "all" - any other value throws ConfigurationException. Align the parser with all allowed values.

17. Fragile server name parsing and HTTP port offset heuristic

findLocalPeerId() parses the numeric suffix after the last - or _. A name like arc-prod-db-1 has multiple separators and fails silently. getHttpPortOffset() falls back to magic number 46 (= 2480 - 2434). Enforce the naming convention at config validation time.

18. GetClusterHandler resolves plugin on every request

server.getPlugin(...) is called on every /api/v1/cluster request. Cache the reference in the constructor.

19. registerAPI() unconditional regardless of HA implementation

RaftHAPlugin.registerAPI() registers /api/v1/cluster even when HA_IMPLEMENTATION=legacy. Gate registration on the active implementation.

20. validateConfiguration() warns but does not block unsafe quorum combos

NONE quorum on a multi-node cluster allows split-brain. Consider throwing ConfigurationException to prevent server start for definitively unsafe combinations.


Low Severity / Code Quality

  • Wildcard imports in RaftReplicatedDatabase.java violate project convention (explicit per-class imports required).
  • Fully qualified class names in ArcadeStateMachine.java and RaftHAPlugin.java - project guideline: always import the class and use just the name.
  • Typo: ClusterDatatbaseChecker.java should be ClusterDatabaseChecker.
  • printClusterConfiguration() logs the peer table at Level.WARNING - should be Level.INFO.
  • docs/plans/ and docs/superpowers/: 14 AI-generated planning files (~12K lines) bloat repo history without user-facing value - consider removing.
  • Database name not URL-encoded in forwarded URIs in RaftReplicatedDatabase.
  • When the leader returns an HTTP error, parseResultSetFromJson() appears to return an empty result set - clients cannot distinguish success-with-zero-results from failure-on-leader.

Summary

The architecture is solid and replacing custom consensus logic with Apache Ratis is the right call. The test matrix is thorough. The main blockers are: the missing snapshot implementation (lagging-node recovery is impossible with log compaction enabled), the leadership-change race in applyTxEntry, the codec bounds-checking gap, and the credential/compliance issues.

Generated with Claude Code

…andler

44 handler subclasses each created their own LeaderProxy with its own
HttpClient and thread pool. Moved the instance to HttpServer so all
handlers share one HttpClient, reducing thread count in multi-node
test clusters.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

📜 License Compliance Check

✅ License check passed. See artifacts for full report.

License Summary (first 50 lines)

Lists of 387 third-party dependencies.
     (Apache License 2.0) LZ4 Java Compression (at.yawk.lz4:lz4-java:1.10.4 - https://github.com/yawkat/lz4-java)
     (EPL 2.0) (GNU Lesser General Public License) Logback Classic Module (ch.qos.logback:logback-classic:1.5.32 - http://logback.qos.ch/logback-classic)
     (EPL 2.0) (GNU Lesser General Public License) Logback Core Module (ch.qos.logback:logback-core:1.5.32 - http://logback.qos.ch/logback-core)
     (Apache 2) ArcadeDB BOLT Protocol (com.arcadedb:arcadedb-bolt:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-bolt/)
     (Apache 2) ArcadeDB Console (com.arcadedb:arcadedb-console:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-console/)
     (Apache 2) ArcadeDB Engine (com.arcadedb:arcadedb-engine:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-engine/)
     (Apache 2) ArcadeDB GraphQL (com.arcadedb:arcadedb-graphql:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-graphql/)
     (Apache 2) ArcadeDB Gremlin (com.arcadedb:arcadedb-gremlin:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-gremlin/)
     (Apache 2) ArcadeDB gRPC Stubs (com.arcadedb:arcadedb-grpc:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-grpc/)
     (Apache 2) ArcadeDB gRPC Client (com.arcadedb:arcadedb-grpc-client:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-grpc-client/)
     (Apache 2) ArcadeDB gRpcW (com.arcadedb:arcadedb-grpcw:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-grpcw/)
     (Apache 2) ArcadeDB HA Raft (com.arcadedb:arcadedb-ha-raft:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-ha-raft/)
     (Apache 2) ArcadeDB Integration (com.arcadedb:arcadedb-integration:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-integration/)
     (Apache 2) ArcadeDB load tests (com.arcadedb:arcadedb-load-tests:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-load-tests/)
     (Apache 2) ArcadeDB Metrics (com.arcadedb:arcadedb-metrics:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-metrics/)
     (Apache 2) ArcadeDB MongoDB Wire Protocol (com.arcadedb:arcadedb-mongodbw:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-mongodbw/)
     (Apache 2) ArcadeDB Network (com.arcadedb:arcadedb-network:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-network/)
     (Apache 2) ArcadeDB PostgresW (com.arcadedb:arcadedb-postgresw:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-postgresw/)
     (Apache 2) ArcadeDB RedisW (com.arcadedb:arcadedb-redisw:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-redisw/)
     (Apache 2) ArcadeDB Server (com.arcadedb:arcadedb-server:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-server/)
     (Apache 2) ArcadeDB Studio (com.arcadedb:arcadedb-studio:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-studio/)
     (Apache 2) ArcadeDB Test Utils (com.arcadedb:arcadedb-test-utils:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-test-utils/)
     (Apache License 2.0) HPPC Collections (com.carrotsearch:hppc:0.7.1 - http://labs.carrotsearch.com/hppc.html/hppc)
     (Apache License 2.0) Metrics Core (com.codahale.metrics:metrics-core:3.0.2 - http://metrics.codahale.com/metrics-core/)
     (The Apache License, Version 2.0) com.conversantmedia:disruptor (com.conversantmedia:disruptor:1.2.21 - https://github.com/conversant/disruptor)
     (Apache License 2.0) Jackson-annotations (com.fasterxml.jackson.core:jackson-annotations:2.20 - https://github.com/FasterXML/jackson)
     (Apache License 2.0) Jackson-annotations (com.fasterxml.jackson.core:jackson-annotations:2.21 - https://github.com/FasterXML/jackson)
     (Apache License 2.0) Jackson-core (com.fasterxml.jackson.core:jackson-core:2.21.1 - https://github.com/FasterXML/jackson-core)
     (Apache License 2.0) Jackson-core (com.fasterxml.jackson.core:jackson-core:2.21.2 - https://github.com/FasterXML/jackson-core)
     (Apache License 2.0) jackson-databind (com.fasterxml.jackson.core:jackson-databind:2.21.1 - https://github.com/FasterXML/jackson)
     (Apache License 2.0) jackson-databind (com.fasterxml.jackson.core:jackson-databind:2.21.2 - https://github.com/FasterXML/jackson)
     (Apache License 2.0) Jackson-dataformat-YAML (com.fasterxml.jackson.dataformat:jackson-dataformat-yaml:2.21.1 - https://github.com/FasterXML/jackson-dataformats-text)
     (Apache License 2.0) Jackson datatype: JSR310 (com.fasterxml.jackson.datatype:jackson-datatype-jsr310:2.21.1 - https://github.com/FasterXML/jackson-modules-java8/jackson-datatype-jsr310)
     (Apache License 2.0) Caffeine cache (com.github.ben-manes.caffeine:caffeine:2.3.1 - https://github.com/ben-manes/caffeine)
     (Apache License 2.0) docker-java-api (com.github.docker-java:docker-java-api:3.7.1 - https://github.com/docker-java/docker-java)
     (Apache License 2.0) docker-java-transport (com.github.docker-java:docker-java-transport:3.7.1 - https://github.com/docker-java/docker-java)
     (Apache License 2.0) docker-java-transport-zerodep (com.github.docker-java:docker-java-transport-zerodep:3.7.1 - https://github.com/docker-java/docker-java)
     (Apache Software License, version 2.0) (Lesser General Public License, version 3 or greater) btf (com.github.java-json-tools:btf:1.3 - https://github.com/java-json-tools/btf)
     (Apache Software License, version 2.0) (Lesser General Public License, version 3 or greater) jackson-coreutils (com.github.java-json-tools:jackson-coreutils:2.0 - https://github.com/java-json-tools/jackson-coreutils)
     (Apache Software License, version 2.0) (Lesser General Public License, version 3 or greater) jackson-coreutils-equivalence (com.github.java-json-tools:jackson-coreutils-equivalence:1.0 - https://github.com/java-json-tools/jackson-coreutils)
     (Apache Software License, version 2.0) (Lesser General Public License, version 3 or greater) json-patch (com.github.java-json-tools:json-patch:1.13 - https://github.com/java-json-tools/json-patch)
     (Apache Software License, version 2.0) (Lesser General Public License, version 3 or greater) json-schema-core (com.github.java-json-tools:json-schema-core:1.2.14 - https://github.com/java-json-tools/json-schema-core)
     (Apache Software License, version 2.0) (Lesser General Public License, version 3 or greater) json-schema-validator (com.github.java-json-tools:json-schema-validator:2.2.14 - https://github.com/java-json-tools/json-schema-validator)
     (Apache Software License, version 2.0) (Lesser General Public License, version 3 or greater) msg-simple (com.github.java-json-tools:msg-simple:1.2 - https://github.com/java-json-tools/msg-simple)
     (Apache Software License, version 2.0) (Lesser General Public License, version 3 or greater) uri-template (com.github.java-json-tools:uri-template:0.10 - https://github.com/java-json-tools/uri-template)
     (Apache License 2.0) (GNU Lesser General Public License) javaparser-core (com.github.javaparser:javaparser-core:3.26.3 - https://github.com/javaparser/javaparser-core)
     (Apache License 2.0) JCIP Annotations under Apache License (com.github.stephenc.jcip:jcip-annotations:1.0-1 - http://stephenc.github.com/jcip-annotations)
     (Apache License 2.0) Google Android Annotations Library (com.google.android:annotations:4.1.1.4 - http://source.android.com/)

@claude
Copy link
Copy Markdown

claude bot commented Apr 12, 2026

PR Review: Raft-based High Availability using Apache Ratis

This is a significant architectural change - replacing the custom replication protocol with Apache Ratis (Raft consensus). The overall direction is right. Below is a detailed review by severity.


Critical - Data Correctness

1. Leadership-change race in ArcadeStateMachine.applyTxEntry()

The state machine skips applying a transaction when raftHAServer.isLeader() returns true at apply time, assuming the leader already committed locally. However, Ratis delivers applyTransaction asynchronously. If the node loses leadership between submitting the log entry and the callback firing, it skips an entry it now needs as a follower. Conversely, a newly elected leader inheriting uncommitted entries from the previous term will also incorrectly skip them.

Fix: encode the originator's peer ID in the log entry itself (e.g., a leaderPeerId field in the codec) and compare against the stable peer identity, not the transient isLeader() query.

2. Two-phase commit not atomic under concurrent leadership change (RaftReplicatedDatabase.java)

In commit(), the sequence is: raftHAServer.getClient().io().send(entry) -> check reply -> tx.commit2ndPhase(phase1). If the node loses leadership between send() succeeding and commit2ndPhase() executing, the state machine's applyTxEntry() will no longer skip that entry (since isLeader() is now false), resulting in the transaction being applied twice. The ignoreErrors=true path in applyChanges() silently discards duplicate-page errors, masking the double-application. These two bugs together can cause replica data corruption in failover scenarios.

3. Non-atomic schema version increment (RaftReplicatedDatabase.java)

The schema version is read and incremented client-side before replication: schemaJson.put("schemaVersion", schemaJson.getLong("schemaVersion") + 1). Concurrent DDL operations on the leader can both read the same base version and both submit version+1, producing a version collision on replicas with no conflict detection.

4. No bounds checking in deserializeWalTransaction (ArcadeStateMachine.java)

pageCount is read from the buffer and used directly as the array size: new WALFile.WALPage[pageCount]. A corrupted or malicious log entry with a very large pageCount causes an OOM allocation. Additionally, deltaSize = page.changesTo - page.changesFrom + 1 with changesTo < changesFrom causes NegativeArraySizeException. No magic bytes or format version markers are present to detect truncated entries. Because Raft log entries persist to disk, a corrupted entry can reproduce across restarts, potentially preventing the node from ever recovering.


High - Security

5. Cluster token file not permission-hardened (RaftHAServer.java)

initClusterToken() writes the shared intra-cluster trust token to cluster-token.txt using standard Java I/O with no explicit permission restriction. On Linux, new files default to umask-based permissions (commonly 644), making the token world-readable. Any local OS user on a node can read it and forge X-ArcadeDB-Cluster-Token headers.

Fix: Files.setPosixFilePermissions(tokenPath, Set.of(OWNER_READ, OWNER_WRITE)) after writing.

6. Cluster-internal auth bypasses authorization, not just authentication

A validated cluster token causes the handler to resolve the user from X-ArcadeDB-Forwarded-User with no further authorization check. An attacker who obtains the token (see issue 5) and knows any valid username can execute any operation as that user on any node. The cluster-auth fast path must still apply the same per-database, per-operation authorization checks as normal user auth.

7. Leader-forwarding over plain HTTP (RaftReplicatedDatabase.java)

The forwarded request is built with URI.create("http://" + leaderHttpAddress + ...). Both the cluster token and the original Authorization header are transmitted in cleartext. Any network-level observer between nodes can capture credentials. This must either use HTTPS, or a deployment requirement for a secure (e.g., private) network must be explicitly documented.


High - Build / Module Integrity

8. Duplicate arcadedb-server dependency with wrong scope in ha-raft/pom.xml

Per project conventions (CLAUDE.md "Wire Protocol Module Dependencies"), protocol modules must declare arcadedb-server with provided scope. There appears to be a second arcadedb-server declaration with compile scope and a hardcoded version 26.4.1-SNAPSHOT. This overrides provided, adds the server to the runtime classpath, and will break when the project version changes.

9. maven-shade-plugin referenced without configuration (ha-raft/pom.xml)

A bare maven-shade-plugin reference with no configuration will produce a fat JAR that incorrectly bundles Ratis classes. When deployed alongside arcadedb-server this causes classpath conflicts. Either configure it with proper <relocations> and <filters>, or remove it.

10. Snapshot-based resync not implemented (SnapshotManager.java, ArcadeStateMachine.java)

ArcadeStateMachine.takeSnapshot() is not implemented. With the default HA_RAFT_SNAPSHOT_THRESHOLD=10000, any replica that falls behind by more than 10,000 Raft log entries cannot rejoin without manual intervention. This must either be implemented before merge, or clearly labeled experimental with a tracking issue and explicit release notes warning.

11. ATTRIBUTIONS.md and NOTICE not updated for Apache Ratis

Per CLAUDE.md: "When adding a dependency, you MUST update ATTRIBUTIONS.md and, if Apache-licensed with a NOTICE file, incorporate required notices into the main NOTICE file." Apache Ratis (ratis-server, ratis-grpc, ratis-netty, ratis-metrics-default) is Apache 2.0 licensed with its own NOTICE file. Neither ATTRIBUTIONS.md nor the project's NOTICE file has been updated in this PR.

12. HA_SERVER_LIST is a breaking format change with no migration path

The peer list format changed from hostname:port to hostname:raftPort:httpPort[:priority]. Existing users upgrading will supply the old two-token format and get silent misconfiguration (peers fail to connect with no clear error). Add a validator that detects the old format and emits a descriptive startup error.


Medium

13. ClusterDatatbaseChecker.java must not be committed

ha-raft/src/test/java/com/arcadedb/server/ha/raft/ClusterDatatbaseChecker.java contains a main() method with hardcoded local filesystem paths (/Users/frank/projects/...), has no JUnit annotations (it will never run in CI), has a typo in the class name ("Datatbase"), and has no license header. This is a developer debugging utility that was accidentally committed - please remove it.

14. AI planning documents committed to the repository

docs/plans/ and docs/superpowers/ contain multiple markdown files totaling ~13,000 lines with content like > **For Claude:** REQUIRED SUB-SKILL:. These are AI-agent planning artifacts, not user- or developer-facing documentation, and should not be in the public repository.

15. ClusterMonitor.leaderCommitIndex is not thread-safe

updateLeaderCommitIndex() and updateReplicaMatchIndex() are called from different threads but the field is a plain long. Use AtomicLong or volatile long.

16. HA_RAFT_PERSIST_STORAGE flag needs a warning

This flag disables Raft log persistence but is documented without any warning. At minimum, its description should be prefixed with [TEST ONLY - disabling this WILL cause data loss].


Low

17. Wildcard imports

RaftReplicatedDatabase.java uses import com.arcadedb.database.*, import com.arcadedb.engine.*, import java.util.*, etc. The existing codebase avoids wildcard imports - please resolve to explicit single-class imports.

18. Fragile plugin type check

"RaftHAPlugin".equals(name) breaks silently on class rename. A marker interface (e.g., HAPlugin extends ServerPlugin) would be more robust.


Test Coverage Gaps

  • No test for the leadership-change race during commit (issues 1 and 2)
  • No test verifying cluster token persistence across node restarts (a new token on restart would break all inter-node auth)

Positive Observations

  • The choice of Apache Ratis over a home-grown 2PC protocol is the right long-term call.
  • The plugin/ServiceLoader integration is clean and correctly decoupled from the server core.
  • The e2e test suite using TestContainers + Toxiproxy covers meaningful failure scenarios (network partitions, packet loss, rolling restarts).
  • HALog abstraction for structured HA events is a good observability pattern.
  • The GetClusterHandler endpoint for cluster status is the right operational interface to add.
  • Good test coverage breadth: 25 unit + 14 integration + 9 e2e tests.

Summary

Severity Count
Critical (data correctness) 4
High (security + build) 8
Medium 4
Low 2

The four critical correctness bugs and the security issues (especially token file permissions and auth bypass) must be resolved before merging to main. The missing snapshot resync should either be implemented or explicitly documented as a known limitation with a tracking issue.

…tion

During test teardown, servers stop one by one. Remaining nodes' Ratis
enters CLOSED state when peers become unreachable. The HealthMonitor
would restart Ratis every 3 seconds, each restart creating new gRPC
and election threads, eventually hitting OOM (unable to create native
thread). Setting HA_HEALTH_CHECK_INTERVAL=0 in BaseRaftHATest disables
the monitor for all HA tests. Tests that need the monitor explicitly
(e.g. RaftHealthMonitorRecoveryIT) can override and re-enable it.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@codacy-production
Copy link
Copy Markdown

codacy-production bot commented Apr 12, 2026

Not up to standards ⛔

🔴 Issues 14 medium · 24 minor

Alerts:
⚠ 38 issues (≤ 0 issues of at least minor severity)

Results:
38 new issues

Category Results
BestPractice 13 medium
CodeStyle 24 minor
Complexity 1 medium

View in Codacy

🟢 Metrics 909 complexity

Metric Results
Complexity 909

View in Codacy

🟢 Coverage 61.58% diff coverage · -8.41% coverage variation

Metric Results
Coverage variation -8.41% coverage variation
Diff coverage 61.58% diff coverage

View coverage diff in Codacy

Coverage variation details
Coverable lines Covered lines Coverage
Common ancestor commit (a763aee) 116168 85454 73.56%
Head commit (41e8e82) 147446 (+31278) 96060 (+10606) 65.15% (-8.41%)

Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch: <coverage of head commit> - <coverage of common ancestor commit>

Diff coverage details
Coverable lines Covered lines Diff coverage
Pull request (#3731) 2730 1681 61.58%

Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: <covered lines added or modified>/<coverable lines added or modified> * 100%

TIP This summary will be updated as you push new changes. Give us feedback

@claude
Copy link
Copy Markdown

claude bot commented Apr 12, 2026

Review Update - April 12, 2026

Checking the current state of the branch against issues raised in earlier reviews. Several things have been addressed; a few blockers remain.

Addressed since last review

  • ThreadLocal buffers - schemaWalBuffer and schemaBucketDeltaBuffer are now cleared in a finally block in recordFileChanges. Good.
  • Cluster token security - The file-based token is gone. The token is now derived via PBKDF2 (100k iterations, SHA-256) from clusterName + rootPassword at startup. No file to protect. The file-permission concern is moot.
  • Snapshot recovery - Functional snapshot install is now in ArcadeStateMachine.notifyInstallSnapshotFromLeader(), which downloads a ZIP from the leader, validates against zip-slip, and re-opens the database. The core path works.
  • Zip-slip protection - targetFile.toPath().normalize().toAbsolutePath().startsWith(dbPath) check is present. Good.

Remaining blockers

1. ATTRIBUTIONS.md not updated (mandatory per project rules)

ha-raft/pom.xml adds ratis-server, ratis-grpc, and ratis-metrics-default (Apache Ratis 3.2.2, Apache 2.0). Apache Ratis ships a NOTICE file, so both ATTRIBUTIONS.md and the project NOTICE file must be updated. This is a hard requirement per CLAUDE.md. The current ATTRIBUTIONS.md has no Ratis entry.

2. No bounds checking in codec / WAL deserialization (crash safety)

In RaftLogEntryCodec.decodeTxEntry():

  • compressedLength = dis.readInt() - no upper-bound check before new byte[compressedLength]. A corrupt entry with Integer.MAX_VALUE causes OOM and crashes the node.
  • Same for uncompressedLength and deltaCount.

In ArcadeStateMachine.deserializeWalTransaction():

  • pageCount = buf.getInt() - no check for negative or unreasonably large values.
  • deltaSize = page.changesTo - page.changesFrom + 1 - if changesTo < changesFrom (corrupt data), new byte[deltaSize] throws NegativeArraySizeException.
  • No buf.remaining() checks before reads.

These entries come from Raft log entries written by cluster peers, so exploitation requires a compromised peer - but defensive bounds checks prevent accidental corruption from taking down the whole cluster. Suggested fix for the codec:

private static final int MAX_ENTRY_BYTES = 256 * 1024 * 1024; // 256 MB sanity cap

int compressedLength = dis.readInt();
if (compressedLength < 0 || compressedLength > MAX_ENTRY_BYTES)
  throw new IOException("Invalid compressed length: " + compressedLength);

3. Leadership-change TOCTOU in applyTxEntry

Both applyTxEntry and applySchemaEntry check raftHAServer.isLeader() at apply time and skip if true. The problem: a node that was leader when it submitted the entry to the Raft log but lost leadership before Ratis calls applyTransaction will now apply the entry (because isLeader() returns false). Meanwhile, the new leader will skip applying it (because it is leader). The entry is in the log on both nodes but applied on neither. This is a permanent inconsistency.

The canonical Raft fix is: never skip log application on the state machine. Apply every entry on every node unconditionally. The leader should handle duplicate-write prevention at the layer above (e.g., write-ahead the result before submitting to Raft, or use idempotent apply semantics). The current skip-on-leader design is fundamentally at odds with Raft's guarantee that all nodes apply the same log in the same order.


Minor items still open

  • SnapshotManager is now dead code - recovery bypasses it entirely and goes through ArcadeStateMachine. The class still claims it handles snapshot recovery in its Javadoc. Either wire it in or remove it to avoid confusion.
  • docs/plans/ files - 13 planning documents (4 HA-specific) are still in the diff. These are internal design artifacts and should not be committed to the repository.
  • PBKDF2 token degrades silently - if SERVER_ROOT_PASSWORD is null or blank, initClusterToken() uses just the cluster name as the KDF input. Add a check and fail fast if the root password is missing rather than producing a weak token.

Summary

Issue Status
ThreadLocal buffer leak Fixed
Cluster token file permissions N/A - redesigned to PBKDF2
Snapshot recovery functional Fixed
Zip-slip protection Fixed
ATTRIBUTIONS.md / NOTICE Still missing
Codec bounds checking Still missing
Leader-skip TOCTOU race Still present
docs/plans/ AI artifacts in repo Still present
SnapshotManager dead code confusion Minor - still present
PBKDF2 silent degradation Minor - still present

The TOCTOU issue in the state machine is the most architecturally significant remaining concern. The ATTRIBUTIONS.md omission is a hard requirement per project rules.

@github-actions
Copy link
Copy Markdown
Contributor

📜 License Compliance Check

✅ License check passed. See artifacts for full report.

License Summary (first 50 lines)

Lists of 387 third-party dependencies.
     (Apache License 2.0) LZ4 Java Compression (at.yawk.lz4:lz4-java:1.10.4 - https://github.com/yawkat/lz4-java)
     (EPL 2.0) (GNU Lesser General Public License) Logback Classic Module (ch.qos.logback:logback-classic:1.5.32 - http://logback.qos.ch/logback-classic)
     (EPL 2.0) (GNU Lesser General Public License) Logback Core Module (ch.qos.logback:logback-core:1.5.32 - http://logback.qos.ch/logback-core)
     (Apache 2) ArcadeDB BOLT Protocol (com.arcadedb:arcadedb-bolt:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-bolt/)
     (Apache 2) ArcadeDB Console (com.arcadedb:arcadedb-console:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-console/)
     (Apache 2) ArcadeDB Engine (com.arcadedb:arcadedb-engine:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-engine/)
     (Apache 2) ArcadeDB GraphQL (com.arcadedb:arcadedb-graphql:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-graphql/)
     (Apache 2) ArcadeDB Gremlin (com.arcadedb:arcadedb-gremlin:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-gremlin/)
     (Apache 2) ArcadeDB gRPC Stubs (com.arcadedb:arcadedb-grpc:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-grpc/)
     (Apache 2) ArcadeDB gRPC Client (com.arcadedb:arcadedb-grpc-client:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-grpc-client/)
     (Apache 2) ArcadeDB gRpcW (com.arcadedb:arcadedb-grpcw:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-grpcw/)
     (Apache 2) ArcadeDB HA Raft (com.arcadedb:arcadedb-ha-raft:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-ha-raft/)
     (Apache 2) ArcadeDB Integration (com.arcadedb:arcadedb-integration:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-integration/)
     (Apache 2) ArcadeDB load tests (com.arcadedb:arcadedb-load-tests:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-load-tests/)
     (Apache 2) ArcadeDB Metrics (com.arcadedb:arcadedb-metrics:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-metrics/)
     (Apache 2) ArcadeDB MongoDB Wire Protocol (com.arcadedb:arcadedb-mongodbw:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-mongodbw/)
     (Apache 2) ArcadeDB Network (com.arcadedb:arcadedb-network:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-network/)
     (Apache 2) ArcadeDB PostgresW (com.arcadedb:arcadedb-postgresw:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-postgresw/)
     (Apache 2) ArcadeDB RedisW (com.arcadedb:arcadedb-redisw:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-redisw/)
     (Apache 2) ArcadeDB Server (com.arcadedb:arcadedb-server:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-server/)
     (Apache 2) ArcadeDB Studio (com.arcadedb:arcadedb-studio:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-studio/)
     (Apache 2) ArcadeDB Test Utils (com.arcadedb:arcadedb-test-utils:26.4.1-SNAPSHOT - https://arcadedata.com/arcadedb-test-utils/)
     (Apache License 2.0) HPPC Collections (com.carrotsearch:hppc:0.7.1 - http://labs.carrotsearch.com/hppc.html/hppc)
     (Apache License 2.0) Metrics Core (com.codahale.metrics:metrics-core:3.0.2 - http://metrics.codahale.com/metrics-core/)
     (The Apache License, Version 2.0) com.conversantmedia:disruptor (com.conversantmedia:disruptor:1.2.21 - https://github.com/conversant/disruptor)
     (Apache License 2.0) Jackson-annotations (com.fasterxml.jackson.core:jackson-annotations:2.20 - https://github.com/FasterXML/jackson)
     (Apache License 2.0) Jackson-annotations (com.fasterxml.jackson.core:jackson-annotations:2.21 - https://github.com/FasterXML/jackson)
     (Apache License 2.0) Jackson-core (com.fasterxml.jackson.core:jackson-core:2.21.1 - https://github.com/FasterXML/jackson-core)
     (Apache License 2.0) Jackson-core (com.fasterxml.jackson.core:jackson-core:2.21.2 - https://github.com/FasterXML/jackson-core)
     (Apache License 2.0) jackson-databind (com.fasterxml.jackson.core:jackson-databind:2.21.1 - https://github.com/FasterXML/jackson)
     (Apache License 2.0) jackson-databind (com.fasterxml.jackson.core:jackson-databind:2.21.2 - https://github.com/FasterXML/jackson)
     (Apache License 2.0) Jackson-dataformat-YAML (com.fasterxml.jackson.dataformat:jackson-dataformat-yaml:2.21.1 - https://github.com/FasterXML/jackson-dataformats-text)
     (Apache License 2.0) Jackson datatype: JSR310 (com.fasterxml.jackson.datatype:jackson-datatype-jsr310:2.21.1 - https://github.com/FasterXML/jackson-modules-java8/jackson-datatype-jsr310)
     (Apache License 2.0) Caffeine cache (com.github.ben-manes.caffeine:caffeine:2.3.1 - https://github.com/ben-manes/caffeine)
     (Apache License 2.0) docker-java-api (com.github.docker-java:docker-java-api:3.7.1 - https://github.com/docker-java/docker-java)
     (Apache License 2.0) docker-java-transport (com.github.docker-java:docker-java-transport:3.7.1 - https://github.com/docker-java/docker-java)
     (Apache License 2.0) docker-java-transport-zerodep (com.github.docker-java:docker-java-transport-zerodep:3.7.1 - https://github.com/docker-java/docker-java)
     (Apache Software License, version 2.0) (Lesser General Public License, version 3 or greater) btf (com.github.java-json-tools:btf:1.3 - https://github.com/java-json-tools/btf)
     (Apache Software License, version 2.0) (Lesser General Public License, version 3 or greater) jackson-coreutils (com.github.java-json-tools:jackson-coreutils:2.0 - https://github.com/java-json-tools/jackson-coreutils)
     (Apache Software License, version 2.0) (Lesser General Public License, version 3 or greater) jackson-coreutils-equivalence (com.github.java-json-tools:jackson-coreutils-equivalence:1.0 - https://github.com/java-json-tools/jackson-coreutils)
     (Apache Software License, version 2.0) (Lesser General Public License, version 3 or greater) json-patch (com.github.java-json-tools:json-patch:1.13 - https://github.com/java-json-tools/json-patch)
     (Apache Software License, version 2.0) (Lesser General Public License, version 3 or greater) json-schema-core (com.github.java-json-tools:json-schema-core:1.2.14 - https://github.com/java-json-tools/json-schema-core)
     (Apache Software License, version 2.0) (Lesser General Public License, version 3 or greater) json-schema-validator (com.github.java-json-tools:json-schema-validator:2.2.14 - https://github.com/java-json-tools/json-schema-validator)
     (Apache Software License, version 2.0) (Lesser General Public License, version 3 or greater) msg-simple (com.github.java-json-tools:msg-simple:1.2 - https://github.com/java-json-tools/msg-simple)
     (Apache Software License, version 2.0) (Lesser General Public License, version 3 or greater) uri-template (com.github.java-json-tools:uri-template:0.10 - https://github.com/java-json-tools/uri-template)
     (Apache License 2.0) (GNU Lesser General Public License) javaparser-core (com.github.javaparser:javaparser-core:3.26.3 - https://github.com/javaparser/javaparser-core)
     (Apache License 2.0) JCIP Annotations under Apache License (com.github.stephenc.jcip:jcip-annotations:1.0-1 - http://stephenc.github.com/jcip-annotations)
     (Apache License 2.0) Google Android Annotations Library (com.google.android:annotations:4.1.1.4 - http://source.android.com/)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: Raft-based High Availability using Apache Ratis

2 participants