fix: OpenCypher aggregation with CASE returns multiple rows (#3858) by robfrank · Pull Request #3859 · ArcadeData/arcadedb

robfrank · 2026-04-12T17:02:49Z

Summary

Fix expression parser mis-parsing sum(CASE WHEN ... END) as a bare CaseExpression instead of FunctionCallExpression("sum", [CaseExpression]), which caused the planner to treat it as a non-aggregation grouping key and split results into multiple rows
Add text-length guards to tryParseSpecialFunctions and parseExpressionFromText so recursive CASE detection only matches when the CASE covers the full expression (same pattern already used for reduce/pattern comprehension)
Add two regression tests: pure aggregation with CASE (single-row result) and mixed aggregation+grouping with CASE

Test plan

New test aggregationWithCaseNoImplicitGroupBy - exact reproduction of the reported query
New test aggregationWithCaseAndGroupByKey - CASE inside aggregation alongside a real grouping key
All 8 CypherCaseTest tests pass
All 5147 Cypher/OpenCypher engine tests pass (0 failures)
All 62 aggregation-related tests pass

🤖 Generated with Claude Code

The expression parser's recursive CASE detection found CaseExpression nodes nested inside function arguments (e.g. sum(CASE...)) and returned the bare CASE, discarding the outer aggregation wrapper. This caused the planner to treat the expression as a non-aggregation grouping key, splitting results into multiple rows instead of one. Add text-length guards to tryParseSpecialFunctions and parseExpressionFromText so a found CASE context is only used when it covers the full expression, matching the pattern already used for reduce and pattern comprehension. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

claude · 2026-04-12T17:04:11Z

Code Review

Good fix for the reported issue. The root cause is well identified: findCaseExpressionRecursive would find a CaseExpression nested inside a function call argument and return it as if it were the entire expression, causing the outer aggregation wrapper (sum(...)) to be lost.

Approach consistency

The length-based guard caseCtx.getText().length() >= nodeText.length() - 2 is consistent with the identical pattern already used in tryParseComprehensions and tryParseListPredicates for reduce, allReduce, patternComprehension, listComprehension, and listItemsPredicate. This is the right approach given the existing design.

Minor concern: magic number `-2`

The - 2 tolerance is unexplained in the new guards, though tryParseListPredicates at least has the inline comment // allow for whitespace. The new guards in ExpressionTypeDetector and CypherExpressionBuilder would benefit from the same:

if (caseCtx != null && caseCtx.getText().length() >= exprText.length() - 2) // allow for whitespace

It is worth documenting why 2 is sufficient - ANTLR's getText() strips whitespace from tokens, so the only difference comes from delimiters like parentheses (1 char each). A single-character function name wrapping a CASE gives f(CASE...) vs CASE..., a difference of 3, so the guard correctly rejects it. This reasoning is not obvious from the code alone.

Potential edge case: `EXISTS` lacks the same guard

In tryParseSpecialFunctions, EXISTS detection does not have the length guard:

final Cypher25Parser.ExistsExpressionContext existsCtx = builder.findExistsExpressionRecursive(ctx);
if (existsCtx != null)
    return builder.parseExistsExpression(existsCtx);

If EXISTS can appear as a function argument in any valid query pattern (e.g. a user-defined function wrapping it), it would have the same mis-parsing bug. This is likely not a practical issue since EXISTS is typically a top-level predicate, but it is worth noting for completeness or adding a TODO comment.

Tests

The two regression tests are well written:

aggregationWithCaseNoImplicitGroupBy - exact reproduction of OpenCypher: Aggregation with CASE statement returns multiple rows (implicit GROUP BY) #3858, clean and focused
aggregationWithCaseAndGroupByKey - correct coverage of the grouping-key variant

Both use the project-standard assertThat(...).isEqualTo(...) style and properly assert there are no extra rows.

One small nit: assertThat((boolean) results.hasNext()).isTrue() could be written as assertThat(results.hasNext()).isTrue() since hasNext() already returns a primitive boolean - the cast is redundant (and existing tests in this file already do the same, so this is low priority).

Summary

The fix is correct, well-tested, and consistent with existing patterns in the parser. The main actionable suggestion is adding // allow for whitespace inline comments to match the style in tryParseListPredicates, and optionally adding a note about the EXISTS case. Otherwise this looks good to merge.

codacy-production · 2026-04-12T17:04:14Z

Up to standards ✅

🟢 Issues 0 issues

Results:
0 new issues

View in Codacy

🟢 Metrics 4 complexity

Metric Results

Complexity 4

View in Codacy

_{TIP This summary will be updated as you push new changes. Give us feedback}

gemini-code-assist

Code Review

This pull request addresses a parsing bug (issue #3858) where CASE expressions nested inside aggregations were mis-parsed as the primary expression, leading to incorrect implicit grouping. The fix introduces a length guard in CypherExpressionBuilder and ExpressionTypeDetector to ensure that CASE expressions are only matched when they span the full expression text. Regression tests were added to confirm the fix. Review feedback indicates that the same length guard should be applied to EXISTS and shortestPath expressions to prevent similar mis-parsing issues, and notes a minor efficiency improvement regarding the placement of the node.getText() call.

gemini-code-assist · 2026-04-12T17:04:27Z

engine/src/main/java/com/arcadedb/query/opencypher/parser/CypherExpressionBuilder.java

+    final String nodeText = node.getText();
    final Cypher25Parser.CaseExpressionContext caseCtx = findCaseExpressionRecursive(node);
-    if (caseCtx != null)
+    if (caseCtx != null && caseCtx.getText().length() >= nodeText.length() - 2)
      return parseCaseExpression(caseCtx);

    final Cypher25Parser.ExtendedCaseExpressionContext extCaseCtx = findExtendedCaseExpressionRecursive(node);
-    if (extCaseCtx != null)
+    if (extCaseCtx != null && extCaseCtx.getText().length() >= nodeText.length() - 2)
      return parseExtendedCaseExpression(extCaseCtx);


The node.getText() call is performed eagerly, which is inefficient if no CASE expression is found in the subtree. More importantly, the recursive search logic used here is also applied to EXISTS (line 258) and shortestPath expressions, but they lack the length guard. This means expressions like count(EXISTS { ... }) or my_func(shortestPath(...)) will still be mis-parsed as just the inner special expression, losing the outer wrapper and potentially causing incorrect query results. You should apply the same guard to all special expressions that use recursive search to ensure consistent and correct parsing.

Applied the length guard to EXISTS in both tryParseSpecialFunctions and parseExpressionFromText, and to shortestPath in tryParseSpecialFunctions. Also consolidated the exprText/nodeText variable to the top with an inline comment explaining the - 2 tolerance. All 5147 Cypher tests still pass.

gemini-code-assist · 2026-04-12T17:04:27Z

engine/src/main/java/com/arcadedb/query/opencypher/parser/ExpressionTypeDetector.java

+    final String exprText = ctx.getText();
    final Cypher25Parser.CaseExpressionContext caseCtx = builder.findCaseExpressionRecursive(ctx);
-    if (caseCtx != null)
+    if (caseCtx != null && caseCtx.getText().length() >= exprText.length() - 2)
      return builder.parseCaseExpression(caseCtx);

    final Cypher25Parser.ExtendedCaseExpressionContext extCaseCtx = builder.findExtendedCaseExpressionRecursive(ctx);
-    if (extCaseCtx != null)
+    if (extCaseCtx != null && extCaseCtx.getText().length() >= exprText.length() - 2)
      return builder.parseExtendedCaseExpression(extCaseCtx);


The length guard added here for CASE expressions should also be applied to EXISTS (line 58) and shortestPath (line 75) expressions. These expressions use the same recursive search mechanism and are susceptible to the same mis-parsing bug when nested inside other functions (e.g., count(EXISTS { ... })). Applying the guard consistently across all special functions in tryParseSpecialFunctions will prevent similar issues for those expression types and ensure the parser correctly identifies the top-level expression.

Fixed. Applied the length guard to EXISTS (line 58) and shortestPath (line 75) in tryParseSpecialFunctions, and moved the exprText variable above all checks with a block comment explaining the rationale and the - 2 tolerance. 5147 Cypher tests pass.

Extend the same text-length guard added for CASE to EXISTS and shortestPath detection in both tryParseSpecialFunctions and parseExpressionFromText. These use the same recursive descent pattern and are susceptible to the same class of mis-parsing when nested inside function arguments. Add inline comments explaining the - 2 tolerance (whitespace that ANTLR's getText() strips). Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

claude · 2026-04-12T17:29:42Z

test comment - please ignore

claude · 2026-04-12T17:30:23Z

test2 - ignore

claude · 2026-04-12T17:30:45Z

test3 - ignore

claude · 2026-04-12T17:30:52Z

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project Overview

ArcadeDB is a Multi-Model DBMS (Database Management System) built for extreme performance. It's a Java-based project that supports multiple data models (Graph, Document, Key/Value, Search Engine, Time Series, Vector Embedding) and query languages (SQL, Cypher, Gremlin, GraphQL, MongoDB Query Language).

Response Formatting

Never use the em dash character (—) in responses. Use a normal dash (-), a comma, or rephrase instead.

Project Instructions

Before writing any code:

State how you will verify this change works (e.g., unit tests, integration tests, manual testing)
Write the tests first (TDD approach) whenever possible
Ensure code adheres to existing coding standards and styles
Then implement the code
Run verification and iterate until it passes
Run all the connected tests could be affected by the change to ensure nothing is broken (no need to run the whole suite, it would take too long)

General design principles:

reuse existing components whenever is possible
don't use fully qualified names if possible, always import the class and just use the name
don't include a new dependency unless is strictly necessary, and they MUST be Apache 2.0 compatible:
- ✅ ALLOWED: Apache 2.0, MIT, BSD (2/3-Clause), EPL 1.0/2.0, UPL 1.0, EDL 1.0, LGPL 2.1+ (for libraries only), CC0/Public Domain
- ❌ FORBIDDEN: GPL, AGPL, proprietary licenses without explicit permission, SSPL, Commons Clause
- When adding a dependency, you MUST update ATTRIBUTIONS.md and, if Apache-licensed with a NOTICE file, incorporate required notices into the main NOTICE file
for Studio (webapp), limit to jquery and bootstrap 5. If necessary use 3rd party libs, but they must be Apache 2.0 compatible (see allowed licenses above)
always bear in mind PERFORMANCE. It must be always your mantra: performance and lightweight on garbage collector. If you can, prefer using arrays of primitives to List of Objects
if you need to use JSON, use the class com.arcadedb.serializer.json.JSONObject. Leverage the getter methods that accept the default value as 2nd argument, so you don't need to check if they present or not null = less boilerplate code
same thing for JSON arrays: use com.arcadedb.serializer.json.JSONArray class
code styles:
adhere to the existing code
if statements with only one child sub-statement don't require a curly brace open/close, keep it simple
use the final keyword when possible on variables and parameters
all new server-side code must be tested with a test case. Check existing test case to see the framework and style to use
write a regression test
after every change in the backend (Java), compile the project and fix all the issues until the compilation passes
test all the new and old components you've modified before considering the job finished. Please do not provide something untested
always keep in mind speed and security with ArcadeDB, do not introduce security hazard or code that could slow down other parts unless requested/approved
do not commit on git, I will do it after a review
remove any System.out you used for debug when you have finished
For test cases, prefer this syntax: assertThat(property.isMandatory()).isTrue();
don't add Claude as author of any source code

Build and Development Commands

Maven (Java)

Build entire project: mvn clean install
Build without tests: mvn clean install -DskipTests
Run unit tests: mvn test
Run integration tests: mvn test -DskipITs=false
Build specific module: cd <module> && mvn clean install

Studio Frontend (Node.js)

Build frontend: cd studio && npm run build
Development mode: cd studio && npm run dev
Security audit: cd studio && npm run security-audit

Server Operations

Start server: Use packaged scripts in package/src/main/scripts/server.sh (Unix) or server.bat (Windows)
Console: Use package/src/main/scripts/console.sh or console.bat

Distribution Builder

The modular distribution builder (package/arcadedb-builder.sh) creates custom ArcadeDB distributions:

Production builds (download from releases):

cd package
./arcadedb-builder.sh --version=26.1.0 --modules=gremlin,studio

Development builds (use local Maven repository):

# Build modules first
mvn clean install -DskipTests

# Create distribution with local modules
cd package
VERSION=$(mvn -f ../pom.xml help:evaluate -Dexpression=project.version -q -DforceStdout)
./arcadedb-builder.sh \
    --version=$VERSION \
    --modules=console,gremlin,studio \
    --local-repo \
    --skip-docker

Testing the builder:

cd package
./test-builder-local.sh

Testing Commands

Run specific test class: mvn test -Dtest=ClassName
Run tests with specific pattern: mvn test -Dtest="*Pattern*"
Performance tests: Located in src/test/java/performance/ packages

Codebase Navigation Map

ANTLR Grammars

engine/src/main/antlr4/com/arcadedb/query/sql/grammar/SQLLexer.g4 — SQL lexer
engine/src/main/antlr4/com/arcadedb/query/sql/grammar/SQLParser.g4 — SQL parser
engine/src/main/antlr4/com/arcadedb/query/opencypher/grammar/Cypher25Lexer.g4 — Cypher lexer
engine/src/main/antlr4/com/arcadedb/query/opencypher/grammar/Cypher25Parser.g4 — Cypher parser

SQL Engine Key Files

Parser AST nodes (170+ classes): engine/src/main/java/com/arcadedb/query/sql/parser/
- SuffixIdentifier.java — property access (e.g., record.field)
- BaseIdentifier.java, LevelZeroIdentifier.java — identifier resolution
- Expression.java, BaseExpression.java, MathExpression.java — expression evaluation
- Projection.java, ProjectionItem.java — SELECT projection handling
- SelectStatement.java, MatchStatement.java — statement AST roots
- NestedProjection.java, NestedProjectionItem.java — nested projection (e.g., {*})
- FunctionCall.java, MethodCall.java — function/method invocation
- Modifier.java — chained modifiers (array selectors, method calls, suffix identifiers)
- WhereClause.java, BooleanExpression.java — filter conditions
- LetClause.java, LetItem.java — LET variable bindings
Executor steps (158 classes): engine/src/main/java/com/arcadedb/query/sql/executor/
- SelectExecutionPlanner.java — main SELECT execution planner
- ProjectionCalculationStep.java — projection evaluation step
- LetExpressionStep.java, GlobalLetExpressionStep.java — LET evaluation
- FetchFromTypeStep.java, FetchFromIndexStep.java — data source steps
- FilterStep.java, FilterByClustersStep.java — filtering steps
SQL methods (50+ classes): engine/src/main/java/com/arcadedb/query/sql/method/
- string/ — toLowerCase, toUpperCase, trim, split, etc.
- collection/ — size, keys, values, sort, etc.
- conversion/ — asInteger, asString, asList, asJSON, etc.
SQL functions: engine/src/main/java/com/arcadedb/function/sql/
- graph/ — out, in, both, outE, inE, bothE, shortestPath, dijkstra, etc.
- coll/ — difference, intersect, symmetricDifference
- fulltext/ — search field/index functions

OpenCypher Engine Key Files

AST (40+ classes): engine/src/main/java/com/arcadedb/query/opencypher/ast/
- CypherStatement.java, MatchClause.java, ReturnClause.java, WhereClause.java
- CreateClause.java, MergeClause.java, DeleteClause.java, SetClause.java
- PatternElement.java, NodePattern.java, RelationshipPattern.java
Executor: engine/src/main/java/com/arcadedb/query/opencypher/executor/
Optimizer: engine/src/main/java/com/arcadedb/query/opencypher/optimizer/
Planner: engine/src/main/java/com/arcadedb/query/opencypher/planner/
Tests: engine/src/test/java/com/arcadedb/query/opencypher/

Graph Engine

engine/src/main/java/com/arcadedb/graph/
- Vertex.java, MutableVertex.java, ImmutableVertex.java — vertex types
- Edge.java, MutableEdge.java, ImmutableEdge.java — edge types
- GraphEngine.java — core graph operations
- EdgeSegment.java, MutableEdgeSegment.java — edge storage segments
- EdgeLinkedList.java — edge linked list structure
- EdgeIterator.java, VertexIterator.java — traversal iterators

Server / HTTP

HTTP handlers: server/src/main/java/com/arcadedb/server/http/handler/
- DatabaseAbstractHandler.java — base handler (wraps commands in transactions)
- PostCommandHandler.java — POST /command endpoint
- PostQueryHandler.java, GetQueryHandler.java — query endpoints
HA: server/src/main/java/com/arcadedb/server/ha/
Security: server/src/main/java/com/arcadedb/server/security/

Test Locations (by module)

engine/src/test/java/ — 746 test files (SQL, Cypher, graph, storage, schema, indexing)
server/src/test/java/ — 114 test files (HTTP API, HA, security)
gremlin/src/test/java/ — 29 test files
integration/src/test/java/ — 22 test files
bolt/src/test/java/ — 10 test files
graphql/src/test/java/ — 9 test files

Architecture Overview

Core Modules

engine/: Core database engine, storage, indexing, query execution (SQL, OpenCypher, Polyglot)
server/: HTTP/REST API, WebSocket support, clustering/HA, MCP server
network/: Network communication layer
console/: CLI console for interactive database operations
studio/: Web-based administration interface (JavaScript/Node.js)
metrics/: Server metrics collection and reporting
integration/: Integration utilities
test-utils/: Shared test utilities

Wire Protocol Modules

gremlin/: Apache Tinkerpop Gremlin support
graphql/: GraphQL API support
mongodbw/: MongoDB wire protocol compatibility
redisw/: Redis wire protocol compatibility
postgresw/: PostgreSQL wire protocol compatibility
bolt/: Neo4j Bolt wire protocol compatibility
grpc/: gRPC protocol definitions
grpcw/: gRPC wire protocol module
grpc-client/: gRPC client library

Key Engine Components

Database Management: com.arcadedb.database.* - Database lifecycle, transactions, ACID compliance
Storage Engine: com.arcadedb.engine.* - Low-level storage, page management, WAL
SQL Query Engine: com.arcadedb.query.sql.* - SQL query parsing, execution planning
OpenCypher Engine: com.arcadedb.query.opencypher.* - Native Cypher implementation with ANTLR parser, AST, optimizer (filter pushdown, index selection, expand-into, join ordering), and step-based execution. Has both optimizer and legacy execution paths — changes to clause handling may need updates in multiple paths
Polyglot Engine: com.arcadedb.query.polyglot.* - GraalVM-based scripting support
Schema Management: com.arcadedb.schema.* - Type definitions, property management
Index System: com.arcadedb.index.* - LSM-Tree indexes, full-text, vector indexes
Graph Engine: com.arcadedb.graph.* - Vertex/Edge management, graph traversals
Serialization: com.arcadedb.serializer.* - Binary serialization, JSON handling

Server Components

HTTP API: com.arcadedb.server.http.* - REST endpoints, request handling
High Availability: com.arcadedb.server.ha.* - Clustering, replication, leader election
Security: com.arcadedb.server.security.* - Authentication, authorization
Monitoring: com.arcadedb.server.monitor.* - Metrics, query profiling, health checks
MCP: com.arcadedb.server.mcp.* - Model Context Protocol server support

Development Guidelines

Java Version

Required: Java 21+ (main branch)
Legacy: Java 17 support on java17 branch

Code Structure

Uses Maven multi-module project structure
Low-level Java optimization for performance ("LLJ: Low Level Java")
Minimal garbage collection pressure design
Thread-safe implementations throughout

Testing Approach

Framework: JUnit 5 (Jupiter) with AssertJ assertions
Unit tests in each module's src/test/java
Integration tests with IT suffix
Performance tests in performance/ packages
TestContainers used in e2e/ and load-tests/ modules for containerized testing
Separate test databases in databases/ for isolation

Database Features to Consider

ACID Transactions: Full transaction support with isolation levels
Multi-Model: Single database can store graphs, documents, key/value pairs
Query Languages: SQL (OrientDB-compatible), Cypher, Gremlin, MongoDB queries
Indexing: LSM-Tree indexes, full-text (Lucene), vector embeddings
High Availability: Leader-follower replication, automatic failover
Wire Protocols: HTTP/JSON, PostgreSQL, MongoDB, Redis, Neo4j Bolt, gRPC compatibility

Common Development Tasks

Adding New Features

Create tests first (TDD approach)
Implement in appropriate module
Update schema if needed
Add integration tests
Update documentation

Working with Indexes

LSM-Tree implementation in com.arcadedb.index.lsm.*
Index creation via Schema API
Performance testing with large datasets recommended

Query Development

SQL parsing in com.arcadedb.query.sql.*
SQL execution plans in com.arcadedb.query.sql.executor.*
OpenCypher engine in com.arcadedb.query.opencypher.* — has ast/, parser/, executor/, optimizer/, planner/, rewriter/ sub-packages
OpenCypher tests in engine/src/test/java/com/arcadedb/query/opencypher/
Test with various query patterns and data sizes

Server Development

HTTP handlers in com.arcadedb.server.http.handler.*
Security integration required for new endpoints
WebSocket support for real-time features

Wire Protocol Module Dependencies

Standard: All wire protocol modules (gremlin, graphql, mongodbw, redisw, postgresw, bolt, grpcw) must use provided scope for arcadedb-server dependency
Rationale: Server remains the assembly point; prevents dependency duplication in distributions
Pattern:
- Main server dependency → scope: provided
- Server test-jar → scope: test
- Cross-module test dependencies → scope: test only (e.g., postgresw should not depend on gremlin for compilation)
- Integration/format handlers → scope: compile only if in src/main/java (e.g., gremlin's GraphML/GraphSON handlers)
Enforcement: Code review process ensures:
- Protocol modules do NOT depend on other protocol modules in compile scope
- Each protocol module has arcadedb-server in provided scope only (not compile)
- Only the server assembly (package module) and coverage reporting modules can aggregate protocol modules

Example:

<dependency>
    <groupId>com.arcadedb</groupId>
    <artifactId>arcadedb-server</artifactId>
    <scope>provided</scope>
</dependency>
<dependency>
    <groupId>com.arcadedb</groupId>
    <artifactId>arcadedb-server</artifactId>
    <version>${project.parent.version}</version>
    <scope>test</scope>
    <type>test-jar</type>
</dependency>

Important Notes

Pre-commit hooks: This project uses pre-commit for code quality checks (trailing whitespace, Prettier for Java/XML formatting, etc.)
Code formatting: Prettier with requirePragma: true and printWidth: 160 — only formats files with a @format pragma
Security: Never log or expose sensitive data (passwords, tokens, etc.)
Performance: Always consider memory and CPU impact of changes
Compatibility: Maintain backward compatibility for API changes
Licensing: All code must comply with Apache 2.0 license
Modular Builder: Script to create custom distributions with selected modules (see package/README-BUILDER.md)

claude · 2026-04-12T17:31:55Z

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project Overview

ArcadeDB is a Multi-Model DBMS (Database Management System) built for extreme performance. It's a Java-based project that supports multiple data models (Graph, Document, Key/Value, Search Engine, Time Series, Vector Embedding) and query languages (SQL, Cypher, Gremlin, GraphQL, MongoDB Query Language).

Response Formatting

Never use the em dash character (—) in responses. Use a normal dash (-), a comma, or rephrase instead.

Project Instructions

Before writing any code:

State how you will verify this change works (e.g., unit tests, integration tests, manual testing)
Write the tests first (TDD approach) whenever possible
Ensure code adheres to existing coding standards and styles
Then implement the code
Run verification and iterate until it passes
Run all the connected tests could be affected by the change to ensure nothing is broken (no need to run the whole suite, it would take too long)

General design principles:

reuse existing components whenever is possible
don't use fully qualified names if possible, always import the class and just use the name
don't include a new dependency unless is strictly necessary, and they MUST be Apache 2.0 compatible:
- ✅ ALLOWED: Apache 2.0, MIT, BSD (2/3-Clause), EPL 1.0/2.0, UPL 1.0, EDL 1.0, LGPL 2.1+ (for libraries only), CC0/Public Domain
- ❌ FORBIDDEN: GPL, AGPL, proprietary licenses without explicit permission, SSPL, Commons Clause
- When adding a dependency, you MUST update ATTRIBUTIONS.md and, if Apache-licensed with a NOTICE file, incorporate required notices into the main NOTICE file
for Studio (webapp), limit to jquery and bootstrap 5. If necessary use 3rd party libs, but they must be Apache 2.0 compatible (see allowed licenses above)
always bear in mind PERFORMANCE. It must be always your mantra: performance and lightweight on garbage collector. If you can, prefer using arrays of primitives to List of Objects
if you need to use JSON, use the class com.arcadedb.serializer.json.JSONObject. Leverage the getter methods that accept the default value as 2nd argument, so you don't need to check if they present or not null = less boilerplate code
same thing for JSON arrays: use com.arcadedb.serializer.json.JSONArray class
code styles:
adhere to the existing code
if statements with only one child sub-statement don't require a curly brace open/close, keep it simple
use the final keyword when possible on variables and parameters
all new server-side code must be tested with a test case. Check existing test case to see the framework and style to use
write a regression test
after every change in the backend (Java), compile the project and fix all the issues until the compilation passes
test all the new and old components you've modified before considering the job finished. Please do not provide something untested
always keep in mind speed and security with ArcadeDB, do not introduce security hazard or code that could slow down other parts unless requested/approved
do not commit on git, I will do it after a review
remove any System.out you used for debug when you have finished
For test cases, prefer this syntax: assertThat(property.isMandatory()).isTrue();
don't add Claude as author of any source code

Build and Development Commands

Maven (Java)

Build entire project: mvn clean install
Build without tests: mvn clean install -DskipTests
Run unit tests: mvn test
Run integration tests: mvn test -DskipITs=false
Build specific module: cd <module> && mvn clean install

Studio Frontend (Node.js)

Build frontend: cd studio && npm run build
Development mode: cd studio && npm run dev
Security audit: cd studio && npm run security-audit

Server Operations

Start server: Use packaged scripts in package/src/main/scripts/server.sh (Unix) or server.bat (Windows)
Console: Use package/src/main/scripts/console.sh or console.bat

Distribution Builder

The modular distribution builder (package/arcadedb-builder.sh) creates custom ArcadeDB distributions:

Production builds (download from releases):

cd package
./arcadedb-builder.sh --version=26.1.0 --modules=gremlin,studio

Development builds (use local Maven repository):

# Build modules first
mvn clean install -DskipTests

# Create distribution with local modules
cd package
VERSION=$(mvn -f ../pom.xml help:evaluate -Dexpression=project.version -q -DforceStdout)
./arcadedb-builder.sh \
    --version=$VERSION \
    --modules=console,gremlin,studio \
    --local-repo \
    --skip-docker

Testing the builder:

cd package
./test-builder-local.sh

Testing Commands

Run specific test class: mvn test -Dtest=ClassName
Run tests with specific pattern: mvn test -Dtest="*Pattern*"
Performance tests: Located in src/test/java/performance/ packages

Codebase Navigation Map

ANTLR Grammars

engine/src/main/antlr4/com/arcadedb/query/sql/grammar/SQLLexer.g4 — SQL lexer
engine/src/main/antlr4/com/arcadedb/query/sql/grammar/SQLParser.g4 — SQL parser
engine/src/main/antlr4/com/arcadedb/query/opencypher/grammar/Cypher25Lexer.g4 — Cypher lexer
engine/src/main/antlr4/com/arcadedb/query/opencypher/grammar/Cypher25Parser.g4 — Cypher parser

SQL Engine Key Files

Parser AST nodes (170+ classes): engine/src/main/java/com/arcadedb/query/sql/parser/
- SuffixIdentifier.java — property access (e.g., record.field)
- BaseIdentifier.java, LevelZeroIdentifier.java — identifier resolution
- Expression.java, BaseExpression.java, MathExpression.java — expression evaluation
- Projection.java, ProjectionItem.java — SELECT projection handling
- SelectStatement.java, MatchStatement.java — statement AST roots
- NestedProjection.java, NestedProjectionItem.java — nested projection (e.g., {*})
- FunctionCall.java, MethodCall.java — function/method invocation
- Modifier.java — chained modifiers (array selectors, method calls, suffix identifiers)
- WhereClause.java, BooleanExpression.java — filter conditions
- LetClause.java, LetItem.java — LET variable bindings
Executor steps (158 classes): engine/src/main/java/com/arcadedb/query/sql/executor/
- SelectExecutionPlanner.java — main SELECT execution planner
- ProjectionCalculationStep.java — projection evaluation step
- LetExpressionStep.java, GlobalLetExpressionStep.java — LET evaluation
- FetchFromTypeStep.java, FetchFromIndexStep.java — data source steps
- FilterStep.java, FilterByClustersStep.java — filtering steps
SQL methods (50+ classes): engine/src/main/java/com/arcadedb/query/sql/method/
- string/ — toLowerCase, toUpperCase, trim, split, etc.
- collection/ — size, keys, values, sort, etc.
- conversion/ — asInteger, asString, asList, asJSON, etc.
SQL functions: engine/src/main/java/com/arcadedb/function/sql/
- graph/ — out, in, both, outE, inE, bothE, shortestPath, dijkstra, etc.
- coll/ — difference, intersect, symmetricDifference
- fulltext/ — search field/index functions

OpenCypher Engine Key Files

AST (40+ classes): engine/src/main/java/com/arcadedb/query/opencypher/ast/
- CypherStatement.java, MatchClause.java, ReturnClause.java, WhereClause.java
- CreateClause.java, MergeClause.java, DeleteClause.java, SetClause.java
- PatternElement.java, NodePattern.java, RelationshipPattern.java
Executor: engine/src/main/java/com/arcadedb/query/opencypher/executor/
Optimizer: engine/src/main/java/com/arcadedb/query/opencypher/optimizer/
Planner: engine/src/main/java/com/arcadedb/query/opencypher/planner/
Tests: engine/src/test/java/com/arcadedb/query/opencypher/

Graph Engine

engine/src/main/java/com/arcadedb/graph/
- Vertex.java, MutableVertex.java, ImmutableVertex.java — vertex types
- Edge.java, MutableEdge.java, ImmutableEdge.java — edge types
- GraphEngine.java — core graph operations
- EdgeSegment.java, MutableEdgeSegment.java — edge storage segments
- EdgeLinkedList.java — edge linked list structure
- EdgeIterator.java, VertexIterator.java — traversal iterators

Server / HTTP

HTTP handlers: server/src/main/java/com/arcadedb/server/http/handler/
- DatabaseAbstractHandler.java — base handler (wraps commands in transactions)
- PostCommandHandler.java — POST /command endpoint
- PostQueryHandler.java, GetQueryHandler.java — query endpoints
HA: server/src/main/java/com/arcadedb/server/ha/
Security: server/src/main/java/com/arcadedb/server/security/

Test Locations (by module)

engine/src/test/java/ — 746 test files (SQL, Cypher, graph, storage, schema, indexing)
server/src/test/java/ — 114 test files (HTTP API, HA, security)
gremlin/src/test/java/ — 29 test files
integration/src/test/java/ — 22 test files
bolt/src/test/java/ — 10 test files
graphql/src/test/java/ — 9 test files

Architecture Overview

Core Modules

engine/: Core database engine, storage, indexing, query execution (SQL, OpenCypher, Polyglot)
server/: HTTP/REST API, WebSocket support, clustering/HA, MCP server
network/: Network communication layer
console/: CLI console for interactive database operations
studio/: Web-based administration interface (JavaScript/Node.js)
metrics/: Server metrics collection and reporting
integration/: Integration utilities
test-utils/: Shared test utilities

Wire Protocol Modules

gremlin/: Apache Tinkerpop Gremlin support
graphql/: GraphQL API support
mongodbw/: MongoDB wire protocol compatibility
redisw/: Redis wire protocol compatibility
postgresw/: PostgreSQL wire protocol compatibility
bolt/: Neo4j Bolt wire protocol compatibility
grpc/: gRPC protocol definitions
grpcw/: gRPC wire protocol module
grpc-client/: gRPC client library

Key Engine Components

Database Management: com.arcadedb.database.* - Database lifecycle, transactions, ACID compliance
Storage Engine: com.arcadedb.engine.* - Low-level storage, page management, WAL
SQL Query Engine: com.arcadedb.query.sql.* - SQL query parsing, execution planning
OpenCypher Engine: com.arcadedb.query.opencypher.* - Native Cypher implementation with ANTLR parser, AST, optimizer (filter pushdown, index selection, expand-into, join ordering), and step-based execution. Has both optimizer and legacy execution paths — changes to clause handling may need updates in multiple paths
Polyglot Engine: com.arcadedb.query.polyglot.* - GraalVM-based scripting support
Schema Management: com.arcadedb.schema.* - Type definitions, property management
Index System: com.arcadedb.index.* - LSM-Tree indexes, full-text, vector indexes
Graph Engine: com.arcadedb.graph.* - Vertex/Edge management, graph traversals
Serialization: com.arcadedb.serializer.* - Binary serialization, JSON handling

Server Components

HTTP API: com.arcadedb.server.http.* - REST endpoints, request handling
High Availability: com.arcadedb.server.ha.* - Clustering, replication, leader election
Security: com.arcadedb.server.security.* - Authentication, authorization
Monitoring: com.arcadedb.server.monitor.* - Metrics, query profiling, health checks
MCP: com.arcadedb.server.mcp.* - Model Context Protocol server support

Development Guidelines

Java Version

Required: Java 21+ (main branch)
Legacy: Java 17 support on java17 branch

Code Structure

Uses Maven multi-module project structure
Low-level Java optimization for performance ("LLJ: Low Level Java")
Minimal garbage collection pressure design
Thread-safe implementations throughout

Testing Approach

Framework: JUnit 5 (Jupiter) with AssertJ assertions
Unit tests in each module's src/test/java
Integration tests with IT suffix
Performance tests in performance/ packages
TestContainers used in e2e/ and load-tests/ modules for containerized testing
Separate test databases in databases/ for isolation

Database Features to Consider

ACID Transactions: Full transaction support with isolation levels
Multi-Model: Single database can store graphs, documents, key/value pairs
Query Languages: SQL (OrientDB-compatible), Cypher, Gremlin, MongoDB queries
Indexing: LSM-Tree indexes, full-text (Lucene), vector embeddings
High Availability: Leader-follower replication, automatic failover
Wire Protocols: HTTP/JSON, PostgreSQL, MongoDB, Redis, Neo4j Bolt, gRPC compatibility

Common Development Tasks

Adding New Features

Create tests first (TDD approach)
Implement in appropriate module
Update schema if needed
Add integration tests
Update documentation

Working with Indexes

LSM-Tree implementation in com.arcadedb.index.lsm.*
Index creation via Schema API
Performance testing with large datasets recommended

Query Development

SQL parsing in com.arcadedb.query.sql.*
SQL execution plans in com.arcadedb.query.sql.executor.*
OpenCypher engine in com.arcadedb.query.opencypher.* — has ast/, parser/, executor/, optimizer/, planner/, rewriter/ sub-packages
OpenCypher tests in engine/src/test/java/com/arcadedb/query/opencypher/
Test with various query patterns and data sizes

Server Development

HTTP handlers in com.arcadedb.server.http.handler.*
Security integration required for new endpoints
WebSocket support for real-time features

Wire Protocol Module Dependencies

Standard: All wire protocol modules (gremlin, graphql, mongodbw, redisw, postgresw, bolt, grpcw) must use provided scope for arcadedb-server dependency
Rationale: Server remains the assembly point; prevents dependency duplication in distributions
Pattern:
- Main server dependency → scope: provided
- Server test-jar → scope: test
- Cross-module test dependencies → scope: test only (e.g., postgresw should not depend on gremlin for compilation)
- Integration/format handlers → scope: compile only if in src/main/java (e.g., gremlin's GraphML/GraphSON handlers)
Enforcement: Code review process ensures:
- Protocol modules do NOT depend on other protocol modules in compile scope
- Each protocol module has arcadedb-server in provided scope only (not compile)
- Only the server assembly (package module) and coverage reporting modules can aggregate protocol modules

Example:

<dependency>
    <groupId>com.arcadedb</groupId>
    <artifactId>arcadedb-server</artifactId>
    <scope>provided</scope>
</dependency>
<dependency>
    <groupId>com.arcadedb</groupId>
    <artifactId>arcadedb-server</artifactId>
    <version>${project.parent.version}</version>
    <scope>test</scope>
    <type>test-jar</type>
</dependency>

Important Notes

Pre-commit hooks: This project uses pre-commit for code quality checks (trailing whitespace, Prettier for Java/XML formatting, etc.)
Code formatting: Prettier with requirePragma: true and printWidth: 160 — only formats files with a @format pragma
Security: Never log or expose sensitive data (passwords, tokens, etc.)
Performance: Always consider memory and CPU impact of changes
Compatibility: Maintain backward compatibility for API changes
Licensing: All code must comply with Apache 2.0 license
Modular Builder: Script to create custom distributions with selected modules (see package/README-BUILDER.md)

claude · 2026-04-12T17:35:19Z

Code Review

Clean fix for issue 3858. The length-guard approach is consistent with established patterns in ExpressionTypeDetector.

Strengths

Follows existing pattern from reduce, patternComprehension, and listItemsPredicate
Surgical fix with no unrelated changes
Two solid regression tests covering both pure-aggregation and mixed aggregation+grouping cases
Code style matches project conventions (single-line if without braces, final locals, AssertJ)

Concerns

1. count(*) is not guarded (minor risk)

In tryParseSpecialFunctions, the countStar check has no length guard. If count() ever appears as a sub-expression inside another construct (e.g., coalesce(count(), 0)), this would have the same mis-parsing issue that CASE had. Worth adding a guard for consistency and future safety, even if no current bug exists.

2. The -2 tolerance comment could be more precise

The comment says the -2 "allows for whitespace that ANTLR's getText() strips." Since getText() strips whitespace on both sides, both strings are already whitespace-free - so the tolerance is covering something else (likely minor structural differences in how parent nodes represent their text boundary). The comment matches the pattern already in tryParseListPredicates so it is consistent, but a more precise explanation would help future maintainers understand when -2 is the right threshold.

3. Nit - unnecessary boolean cast in tests

assertThat((boolean) results.hasNext()).isTrue() - the explicit boolean cast is unnecessary since hasNext() already returns a primitive boolean.

Bottom line

The fix is correct and well-tested. The main suggestion before merge: add the length guard to count(*) for completeness. Everything else is minor.

gemini-code-assist bot reviewed Apr 12, 2026

View reviewed changes

Uh oh!

Conversation

robfrank commented Apr 12, 2026

Summary

Test plan

Uh oh!

claude bot commented Apr 12, 2026

Code Review

Approach consistency

Minor concern: magic number -2

Potential edge case: EXISTS lacks the same guard

Tests

Summary

Uh oh!

codacy-production bot commented Apr 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Up to standards ✅

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Apr 12, 2026

Choose a reason for hiding this comment

Uh oh!

robfrank Apr 12, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 12, 2026

Choose a reason for hiding this comment

Uh oh!

robfrank Apr 12, 2026

Choose a reason for hiding this comment

Uh oh!

claude bot commented Apr 12, 2026

Uh oh!

claude bot commented Apr 12, 2026

Uh oh!

claude bot commented Apr 12, 2026

Uh oh!

claude bot commented Apr 12, 2026

CLAUDE.md

Project Overview

Response Formatting

Project Instructions

Build and Development Commands

Maven (Java)

Studio Frontend (Node.js)

Server Operations

Distribution Builder

Testing Commands

Codebase Navigation Map

ANTLR Grammars

SQL Engine Key Files

OpenCypher Engine Key Files

Graph Engine

Server / HTTP

Test Locations (by module)

Architecture Overview

Core Modules

Wire Protocol Modules

Key Engine Components

Server Components

Development Guidelines

Java Version

Code Structure

Testing Approach

Database Features to Consider

Common Development Tasks

Adding New Features

Working with Indexes

Query Development

Server Development

Wire Protocol Module Dependencies

Important Notes

Uh oh!

claude bot commented Apr 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

CLAUDE.md

Project Overview

Minor concern: magic number `-2`

Potential edge case: `EXISTS` lacks the same guard

codacy-production bot commented Apr 12, 2026 •

edited

Loading

claude bot commented Apr 12, 2026 •

edited

Loading

claude bot commented Apr 12, 2026 •

edited

Loading