resolve subgraph staging sequences via regclass by jrgemignani · Pull Request #2446 · apache/age

jrgemignani · 2026-06-22T23:18:49Z

The vertex/edge staging copies in create_subgraph() generated new graphids with nextval(%L), which binds the sequence as a string literal and invokes the nextval(text) overload. That re-resolves the schema-qualified sequence name on each call.

Cast the literal to regclass (nextval(%L::regclass)) so the sequence is resolved once to its OID, matching how AGE defines its label id defaults (nextval('schema.seq'::regclass)). Applied to both the vertex and edge staging queries, in sql/age_subgraph.sql and the identical body in the age--1.7.0--y.y.y.sql upgrade template so the upgrade-path catalog comparison still matches.

Behavior is unchanged; all 38 regression tests pass against PostgreSQL 18.

Addresses Copilot review feedback on #2441.

Co-authored-by: GitHub Copilot (Claude Opus 4.8) <[email protected]>

modified: age--1.7.0--y.y.y.sql
modified: sql/age_subgraph.sql

The vertex/edge staging copies in create_subgraph() generated new graphids with nextval(%L), which binds the sequence as a string literal and invokes the nextval(text) overload. That re-resolves the schema-qualified sequence name on each call. Cast the literal to regclass (nextval(%L::regclass)) so the sequence is resolved once to its OID, matching how AGE defines its label id defaults (nextval('schema.seq'::regclass)). Applied to both the vertex and edge staging queries, in sql/age_subgraph.sql and the identical body in the age--1.7.0--y.y.y.sql upgrade template so the upgrade-path catalog comparison still matches. Behavior is unchanged; all 38 regression tests pass against PostgreSQL 18. Addresses Copilot review feedback on apache#2441. Co-authored-by: GitHub Copilot (Claude Opus 4.8) <[email protected]> modified: age--1.7.0--y.y.y.sql modified: sql/age_subgraph.sql

Copilot

Pull request overview

Updates the dynamic SQL in ag_catalog.create_subgraph() to call nextval() with a regclass-typed sequence reference, avoiding repeated schema-qualified sequence name resolution during vertex/edge staging while keeping behavior consistent with AGE’s existing sequence default patterns.

Changes:

Use nextval(%L::regclass) (instead of nextval(%L)) for vertex staging ID generation.
Use nextval(%L::regclass) (instead of nextval(%L)) for edge staging ID generation.
Apply the same change to both the canonical SQL and the extension upgrade template to keep upgrade-path objects identical.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File	Description
sql/age_subgraph.sql	Casts formatted sequence literal to `regclass` in vertex/edge staging `nextval()` calls.
age--1.7.0--y.y.y.sql	Mirrors the same `regclass` casting change in the upgrade template for catalog match parity.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

gregfelice

LGTM — approving.

Confirmed the change is behavior-preserving: %L is fed dst_seq_fqn (the fully-qualified sequence name), so nextval(%L::regclass) yields nextval('schema.seq'::regclass), matching how AGE defines its label-id defaults. Both the old nextval(text) overload and '...'::regclass resolve the name to an OID through the same identifier-parsing path; the cast just resolves the constant once at plan time instead of re-resolving per row via the text overload. Same semantics, less per-row work.

Also verified the identical change is applied to both sql/age_subgraph.sql and the age--1.7.0--y.y.y.sql upgrade template, so the upgrade-path catalog comparison still matches. Existing regress/sql/subgraph.sql exercises create_subgraph, and the change is behavior-preserving, so no new test is needed.

* Make age extension usable from shared_preload_libraries (#2438) When AGE is loaded via shared_preload_libraries, its hooks (post_parse_analyze, set_rel_pathlist, object_access) are active before CREATE EXTENSION age is run. This causes errors when non-Cypher queries trigger those hooks and ag_catalog does not yet exist. Changes: - Add is_age_extension_exist() with a relcache callback cache so that checking pg_extension is not repeated on every hook invocation. - Guard post_parse_analyze, set_rel_pathlist, and object_access hooks with is_age_extension_exist() so they become no-ops when the extension is not installed. - Refactor ag_ProcessUtility_hook to detect CREATE/DROP EXTENSION age and broadcast a relcache invalidation via CacheInvalidateRelcacheByRelid(ExtensionRelationId) so other backends update their cached extension state. - Wrap DROP EXTENSION processing in PG_TRY/PG_CATCH to restore object_access_hook if the drop fails (e.g. dependent objects). - Skip _PG_init during pg_upgrade (IsBinaryUpgrade) to avoid hook registration when the binary-upgrade machinery is running. - Add regression tests that verify hooks do not error when ag_catalog schema is absent. * feature: add create_subgraph() (#2441) Add the feature create_subgraph() for materialized induced-subgraph extraction. Add ag_catalog.create_subgraph(new_graph, from_graph, node_filter, relationship_filter) which materializes a new, persistent, fully Cypher-queryable AGE graph as the induced subgraph of an existing graph. Selection follows the graph-theory induced-subgraph definition as operationalized by Neo4j GDS gds.graph.filter(): * a vertex is kept iff node_filter holds ('*' keeps all); * an edge is kept iff relationship_filter holds AND both of its endpoints were kept (no dangling edges). Filters are arbitrary Cypher predicates bound to `n` (nodes) and `r` (relationships) and are evaluated by AGE's own Cypher engine against the source graph, so the full predicate language is available; label selection uses label(n)/label(r) since the match pattern is fixed. Implementation notes: * Result is a real, ACID, registered graph (create_graph + create_v/ elabel), not a virtual view; it composes with cypher() and itself. * Entity graphids are reassigned from the destination labels' own sequences (graphid encodes a per-graph label id), and edge endpoints are remapped through an old->new vertex map, enforcing the induced rule via inner joins. * Source label tables are read with FROM ONLY to avoid double-copying children under PostgreSQL table inheritance. * Properties of any agtype are preserved; self-loops and parallel edges (multigraph structure) are retained. * SECURITY INVOKER: reads respect the caller's table privileges and RLS; the new graph is owned by the caller. * Validates NULL/identical graph names, missing source, pre-existing destination, and a reserved dollar-quote token in predicates. Wire-up: * sql/age_subgraph.sql (new) registered in sql/sql_files after age_pg_upgrade; identical body added to age--1.7.0--y.y.y.sql so the upgrade-path catalog comparison matches. * regress/sql/subgraph.sql + expected output (new), added to REGRESS. Covers full copy, vertex-induced, node+rel, label-only edge drop, bipartite, empty result, composability, self-loops/parallel edges, property fidelity, and error cases over a ~4500-vertex / 2000-edge source graph. All 38 regression tests pass against PostgreSQL 18. Co-authored-by: GitHub Copilot (Claude Opus 4.8) <[email protected]> modified: Makefile modified: age--1.7.0--y.y.y.sql new file: regress/expected/subgraph.out new file: regress/sql/subgraph.sql new file: sql/age_subgraph.sql modified: sql/sql_files * cypher_vle: add ORDER BY to non-deterministic RETURN queries (#2434) Several VLE regression queries RETURN multiple rows without an ORDER BY, so their row order depends on traversal/scan order and can vary between runs and platforms. Add ORDER BY ASC to those queries (on the path, edge-list, or graphid as appropriate) so the expected output is stable. The queries are pinned by path (p), edge list (e), or graphid (id(u)/id(v)/id(e[n])) depending on what each RETURN projects. Full audit of cypher_vle: all 38 multi-row result blocks were checked. After this change, every multi-row RETURN is deterministically ordered except the two SELECT * FROM show_list_use_vle('list01') calls, which are already deterministic because the function body orders its results with RETURN v ORDER BY id(v) (added in #2417); their result blocks are unchanged by this commit. This is a test-only change (regress/sql/cypher_vle.sql and regress/expected/cypher_vle.out); no extension C code or SQL is modified. Row counts are unchanged (pure reordering). All 37 regression tests pass (installcheck) on PostgreSQL 18.3. Co-authored-by: GitHub Copilot <noreply@github.com> modified: regress/expected/cypher_vle.out modified: regress/sql/cypher_vle.sql * age_global_graph: stabilize regression tests (#2431) age_global_graph: stabilize regression tests under concurrent xid load Wrap both vertex_stats() context-building phases in a single BEGIN ISOLATION LEVEL REPEATABLE READ; ... COMMIT; transaction so the three calls share one snapshot. This prevents the snapshot-fallback path in is_ggctx_invalid() from purging an already-built graph context when concurrent xid activity (autovacuum, parallel installcheck, replication, shared CI) advances the snapshot between calls, which would otherwise make the targeted delete_global_graphs(name) checks return false instead of the expected true. Read Committed is insufficient because it acquires a fresh snapshot per statement; REPEATABLE READ pins one snapshot for the whole transaction. Also add explicit ORDER BY id to the three direct-SQL label-table SELECTs (_ag_label_vertex x2, _ag_label_edge) that return multiple rows, so their output no longer depends on heap scan order. This is a test-only change (regress/sql/age_global_graph.sql and regress/expected/age_global_graph.out); no extension C code or SQL is modified. All 37 regression tests pass (installcheck) on PostgreSQL 18.3. Co-authored-by: GitHub Copilot <noreply@github.com> modified: regress/expected/age_global_graph.out modified: regress/sql/age_global_graph.sql * Makefile: add installcheck-existing target and improve readability (#2437) Add an "installcheck-existing" target that runs the regression suite against an already-running PostgreSQL server, complementing the default "installcheck" (which builds a private temp instance). It points pg_regress at the server via the standard libpq variables (PGHOST/PGPORT/PGUSER; PGDATABASE defaults to contrib_regression) and lets pg_regress create the database and load the extension through --load-extension=age. It deliberately avoids --use-existing -- that option skips database creation and disables --load-extension -- so no manual CREATE EXTENSION step is required. The upgrade test (age_upgrade) is excluded because it stages synthetic extension files into the local sharedir that a running server would not see. Readability and maintainability improvements (no behavior change): - Derive age_sql from AGE_CURR_VER (read from age.control) so the version number is defined in exactly one place. - Add section banners and a top-of-file layout/target index; group the scattered upgrade-test pieces and move the ag_scanner flex rule in with the other parser-generation rules. - Wrap the long REGRESS_OPTS and EXTRA_CLEAN assignments across lines. - Fix the DATA filter-out pattern to use a double dash (age--%--y.y.y.sql) matching the actual template filename; the prior single-dash pattern only matched via greedy '%' expansion. - Anchor the age.control version regex (/^default_version/). - Replace the hardcoded "31 tests" comment with generic wording and add a "help" target listing the common targets. Verified on PostgreSQL 18: make installcheck (temp instance) and make installcheck-existing both pass; clean rebuild and make clean unaffected. Co-authored-by: GitHub Copilot <noreply@github.com> modified: Makefile * Make ag_catalog ownership and built-in resolution explicit (#2440) AGE places all of its objects in the ag_catalog schema. Make the assumptions around that schema explicit so installs and upgrades behave predictably regardless of how a database is provisioned: - Ownership-checked install: CREATE EXTENSION age installs into ag_catalog only when that schema does not already exist under a different owner, keeping ownership of AGE's catalog well-defined. - Deterministic name resolution: the pg_upgrade helper functions resolve built-ins from pg_catalog first and schema-qualify their format()/hashtext() calls, so their behavior does not depend on what else is defined in ag_catalog. - README note describing ag_catalog ownership and the install-time check. The upgrade script applies the same helper changes so existing installations get them on ALTER EXTENSION UPDATE. Adds an extension_security regression test covering the ownership check and the qualified-call / search_path properties. Assisted-by: GitHub Copilot (Claude Opus 4.8) modified: Makefile modified: README.md modified: age--1.7.0--y.y.y.sql new file: regress/expected/extension_security.out new file: regress/sql/extension_security.sql modified: sql/age_main.sql modified: sql/age_pg_upgrade.sql Resolved Conflicts: Makefile * cypher_with: add ORDER BY to non-deterministic RETURN queries (#2436) Several cypher_with regression queries RETURN multiple rows without an ORDER BY, so their row order depends on heap/scan order and can vary between runs, build types, and platforms. Add ORDER BY ASC to those queries so the expected output is stable. Ordering keys use id() (a single int64 that bypasses the locale-sensitive string comparison path and is reproducible from the test's deterministic setup order), or the projected path/scalar where that is what the query returns. Where the underlying vertex/edge was dropped by a WITH projection, its id is threaded through as an alias rather than reordering the projection. Full audit of cypher_with: all 23 multi-row result blocks were checked. After this change, every multi-row, non-EXPLAIN RETURN is deterministically ordered. The two remaining unordered multi-row blocks are left as-is: - "RETURN lbl" returns two identical "Person" rows, so order cannot drift; - the 13 EXPLAIN (VERBOSE, COSTS OFF) plan blocks emit a fixed serial plan (no parallel/gather nodes), so their row order is already deterministic. This is a test-only change (regress/sql/cypher_with.sql and regress/expected/cypher_with.out); no extension C code or SQL is modified. Row counts are unchanged (pure reordering). All 37 regression tests pass (installcheck) on PostgreSQL 18.3. Co-authored-by: GitHub Copilot <noreply@github.com> modified: regress/expected/cypher_with.out modified: regress/sql/cypher_with.sql * Add shortest_path / all_shortest_paths SRFs (#2430) Add two C set-returning functions that compute unweighted (hop-count) shortest paths over the cached global graph adjacency via BFS, callable both at the SQL top level and inside a cypher() RETURN: - age_shortest_path(...) -> the single shortest path (0 or 1 rows) - age_all_shortest_paths(...) -> every shortest path, one per row The signature follows the natural Cypher argument order (graph, start, end, edge_types, direction, min_hops, max_hops), registered in sql/agtype_typecast.sql (install) and age--1.7.0--y.y.y.sql (upgrade). Unimplemented parameters fail loudly: multiple relationship types and a non-zero min_hops raise ERRCODE_FEATURE_NOT_SUPPORTED. A single edge type (string or one-element array) is honored, and a NULL endpoint yields no rows per Cypher null semantics (wrong-typed endpoints / NULL graph still error). To call the SRFs inside a cypher() RETURN, transform_cypher_return now sets query->hasTargetSRFs (it was the only results-producing clause that didn't, so the planner never added a ProjectSet node), and transform_FuncCall auto-prepends the graph name for snake_case shortest_path / all_shortest_paths. camelCase names are reserved for the future native grammar. Robustness: - BFS guards against non-existent endpoints (returns 0 rows instead of crashing) and honors CHECK_FOR_INTERRUPTS. - An unknown edge label now matches no edges instead of silently traversing all of them (get_label_relation returns InvalidOid). Adds the age_shortest_path regression test (directed/undirected, label filtering, parallel edges, self-loops, max_hops, the not-supported stubs, NULL and non-existent endpoint/graph guards). 38/38 installcheck pass. Co-authored-by: Copilot <copilot@github.com> modified: Makefile modified: age--1.7.0--y.y.y.sql modified: sql/agtype_typecast.sql modified: src/backend/parser/cypher_clause.c modified: src/backend/parser/cypher_expr.c modified: src/backend/utils/adt/age_vle.c new file: regress/expected/age_shortest_path.out new file: regress/sql/age_shortest_path.sql * Support pattern expressions as boolean expressions (#2360) * Support pattern expressions in WHERE clause via GLR parser (issue #1577) Enable bare graph patterns as boolean expressions in WHERE clauses: MATCH (a:Person), (b:Person) WHERE (a)-[:KNOWS]->(b) -- now valid, equivalent to EXISTS(...) RETURN a.name, b.name Previously, this required wrapping in EXISTS(): WHERE EXISTS((a)-[:KNOWS]->(b)) The bare pattern syntax is standard openCypher and is used extensively in Neo4j. Its absence was the most frequently cited migration blocker. Implementation approach: - Switch the Cypher parser from LALR(1) to Bison GLR mode. GLR handles the inherent ambiguity between parenthesized expressions '(' expr ')' and graph path nodes '(' var_name label_opt props ')' by forking the parse stack and discarding the failing path. - Add anonymous_path as an expr_atom alternative with %dprec 1 (lower priority than expression path at %dprec 2). The action wraps the pattern in a cypher_sub_pattern + EXISTS SubLink, reusing the same transform_cypher_sub_pattern() machinery as explicit EXISTS(). - Extract make_exists_pattern_sublink() helper shared by both EXISTS(pattern) and bare pattern rules. - Fix YYLLOC_DEFAULT to use YYRHSLOC() for GLR compatibility. - %dprec annotations on expr_var/var_name_opt resolve the reduce/reduce conflict between expression variables and pattern node variables. Conflict budget: 7 shift/reduce (path extension vs arithmetic on -/<), 3 reduce/reduce (expr_var vs var_name_opt on )/}/=). All are expected and handled correctly by GLR forking + %dprec disambiguation. All 32 regression tests pass (31 existing + 1 new). New pattern_expression test covers: bare patterns, NOT patterns, labeled nodes, AND/OR combinations, left-directed patterns, anonymous nodes, multi-hop patterns, EXISTS() backward compatibility, and non-pattern expression regression checks. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Address Copilot review: comment placement, %expect docs, test wording 1. Move "Helper function to create an ExplainStmt node" comment from above make_exists_pattern_sublink() to above make_explain_stmt() where it belongs. 2. Add block comment documenting the %expect/%expect-rr conflict budget: 7 S/R from path vs arithmetic on - and <, 3 R/R from expr_var vs var_name_opt on ) } =. 3. Clarify test comment: "Regular expressions" -> "Regular (non-pattern) expressions" to avoid confusion with regex. Regression test: pattern_expression OK. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Address Copilot round 3: broaden scope, remove %expect fragility - Pattern expressions are now accepted anywhere an expr is valid (RETURN, WITH, SET, CASE, boolean combinations), not only WHERE. This matches openCypher semantics and documents the broader surface area that was already implicitly enabled by adding anonymous_path to expr_atom. Added regression tests for each new context: RETURN projection (bare and AS-aliased), mixed with other projections, CASE WHEN, boolean AND/OR combinators, SET to persist a computed boolean property, and WITH ... WHERE pipeline. - Remove the hardcoded `%expect 7` / `%expect-rr 3` conflict budget from cypher_gram.y. The exact conflict counts can drift across Bison versions and distros, which would break builds even though the grammar is correct (GLR handles the conflicts at runtime via fork + %dprec). Instead, pass -Wno-conflicts-sr / -Wno-conflicts-rr via BISONFLAGS in the Makefile so the build stays clean without binding us to a specific Bison release. Kept a block comment in the grammar explaining why GLR conflicts are expected and how they resolve. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Address jrgemignani review: keep -Werror, restore %expect budget Reverts the broad `-Werror` drop and the no-%expect approach from the prior round on jrgemignani's request. The earlier framing — that conflict counts drift across Bison versions, so %expect is fragile — overcorrected: it removed the only build-time alarm bell for unintended new conflicts. Makefile: keep -Werror so any unexpected Bison warning (unused rules, undeclared types, etc.) still fails the build; downgrade only the two conflict categories to plain warnings via -Wno-error=conflicts-sr -Wno-error=conflicts-rr. pgxs auto-adds -Wno-deprecated, so existing %name-prefix= / %pure-parser directives remain non-erroring. cypher_gram.y: add `%expect 7` and `%expect-rr 3` matching the Bison 3.8.2 totals. Bison treats %expect as exact-match, not as a ceiling — any deviation fails the build and forces an audit of the new conflicts. Comment updated to reflect that future Bison versions reporting different counts should bump the numbers explicitly with a version note in the commit message, rather than removing the directive. No grammar or runtime change. Cassert installcheck 34/34 AGE tests green. * Add follow-up regression coverage for pattern expressions (#2360) Addresses the non-blocking test-coverage follow-ups from the review: pattern expressions in additional contexts opened up by allowing anonymous_path as an expr_atom. New cases (all verified against a PG18 build): - Single-node pattern on a bound variable (a:Label). Documented as an EXISTS existence check, NOT an openCypher label predicate: a matching label is always true, and a non-matching label hits AGE's pre-existing "multiple labels for variable" restriction (captured as expected error). - Pattern expressions inside list and map literals. - Pattern expressions as function arguments: collect() shows correct per-row booleans; count() counts all rows (non-null bool) -- documented so the value is not mistaken for a bug. - Pattern expression in OPTIONAL MATCH ... WHERE (null-preserving). - EXISTS() and a bare pattern combined in one predicate. make installcheck: 33/33 green. --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Add reduce() list folding function (#2435) Implement the openCypher reduce(acc = init, var IN list | body) expression, which folds an arbitrary expression over a list, threading an accumulator across the elements in list order. This closes a long-standing gap (reduce() was previously unsupported) and works both at the SQL top level and inside a cypher() RETURN/WHERE. Implementation -------------- reduce() is desugared, at transform time, into a correlated scalar subquery over a new ordered aggregate rather than a new executor node, so no PostgreSQL core changes are required: CASE WHEN list IS NULL THEN NULL ELSE COALESCE((SELECT ag_catalog.age_reduce( <init>, '<serialized-body>'::text, r.elem ORDER BY r.ord) FROM unnest(<list>) WITH ORDINALITY AS r(elem, ord)), <init>) END - A new cypher_reduce extensible node carries the accumulator/element names and the init/list/body expressions (grammar production, keyword, and the copy/out/read serialization plumbing). - The fold body is transformed against a throwaway two-column agtype namespace, its accumulator and element Var references are rewritten to PARAM_EXEC params 0 and 1, and it is serialized with nodeToString() into a text argument. - age_reduce_transfn (a custom agtype aggregate transition function) deserializes and compiles the body once per group with ExecInitExpr, then evaluates it per element with ExecEvalExpr, rebinding the two params. The body is normalized to agtype at transform time so a boolean or other non-agtype result cannot be misread as a by-reference Datum. Semantics --------- - List order is preserved (unnest WITH ORDINALITY + aggregate ORDER BY). - An empty list yields the initial value; a NULL list yields NULL. - The list and initial value may reference outer-query variables (e.g. reduce(total = 0, n IN nodes(p) | total + n.age)); the body may reference only the accumulator and element. - Arithmetic, string, list-building, boolean/comparison (AND/OR/=/>), CASE, and element property-access bodies are all supported. - Outer-variable, query-parameter, nested-reduce, and aggregate references inside the body raise a clean ERRCODE_FEATURE_NOT_SUPPORTED error. - reduce is registered as a safe keyword so it remains usable as a property or map key, preserving backward compatibility. Tests ----- Adds the age_reduce regression test (registered in the install SQL and the upgrade template so age_upgrade passes), covering: arithmetic/product/string folds; order sensitivity; empty/NULL list; NULL element and NULL init; list-building and CASE bodies; boolean and comparison bodies; element property access; multiple and nested (in list/init) reduce(); reduce() in boolean expressions, WHERE, and list comprehensions; folds over collected nodes and node list properties; the not-supported rejections; and reduce as a map key. Following reviewer feedback, three further semantics-coverage gaps are pinned directly so the mechanisms that make the aggregate desugaring correct are exercised by tests rather than only correct by inspection: - A fold body that produces null mid-fold and then recovers: the agtype 'null' running state is a readable value, so a later element folds back out of it (distinct from "null propagates to a null result", which was already covered). - An empty list with a NULL initial value: COALESCE(<no rows>, init) yields NULL, kept distinct from a body that legitimately folds to agtype 'null', which must not be resurrected to the initial value. - A type error and a runtime division-by-zero error in the body: both abort cleanly out of the standalone per-element evaluator rather than corrupting the running aggregate state. All multi-row results are ordered. 42/42 installcheck pass. Future work ----------- The body restriction (accumulator and element only) is a property of the standalone expression evaluation and can be relaxed without core changes: - Allow loop-invariant outer-variable and cypher $parameter references in the body by capturing them as additional eager aggregate arguments bound to extra param slots. - Support a nested reduce() inside the body via an SPI-based evaluation fallback for subquery-bearing bodies. Aggregates inside the body remain intentionally unsupported, matching the openCypher specification. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> modified: Makefile modified: age--1.7.0--y.y.y.sql new file: regress/expected/age_reduce.out new file: regress/sql/age_reduce.sql modified: sql/age_aggregate.sql modified: src/backend/nodes/ag_nodes.c modified: src/backend/nodes/cypher_copyfuncs.c modified: src/backend/nodes/cypher_outfuncs.c modified: src/backend/nodes/cypher_readfuncs.c modified: src/backend/parser/cypher_analyze.c modified: src/backend/parser/cypher_clause.c modified: src/backend/parser/cypher_gram.y modified: src/backend/utils/adt/agtype.c modified: src/include/nodes/ag_nodes.h modified: src/include/nodes/cypher_copyfuncs.h modified: src/include/nodes/cypher_nodes.h modified: src/include/nodes/cypher_outfuncs.h modified: src/include/nodes/cypher_readfuncs.h modified: src/include/parser/cypher_kwlist.h * resolve subgraph staging sequences via regclass (#2446) The vertex/edge staging copies in create_subgraph() generated new graphids with nextval(%L), which binds the sequence as a string literal and invokes the nextval(text) overload. That re-resolves the schema-qualified sequence name on each call. Cast the literal to regclass (nextval(%L::regclass)) so the sequence is resolved once to its OID, matching how AGE defines its label id defaults (nextval('schema.seq'::regclass)). Applied to both the vertex and edge staging queries, in sql/age_subgraph.sql and the identical body in the age--1.7.0--y.y.y.sql upgrade template so the upgrade-path catalog comparison still matches. Behavior is unchanged; all 38 regression tests pass against PostgreSQL 18. Addresses Copilot review feedback on #2441. Co-authored-by: GitHub Copilot (Claude Opus 4.8) <[email protected]> modified: age--1.7.0--y.y.y.sql modified: sql/age_subgraph.sql * Support relationship-type filters and a minimum hop count (#2442) Support relationship-type filters and a minimum hop count in shortest_path SRFs age_shortest_path / age_all_shortest_paths gain two related capabilities, both following openCypher / Neo4j semantics. Relationship-type filtering: the edge_types argument now accepts an array of types; an edge matches when its label is any one of the requested types. A bare string or a one-element array keeps the single-type behaviour, an empty string/array or NULL means no filter, and an unknown type matches nothing. sp_run_bfs takes an Oid set rather than a single oid, and sp_compute_paths resolves the argument into that set. Minimum hop count: the new min_hops argument is a lower bound on the path length. When it does not exceed the true shortest distance it imposes no constraint, so the normal BFS shortest-path result is returned. When it exceeds the shortest distance, BFS cannot produce a qualifying path, so the search falls back to the variable-length-edge depth-first engine (sp_minhops_fallback), which enumerates edge-distinct paths (relationship-uniqueness / trail semantics) and returns the shortest path(s) whose length is at least min_hops. This regime permits revisiting a vertex and closed walks back to the start, but never reusing an edge. A private memory context bounds the search and a cost guard caps the number of examined paths, raising PROGRAM_LIMIT_EXCEEDED (with a hint to bound the search with a maximum hop count) when the cap is exceeded. The hard regime combined with multiple relationship types is unsupported, because the VLE engine matches a single label; that case raises FEATURE_NOT_SUPPORTED. Regression coverage spans single- and multi-type filters, directed and undirected reachability, multiplicity of equal-length paths, max_hops bounds, NULL and non-existent endpoints, and both min_hops regimes, including a vertex-revisiting longer path (sp_revisit) and a closed-walk cycle back to the start (sp_tri). The in-cypher() Tier 1 call forms are exercised as well. Review feedback addressed: 1. Error messages now report the function actually called. age_shortest_path and age_all_shortest_paths share their argument-resolution helpers, which hard-coded an "age_shortest_path" prefix regardless of the caller; the caller's name is now threaded through so each function reports its own (this also corrects a mislabeled multi-type min_hops error). A new regression case (sp_errname) pins the behaviour for both functions. 2. age_all_shortest_paths now bounds the number of materialized result paths. The shortest-path DAG can contain exponentially many equal-length paths, all built up front before the first row streams; enumeration is capped at SP_MAX_RESULT_PATHS (1,000,000), raising PROGRAM_LIMIT_EXCEEDED with a hint to narrow the search, mirroring the existing min-hops candidate cap. 3. The BFS search state (visited table, frontier queue, predecessor multiset, and intermediate path arrays) now lives in a private scratch memory context that is deleted once the surviving result Datums are built in the SRF context, rather than persisting in multi_call_memory_ctx for the life of the SRF. This bounds peak memory to the result set plus one search and matches the pattern sp_minhops_fallback already used. 4. A second review round tightened memory hygiene and reporting: the pnstrdup'd relationship-type name is freed once resolved to an oid (it was retained for the life of the SRF) in both the array and scalar cases; the invalid-direction error now carries the called function's name like the other argument errors; the min-hops fallback's private context is renamed to a caller-neutral "age shortest path minhops" (it is shared by both SRFs); and the multi-type label-filter comment is corrected to note that an unknown type merely contributes no matches -- known types in the same set still match, and only an all-unknown set leaves just the zero-length path. 41/41 installcheck. Co-authored-by: Copilot <copilot@github.com> modified: regress/expected/age_shortest_path.out modified: regress/sql/age_shortest_path.sql modified: src/backend/utils/adt/age_vle.c * Fix single-node labeled pattern expressions not filtering by label (#2443) (#2444) A single-node labeled pattern used as a boolean expression -- e.g. `WHERE (a:Person)`, `WHERE EXISTS((a:Person))` -- was accepted but did not test the bound vertex's label. It desugars to an EXISTS sub-pattern, and make_path_join_quals() returned early for vertex-only patterns (list_length(entities) < 3), emitting no quals. With no edge to carry a correlation, the sub-pattern referenced nothing from the enclosing query, so the planner produced an uncorrelated one-time InitPlan that was trivially true whenever any vertex of that label existed -- the predicate matched every outer row. Emit an explicit label-id filter for a vertex-only pattern whose vertex carries a non-default label and whose variable is declared in an ancestor parse state (i.e. a correlated reference). make_qual() builds a name-based id reference that resolves to the outer variable, so the filter both correlates the sub-pattern to that variable and enforces the label. Freshly scanned, non-correlated vertices (no ancestor binding) are untouched, so plain MATCH (a:Person) and "does any X exist" EXISTS checks are unaffected. Add regression coverage in pattern_expression: WHERE (a:Person), WHERE NOT (a:Person), and EXISTS((a:Company)) against a graph with a non-Person vertex. All 41 regression tests pass. * Preserve null-valued keys in map literals (#2391) (#2412) * Preserve null-valued keys in map literals (#2391) Map literals such as RETURN {a: null} previously dropped keys whose values were null, producing {} instead of {"a": null}. This diverged from the openCypher / Neo4j semantics where map literals preserve every key the user wrote, including those bound to null. Root cause: cypher_map.keep_null defaulted to false (zero-initialised), so the grammar-produced node fed agtype_build_map_nonull, which strips null entries. Call sites that legitimately need strip-null semantics (CREATE node/edge property maps and SET = assignments) already set keep_null=false explicitly, and the MATCH pattern path sets it to true explicitly. Flipping the grammar default to true therefore only affects the cases that were buggy (bare map expressions and nested map values), and leaves CREATE/SET behaviour unchanged. Two preexisting tests encoded the old buggy output and are updated: expr.out (bare RETURN maps now keep the null value) and agtype.out (a nested map inside an orderability test no longer drops its null entry, shifting one row in the ORDER BY result). Dedicated regression coverage for #2391 is added to regress/sql/expr.sql. * Address review: move 'End of tests' marker and drop order claim - Move 'End of tests' to after the Issue 2391 test block so it marks the actual end of the file. - Remove 'in order' from the mixed-values comment since map key ordering is not guaranteed. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Update accessor EXPLAIN expected output for null-preserving map literals Map literals now build with agtype_build_map (keep_null) instead of agtype_build_map_nonull, so the accessor-optimization EXPLAIN plan in expr.out must reflect the new function name. Also strip a stray trailing CRLF on the last line of expr.sql/expr.out that leaked a carriage return into the regression output. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Add nested-map write coverage and clarify top-level strip wording A reviewer noted the keep_null=true default reaches further than the PR summary stated: CREATE / SET = only override keep_null=false on the top-level property map, not recursively. A nested map value is its own cypher_map node, so it now inherits the new default and preserves its null-valued keys (e.g. CREATE (n {a: {b: null}}) stores a -> {"b": null}). - regress/sql/expr.sql, regress/expected/expr.out: pin the nested case under a write with CREATE (n:Nested {a: {b: null}}), MATCH ... SET n = {a: {b: null}}, and a MATCH verify. - src/backend/parser/cypher_gram.y: clarify the map: rule comment to state the CREATE / SET override is top-level only and nested map values keep the null-preserving default. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Fix Node.js driver CI build broken by @types/node drift (#2452) The Node.js driver CI (npm install -> npm run build -> tsc) failed with parser errors in node_modules/@types/node/ffi.d.ts (TS1139/TS1005/TS1109/ TS1128). package-lock.json is gitignored, so CI resolves dependencies purely from package.json. @types/node was only pulled transitively via a wildcard range (@types/pg and jest depend on @types/node@*), so a fresh install grabbed the latest (26.x). That version uses `const` type parameters (a TypeScript 5.0 feature) in ffi.d.ts, which typescript@4.9 cannot parse. skipLibCheck does not suppress these parser-level errors. The runtime Node version is unrelated: @types/node is resolved from the npm dependency graph, not the Node.js runtime. Fix: - Add a bounded direct devDependency "@types/node": "^20.19.0" so a fresh install constrains the typings to the Node 20 LTS line, which is compatible with typescript@4.9 and keeps the toolchain consistent (eslint 7 / typescript-eslint 4 / TS 4.9 / Node 20 typings). - Pin CI to Node 20 (setup-node@v4, node-version: 20) for reproducibility and to match the pinned typings; replaces the deprecated setup-node@v3 and floating node-version: latest. Verified: a clean, no-lockfile install (matching CI) now resolves @types/node@20.19.43 and tsc builds successfully. Co-authored-by: Copilot <copilot@github.com> modified: .github/workflows/nodejs-driver.yaml modified: drivers/nodejs/package.json * Fix stack overflow and precision loss in toFloatList() conversion (#2451) toFloatList()'s AGTV_FLOAT branch formatted each element with sprintf(buffer, "%f", ...) into a fixed 64-byte stack buffer and then re-parsed the string back into a float. This had two defects: 1. Stack overflow. "%f" prints the full integer part with no width limit, so a large magnitude overflows the 64-byte buffer. The value is query-reachable: RETURN toFloatList([1.0e308]) needs ~317 bytes (309 integer digits + ".000000") and smashes the stack. This is the issue reported in #2410. 2. Precision loss. "%f" emits only 6 fractional digits, so the format-and-reparse round trip was lossy -- toFloatList([0.123456789]) returned 0.123457. The element is already a float8, so the whole format/reparse step is unnecessary. Assign elem->val.float_value directly. This removes the stack buffer entirely (no magic buffer size to justify) and fixes both the overflow and the precision loss at once. Also harden toStringList(): its "%.*g"/"%ld" conversions use bounded formats and were never overflow-prone, but switch them from sprintf to snprintf as defensive depth. Add regression coverage to regress/sql/expr.sql for both the large magnitude case (no overflow) and precision preservation. This reimplements the fix originally proposed by David Christensen in #2410, whose report identified the sprintf overflow. Co-authored-by: David Christensen <david.christensen@snowflake.com> * Support outer references in reduce() fold bodies (#2448) Allow a reduce(acc = init, var IN list | body) fold body to reference loop-invariant values from the enclosing query -- outer-query variables and cypher() parameters -- in addition to the accumulator and element. These were previously rejected with ERRCODE_FEATURE_NOT_SUPPORTED. How it works ------------ The fold body is still compiled to a standalone expression evaluated by age_reduce_transfn, so an outer reference (which cannot be evaluated there) is captured at transform time and supplied as a value: - After the accumulator and element are rewritten to PARAM_EXEC params 0 and 1, transform_cypher_reduce() walks the body and replaces each loop-invariant outer reference -- one that references an outer Var or a cypher() $parameter but not the accumulator/element -- with a new PARAM_EXEC param 2, 3, ... in body order. Capture is at leaf granularity: only the bare outer value is hoisted out of the body, while the operators, function calls and CASE/AND/OR/coalesce branches around it stay in the serialized body. Because a captured value becomes an aggregate argument that the executor evaluates eagerly and unconditionally, hoisting a whole computed subtree (for example "1/z" under a never-taken CASE branch) would defeat the fold's short-circuiting; capturing only the leaf keeps evaluation under the body's own control flow. The one exception is an outer reference that is not itself agtype-typed -- most commonly the graphid inside a graph vertex/edge variable -- whose smallest enclosing agtype-typed subtree is captured whole, since it cannot stand alone as an agtype[] extra. - The captured expressions are passed to the aggregate as a trailing agtype[] argument; age_reduce(agtype, text, agtype, agtype[]) and its transition function gain this argument. - age_reduce_transfn sizes its param array to 2 + the number of captures and binds the captured values to params 2.. on every row. Because the captures are evaluated in the outer query context as ordinary aggregate arguments, a correlated capture is re-evaluated per group, so an outer value that varies per row (for example under UNWIND) is folded with the correct value. Each capture slot is rebound on every row, and the trailing extras argument is read only when the aggregate actually passes it (PG_NARGS), keeping the transition safe under direct age_reduce() SQL calls and an older 4-argument signature. This keeps the no-core-patch design: the body is still a serialized standalone expression, and the only new machinery is the captured-value plumbing. Still rejected -------------- Subqueries in the body (including a nested reduce()) and aggregate functions remain unsupported and raise a clean ERRCODE_FEATURE_NOT_SUPPORTED error: a subquery cannot be planned as a plain aggregate argument, and an aggregate in a per-element fold is undefined per the openCypher specification. Tests ----- age_reduce gains an "Outer references in the fold body" section covering a plain outer variable, an outer variable used as a multiplier, two distinct outer variables, a property of an outer graph variable, the same outer variable referenced more than once, a property of an outer map, a subexpression that mixes an outer reference with the element (only the loop-invariant part is captured), an outer reference inside a CASE branch of the body, a NULL outer value propagating through the fold, multiple captures mixing a NULL and a non-NULL outer value, an outer variable that changes per row (captured per group), and a cypher() parameter supplied via a prepared statement. A "Short-circuit evaluation is preserved for outer references in the body" section verifies that a guarded outer sub-expression is not evaluated on a branch that is not taken: a never-taken CASE THEN branch, a never-taken CASE ELSE branch, an OR and an AND that short-circuit, and a coalesce -- each of which would divide by zero if the outer "1/w" were hoisted into an eagerly evaluated aggregate argument -- plus a guarded branch that is taken and evaluates its outer division normally. The previously-rejected outer-variable case is moved out of the not-supported section, which now covers a nested reduce() (any subquery in the body is unsupported) and an aggregate in the body. The same change also broadens the base reduce() coverage with value-type folds (a float accumulator, negative numbers, a map accumulator passed through unchanged, and list elements indexed in the body), function calls in the fold body (a scalar function over the element and the list itself produced by a function), reduce() composed with surrounding expressions (consumed by another function and used in a comparison), and syntax-error checks for each required piece of the form -- the "= init", ", var IN list", and "| body" clauses, plus a rejected qualified iterator variable. 42/42 installcheck pass. Co-authored-by: Copilot <copilot@github.com> modified: age--1.7.0--y.y.y.sql modified: regress/expected/age_reduce.out modified: regress/sql/age_reduce.sql modified: sql/age_aggregate.sql modified: src/backend/parser/cypher_clause.c modified: src/backend/utils/adt/agtype.c * Fix segfault and out-of-bounds reads in file loaders on malformed rows (#2453) * Fix segfault and out-of-bounds reads in file loaders on malformed rows load_edges_from_file() and load_labels_from_file() build their COPY parser with only format=csv and header=false, so COPY uses its default comma delimiter. A file delimited by anything else (or a malformed row) then parses with an unexpected column count, and the loaders indexed the parsed fields without validating that count: - process_edge_row() reads the four fixed fields fields[0..3] unconditionally. A non-comma-delimited edge file parses as a single column, so fields[1..3] are out of bounds -> segfault (issue #2449). - create_agtype_from_list()/_i() pair header[i] with fields[i] for all i < nfields, so a row with more fields than the header reads header[i] out of bounds. Add bounds validation that turns these into clear errors: - Edge header must have >= 4 columns; a smaller count almost always means the wrong delimiter, so the error carries a hint. - Each edge row must have >= 4 columns and no more than the header's. - Each label row must have no more than the header's column count. Rows with fewer trailing columns than the header remain allowed, matching existing behavior (exercised by the existing conversion tests). This closes the segfault and out-of-bounds reads. The silent mis-parsing of a non-comma file whose header and rows share the same (wrong) column count is not detectable here; adding a delimiter option to the load functions is a separate follow-up. Adds a regression test in age_load using a pipe-delimited edge file. Addresses #2449. * loader guards: clarify error wording and add per-row regression coverage Address review feedback on the nfields guards: - Error messages now say "the header's %d columns" (was "the header's %d"), making the count's unit explicit. - Add regression cases exercising the per-row guards, which previously only had coverage for the mis-delimited-header path: * an edge row with fewer than 4 columns * an edge row with more columns than the header * a label row with more columns than the header Each asserts a clean ERROR (these were the out-of-bounds reads the guards now catch). * ci: pin Build/Regression runner to ubuntu-24.04 and guard Bison version for GLR grammar (#2445) * ci: pin runner to ubuntu-24.04 + guard Bison version for GLR grammar The Cypher GLR grammar pins exact shift/reduce and reduce/reduce conflict counts via %expect / %expect-rr in cypher_gram.y, and Bison treats %expect as exact-match: a different Bison version can report different counts and break the build. ubuntu-latest floats to new Ubuntu releases (and new Bison versions), which would silently shift those counts. Pin runs-on to ubuntu-24.04 to freeze Bison at 3.8.x, and add a guard step that fails loudly with a pointer to cypher_gram.y if Bison ever drifts off 3.8.x. Reproducibility comes from pinning the variable rather than widening the conflict-count tolerance, keeping the exact-match alarm for genuinely new grammar conflicts intact. * ci: make Bison version parse robust (awk + explicit empty guard) Address Copilot review on #2445: the previous grep-based parse could silently yield an empty version and fail with a confusing 'Bison != 3.8.x' message. Parse the version field with awk and error explicitly when it can't be determined. Resolved conflict: .github/workflows/installcheck.yaml * Add automatic header-dependency tracking to the Makefile (#2454) AGE lists OBJS explicitly and relies on PGXS, whose built-in dependency tracking only runs when the server was built with --enable-depend (often disabled). Consequently, editing a header did not recompile the .c files that include it, leaving stale .o files. This is especially dangerous for node/struct headers: a stale ag_nodes.o keeps an outdated node_size, so _readExtensibleNode under-allocates and readNode corrupts the heap ("unrecognized node type"). Emit a .d file beside each object via -MMD -MP and -include them, deriving DEPFILES from OBJS. The mechanism is self-contained (independent of --enable-depend): -MMD skips system headers and -MP tolerates deleted headers. On servers built with --enable-depend, PGXS appends its own -MF after CFLAGS (last -MF wins), so this degrades cleanly to PGXS's tracking. Add DEPFILES to EXTRA_CLEAN and *.d to .gitignore. Co-authored-by: Copilot <copilot@github.com> modified: .gitignore modified: Makefile --------- Co-authored-by: serdarmumcu <serdar.mumcu@udemy.com> Co-authored-by: Greg Felice <gregfelice@gmail.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Prashant Chinnam <5108573+crprashant@users.noreply.github.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Co-authored-by: David Christensen <david.christensen@snowflake.com>

The vertex/edge staging copies in create_subgraph() generated new graphids with nextval(%L), which binds the sequence as a string literal and invokes the nextval(text) overload. That re-resolves the schema-qualified sequence name on each call. Cast the literal to regclass (nextval(%L::regclass)) so the sequence is resolved once to its OID, matching how AGE defines its label id defaults (nextval('schema.seq'::regclass)). Applied to both the vertex and edge staging queries, in sql/age_subgraph.sql and the identical body in the age--1.7.0--y.y.y.sql upgrade template so the upgrade-path catalog comparison still matches. Behavior is unchanged; all 38 regression tests pass against PostgreSQL 18. Addresses Copilot review feedback on #2441. Co-authored-by: GitHub Copilot (Claude Opus 4.8) <[email protected]> modified: age--1.7.0--y.y.y.sql modified: sql/age_subgraph.sql

jrgemignani requested review from MuhammadTahaNaveed, Copilot and gregfelice June 22, 2026 23:18

jrgemignani added priority 1.8.0 labels Jun 22, 2026

Copilot started reviewing on behalf of jrgemignani June 22, 2026 23:19 View session

Copilot AI reviewed Jun 22, 2026

View reviewed changes

jrgemignani requested a review from muhammadshoaib June 24, 2026 21:11

gregfelice approved these changes Jun 26, 2026

View reviewed changes

gregfelice merged commit ce073ef into apache:master Jun 29, 2026
6 checks passed

gregfelice mentioned this pull request Jul 2, 2026

Support outer references in reduce() fold bodies #2448

Merged

jrgemignani mentioned this pull request Jul 2, 2026

Master to pg19 #2456

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

resolve subgraph staging sequences via regclass#2446

resolve subgraph staging sequences via regclass#2446
gregfelice merged 1 commit into
apache:masterfrom
jrgemignani:subgraph_addendum

jrgemignani commented Jun 22, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

gregfelice left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

jrgemignani commented Jun 22, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

gregfelice left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants