Fix: FDW OPTIONS encoding accepts symbolic names (issue #1726)#1727
Open
talmacschen-arch wants to merge 3 commits into
Open
Fix: FDW OPTIONS encoding accepts symbolic names (issue #1726)#1727talmacschen-arch wants to merge 3 commits into
talmacschen-arch wants to merge 3 commits into
Conversation
Both the FDW catalog reader (src/backend/access/external/external.c)
and the gp_exttable_fdw option validator
(gpcontrib/gp_exttable_fdw/option.c) parsed the "encoding" OPTIONS value
with atoi(). atoi("UTF8") returns 0 (PG_SQL_ASCII) and PG_VALID_ENCODING(0)
is true, so symbolic names like 'UTF8', 'utf-8', 'GBK' silently fell through
validation and were stored as SQL_ASCII at read time. By contrast, the
legacy CREATE EXTERNAL TABLE ... ENCODING ... path resolves names via
pg_char_to_encoding() and persists a numeric form into OPTIONS — only the
FDW OPTIONS entry point bypassed that translation.
Add a small shared helper parse_fdw_encoding_option(const char *) in
src/backend/access/external/external.c (declared in
src/include/access/external.h):
- first try pg_char_to_encoding(name) — same logic as the legacy path;
- otherwise try a strict numeric form via strtol() with end-of-string
and PG_VALID_ENCODING() checks (atoi is intentionally avoided, since
atoi("UTF8")==0 is the bug being fixed);
- otherwise ereport(ERROR).
Both the validator and GetExtFromForeignTableOptions() call this helper.
On-disk values in pg_foreign_table.ftoptions are stored verbatim as the
user wrote them; correctness is established at read time. This avoids a
ProcessUtility_hook approach, which is unworkable here because the
extension's _PG_init runs lazily on the first dlopen, after the current
statement's hook check has already passed.
Affected scope: gp_exttable_fdw (used by gp_exttable_server). The
standalone pxf_fdw is unaffected — its validator already routes encoding
through ProcessCopyOptions, which is name-aware.
Behavior change on upgrade: existing rows whose ftoptions literally contain
encoding=<name> have, until now, been silently interpreted as SQL_ASCII.
After this fix they are interpreted as the named encoding. This will be
called out in the release notes; a detection query is provided in the PR
description for operators who wish to pin specific tables to numeric form
before upgrade.
Tests added in gpcontrib/gp_exttable_fdw/{input,output}/gp_exttable_fdw.source
cover encoding '6' / 'UTF8' / 'utf-8' / 'GBK' / 'bogus' and an
ALTER FOREIGN TABLE ... OPTIONS (SET encoding 'UTF8') path. The pre-existing
encoding '-1' error case has its expected error message updated to match
the new helper's wording.
There was a problem hiding this comment.
Hi, @talmacschen-arch welcome!🎊 Thanks for taking the effort to make our project better! 🙌 Keep making such awesome contributions!
The new tests added in the previous commit had column header lines
without the trailing-space padding that psql's aligned output emits
to match the separator. The pre-existing ext_special_uri header
(' a | b') was also unintentionally stripped of its trailing space
during the same edit.
Pure whitespace fix. No behavior change.
pg_regress diffs the expected and actual .out files strictly, including
the final newline count. The new encoding test block ended with a
stray empty line (";\n\n") while psql produces ";\n", causing a 1-line
diff at end-of-file. Pure whitespace fix.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #1726
What does this PR do?
Make the
gp_exttable_fdwFDW accept both symbolic encoding names (UTF8,utf-8,GBK, …) and numeric encoding IDs in theencodingOPTIONS entry, mirroring what the legacyCREATE EXTERNAL TABLE … ENCODING …path has always accepted, and what the rest of PostgreSQL accepts everywhere else (client_encoding,pg_dump,pg_conversion,CREATE DATABASE, …).Before this PR, both the FDW catalog reader (
src/backend/access/external/external.c:280) and the option validator (gpcontrib/gp_exttable_fdw/option.c:139,142) parsed the value withatoi().atoi("UTF8")silently returns0(SQL_ASCII) andPG_VALID_ENCODING(0)is true, so:encoding '6'encoding 'UTF8'encoding 'utf-8'encoding 'GBK'encoding 'bogus'Affected scope is
gp_exttable_fdw(used bygp_exttable_server). The standalonepxf_fdwis unaffected — its validator routesencodingthroughProcessCopyOptions, which is already name-aware.Implementation
A small shared helper
parse_fdw_encoding_option(const char *)is added insrc/backend/access/external/external.c(declared insrc/include/access/external.h):pg_char_to_encoding(value)— the same translation the legacyCREATE EXTERNAL TABLEpath uses.strtol+ end-of-string +PG_VALID_ENCODINGchecks).atoiis intentionally avoided:atoi("UTF8")silently mistranslates and that is the bug being fixed.ereport(ERROR).Both the FDW validator (
gp_exttable_permission_check) and the read path (GetExtFromForeignTableOptions) call this helper. No on-disk format change: values are stored verbatim inpg_foreign_table.ftoptionsexactly as the user wrote them, and resolved at read time. This keepspg_dumpoutput round-trip-safe and avoids silently rewriting catalog rows during upgrade.The bypass on the runtime PXF error message ("
Define the external table with ENCODING UTF8...") lives in the separateapache/cloudberry-pxf-serverrepository and will be addressed in a follow-up PR there.Type of Change
Test Plan
Added cases to
gpcontrib/gp_exttable_fdw/input/gp_exttable_fdw.source:encoding '6'→ resolves to UTF8 (control)encoding 'UTF8'/encoding 'utf-8'→ resolves to UTF8 (the fix)encoding 'GBK'→ resolves to GBKencoding 'bogus'→ ereportsALTER FOREIGN TABLE … OPTIONS (SET encoding 'UTF8')on a table thatwas originally created with
encoding '0'→ reads as UTF8 after the ALTERThe pre-existing
encoding '-1'ereport test was kept; only the error message text was updated to the new"\"-1\" is not a valid encoding name or code"form.gpcontrib/gp_exttable_fdw/)make installcheck(to be run by reviewer / CI)make -C src/test installcheck-cbdb-parallel(to be run)Impact
User-facing changes (release-note worthy⚠️ ):
Foreign tables that were created before this fix with a non-numeric
encodingoption (e.g.encoding 'UTF8') had the literal name silently stored inpg_foreign_table.ftoptions. With the oldatoi()reader they took effect asSQL_ASCII. After this fix, the same on-disk row is interpreted as the named encoding (e.g.UTF8) — i.e. the encoding actually requested by the user.For most affected users this is the desired correction (data was already being silently mishandled). Operators who want to identify which existing tables are about to flip semantics can run:
If a table appears that should not flip, the operator can pin it by re-issuing
ALTER FOREIGN TABLE … OPTIONS (SET encoding '<numeric>')before upgrade.Performance: none. Two extra string compares + one syscache-free name lookup on each foreign-table read; same for validation.
Dependencies: none.
pg_char_to_encoding()is already declared insrc/include/mb/pg_wchar.h:567and used by the legacy path.Checklist
Additional Context
The design discussion in the issue (
#1726) considered a write-time normalization variant viaProcessUtility_hook, where the option value would be rewritten to its numeric form before persisting intoftoptions. That approach was abandoned because the hook is only installed by the extension's_PG_init, which does not run until the first call into the extension's shared library — and on the very firstCREATE FOREIGN TABLEof a fresh backend, the hook check has already been passed by the time the validator triggersdlopen. Adopting it would require listinggp_exttable_fdwinsession_preload_libraries, which changes deployment expectations. The chosen read-side resolution works without any preload requirement.