Skip to content

Xmp toolkit as a submodule#3501

Open
serghov wants to merge 20 commits intoExiv2:mainfrom
serghov:xmp-toolkit-v2025.03
Open

Xmp toolkit as a submodule#3501
serghov wants to merge 20 commits intoExiv2:mainfrom
serghov:xmp-toolkit-v2025.03

Conversation

@serghov
Copy link
Contributor

@serghov serghov commented Feb 18, 2026

This PR replaces the old vendored copy of the XMP SDK with the official https://github.com/adobe/XMP-Toolkit-SDK as a git submodule.
This brings the XMP implementation from a forked "XMP Core 4.4.0-Exiv2" to the upstream "XMP Core 6.0.0".

There are breaking changes

The LangAlt encoding path has been substantially rewritten to handle three separate SDK behavioral differences:

  • Alias resolution: XMP-Toolkit-SDK resolves standard aliases (e.g. tiff:ImageDescription -> dc:description[?xml:lang="x-default"]) to a specific array item. The workaround detects this with a try/catch and falls back to SetProperty for the x-default value only, silently dropping other language variants for aliased properties.
  • Duplicate lang handling: The old code blindly incremented an index counter when appending array items, which could create duplicate entries for the same language. The new code checks for existing items by comparing xml:lang qualifiers before appending. (the new xmp toolkit complains if we don't do this!)
  • NormalizeLangArray workaround: The SDK's NormalizeLangArray overwrites the second item's value with x-default's value when a LangAlt array has exactly 2 items. The workaround pre-emptively sets x-default to match the language-specific entry, making the normalization a no-op. This is fragile; it depends on SDK internals and may break if the normalization logic changes.

The toolkit now generates slightly different outputs, so some test cases had to be modified, but most of them are benign, like xmp sdk encoding it's own version in the generated xml.

@serghov serghov force-pushed the xmp-toolkit-v2025.03 branch from c8af03d to c54c6c4 Compare February 18, 2026 15:15
@kevinbackhouse
Copy link
Collaborator

Thanks for working on this!

@serghov serghov force-pushed the xmp-toolkit-v2025.03 branch 2 times, most recently from 7588bd4 to b101fff Compare February 19, 2026 09:27
@serghov serghov force-pushed the xmp-toolkit-v2025.03 branch 2 times, most recently from 2e731eb to b6a052d Compare February 19, 2026 10:01
Update xmpsdk/meson.build to use the new submodule source paths,
add required platform-specific compile definitions (UNIX_ENV,
MAC_ENV, WIN_ENV, BanAllEntityUsage, EXV_ADOBE_XMPSDK), and
propagate xmp_args to the library build in src/meson.build.
SXMPMeta::DeleteNamespace has never been implemented in the
upstream XMP-Toolkit-SDK (throws kXMPErr_Unimplemented). The old
in-source copy had a custom implementation that was added by the
exiv2 project. Log a warning when a namespace re-registration
with a different prefix is attempted.
      XMP-Toolkit-SDK's NormalizeLangArray (called during serialization)
      overwrites the second item's value with x-default's when a LangAlt
      array has exactly 2 items. The old in-source SDK had this behavior
      disabled since 2007 (commented out by Andreas Huggel as unexpected
   The new XMP-Toolkit-SDK correctly skips timezone conversion for
   date-only values (no time component). The old SDK unconditionally
   applied the local timezone offset, making the test result depend
   on the machine's timezone (e.g. 08:00:00 on UTC+8).
Instead of hardcoding "XMP Core 4.4.0-Exiv2" in test expectations
(which breaks on every SDK upgrade), detect the version dynamically
by writing minimal XMP to a temp file and extracting the x:xmptk
attribute.  The version is exposed as $xmp_toolkit_version for use
in test string templates via the existing CaseMeta variable expansion.
Replace all hardcoded "XMP Core 4.4.0-Exiv2" strings with the
$xmp_toolkit_version variable detected at runtime (see previous commit).

Affects 11 test cases across 6 files:
  - test_pr1475_HEIC (1 test): x:xmptk in XMP packet output
  - test_pr1475_HIF  (2 tests): same
  - test_pr_2000     (4 text tests): same
  - test_issue_1112: XMP sidecar output
  - test_issue_1229: XMP sidecar output (refactored local variable
    to class attribute so CaseMeta expansion can reach it)
  - test_issue_799:  XMP packet output

For test_pr_2000.TestVerboseExtractRawMetadataToStdout (binary test):
instead of comparing against a static reference file that embeds the
version string, generate the reference at test time by extracting to
a temp file, then verify that stdout extraction produces identical
bytes.  This makes the test fully version-independent.
XMP-Toolkit-SDK 6.0.0 resolves property aliases differently than
the old in-source 4.4.0 build:

  tiff:Artist         → dc:creator          (was kept as separate property)
  tiff:Copyright      → dc:rights           (was kept as separate property)
  tiff:ImageDescription → dc:description    (was kept as separate property)
  tiff:Software       → xmp:CreatorTool     (new alias direction)
  exif:DateTimeDigitized → xmp:CreateDate   (no longer emitted separately)

When the canonical form (dc:*, xmp:*) is already present in the file,
the aliased tiff:/exif: form is no longer duplicated in the output.
The underlying metadata is unchanged — only the property key shown
by "exiv2 -px" / "exiv2 -Pkycvt" differs.

Bugfix tests updated:
  - test_issue_540: removed 4 tiff alias properties, added CreatorTool
  - test_issue_937: removed Xmp.exif.DateTimeDigitized from -g Date output

Regression reference files regenerated (same root cause):
  - ReaganLargePng.png.out, ReaganSmallPng.png.out (DateTimeDigitized)
  - exiv2-bug1225.exv.out (DateTimeDigitized + Artist/CreateDate)
  - exiv2-bug540.jpg.out (Artist/Copyright/ImageDescription/Software)
  - exiv2-bug884c.jpg.out (DateTimeDigitized)
  - exiv2-bug937.jpg.out (DateTimeDigitized)
  - issue_ghsa_5p8g_9xf3_gfrr_poc.{exv,webp}.out (Software→CreatorTool)
XMP-Toolkit-SDK 6.0.0 registers the IPTC Extensions namespace
(http://iptc.org/std/Iptc4xmpExt/2008-02-29/) with prefix
"Iptc4xmpExt" instead of "iptcExt" used by the old in-source build.

This only affects the XML prefix in serialized XMP packets (the -pX
output).  The exiv2 internal property names (Xmp.iptcExt.*) shown by
-pa are unchanged since those come from exiv2's own namespace registry.

Also switches x:xmptk to $xmp_toolkit_version (same as previous commit).
XMP-Toolkit-SDK 6.0.0 does not support SXMPMeta::DeleteNamespace, so
re-registering a prefix with a different URI cannot reuse the original
prefix.  Instead, the SDK assigns a new prefix (e.g. "imageapp_1_").

Old behavior (SDK 4.4.0, DeleteNamespace worked):
  1. reg imageapp=orig/  → prefix "imageapp" maps to orig/
  2. reg imageapp=dest/  → deletes orig/, re-registers as dest/
  3. serialize           → still wrote orig/ (property bound to orig/)
  4. reload              → reversed dest/→orig/ in internal registry

New behavior (SDK 6.0.0, DeleteNamespace unimplemented):
  1. reg imageapp=orig/  → prefix "imageapp" maps to orig/
  2. reg imageapp=dest/  → SDK assigns "imageapp_1_" for dest/
  3. serialize           → writes imageapp_1_=dest/
  4. reload              → no conflict (fresh process), no warning

Changes:
  - xmp_packets[1]: xmlns:imageapp_1_="dest/" (was imageapp="orig/")
  - stderr[3]: empty (was "Updating namespace URI" warning)
  - Also switches x:xmptk to $xmp_toolkit_version
XMP-Toolkit-SDK 6.0.0 handles edge-case XMP differently:

Stricter — issue_1901 poc4.xmp:
  The file has malformed namespace URIs (xmlns:x=" ").  The old SDK
  was lenient enough to parse them; the new SDK rejects the XMP
  entirely, so exiv2 now reports "No Exif data found" (exit 253)
  instead of silently succeeding (exit 0).

Stricter — issue_94_poc3.pgf:
  Fuzz-PoC with garbled XMP.  Old SDK produced 9 lines of mangled
  Xmp.rdf.* properties; new SDK rejects the malformed RDF and
  produces only the 2 non-XMP Exif lines.  Arguably more correct.

More lenient — ReaganLargeJpg.jpg:
  Old SDK threw error 203 "Duplicate property or field node" and
  produced no XMP output (62 lines total).  New SDK parses the
  duplicate properties successfully, exposing 72 additional XMP
  properties (134 lines total).
XMP-Toolkit-SDK registers standard aliases like tiff:ImageDescription →
dc:description[?xml:lang="x-default"] that resolve to a specific array
item rather than the array itself.  When XmpParser::encode() tried to
call CountArrayItems or AppendArrayItem on such aliases, the SDK's
internal path resolution found a simple value node (the x-default item)
instead of an array node, causing "XMP Toolkit error 102: The named
property is not an array" and crashing programs like xmpsample.

Fix by detecting this situation with a try-catch around CountArrayItems.
When the alias-to-array-item case is detected, fall back to SetProperty
which correctly handles alias resolution for the x-default value.  Other
language variants are not representable through a single-value alias and
belong on the canonical property (e.g. dc:description).
The new XMP-Toolkit-SDK resolves tiff:Software as an alias for
xmp:CreatorTool.  When Exif.Image.Software is converted to XMP, it now
appears as Xmp.xmp.CreatorTool instead of Xmp.tiff.Software.  The
reverse conversion (XMP → Exif) no longer produces Exif.Image.Software
because CreatorTool is not mapped back to the tiff namespace.

Affects testcases 10 and 11 in the conversions bash test.
The XMP toolkit version string changed from "XMP Core 4.4.0-Exiv2" (20
chars) to "XMP Core 6.0.0" (14 chars), making every serialized XMP
packet 6 bytes smaller.  This shifts JPEG APP1 segment sizes and offsets
in the stdin test, and RIFF container / XMP chunk sizes in the webp
test.  No semantic change to the metadata content.
Three types of changes in the xmpparser bash test reference:

1. Version string: "XMP Core 4.4.0-Exiv2" → "XMP Core 6.0.0" in all
   serialized XMP packets (4 occurrences).

2. Namespace prefix modernization: the new SDK re-serializes the
   BlueSquare.xmp file using modern prefixes (xap → xmp, xapMM →
   xmpMM, xapRights → xmpRights), which changes the diff output
   between the original and round-tripped file.

3. xmpsample alias resolution: tiff:ImageDescription is now resolved
   through its alias to dc:description[?xml:lang="x-default"].  The
   tiff:ImageDescription property no longer appears as a separate
   element in the serialized XMP, and dc:description's x-default value
   reflects the alias write.
Fresh clones without --recurse-submodules would fail with cryptic
source-file-not-found errors.  Add an explicit check for the SDK's
presence in both CMake and Meson, with a clear error message telling
the user to run git submodule update --init --recursive.
@serghov serghov force-pushed the xmp-toolkit-v2025.03 branch from b6a052d to db4d811 Compare February 19, 2026 10:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Comments