Skip to content

Conversation

@johnathan-becker
Copy link
Contributor

This change updates our libxml2 integration to correctly handle node adoption and manual linking when building against libxml2 2.12.0 or newer. In 2.12.x, libxml2 rewrote significant portions of its tree-manipulation code, including adding internal normalization during both xmlSetTreeDoc and manual node linking. Previously, manually linking child nodes avoided automatic text‑node merging, but in 2.12.x the library now performs merging internally even when bypassing xmlSetTreeDoc. To prevent unwanted normalization, this update adds a manual doc‑update routine for subtree nodes that sets the doc pointer without triggering merges, ensures every text node receives a unique name pointer so libxml2 will not merge them based on pointer equality, updates attributes and namespace nodes safely, and applies these updates before assigning the parent to avoid libxml2 normalization during insertion. For older libxml2 versions we continue to use xmlDOMWrapAdoptNode or xmlSetTreeDoc as before. This fixes text‑node coalescing introduced by libxml2’s internal changes and restores the expected behavior for manually linked XML nodes.

@johnathan-becker johnathan-becker requested a review from rfm as a code owner January 15, 2026 18:55
@johnathan-becker
Copy link
Contributor Author

I will be adding tests for this ticket in the next few hours sorry.

@johnathan-becker
Copy link
Contributor Author

tests have been pushed

@rfm
Copy link
Contributor

rfm commented Jan 17, 2026

I'm not familiar with libxml2, but the explanation of the change here sounds reasonable, so this looks good in principle.
Unfortunately testcases are failing here on 4 of the 7 test platforms, so that needs to be addressed.

@johnathan-becker
Copy link
Contributor Author

I'm not familiar with libxml2, but the explanation of the change here sounds reasonable, so this looks good in principle. Unfortunately testcases are failing here on 4 of the 7 test platforms, so that needs to be addressed.

Ah ok I will work on this thank you.

@johnathan-becker
Copy link
Contributor Author

Ok @rfm I have tried this on different linux platforms and I believe this fixes the issue. I can't seem to be able to run the CI so can you pleas run it for me?

@rfm
Copy link
Contributor

rfm commented Jan 20, 2026

Looks like it's fine on linux but failing on windows :-(

@johnathan-becker
Copy link
Contributor Author

Ah ok. Thank you so much for trying I need to setup a Windows environment to mirror your CI environment. I will start on that now. I really appreciate you being patient there is actually several bugs in newer libxml2 versions so I guess I'm not suprised.

@rfm
Copy link
Contributor

rfm commented Jan 20, 2026

Thanks for working on this. It's good to see someone knows what they are doing with libxml2 :-)

@johnathan-becker
Copy link
Contributor Author

Thanks for working on this. It's good to see someone knows what they are doing with libxml2 :-)

Of course I don't mind! I believe I have fixed this now I tested all Windows platforms except MSVC because I don't have any of that setup. I will verify all the linux builds tomorrow morning before we run the checks again. Turns out Windows is on an even later version of Libxml2 and it has problems with namespace prefixes. Kind of amazing how many bugs different versions of libxml have.

@rfm
Copy link
Contributor

rfm commented Jan 21, 2026

Looking at the logs, I think what's still wrong with the testcases still is the use of the non-portable exception handler code '@Try {...} @catch {...}'
The correct (portable) way to write an exception handler is
NS_DURING ... NS_HANDLER ... NS_ENDHANDLER
where the macros expand to exception handler code appropriate to the platform on which the code is built.
So this should be easy to fix.

@johnathan-becker
Copy link
Contributor Author

Ah ok I have made that change. I did not realize that was the difference between these handlers. Thank you for that.

@rfm
Copy link
Contributor

rfm commented Jan 21, 2026

FYI NeXTstep exception handling used those macros to wrap setjmp() calls for exception handling. Newer compilers added support for exception handling as an optional extension to the language itself. But if you are using an older compiler, or a compiler where the appropriate flag to turn on the language extensions has not been used, then you need the older mechanism.
The error logs looked to me like the second of those issues. Possibly there is some issue with the configuration of gnustep-make so it's not using the correct compiler flags, but even if that is the case it would still make sense to use the more portable macros so that the tests should run on as many platforms as possible.

@johnathan-becker
Copy link
Contributor Author

FYI NeXTstep exception handling used those macros to wrap setjmp() calls for exception handling. Newer compilers added support for exception handling as an optional extension to the language itself. But if you are using an older compiler, or a compiler where the appropriate flag to turn on the language extensions has not been used, then you need the older mechanism. The error logs looked to me like the second of those issues. Possibly there is some issue with the configuration of gnustep-make so it's not using the correct compiler flags, but even if that is the case it would still make sense to use the more portable macros so that the tests should run on as many platforms as possible.

Ah ok thank you for clarifying that it makes sense. I believe all the checks have passed now thank you so much @rfm.

Copy link
Contributor

@rfm rfm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All looks good now, thanks.

@rfm rfm merged commit 348b85c into gnustep:master Jan 22, 2026
8 checks passed
@rfm
Copy link
Contributor

rfm commented Jan 22, 2026

One note ... the testcases show that the code contains some memory leaks; bugs in libxml2 I expect.
This is on debian with the package libxml2-dev/stable,now 2.12.7+dfsg+really2.9.14-2.1+deb13u2 amd64

If you define the environment variable GNUSTEP_WITH_ASAN=1 before building the base library and running the tests, you will see that description.m and xpath.m fail (due to leaked memory), and the log file reports that the memory leaked was allocated by xmlStrndup and xmlStrdup

I'm not sure how important it is, and tracking down the leaks would likely be very time consuming (perhaps compiling libxml2 with address sanitizer would help), but I thought you would want to know.

@johnathan-becker
Copy link
Contributor Author

One note ... the testcases show that the code contains some memory leaks; bugs in libxml2 I expect. This is on debian with the package libxml2-dev/stable,now 2.12.7+dfsg+really2.9.14-2.1+deb13u2 amd64

If you define the environment variable GNUSTEP_WITH_ASAN=1 before building the base library and running the tests, you will see that description.m and xpath.m fail (due to leaked memory), and the log file reports that the memory leaked was allocated by xmlStrndup and xmlStrdup

I'm not sure how important it is, and tracking down the leaks would likely be very time consuming (perhaps compiling libxml2 with address sanitizer would help), but I thought you would want to know.

Ok @rfm I am probably going to look into this still but it may be tomorrow. Thank you for merging for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

2 participants