epub: free libxml2 strings with xmlFree instead of g_free#720
Open
maxheise wants to merge 1 commit into
Open
Conversation
backend/epub/epub-document.c reads many strings out of the parsed container.xml and OPF package document through the helper xml_get_data_from_node(), which returns an xmlChar* allocated by libxml2 (it calls xmlGetProp or xmlNodeListGetString internally). A string obtained from libxml2 must be released with xmlFree, because libxml2 allocates it through its own allocator (xmlMalloc). The code released most of these strings with GLib's g_free, and in a few places stored a libxml2 string into a struct field that is later freed with g_free, or never freed it at all. Mixing the two allocators is wrong: it only happens to work while libxml2 and GLib share the system malloc, and breaks if libxml2 is built with a custom allocator (xmlMemSetup) or where the two libraries do not share one heap. A few sites also leaked the libxml2 string outright. This change makes every libxml2 string in the file use the matching allocator, using one of three shapes depending on who owns the string: 1. Where the code owns the free site and the freed pointer is the libxml2 string itself, switch g_free to xmlFree. This covers the relativepath in get_uri_to_content() and setup_document_content_list() (the OPF full-path and item href), the newnode->key error paths and free_tree_nodes() (the spine idref), free_link_nodes() linktext, the href in get_child_list(), the class attribute in change_to_night_sheet(), the tocfilename in setup_document_index() (returned by get_toc_file_name()/epub_document_get_nav_file()), the stylesheet name in epub_document_check_add_night_sheet() (returned by epub_document_get_alternate_stylesheet()), and mathml plus href in epub_document_add_mathJax(). The companion g_free calls in those functions that release g_filename_to_uri / g_strdup / g_strdup_printf results (for example dataptr->value, dataptr->pagelink, the filepath and filename locals) are GLib strings and are left as g_free. 2. Where a libxml2 string was copied and then leaked, free it. The "toc" attribute ncx in get_toc_file_name() is used only to look up the manifest item and is now xmlFree'd after that lookup. The "version" attribute in epub_document_get_info() is copied by g_string_new() and is now captured in a local, passed to g_string_new(), then xmlFree'd. 3. Where a libxml2 string is stored into a struct whose destructor frees it with g_free and cannot be changed, store a GLib copy and free the libxml2 original. This applies to the EvDocumentInfo fields title, author, subject and creator (ev_document_info_free() frees them with g_free), and to EpubDocument.docTitle (freed with g_free in the finalize handler), which is assigned both in setup_document_index() and through epub_document_set_document_title(). Each of these now does epubinfo->field = g_strdup((const char*)s) followed by xmlFree(s). The NULL / "unknown" fallback branches were already g_strdup'd and are unchanged. docTitle keeps its existing g_free at finalize, which is now correct because it always holds a g_strdup'd copy. g_strdup(NULL) returns NULL and xmlFree(NULL) is a no-op, so the empty-metadata cases behave exactly as before; the docTitle while-loop in setup_document_index() also keeps its original termination behaviour. For valid documents the visible result is unchanged; only the allocator used to free each string is corrected, and the previously leaked toc and version strings are now released.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Hello,
backend/epub/epub-document.c reads many strings out of the parsed
container.xml and OPF document through the helper xml_get_data_from_node(),
which returns an xmlChar* allocated by libxml2 (internally xmlGetProp or
xmlNodeListGetString). A libxml2 string should be released with xmlFree,
since libxml2 allocates it through its own allocator. Most of these
results were released with GLib's g_free instead; a few were stored into
a struct that frees them with g_free, and two were leaked. Mixing the
allocators happens to work while libxml2 and GLib share the system
malloc, but it is not guaranteed (it breaks with a custom libxml2
allocator, or where the two libraries do not share one heap). The file
already xmlFree's some of these results in a couple of places, so the
handling is currently inconsistent.
The change makes every libxml2 string use the matching allocator, in one
of three shapes depending on who owns it:
string, g_free becomes xmlFree (relativepath in get_uri_to_content and
setup_document_content_list, the newnode->key error paths and
free_tree_nodes, linktext in free_link_nodes, the href in
get_child_list, class in change_to_night_sheet, tocfilename in
setup_document_index, the stylesheet name in
epub_document_check_add_night_sheet, and mathml/href in
epub_document_add_mathJax). The neighbouring g_free calls that release
g_filename_to_uri / g_strdup results are left as-is.
the toc attribute in get_toc_file_name, and the version attribute in
epub_document_get_info (copied by g_string_new).
g_free and cannot be changed, it is copied with g_strdup and the
original xmlFree'd: the EvDocumentInfo title/author/subject/creator
fields (freed by ev_document_info_free with g_free) and docTitle
(freed with g_free at finalize). The NULL / "unknown" fallback
branches already used g_strdup and are unchanged.
For valid documents the visible result is unchanged; only the allocator
used for each string is corrected, and the two previously leaked strings
are now released.
Thanks for taking a look.
Best regards
Max