-
Notifications
You must be signed in to change notification settings - Fork 262
Description
There are currently three kinds of strings interfacing llvm_sys: CStr, str and LLVMString. On the LLVM C++ side, the first two corresponds to StringRef, and the last one to strdup (which is always null terminated and hence we will ignore it here). From its documentation,
StringRef - Represent a constant reference to a string, i.e. a character array and a length, which need not be null terminated.
The issue is there seems to be a lack of consistency in inkwell (and more so in LLVM, with its sprawling amount of API over the years) regarding nul termination. For example, in ContextImpl::create_string_attribute the key and val parameters are taken as &str (as is) and fed into the API, while for Attribute::get_string_value it is assumed the returned one is a c-style string, and hence from_ptr, which would result in an UB since there is no guarantee there would be a following nul byte, and even if there is, it is not in the same memory allocation!
Another example is Module::add_function which takes a &str for name, converted into a &CStr and given to LLVMAddFunction. Then in FunctionValue::get_name, the byte slice with length returned by LLVMGetValueName2 does not include any trailing nul byte, hence cannot create a CStr for the similar reason as before. However, Context::metadata_string and MetadataValue::get_string_value do work as intended with the nul terminator included in the constructed slice.
IMO, we could make the entire API non-nul terminating, i.e. always use &str when interfacing with the C API, and always assume any string gotten back is also valid &str (or else panic), but that is very destructive. A better solution would be to find and test every instance where the C API returns a char pointer (possibly with length), and adjust any getter/setter accordingly.