-
Notifications
You must be signed in to change notification settings - Fork 1
Description
I just realized there is a critical problem with the piece of code below.
When I commented on it in #163, @tijmenbaarda stated that this function was necessary because Blazegraph did not support blank nodes, so we left it in.
At first sight, this function just replaces real blank nodes by an emulation of blank nodes. However, it actually introduces a risk of collision between unrelated nodes. Real blank nodes are intrinsically distinct; they can only be identified if they occur within the same representation. The emulation replacement, on the other hand, produces URIs consisting of a small random number embedded in a fixed string. It is only a matter of time before the same random number is reused. The duplicates will be generated on different dates, possibly by different WSGI workers, and most likely in the process of serializing different queries. This will compromise data integrity.
As for the argument that Blazegraph does not support blank nodes: this should not be true. Blank nodes are too essential to leave them out of any serious RDF implementation. In fact, the Blazegraph wiki mentions support for blank nodes and even an alternative mode of supporting them.
@tijmenbaarda, could you retrace the origin of your belief that Blazegraph does not support blank nodes? I'm sure there is a real problem that needs to be addressed, but we must find an approach that does not involve replace_blank_node or anything similar.