Skip to content

Find solution to Virtuoso SPARQL troubles #30

@amoeba

Description

@amoeba

Years ago, back when we set up d1lod, we decided to handle inserting RDF data into whichever triplestore we used in an agnostic fashion so the triplestore could be swapped out without too much work. So we settled on inserting data via SPARQL INSERT statements.

While revisiting d1lod and repurposingn it for Slinky, I've run into two related issues with this approach:

  1. Virtuoso has some sort of arbitrary and not-documented size limit on SPARQL statements. Their SPARQL engine just pukes when you get over a certain query string length. I don't think we ran into this during the GeoLink work and I only noticed it because a particular dataset got turned into a too-long SPARQL INSERT query
  2. If I choose to split the query up and insert it in batches, we run into another problem: Blank nodes. If a query references a bnode as an object but the definition of that bnode (where it's a subject) ends up in the next query, Virtuoso complains. AFAICT this is a Virtuoso Open Source bug and may not apply to other triplestores
    • This makes sense because I don't think bnodes really work across multiple queries. I considered making each bnode a proper HTTP IRI (skolemizing?) but wanted to avoid that because I want our output to still match science-on-schema.

SPARQL may just not be the right thing for this workload. I considered using alternative RDF data loading methods Virtuoso provides but it looks all they have is a system that loads data from a local filesystem via I-SQL commands.

I had been meaning to look at Blazegraph for a few years and I see that it has a nice HTTP bulk data loading REST API where you can just send serialized RDF to an endpoint. We aren't using any special functionality from Virtuoso so this might be a good point to switch.

Feedback or thoughts welcomed. I'll update here with what I figure out.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions