I have more or less finished implementing parsing and basic evaluation for SPARQL 1.1 property paths in Sesame. The only thing still missing is negated property sets. Negated sets of properties… Who the heck wants that? Honestly.
Anyway. Apart from the fact that the property path syntax is not quite simple to parse, there are some concepts involved which make evaluation of property-paths tricky – especially if arbitrary-length paths are involved.
Arbitrary-length paths are paths specified with a ‘+’ or ‘*’ modifier. For example, the path
?x foaf:knows+/foaf:name ?name specifies a match where either
?x knows someone with name
?name, or knows someone who knows someone with name
?name, or … you get the point. The trouble with such queries in ‘traditional’ evaluation is that there is no a-priori length: you can not simply use the SPARQL parser to build a set of joins.
A path of fixed length
n will usually be translated by the parser to
n-1 joins on individual statement patterns. In order to allow evaluation of non-fixed length paths, we need something that amounts to graph traversal. Fortunately, Sesame’s default evaluation strategy (essentially lazy iteration over bindings) is well suited for this task. What I have done is the following:
- the SPARQL parser translates an arbitrary-length property path to a new algebra operator, called (rather appropriately I thought)
- the default
EvaluationStrategyfor such an operator is based on a dynamically expanding iterator, which creates new joins while being evaluted.
Effectively, this new iterator (called
PathIteration, currently located as an inner class in the
EvaluationStrategyImpl) implements a depth-first graph traversal strategy. It reports back results iteratively and expands the path sequence length (using an ever-increasing iterative creation of nested joins) until it reaches a length at which the nested join no longer produces any matches. Cycle-detection is implemented by the simple expedient of adding an boolean comparison operator (NEQ) on top of our join: if the start node of the path has the same value as the end node, we have a cycle.