The SPARQL query language is extensible by nature: it allows implementors to define their own custom operators if the standard set of operators is not sufficient for the needs of some application.

Sesame’s SPARQL engine has been designed with this extensibility in mind: it allows you to define your own custom functions and use them as part of your SPARQL queries, like any other function. In this new recipe, I’ll show how to create a simple custom function. Specifically, we are going to implement a boolean function that detects if some string literal is a palindrome. read more…

From Sesame 2.4.1 is a bugfix release, fixing various reported issues in the new SPARQL 1.1 query functionality and a number of stability/scalability improvements. Some highlights:

  • SPARQL GROUP BY with complex expressions as arguments are now evaluated correctly (SES-774)
    SUM and AVG now silently ignore non-numeric values (SES-788)

  • Performance improvements in processing of aggregate operators (SES-789)
  • Fix for native store data corruption issue (SES-776)
  • Various other fixes and improvements in handling of property paths and aggregate operators

See the release notes for a full overview of all changes.

I have finally found the time to set up my new weblog and website for Rivuli Development. Rivuli Development is all about the use of semantic technologies: RDF, SPARQL, Linked Data, and so on. In other words, it is about what I do as a freelance technical consultant and software developer.

Currently, the content is still a bit sparse, but you can expect regular blog postings here on topics such as software design and implementation for SPARQL 1.1, usage tips and tricks for the Sesame framework, and more general musings on Semantic Web and Linked Data. I will also further extend the site with additional static content, including more general information about what linked data and semantic web are all about, and what advantages these technologies can provide to business.

I am very proud to announce the release of Sesame 2.4.0. This is a major new release featuring support for SPARQL 1.1 Query.

Sesame 2.4.0 implements all features of SPARQL 1.1 Query as outlined in the
October 14 W3C Working Draft, with the exception of basic federated query. SPARQL 1.1 for Sesame is developed by Ontotext in cooperation withAduna.

The list of new SPARQL query features includes:

  • Use of expressions in the SELECT clause
  • Property paths
  • Subqueries
  • Negation: (NOT) EXISTS and MINUS
  • Set membership: (NOT) IN
  • Conditionals: IF
  • Various new builtin functions: COALESCE, BNODE, IRI, isNumeric, strLang, strDt

Apart from this impressive array of new query language features, Sesame 2.4.0 also implements a number of bug fixes and improvements, including scalability and performance improvement in the Native store. For a full overview, see the release notes.

I’ve just finished the implementation of SPARQL 1.1 property paths in Sesame. The major thing still missing was the implementation of negated property sets.

Negated property sets enable you to formulate a query like, for example: “give me back two resources x and y which are related in any direction via some property, but not via foaf:knows“. In SPARQL 1.1, this would look like:

   SELECT ?x ?y
   WHERE { 
           ?x !(foaf:knows|^foaf:knows) ?y .

The current SPARQL 1.1 draft defines a new abstract symbol for supporting negated property sets, called NegatedPropertySet. The SPARQL algebra, in turn, maps this abstract symbol directly to a new algebraic operator, so the algebra is extended with an additional operator in order to support negated property sets, and it also gives a specific evaluation semantics for this new operator.

However, although perhaps useful in terms of brevity, it is in fact not necessary to thus extend the algebra. Negated property sets do not actually introduce additional expressivity to the language (in contrast to, for example, arbitrary-length property paths): the above query could have been formulated in SPARQL 1.0:

   SELECT ?x ?y
   WHERE { 
       { ?x ?p1 ?y . FILTER (?p1 != foaf:knows) }
       { ?y ?p2 ?x . FILTER (?p2 != foaf:knows) }

This simple fact makes it possible to implement negated property sets without having to extend Sesame’s query model with an additional algebra operator. The advantage of this is that all of Sesame’s existing query optimizing/rewriting/evaluation strategies can immediately handle negated property sets, without having to be ‘recalibrated’ or indeed extended to take a complex additional operator into account.

So, the SPARQL parser processes a negated property set and translates it to the necessary collection of Joins, Filter comparisons and Unions. The algorithm is roughly as follows:

 let NPS be a negated property set with elements e_1...e_n.
 let s be the subject variable of the NPS.
 let ap be the (anonymous) predicate variable of the NPS.
 let O be the set of object variables of the NPS. 
 let F and F_i be two sets of filter conditions.
 let p(e) be the predicate IRI of e.

 for each e in NPS :
    create a filter condition f: p(e) != ap .
    if e is inverted: add f to F_i, otherwise add f to F .
 let J be a Join on basic graph patterns. 
 if F is not empty:
    for each o in O :
       add BGP(s, ap, o) to J .

 let I be a Join on basic graph patterns. 
 if F_i is not empty:
    for each o in O :
       add BGP(o, ap, s) to I .
 if I and J are both not empty:
    return Union(Filter(J, F), Filter(I, F_i)) .
 else if I is not empty :
    return Filter(I, F_i).
 else if J is not empty :
    return Filter(J, F).

The end result of applying this algorithm to the example SPARQL 1.1 query we saw above would be the following (slightly adapted for readability) Sesame query algebra expression:

Projection({x, y}, 
      Filter(StatementPattern(y, ap, x), Compare(!=, ap, foaf:knows)),
      Filter(StatementPattern(x, ap, y), Compare(!=, ap, foaf:knows))

Short and sweet. Of course, it gets less short and sweet when using more complex property sets, or property sets in combination with other property path features, but the algorithm caters to that. Such more complex expression just result in a larger set of unions and joins.

I have more or less finished implementing parsing and basic evaluation for SPARQL 1.1 property paths in Sesame. The only thing still missing is negated property sets. Negated sets of properties… Who the heck wants that? Honestly.

Anyway. Apart from the fact that the property path syntax is not quite simple to parse, there are some concepts involved which make evaluation of property-paths tricky – especially if arbitrary-length paths are involved.

Arbitrary-length paths are paths specified with a ‘+’ or ‘*’ modifier. For example, the path ?x foaf:knows+/foaf:name ?name specifies a match where either ?x knows someone with name ?name, or knows someone who knows someone with name ?name, or … you get the point. The trouble with such queries in ‘traditional’ evaluation is that there is no a-priori length: you can not simply use the SPARQL parser to build a set of joins.

A path of fixed length n will usually be translated by the parser to n-1 joins on individual statement patterns. In order to allow evaluation of non-fixed length paths, we need something that amounts to graph traversal. Fortunately, Sesame’s default evaluation strategy (essentially lazy iteration over bindings) is well suited for this task. What I have done is the following:

  1. the SPARQL parser translates an arbitrary-length property path to a new algebra operator, called (rather appropriately I thought) ArbitraryLengthPath.
  2. the default EvaluationStrategy for such an operator is based on a dynamically expanding iterator, which creates new joins while being evaluted.

Effectively, this new iterator (called PathIteration, currently located as an inner class in the EvaluationStrategyImpl) implements a depth-first graph traversal strategy. It reports back results iteratively and expands the path sequence length (using an ever-increasing iterative creation of nested joins) until it reaches a length at which the nested join no longer produces any matches. Cycle-detection is implemented by the simple expedient of adding an boolean comparison operator (NEQ) on top of our join: if the start node of the path has the same value as the end node, we have a cycle.