The SPARQL query language is extensible by nature: it allows implementors to define their own custom operators if the standard set of operators is not sufficient for the needs of some application.

Sesame’s SPARQL engine has been designed with this extensibility in mind: it allows you to define your own custom function and use it as part of your SPARQL queries as any other function. In this recipe, I’ll show how to create a simple custom function. Specifically, we are going to implement a boolean function that detects if some string literal is a palindrome.

[toc]

The palindrome function

Suppose we have the following RDF data:
[text light="true"]
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>.
@prefix ex: <http://example.org/> .

ex:a rdfs:label “step on no pets” .
ex:b rdfs:label “go on, try it” .
[/text]
We would like to be able to formulate a SPARQL query that allows us to retrieve all resources that have a palindrome as their label:
[text light="true"]
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
PREFIX cfn: <http://example.org/customfunction/>
SELECT ?x ?label
WHERE {
?x rdfs:label ?label .
FILTER(cfn:palindrome(str(?label)))
}
[/text]
The expected result of this query, given the above data, would be:

x label
ex:a “step on no pets”

Unfortunately, the function cfn:palindrome is not a standard SPARQL function, so this query won’t work. We could of course simply retrieve all label values and iterate over them to detect if they’re palindromes, but if we add a custom function instead, we remove the need to scan over the entire repository.

There’s two basic steps required in adding custom functions to Sesame:

  1. implement a Java class for the function;
  2. creating a JAR file with your function code in it and an Service Provider Interface (SPI) configuration.

Implementing the custom function as a Java class

In Sesame’s SPARQL engine, functions are expected to implement the org.openrdf.query.algebra.evaluation.function.Function interface.

[java]
package org.example.customfunction;

import org.openrdf.query.algebra.evaluation.function.Function;

public class PalindromeFunc implements Function { }
[/java]

The Function interface defines two methods: evaluate() and getURI(). The latter of these is a simple method that returns a string representation of the URI of the function:

[java]
// define a constant for the namespace of our custom function
public static final String NAMESPACE = “http://example.org/custom-function/”;

/**
* return the URI ‘http://example.org/custom-function/palindrome’ as a String
*/
public String getURI() {
return NAMESPACE + “palindrome”;
}
[/java]

The real proof of the pudding is in the implementation of the evaluate() method of course: this is where the function logic is implemented. In other words, in this method we check the incoming value to see if it is, first of all, a valid argument for the function, and second of all, a palindrome, and return the result. Putting everything together, we get the following class:

[java]
package org.example.customfunction;

import org.openrdf.query.algebra.evaluation.function.Function;
import org.openrdf.query.algebra.evaluation.ValueExprEvaluationException;
import org.openrdf.model.impl.*;
import org.openrdf.model.*;

/**
* a custom SPARQL function that determines whether an input literal string is
* a palindrome.
*/
public class PalindromeFunc implements Function {

// define a constant for the namespace of our custom function
public static final String NAMESPACE = “http://example.org/custom-function/”;

/**
* return the URI ‘http://example.org/custom-function/palindrome’ as a String
*/
public String getURI() {
return NAMESPACE + “palindrome”;
}

/**
* Executes the palindrome function.
*
* @return A boolean literal representing true if the input argument is a palindrome,
* false otherwise.
*
* @throws ValueExprEvaluationException
* if more than one argument is supplied or if the supplied argument is
* not a literal.
*/
public Value evaluate(ValueFactory valueFactory, Value… args)
throws ValueExprEvaluationException
{
// our palindrome function expects only a single argument, so throw an error
// if there’s more than one
if (args.length != 1) {
throw new ValueExprEvaluationException(“palindrome function requires” +
“exactly 1 argument, got ” + args.length);
}

Value arg = args[0];

// check if the argument is a literal, if not, we throw an error
if (! arg instanceof Literal) {
throw new ValueExprEvaluationException(
“invalid argument (literal expected): ” + arg);
}
else {
// get the actual string value that we want to check for palindrome-ness.
String label = ((Literal)arg).getLabel();

// we invert our string
String inverted = “”;
for (int i = label.length() – 1; i >= 0; i–) {
inverted += label.charAt(i);
}

// a string is a palindrome if it is equal to its own inverse
boolean palindrome = inverted.equalsIgnoreCase(label);

// a function is always expected to return a Value object, so we
// return our boolean result as a Literal
return valueFactory.createLiteral(palindrome);
}
}
}
[/java]
As you can see, you are completely free to implement your function logic: in the above example, we have created a simple function that only returns true or false, but since the actual return type of a Sesame function is Value, you are completely free to implement functions that return string literals, numbers, or URIs.

There is one important caveat: the evaluate() method is invoked for every single solution in the query result, so you should make sure that the implementation is not overly complex and memory-intensive. For example, you should typically not implement a function that keeps a list of all results sofar evaluated, at least not if you want to avoid major scalability issues.

Once we have created the Java class for our function, we need some way to add it to Sesame. This is where the Service Provider Interface (SPI) comes into play.

Creating an SPI configuration

Sesame’s set of functions is dynamically determined through the use of a javax.imageio.spi.ServiceRegistry class. Specifically, Sesame has a class called org.openrdf.query.algebra.evaluation.function.FunctionRegistry which keeps track of all implementations of the Function interface. Java’s SPI mechanism depends on the presence of configuration files in the JAR files that contain service implementations. This configuration file is expected to be present in the directory META-INF/services in your JAR file.

In the case of Sesame’s SPARQL function registry, the name of this configuration file should be org.openrdf.query.algebra.evaluation.function.Function (in other words, the file name is equal to the fully-qualified name of the service interface we are providing an implementation for). The contents are really quite simple: an SPI configuration is a simple text file, containing the fully-qualified names of each Java class that provides an SPI implementation, one on each line. So in our case, the contents of the file would be:
[text]
org.example.customfunction.PalindromeFunc
[/text]
Apart from this configuration file, your JAR file should of course also contain the actual compiled class. All of this is fairly easy to do, for example from your Eclipse project:

  1. create a directory META-INF and a subdirectory META-INF/services within the src directory of your project (or, if you happen to be slightly more organized and actually use Maven’s prescribed project directory structure, within src/main/resources);
  2. Add a text file named org.openrdf.query.algebra.evaluation.function.Function to this new directory. Make sure it contains a single line with the fully qualified name of your custom function class (in our example, that’s org.example.customfunction.PalindromeFunc);
  3. Use Eclipse’s export function (or alternatively Maven’s package command) to create a JAR file (select the project, click ‘File’ -> ‘Export’ -> ‘JAR file’). Make sure the JAR file produced contains your compiled code and the sevice registry config file.

Once you have a proper JAR file, you need to add it the runtime classpath of your Sesame deployment and if necessary restart Sesame. After that, you’re done: Sesame should automatically pick up your new custom function, you can from now on use it in your SPARQL queries.

Leave a Reply