PyKDE4: Queries with Nepomuk

In one of my previous blog posts I dealt with tagging files and resources with Nepomuk. But Nepomuk is not only about storing metadata, it is also about retrieving and _interrogating _data. Normally, this would mean querying the metadata database directly, using queries written in SPARQL. But this is not intuitive, can be inefficient (if you do things the wrong way) and error prone (oops, I messed up a parameter!).

Fortunately, the Nepomuk developers have come up with a high level API to query already stored metadata, and today’s post will deal with querying tags in Nepomuk. As per the past tutorials, the full source code is available in the kdeexamples module.

Let’s start off with the basic imports:

import sys

import PyQt4.QtCore as QtCore

import PyKDE4.kdecore as kdecore
import PyKDE4.kdeui as kdeui
from PyKDE4.kio import KIO
from PyKDE4.nepomuk import Nepomuk
from PyKDE4.soprano import Soprano

Then let’s create a simple class that wil be used for the rest of this exercise:

class NepomukTagQueryExample(QtCore.QObject):

    def __init__(self, parent=None):

        super(NepomukTagQueryExample, self).__init__(parent)

init is just used to construct the instance, nothing more. The bulk of the work is in the query_tag() function, which we’ll take a look at in parts.

def query_tag(self, tag):

        """Query for a specific tag."""

        tag = Nepomuk.Tag(tag)

First of all we convert the tag we want to query into a proper Nepomuk.Tag() instance. Of course we should use an already existing tag: even if Nepomuk.Tag() automatically creates new tags, it makes little sense to query for a newly created tag, doesn’t it?

For our job, we need to use properties which define the terms of our query. As we’re looking for tags, we’ll use Soprano.Vocabulary.NAO.hasTag():

soprano_term_uri = Soprano.Vocabulary.NAO.hasTag()
        nepomuk_property = Nepomuk.Types.Property(soprano_term_uri)

The first call generates an URI pointing to a specific RDF resource for this specific term, which is then wrapped as a Nepomuk.Types.Property in the second call. While the C++ API docs don’t show this, I found it to be necessary, or the Python interpreter would raise a TypeError. Notice that this is not the only term we can use: aside for tags, there are a lot of other URIs we can use for querying, listed in the Soprano API docs.

Once we have our property set up, it’s time to define which kind of query we’re going to use. In this case, since we want to check for the presence of tags, we use a Nepomuk.Query.ComparisonTerm, which is a query term used to match values of specific properties (in our case, tags):

comparison_term = Nepomuk.Query.ComparisonTerm(nepomuk_property,
                Nepomuk.Query.ResourceTerm(tag))

Our tag is wrapped in a ResourceTerm, which is used exactly for the purpose. Now we make the proper query: in this specific case, we want to look up _files _tagged, so we use a FileQuery. We could also get other items, such as mails (in Akonadi): in that case we could use a a Nepomuk.Query.Query():

query = Nepomuk.Query.FileQuery(comparison_term)

Lastly, we want to get some results out of this query. There are different methods, but for this tutorial we’ll use the tried-and-tested KIO technology:

search_url = query.toSearchUrl()
        search_job = KIO.listDir(kdecore.KUrl(search_url))
        search_job.entries.connect(self.search_slot)
        search_job.result.connect(search_job.entries.disconnect)

First we convert the query to a nepomuksearch:// url, which then we pass to KIO.listDir, to list the entries. Unlike my previous post on KIO, this job emits entries() every time one is found, so we connect the signal to our search_slot method. We also connect the job’s result() signal in a way that it will disconnect the job once it’s over.

Finally, let’s take a look at the search_slot function:

def search_slot(self, job, data):

        # We may get invalid entries, so skip those
        if not data:
            return

        for item in data:
            print item.stringValue(KIO.UDSEntry.UDS_DISPLAY_NAME)

Entries are emitted as UDSEntries: to get something at least understandable, we turn them into the file name, which is obtained by the stringValue() call using KIO.UDSEntry.UDS_DISPLAY_NAME.

That’s it. As you can see, it was pretty easy. Of course there’s more than that. For further reading, take a look at Nepomuk’s Query API docs, and Query Examples. Bear in mind however that to the best of my knowledge, the “fancy operators” mentioned there will not work with Python.

Happy Nepomuk querying!

Dialogue & Discussion