PyKDE4: Queries with Nepomuk

In one of my previous blog posts I dealt with tagging files and resources with Nepomuk. But Nepomuk is not only about storing metadata, it is also about retrieving and interrogating data. Normally, this would mean querying the metadata database directly, using queries written in SPARQL. But this is not intuitive, can be inefficient (if you do things the wrong way) and error prone (oops, I messed up a parameter!). 

Fortunately, the Nepomuk developers have come up with a high level API to query already stored metadata, and today’s post will deal with querying tags in Nepomuk. As per the past tutorials, the full source code is available in the kdeexamples module.

Let’s start off with the basic imports:

import sys

import PyQt4.QtCore as QtCore

import PyKDE4.kdecore as kdecore
import PyKDE4.kdeui as kdeui
from PyKDE4.kio import KIO
from PyKDE4.nepomuk import Nepomuk
from PyKDE4.soprano import Soprano

Then let’s create a simple class that wil be used for the rest of this exercise:

class NepomukTagQueryExample(QtCore.QObject):

    def __init__(self, parent=None):

        super(NepomukTagQueryExample, self).__init__(parent)

__init__ is just used to construct the instance, nothing more. The bulk of the work is in the query_tag() function, which we’ll take a look at in parts.

    def query_tag(self, tag):

        """Query for a specific tag."""

        tag = Nepomuk.Tag(tag)

First of all we convert the tag we want to query into a proper Nepomuk.Tag() instance. Of course we should use an already existing tag: even if Nepomuk.Tag() automatically creates new tags, it makes little sense to query for a newly created tag, doesn’t it?

For our job, we need to use properties which define the terms of our query. As we’re looking for tags, we’ll use Soprano.Vocabulary.NAO.hasTag():

        soprano_term_uri = Soprano.Vocabulary.NAO.hasTag()
        nepomuk_property = Nepomuk.Types.Property(soprano_term_uri)

The first call generates an URI pointing to a specific RDF resource for this specific term, which is then wrapped as a Nepomuk.Types.Property in the second call. While the C++ API docs don’t show this, I found it to be necessary, or the Python interpreter would raise a TypeError. Notice that this is not the only term we can use: aside for tags, there are a lot of other URIs we can use for querying, listed in the Soprano API docs.

Once we have our property set up, it’s time to define which kind of query we’re going to use. In this case, since we want to check for the presence of tags, we use a Nepomuk.Query.ComparisonTerm, which is a query term used to match values of specific properties (in our case, tags):

        comparison_term = Nepomuk.Query.ComparisonTerm(nepomuk_property,
                Nepomuk.Query.ResourceTerm(tag))

Our tag is wrapped in a ResourceTerm, which is used exactly for the purpose. Now we make the proper query: in this specific case, we want to look up files tagged, so we use a FileQuery. We could also get other items, such as mails (in Akonadi): in that case we could use a a Nepomuk.Query.Query():

        query = Nepomuk.Query.FileQuery(comparison_term)

Lastly, we want to get some results out of this query. There are different methods, but for this tutorial we’ll use the tried-and-tested KIO technology:

        search_url = query.toSearchUrl()
        search_job = KIO.listDir(kdecore.KUrl(search_url))
        search_job.entries.connect(self.search_slot)
        search_job.result.connect(search_job.entries.disconnect)

First we convert the query to a nepomuksearch:// url, which then we pass to KIO.listDir, to list the entries. Unlike my previous post on KIO, this job emits entries() every time one is found, so we connect the signal to our search_slot method. We also connect the job’s result() signal in a way that it will disconnect the job once it’s over.

Finally, let’s take a look at the search_slot function:

    def search_slot(self, job, data):

        # We may get invalid entries, so skip those
        if not data:
            return

        for item in data:
            print item.stringValue(KIO.UDSEntry.UDS_DISPLAY_NAME)

Entries are emitted as UDSEntries: to get something at least understandable, we turn them into the file name, which is obtained by the stringValue() call using KIO.UDSEntry.UDS_DISPLAY_NAME.

That’s it. As you can see, it was pretty easy. Of course there’s more than that. For further reading, take a look at Nepomuk’s Query API docs, and Query Examples. Bear in mind however that to the best of my knowledge, the “fancy operators” mentioned there will not work with Python.

Happy Nepomuk querying!

4 thoughts on “PyKDE4: Queries with Nepomuk”

  1. I would like Nepomuk queries to be simple for people using dynamic languages, but the combination of the Nepomuk query language and KIO UDSEntries doesn’t really look very straightforward to me.

    I don’t know what the right answer is, but I think to compose a Nepomuk query you really need to start with SPARQL and then convert it into the Nepomuk equivalent. I don’t think most people know SPARQL, but they probably don’t know the Nepomuk query language either.

    We should avoid any code involving UDSEntries for people wanting to learn KDE programming in dynamic languages, as they shouldn’t be required to learn such low level stuff. Maybe we need to layer a more ‘user friendly’ api on top of KIO for this sort of use case?

  2. There are other ways to query, one perhaps that’s easier to use is the QueryServiceClient (http://2tu.us/3dyt) which is a front-end to the dbus search client.
    This doesn’t output the results as UDSEntries, and can also be async. I’ll try to review my tutorial using this approach.

  3. A couple of things -

    * You might want to use the Python equivalent of ‘using namespace Soprano::Vocabulary’. It really improves the readability of the code.

    * As Richard pointed out, UDS Entries are quite low level, and not particularly fun to deal with. You could just issue a query using the QueryServiceClient, which will returns you a list of results, each of which will have a Nepomuk::Resource. You can then use Nepomuk::Resource::genericLabel(), if you want to display it.

    Queries are essentially asynchronous, but you can run them synchronously, via QueryServiceClient::syncQuery

    * For simple stuff like tags, one might want to use a ‘desktop query’, which supports stuff like “hasTag:tagName”. Look at QueryServiceClient::desktopQuery().

    * Sometime last year after Akademy, Sebastian added a bunch of convenience operators for constructing queries. So, you can do something like this -

    Nepomuk::Tag tag(“tagName”);
    Query::Term term = ( NAO::hasTag() == Query::ResourceTerm(tag) );

    instead of using ComparisonTerm.

    I think I can add auto-conversion from Nepomuk::Resource to Query::ResourceTerm, so that additional Query::ResourceTerm won’t be required. I’ll see.

Comments are closed.