Difference between revisions of "Small 4store How-To"
Line 30: | Line 30: | ||
Also, while PUT works fine from ''requests'', you need to load the whole file first, and then PUT it on 4store, as 4store requires you to provide data length (when PUTting a file using ''requests'' directly from file, the length is not provided). | Also, while PUT works fine from ''requests'', you need to load the whole file first, and then PUT it on 4store, as 4store requires you to provide data length (when PUTting a file using ''requests'' directly from file, the length is not provided). | ||
− | For this reason I prefer to use ''curl'' via ''subprocess'' - saves process' memory, and is in fact faster. | + | For this reason I prefer to use ''curl'' via ''subprocess'' - saves process' memory, and is in fact faster. Or, you can use POST to add statements to KB bit-by-bit. |
The following is a simple example of loading ''example.ttl'' file to 4store, and querying it to return first 10 "rows": | The following is a simple example of loading ''example.ttl'' file to 4store, and querying it to return first 10 "rows": | ||
+ | |||
+ | <pre>#! /usr/bin/python | ||
+ | import requests | ||
+ | import subprocess | ||
+ | |||
+ | # SPARQL endpoint | ||
+ | host = "http://localhost:8000/" | ||
+ | |||
+ | subprocess.call(["curl", "-T", "example.ttl", "-H", "Content-Type: text/turtle", host + "data/example.ttl"]) | ||
+ | |||
+ | query = "SELECT ?s ?p ?o WHERE { ?s ?p ?o . } LIMIT 10" | ||
+ | data = { "query": query, "output": "text" } | ||
+ | |||
+ | r = requests.post(host + "sparql/", data=data) | ||
+ | if r.status_code != requests.codes.ok: # something went wrong | ||
+ | return | ||
+ | |||
+ | # print the results; for output=text we get TSV (tab separated values) | ||
+ | # the first line of the output are names of the variables in columns, thus the [:1] | ||
+ | for line in r.text.split("\n")[1:]: | ||
+ | if line != "": | ||
+ | print line.split("\t")</pre> |
Revision as of 16:07, 7 September 2012
The 4store RDF storage is "an efficient, scalable and stable RDF database".
Even though it's creators Garlik are currently using new 5store, the project is still developed, and honestly, became even more interesting with v1.1.5, esp. since it adds support for ORDER BY together with GROUP BY - e.g. ordering by average - which was not supported in previous versions.
It's recommended that you download tarball, not the Git snapshot. Then
./configure --prefix /some/folder/ --with-storage-path=/some/folder/4store --with-config-file=/some/folder/4store.conf CFLAGS=-O2 CPPFLAGS=-O3 make -j8 make install
Of course, you can leave the folders to their defaults if you so choose.
Then to start the 4store:
4s-backend-setup [kbname] 4s-backend [kbname] 4s-httpd -p [port] [kbname]
The above will setup storage for kbname, run backend to support it, and open HTTP server at port (default 8080). If you want your httpd to be accessible only locally, use -H 127.0.0.1 to restrict it to listen only on localhost.
Import some data using
curl -T data.ttl 'http://localhost:[port]/data/data.ttl'
Then go to http://localhost:[port]/test/ and test some SPARQL queries!
Python to 4store
It's recommendable (and not just for 4store communication, but in general) to use Requests library. It makes urllib2 look like it's written using runes.
Also, while PUT works fine from requests, you need to load the whole file first, and then PUT it on 4store, as 4store requires you to provide data length (when PUTting a file using requests directly from file, the length is not provided). For this reason I prefer to use curl via subprocess - saves process' memory, and is in fact faster. Or, you can use POST to add statements to KB bit-by-bit.
The following is a simple example of loading example.ttl file to 4store, and querying it to return first 10 "rows":
#! /usr/bin/python import requests import subprocess # SPARQL endpoint host = "http://localhost:8000/" subprocess.call(["curl", "-T", "example.ttl", "-H", "Content-Type: text/turtle", host + "data/example.ttl"]) query = "SELECT ?s ?p ?o WHERE { ?s ?p ?o . } LIMIT 10" data = { "query": query, "output": "text" } r = requests.post(host + "sparql/", data=data) if r.status_code != requests.codes.ok: # something went wrong return # print the results; for output=text we get TSV (tab separated values) # the first line of the output are names of the variables in columns, thus the [:1] for line in r.text.split("\n")[1:]: if line != "": print line.split("\t")