【语义网】Jena框架简介及实战

参考资料：
web3.0与Semantic Web编程（中文版）
Web 3.0与Semantic Web编程英文版
随书代码下载
【语义网】【读书笔记】Web 3.0与Semantic Web编程（一）语义Web程序设计简介
【语义网】【读书笔记】Web 3.0与Semantic Web编程（二）语义Web程序设计基础之本体与知识建模
【语义网】【读书笔记】Web 3.0与Semantic Web编程（三）查询、推理与语义网应用程序框架设计
Getting started with Apache Jena
Using Jena with Eclipse

1. 概述
2. RDF和Jena RDF API
- 2.1 核心概念
- 2.2 相关的Jena Package
- 2.3 上手实践
- - 2.3.1 创建RDF图
  - 2.3.2 RDF图的输出
  - 2.3.3 RDF图的读取
  - 2.3.4 RDF模型的信息访问（Navigating and Querying）
  - 2.3.5 模型操作
3. Ontology和Jena Ontology API
- 3.1 载入Onotlogy文件
- 3.2 保存Ontology文件
- 3.3 创建Ontology模型
4. 查询与推理

1. 概述

在Jena框架中，不同API之间的交互如上图所示。首先，RDF API是Jena最核心、最基础的API，支持对RDF数据模型的查询、读写等操作。Ontology API是RDF API的一个扩展，以支持本体文件的操作；SPARQL API提供了更复杂的查询接口。Inference API提供了基于规则进行推理的推理引擎。Store API提供了本体的存储。

2. RDF和Jena RDF API

这一节主要参考了官网的两个教程：

The RDF API
An Introduction to RDF and the Jena RDF API，这个教程也有小伙伴翻译了中文版。

2.1 核心概念

Graphs, models

在Jena中，RDF三元组信息被存储在Model这一数据结构中。Model表示一个RDF图，之所以这样称呼是因为它包含一个RDF节点集合，通过标记关系彼此连接。

Jena API	作用
`Model`	a rich Java API with many convenience methods for Java application developers. 我们通常会更关注`Model` API。
`Graph`	a simpler Java API intended for extending Jena’s functionality

Nodes: resources, literals and blank nodes

既然RDF是一个图，图中的节点（Nodes）就表示资源（resources）或者值（literals）。

A resource represented as a URI （可以参考这篇文章对URI的解释：What Do URIs Mean Anyway?） denotes a named thing - it has an identity. We can use that identity to refer to directly the resource, as we will see below. Another kind of node in the graph is a literal, which just represents a data value such as the string “ten” or the number 10. Literals representing values other than strings may have an attached datatype, which helps an RDF processor correctly convert the string representation of the literal into the correct value in the computer. By default, RDF assumes the datatypes used XSD are available, but in fact any datatype URI may be used.

Triples and Properties

在RDF图中，由两个节点和连接它们的边组成的关系对称为三元组（Triples），如下图所示。一般来说，三元组表示为（主语，谓语，宾语）（即（Subject, Predicate, Object））的形式。在Jena中，用类Statement表示一个三元组。

根据RDF的标准，只有资源可以作为RDF三元组的主语，而宾语可以是资源或值。因此，提取Statement中相关元素的方法为：

getSubject()返回的是Resource
getObject()返回的是RDFNode
getPredicate()返回的是Property

Namespaces

命名空间（Namespaces）的作用是区分拥有相同名字的URI。比如说，红富士苹果和嘎啦苹果虽然都是苹果，但是它们的品种不同。因此，可以用“红富士”和“嘎啦”两个命名空间进行区分。

2.2 相关的Jena Package

注：org.apache.jena简写为oaj

Package	功能
`oaj.jena.rdf.model`	是Jena的核心模块，用于创建和复制RDF图。
`oaj.riot`	用于读写RDF。
`oaj.jena.datatypes`	提供向Jena描述数据类型的核心接口。可以参考Typed literals how-to
`oaj.jena.ontology`	Abstractions and convenience classes for accessing and manipulating ontologies represented in RDF. 这一部分可以参考后面关于Ontology API介绍的章节。
`oaj.jena.rdf.listeners`	Listening for changes to the statements in a model
`oaj.jena.reasoner`	The reasoner subsystem is supports a range of inference engines which derive additional information from an RDF model. 具体的可以看下面关于推理的小节。
`oaj.jena.shared`	Common utility classes
`oaj.jena.vocabulary`	A package containing constant classes with predefined constant objects for classes and properties defined in well known vocabularies.
`oaj.jena.xmloutput`	Writing RDF/XML.

2.3 上手实践

官网的程序下载

2.3.1 创建RDF图

首先，我们尝试着构建这样的一个RDF图：

Java程序如下：

import org.apache.jena.rdf.model.Model;
import org.apache.jena.rdf.model.ModelFactory;
import org.apache.jena.rdf.model.Resource;
import org.apache.jena.vocabulary.VCARD;public class createRDF {// some definitionsstatic String personURI = "http://somewhere/JohnSmith";static String fullName = "John Smith";public static void main(String[] args) {// create an empty ModelModel model = ModelFactory.createDefaultModel();// create the resourceResource johnSmith = model.createResource(personURI);// add the propertyjohnSmith.addProperty(VCARD.FN, fullName);}
}

首先定义两个常量表示人和他的名字。接着，创建一个RDF图model，通过Model API中的ModelFactory.createDefaultModel()方法实现。然后，在RDF图中添加一个资源johnSmith，并添加该资源的属性。我们同样可以将资源创建利用更简洁的写法表示：

Resource johnSmith = model.createResource(personURI).addProperty(VCARD.FN, fullName);

这样，我们就完成了一个最简单的额RDF图的构建，该RDF图中只包含一个三元组。接下来，我们来看一下更负责一点的RDF图怎么实现：

实际上程序也十分简单，也是先创建一个空的RDF图，然后添加节点就可以了。

package dealWithRDF;import org.apache.jena.rdf.model.Model;
import org.apache.jena.rdf.model.ModelFactory;
import org.apache.jena.rdf.model.Resource;
import org.apache.jena.vocabulary.VCARD;public class createRDF {// some definitionsstatic String personURI = "http://somewhere/JohnSmith";static String givenName    = "John";static String familyName   = "Smith";static String fullName     = givenName + " " + familyName;public static void main(String[] args) {// create an empty ModelModel model = ModelFactory.createDefaultModel();// create the resource and add the properties cascading styleResource johnSmith = model.createResource(personURI).addProperty(VCARD.FN, fullName).addProperty(VCARD.N, model.createResource().addProperty(VCARD.Given, givenName).addProperty(VCARD.Family, familyName));}
}

在Jena中，每一个三元组称作一个statement。在上面这个程序中，每一步addProperty的操作就相当于在RDF图中添加了一个statement。为了获得RDF图中的所有statement，我们可以通过StmtIterator中的listStatements()方法得到，并对每一个statement，我们用上述提到的getSubject()、getObject()和getPredicate()方法得到statement的Resource、RDFNode和Property，并输出。

import org.apache.jena.rdf.model.Model;
import org.apache.jena.rdf.model.ModelFactory;
import org.apache.jena.rdf.model.Property;
import org.apache.jena.rdf.model.RDFNode;
import org.apache.jena.rdf.model.Resource;
import org.apache.jena.rdf.model.Statement;
import org.apache.jena.rdf.model.StmtIterator;
import org.apache.jena.vocabulary.VCARD;public class createRDF {// some definitionsstatic String personURI = "http://somewhere/JohnSmith";static String givenName    = "John";static String familyName   = "Smith";static String fullName     = givenName + " " + familyName;public static void main(String[] args) {// create an empty ModelModel model = ModelFactory.createDefaultModel();// create the resource and add the properties cascading styleResource johnSmith = model.createResource(personURI).addProperty(VCARD.FN, fullName).addProperty(VCARD.N, model.createResource().addProperty(VCARD.Given, givenName).addProperty(VCARD.Family, familyName));// list the statements in the ModelStmtIterator iter = model.listStatements();// print out the predicate, subject and object of each statementwhile (iter.hasNext()) {Statement stmt = iter.nextStatement(); // get next statementResource subject = stmt.getSubject(); // get the subjectProperty predicate = stmt.getPredicate(); // get the predicateRDFNode object = stmt.getObject(); // get the objectSystem.out.print(subject.toString());System.out.print(" " + predicate.toString() + " ");if (object instanceof Resource) {System.out.print(object.toString());}else {// object id a literalSystem.out.print("\"" + object.toString() + "\"");}System.out.println(".");}}
}

最后的输出为：

http://somewhere/JohnSmith http://www.w3.org/2001/vcard-rdf/3.0#N 41ba8556-85e4-465e-966e-32da92e83951.
http://somewhere/JohnSmith http://www.w3.org/2001/vcard-rdf/3.0#FN "John Smith".
41ba8556-85e4-465e-966e-32da92e83951 http://www.w3.org/2001/vcard-rdf/3.0#Family "Smith".
41ba8556-85e4-465e-966e-32da92e83951 http://www.w3.org/2001/vcard-rdf/3.0#Given "John".

因为RDF图中含有空白节点，空白节点的地址是随机分配的。

2.3.2 RDF图的输出

在Jena中，直接通过model.write(System.out);就可以在Console面板看到输出的RDF模型了，默认采用的是RDF/XML的序列化方式——

<rdf:RDFxmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"xmlns:vcard="http://www.w3.org/2001/vcard-rdf/3.0#"><rdf:Description rdf:about="http://somewhere/JohnSmith"><vcard:N rdf:parseType="Resource"><vcard:Family>Smith</vcard:Family><vcard:Given>John</vcard:Given></vcard:N><vcard:FN>John Smith</vcard:FN></rdf:Description>
</rdf:RDF>

我们同样可以用RDFDataMgr将RDF图通过不同的格式输出：

格式	输出语句	输出
RDF/XML	`RDFDataMgr.write(System.out, model, Lang.RDFXML);`
TURTLE	`RDFDataMgr.write(System.out, model, Lang.TURTLE);`
N-Triples	`RDFDataMgr.write(System.out, model, Lang.NTRIPLES);`

2.3.3 RDF图的读取

我们将上述创建的RDF图保存为vc-db-1.rdf文件，通过输入流完成RDF图的读取。

import org.apache.jena.rdf.model.Model;
import org.apache.jena.rdf.model.ModelFactory;
import org.apache.jena.riot.RDFDataMgr;public class readRDF {static final String inputFileName  = "vc-db-1.rdf";public static void main(String[] args) {// create an empty modelModel model = ModelFactory.createDefaultModel();//use the RDFDataMgr to find the input filesInputStream in = RDFDataMgr.open(inputFileName);if (in == null) {throw new IllegalArgumentException("File:" + inputFileName + " not found");}// read the RDF/XML filemodel.read(in, null);// write it to standard outmodel.write(System.out);}}

2.3.4 RDF模型的信息访问（Navigating and Querying）

基于创建好的模型，我们可以对模型中的一些信息进行访问。（完整代码）

     // retrieve the Adam Smith vcard resource from the modelResource vcard = model.getResource(johnSmithURI);// retrieve the values of the N propertyResource name = (Resource) vcard.getRequiredProperty(VCARD.N).getObject();// retrieve the given name propertyString fullName = vcard.getRequiredProperty(VCARD.FN).getString();// add two nick name properties to vcardvcard.addProperty(VCARD.NICKNAME, "Smith").addProperty(VCARD.NICKNAME, "Adman");// set up the outputSystem.out.println("The nicknames of \"" + fullName + "\" are:");// list the nicknamesStmtIterator iter = vcard.listProperties(VCARD.NICKNAME);while (iter.hasNext()) {System.out.println("    " + iter.nextStatement().getObject().toString());}

给定一个资源的URI，可以使用Model.getResource(String uri)方法检索模型中的资源对象。这个方法会返回一个Resource对象，如果模型中存在，则返回资源，不存在话，创建一个新的资源对象返回。

Resource.getRequiredProperty(Property p)方法可以获取资源的属性。

语句	作用
`Resource name = (Resource) vcard.getRequiredProperty(VCARD.N).getObject();`	获得该属性连接的另一个资源。这里是对获取接过进行了强制格式转换，如果已经知道获取到的结果是资源类型，也可以不需要执行强制转换。
`String fullName = vcard.getRequiredProperty(VCARD.FN).getString();`	获取属性值

虽然教程中给出了基本的查询方法，但是其查询功能的实现远不如SPARQL强大，这一部分在此就不多介绍了。一般在使用的时候也是以SPARQL为主。

2.3.5 模型操作

Jena提供了三种操作模型的方法——union（求和）, intersection（求交集） and difference（求差集）。分别通过.union(Model)、.intersection(Model)和.difference(Model)来实现。

3. Ontology和Jena Ontology API

OWL语言被细分为三类——OWL Lite、OWL DL和OWL Full。这三种语言之间的区别和联系可参考这篇文章：OWL本体语言中OWL Lite、OWL DL、OWL Full理解。

为了更好地理解Jena的Ontology API，我们通过一个例子去理解。该例子用到的是ESWC ontology。这个本体提供了一个简单的模型来描述与学术会议相关的概念和活动，其中的一些概念及其之间的关系如下图所示：

3.1 载入Onotlogy文件

可以通过read方法来载入（本地的）本体文件，不同的read方法包括：

read( String url )
read( Reader reader, String base )
read( InputStream reader, String base )
read( String url, String lang )
read( Reader reader, String base, String Lang )
read( InputStream reader, String base, String Lang )

例如：

     OntModel m = ModelFactory.createOntologyModel();File myFile = new File("example.ttl");m.read(new FileInputStream(myFile), "", "TURTLE");

3.2 保存Ontology文件

保存Ontology文件通过write方法来实现，具体的实现方式和read方法类似：

 public static void outprintmodel(Model outmodel, String name) throws IOException {String filepath = name;FileOutputStream fileOS = new FileOutputStream(filepath);
//      RDFWriter rdfWriter = outmodel.getWriter("RDF/XML");
//      rdfWriter.setProperty("showXMLDeclaration","true");
//      rdfWriter.setProperty("showDoctypeDeclaration", "true");
//      rdfWriter.write(model, fileOS, null);//用writer就不需要用下面的方法了outmodel.write(fileOS, "Turtle");fileOS.close();}

3.3 创建Ontology模型

Ontology模型是Jena RDF模型的扩展，提供了处理本体的额外功能。本体模型是通过Jena ModelFactory创建的。最简单的就是通过OntModel m = ModelFactory.createOntologyModel();来创建。这样创建得到的本体默认是OWL-Full语言、内存存储和RDFS推理的。

org.apache.jena.ontology.OntModel是专门处理本体（Ontology）的，他是org.apache.jena.rdf.model.Model的子接口，具有Model的全部功能和一些额外的功能。

如果要创建特定语言的本体，则需要传入不同的参数——createOntologyModel(String languageURI)，如果记不住ProfileRegistry，可以直接用ProfileRegistry中的相关参数来实现。

本体语言	ProfileRegistry	传入参数
RDFS	http://www.w3.org/2000/01/rdf-schema#	ProfileRegistry.RDFS_LANG
OWL Full	http://www.w3.org/2002/07/owl#	ProfileRegistry.OWL_LANG
OWL DL	http://www.w3.org/TR/owl-features/#term_OWLDL	ProfileRegistry.OWL_DL_LANG
OWL Lite	http://www.w3.org/TR/owl-features/#term_OWLLite	ProfileRegistry.OWL_LITE_LANG

createOntologyModel方法同样提供了其他参数的选择——createOntologyModel(OntModelSpec spec)，具体的参数值表示了本体的不同配置（如下），可以参考OntModelSpec的文档。

OntModelSpec	Language profile	Storage model	Reasoner
OWL_MEM	OWL full	in-memory	none
OWL_MEM_TRANS_INF	OWL full	in-memory	transitive class-hierarchy inference
OWL_MEM_RULE_INF	OWL full	in-memory	rule-based reasoner with OWL rules
OWL_MEM_MICRO_RULE_INF	OWL full	in-memory	optimised rule-based reasoner with
OWL_MEM_MINI_RULE_INF	OWL full	in-memory	rule-based reasoner with subset of OWL rules
OWL_DL_MEM	OWL DL	in-memory	none
OWL_DL_MEM_RDFS_INF	OWL DL	in-memory	rule reasoner with RDFS-level entailment-rules
OWL_DL_MEM_TRANS_INF	OWL DL	in-memory	transitive class-hierarchy inference
OWL_DL_MEM_RULE_INF	OWL DL	in-memory	rule-based reasoner with OWL rules
OWL_LITE_MEM	OWL Lite	in-memory	none
OWL_LITE_MEM_TRANS_INF	OWL Lite	in-memory	transitive class-hierarchy inference
OWL_LITE_MEM_RDFS_INF	OWL Lite	in-memory	rule reasoner with RDFS-level entailment-rules
OWL_LITE_MEM_RULES_INF	OWL Lite	in-memory	rule-based reasoner with OWL rules
RDFS_MEM	RDFS	in-memory	none
RDFS_MEM_TRANS_INF	RDFS	in-memory	transitive class-hierarchy inference
RDFS_MEM_RDFS_INF	RDFS	in-memory	rule reasoner with RDFS-level entailment-rules

默认的Ontology文件输出时只会包含四个前缀：

和本体构建相关的常用方法总结如下：

功能	方法
定义前缀	`setNsPrefix(String prefix, String uri)`
创建类	`createClass(String uri)`
创建实例	`createIndividual(String uri, Resource cls)`
创建DatatypeProperty	`createDatatypeProperty(String uri)`
创建SubProperty	以DatatypeProperty，先获取相应的DatatypeProperty，然后使用`addSubProperty(Property prop)`方法。其他的创建子类和子属性的方法类似。
添加三元组	最基本的就是`add(s,p,o)`方法，在不同情况下有不同的声明实现，具体可以看文档。

4. 查询与推理

基于Jena的查询与推理主要使用的是SPARQL和Jena规则。这两部分应用起来相对比较简单，理论部分主要是这两种语言的语法，这些在官网上都由介绍。关于SPARQL的介绍，可以看我的另外的一篇笔记——SPARQL——语义网的查询语言。关于查询、推理和语义网框架的设计，可以看另一篇文章。