Decoration
max planck institut
informatik
mpii logo Minerva of the Max Planck Society
 
URDF - Efficient Reasoning in Uncertain RDF Knowledge Bases

URDF: Efficient Reasoning in Uncertain RDF Knowledge Bases

Our work on URDF is motivated by the huge explosion of Web-extracted, semantic knowledge bases (including our own YAGO-NAGA project) we have seen in recent years.We believe that query processing (in other words: reasoning) over Web-extracted data provides one of the best showcases (and in fact one of the biggest challenges) for uncertain data management we have seen so far. A guiding theme for the project is to resolve uncertainty at query time. Inference in URDF is always triggered by a user query, and so you will always find a query (either formulated in SPARQL syntax or in a first-order logical representation) when dealing with URDF.

On the right-hand side you see a screenshot of UViz, the visualization frontend of URDF.
Click here for an interactive demo of UViz.

Research Topics

Reasoning with Soft and Hard Rules

URDF employs rules formulated in first-order logic (FOL) for reasoning. More specifically, we focus on a decidable subset of FOL (generally referred to as the Bernays-Schönfinkel-Ramsey class), which can be evaluated by grounding rules against the base facts captured by the knowledge base. Reasoning in URDF consists of two phases: 1) grounding the query pattern against the rules and 2) the actual (propositional or probabilistic) consistency reasoning step.


Lineage-enabled Query Answering over Correlated Base Facts

Correlations (or more precisely: dependencies) among facts in a knowledge base may arise from two sources: 1) static input dependencies derived from real-world observations at extraction time; and 2) dynamic dependencies induced at query time by the lineage structure obtained from grounding the rules. Correlations can again be considered as both soft and hard dependencies, e.g., capturing strict logical dependencies like implication, bi-implication, mutual exclusion and transitivity, or in the form of arbitrary soft (i.e., weighted) correlations, both of which can be captured via factors that connect RDF facts. Our approach aims to incorporate both types of soft and hard constraints into a unified framework for consistency reasoning over Web-extracted RDF data.

Spatial and Temporal Consistency Reasoning

Recent trends in information extraction allow us to not only extract large semantic knowledge bases from structured or loosely structured Web sources, but to also extract additional annotations along with the facts these knowledge bases contain. Among the most important types of annotations are spatial and temporal annotations. In particular the latter temporal annotations help us to reflect that a majority of facts is not static but highly ephemeral in the real world, i.e., facts are valid for only a limited amount of time, or multiple facts stand in temporal dependencies with each other. Time-annotated knowledge bases pose new problems in terms of consistency reasoning. In addition to atemporal constraints, like that a person can have only one place of birth, temporal constraints can express that, for example, the date of birth should be before the beginning of a marriage of a person. Our current work investigates both solving temporal inconsistencies and deriving temporal confidence histograms for different types of temporal predicates and constraints.

Learning Soft Rules from Web-extracted Data

Knowledge bases extracted from the Web are inherently incomplete. Deduction is one way to address this issue. In order to automatically learn soft inference rules (Horn clauses), we investigate how to adapt Inductive Logic Programming (ILP) techniques to large and incomplete knowledge bases. ILP is a well known technique which lies in the intersection of machine learning and logic. Although powerful, ILP algorithms are inherently expensive as there is a combinatorial growth of the search space when constructing these rules, as the number of predicates increases and the size of the background knowledge grows. In addition, the evaluation of each rule becomes more expensive, as the number of the training examples obtained from a Web-extracted knowledge base is rising. Moreover, for noisy and incomplete knowledge bases, the automatic (i.e., unsupervised) selection of positive and negative training examples needed for learning new rules poses a major challenge to the adoption of these techniques.

Accelerating Rule-Based Reasoning over Disk-Resident RDF Data

Mohamed's Master thesis investigates the implementation of a rule-based query processor over RDF data that are assumed to be disk resident. It provides an integration of the rule-based URDF reasoner with the RDF-3X SPARQL engine, and it presents optimizations along several dimensions: (i) top-down vs. bottom-up query evaluation techniques, (ii) caching to reduce both disk accesses and rule evaluations, (iii) regrouping of extensional and intensional predicates into blocks which allows better utilization of the underlying storage engine's ability to optimize traditional relational queries, and (iv) a probabilistic way of looking at join ordering and cost estimation in this context. It also shows the incompleteness of one of the most cited presentations of the QSQR algorithm for recursive query processing and proposes a fix for it.

Visualization Techniques for Reasoning with Uncertain RDF Data

UViz is a fully Web-enabled visualization frontend for a variety of reasoning tasks in URDF. UViz is built in a client-server architecture using the Adobe Flex RIA framework, and it runs in most common Web browsers via Flash. The Adobe BlazeDS service is used for a fast binary client-server communication, and the Flare visualization toolkit provides a sophisticated graph-rendering library that drives our visualization frontend. UViz delivers an intuitive and highly dynamic visualization experience to visually support and access the URDF reasoner, including UI tasks like exploring the knowledge base, rule-based and statistical reasoning, resolving inconsistencies, and explaining answers through lineage. Timm's Master thesis presents an in-depth description of UViz.

People



Papers



Theses



Search MPII (type ? for help)