Graphix Query Model
Table of Contents
Property Graph Data Model
A graph is a structure composed of sets of graph elements known as vertices and edges. Edges connect exactly two vertices together. A property graph has four defining characteristics that differentiate itself from your bog-standard undirected simple graph:
- Each edge is directed. Consequently, given an edge e that connects two vertices we can talk about the source vertex and the destination vertex of e.
- Each graph element is associated with a label. In the context of the property graph model labels are used to group vertices with other vertices, and edge with other edges.
- More than one edge may connect two pairs of vertices.
- Each graph element possesses a set of key-value pairs, also known as properties.
The property graph model (in contrast to other types of graph models) has received a lot of attention this past decade for its flexibility and ability to express other types of graph models by omitting or re-purposing certain characteristics of the property graph model itself. For example, if your domain requires weights on your vertices or edges, you simply add a weight property to the appropriate graph elements. If your domain requires undirected graphs, then simply assign an arbitrary direction to your edge and ignore the direction when querying. Property graphs can even be used to represent RDF graphs by repurposing graph element labels as URIs.
To better explain this section on property graph models (and the majority of this entire reference), we introduce the Gelp example. We want to model the following statement:
Users and their friends make Reviews about Businesses.
Graphically, we can represent this statement as follows:
In the figure above, we describe the graph schema of Gelp. User, Review, and Business are labels that are used to categorize vertices. Between these vertices, there are three types of edges:
- A User could be FRIENDS_WITH another User.
- A Review is MADE_BY a User.
- A Review is ABOUT a Business.
FRIENDS_WITH, MADE_BY, and ABOUT are labels that are used to categorize these different types of edges.
Let us now take a look at an instance of Gelp.
In the figure above, there are eight vertices. The properties of each vertex are given in the ADM instance immediately below the vertex itself.
Property Graph Query Model
It should come as no surprise that the connected structure of graphs allow us to pose numerous types of questions that would be difficult or impossible to express in other types of data models. In this broad universe of questions we can ask about a graph, we focus on two areas:
- Pattern Matching
- Path Finding
Graph databases should ideally allow support for both operations above.
Pattern Matching Queries
Let us first describe the problem of pattern matching in graphs. A graph pattern can be thought of as another graph, framing the problem of pattern matching as finding all valid mappings of our graph pattern to the graph we are querying. To give an example, let us define a graph pattern composed of a User, a Review, and a MADE_BY edge.
To match the pattern above to the instance of the Gelp graph above yields the following mappings:
Navigational Queries (Path Finding)
Next, we’ll describe the problem of path finding, or finding a path between two vertex instances. In contrast to pattern matching, in path finding we do not know the exact pattern between two vertices. Instead of describing a pattern, we describe a path to find all valid mappings of our path description to edge instances in our graph. When we use regular expressions to describe our paths, we classify our path finding problem as answering regular path queries (RPQs). To give an example, let us define a graphical path description of a path between 1 and 5 hops, composed only of FRIENDS_WITH edge instances. The vertices we are interested in finding a path between are the user “Mary” and the user “Kevin” (directed from “Mary” to “Kevin”).
To match the path description above between the “Mary” and the “Kevin” vertices yields the following paths:
In our result set, we only consider paths that do not repeat edges or vertices. Different systems have different semantics with respect to what paths they consider. In Graphix, cycles are never considered unless explicitly specified.
Navigational Pattern Matching Queries
The two prior sections describe two distinct classes of graph queries, though it’s easy to see how we create queries that involve both pattern matching and path finding (with regular expressions). The superclass of queries described here is known formally as conjunctive regular path queries, or navigational pattern matching queries. To extend the FRIENDS_WITH example from the previous section, suppose we are now interested in finding such a path between any pair of users that have made a review. Graphically, we describe our query as such:
To evaluate the navigational pattern matching query above yields the following results:
Navigational pattern matching provides the foundation of most graph query languages (e.g. SPARQL, Cypher, Gremlin).
Graphix Query Model
Having described the property graph model and common classes of queries on property graphs, let us now describe the query model for Graphix. All gSQL++ queries return ADM (Asterix data model, a superset of JSON) instances. gSQL++ paths in particular refer to a JSON object of two fields (Vertices
and Edges
), each of which contains an array of objects (for ease of use, Graphix users can access the vertices and edges of a path using the PATH_VERTICES
and PATH_EDGES
functions respectively instead of using a field-accessor to “Vertices” / “Edges”).
With respect to pattern matching semantics, Graphix (by default) will not repeat vertices or edges in a pattern. With respect to path finding, Graphix (by default) will consider all paths that do not repeat vertices or edges.