Declarative Objectivity (DO) Language : Querying for Data : Navigating a Graph
4 
Navigating a Graph
In this chapter, you’ll see how to perform graph queries with DO to explore the interconnectedness of data within a federated database.
This chapter builds on the concepts and features shown in Performing Graph Queries. That is, you’ll continue to view your data as a graph and specify patterns that qualify the connections among the graph’s vertices.
In this chapter, you’ll go beyond simply traversing vertices to obtain values from them—you’ll see how to navigate a graph to return entire walks that connect vertices of interest. For example, you can use this approach to learn about the flow of communication among people who are closely or distantly related in a graph of social contacts.
Sample Schema and Data Set
The examples for illustrating DO navigation will be based on the following sample schema for a simple social-contact data set. This data set represents persons making social contact with each other by sending email messages, posting items on social-networking sites, sending text messages, or making telephone calls.
Here is a very small subgraph from the data set. As you can see, this data set models the flow of communication among various people through various channels.
The schema pairs each edge with an inverse edge, and the diagram shows these pairs as two-headed arrows. For readability, the diagram labels only the left-to-right directed edges, although you can find out the names of the inverse edges from the schema diagram above. For example, if a Person object (such as the one named Reiko) has a receivedPosts edge leading to a SocialMediaPost object, that SocialMediaPost object will have an inverse recipients edge leading back to the Person object.
Data Stored as Graphs—Revisited
In Data Stored as Graphs in the previous chapter, we saw that the interconnected objects in a federated database form a graph that can be traversed by DO graph queries. Such queries follow directed edges from one vertex to the next and can return stored values that are reachable from those vertices.
As useful as it is to extract stored values, however, a graph can tell you more. For example, you can use a DO graph query to discover some or all of the paths you can take between two vertices of interest. Discovering different ways to traverse between vertices can help you develop new questions to ask as you explore your data set.
In this documentation, we’ll use the term navigation to refer to query activities that help you map out your data, and we’ll use traversal to refer to following an edge from one vertex to another.
Paths, Trails, and Walks
At this point, we need to generalize the notion of a path. Recall from Paths in a Graph that a path is a chain of alternating vertices and edges in which no vertex or edge is repeated. We distinguished a path from a trail, in which no edge is repeated, although a vertex might be visited more than once.
For example, the following diagram shows both a path and a trail leading from Mark to Lisa:
The path (darker blue) reaches Lisa through an Email vertex.
The trail (lighter blue) traverses first to Jose through a SocialMediaPost vertex, then back to Mark through another SocialMediaPost vertex and finally reaches Lisa through the Email vertex.
Note:Although a trail can revisit a vertex, it doesn’t have to. Consequently, all paths are trails, but not all trails are paths.
Both the path and the trail show how Mark communicated with Lisa—namely, through an email message. But only the trail tells us that Mark also obtained some information from Jose. If the social-media posts were sent earlier than the email message, we could infer that Mark may have passed information from Jose to Lisa.
Paths and trails are special cases of walks, where a walk is a sequence of consecutive connected edges that begins on a vertex and ends on a vertex, without traversing any edge more than once. Strictly speaking, what we called a path pattern in the previous chapter is actually a walk pattern.
You can use DO graph queries to return walks of either kind. The kind of walk you choose depends on your requirements:
Paths are simpler and faster to find, but might not show all the possible connections between a pair of vertices.
Trails reveal all connections between a pair of vertices, but querying for trails takes longer and produces larger result sets.
Returning Individual Edges
A walk is composed of edges, so before you learn how to query for entire walks, it is helpful to see how an individual edge is represented in the output of a DO graph query. Recall from Assigning a Variable to an Edge that you assign a variable to an edge by inserting it into the corresponding arrow in the pattern. The query returns the edge when you include the variable in the RETURN clause.
For example, the following query asks for all edges that directly connect a particular Person vertex to a SocialMediaPost vertex:
MATCH (p:Person {firstName:'Mark'})-[x]->(s:SocialMediaPost) RETURN x;
The returned edges represent social media posts that a person named Mark either sent or received.
Viewing the Result Set
If you execute the example statement using the DO runner, the output (in the default format) lists multiple edges, one for each SocialMediaPost object to which the specified Person object is connected. Each edge is listed as a separate projection. Here’s an excerpt of the output, showing just two (out of many) returned projections:
DO> MATCH (p:Person {firstName:'Mark'})-[x]->(s:SocialMediaPost) RETURN x;
{
  ...
  _Projection
  {
    x:EDGE
    {
      from:3-3-7-121,
      fromClass:'Person',
      attribute:'sentPosts',
      to:3-3-311-386,
      toClass:'SocialMediaPost'
    }
  },
  _Projection
  {
    x:EDGE
    {
      from:3-3-7-121,
      fromClass:'Person',
      attribute:'receivedPost',
      to:3-3-307-28,
      toClass:'SocialMediaPost'
    }
  },
  ...
}
Examining an Edge Projection
Let’s examine the first of the two edge projections shown above in the sample DO runner output. This projection corresponds to one of the edges in the graph diagram we’ve been working with. The projection describes the parts that make up that edge:
The object identifier and class of the “from” vertex (the Person object named Mark).
The name of the attribute through which the “from” vertex references the “to” vertex (sentPosts).
The object identifier and class of the “to” vertex (a particular SocialMediaPost object).
Learning From the Results
Let’s compare the two edge projections shown in the sample DO runner output above. As expected, both edges specify the same “from” vertex (the Person object named Mark), and each edge specifies a different “to” vertex. Interestingly, the two edge projections specify different attributes (receivedPost and sentPosts), and therefore describe two different types of edges that can connect a Person object to a SocialMediaPost object.
The edge pattern -[x]-> in our example matches both types of edges. If you want to return only the edges that pass through a particular attribute, you can optionally include the attribute name along with the variable:
MATCH (p:Person {firstName:'Mark'})-[x:sentPosts]->(s:SocialMediaPost) RETURN x;
Now the returned edges represent just social media posts that Mark sent.
Returning Walks From a DO Graph Query
A DO graph query can return the walks that are matched by an entire pattern. This is especially useful when you know (or suspect) that certain vertices are connected, and you want to see exactly how.
A returned walk is a projection consisting of an ordered sequence of edges, which may form either a path or a trail. Recall from Paths, Trails, and Walks that a path has no repeated vertices, while a trail allows vertices to be revisited, but only through different edges.
A DO graph query returns walks by:
Assigning a Variable to a Walk
Returning Walks That are Paths or Returning Walks That are Trails
(Optionally) Limiting the Number of Returned Walks
Note:Many of the examples in these sections will use the -[*N]->pattern notation for matching a number of consecutive edges. To review this notation, see Specifying a Number of Consecutive Edges.
Assigning a Variable to a Walk
You enable a DO graph query to return walks by assigning a variable to the entire pattern within a MATCH clause. You do this by inserting the variable and an equals sign between the MATCH keyword and the first component of the pattern. You may optionally enclose the pattern in parentheses.
For example, both of the following MATCH clauses assign the variable w to walks that represent all the ways in which one Person vertex is connected to another by two consecutive edges:
or
As with variables assigned to vertices or edges, the variable you assign to a walk can be any string you choose, provided it begins with a letter. A variable name can be as long or as short as you want:
MATCH emailsOrSocialMediaPosts = (p1:Person)-[*2]->(p2:Person) ... 
A variable name does not need to correspond to any particular name or value in the database.
Note:Do not use the string paths, which is a reserved keyword.
Returning Walks That are Paths
You can return walks that are paths to ensure that each returned walk consists of edges between distinct vertices.
You query for paths by assigning a variable to an entire pattern, and then specifying that variable in the RETURN clause. You can optionally include the PATHS keyword to explicitly indicate that paths are to be returned instead of trails. For example, both of the following graph queries return all paths that connect one specific Person vertex to another through two consecutive edges:
MATCH w = (p1:Person {firstName:'Mark'})-[*2]->(p2:Person {firstName:'Lisa'}) 
   RETURN w;
or 
MATCH PATHS w = (p1:Person {firstName:'Mark'})-[*2]->(p2:Person {firstName:'Lisa'}) 
   RETURN w;
In either case, the returned paths represent all communications between Mark and Lisa that were conducted either through email or social media. (Note that the returned paths won’t represent phone calls or text messages—for this, the pattern would need to match paths with at least four edges to include a PhoneCall or TextMessage vertex.)
Viewing the Result Set
If you execute the example statement using the DO runner, the output (in the default format) lists multiple walks describing the matched paths. Each walk is listed as a separate projection. Here’s an excerpt of the output, showing two of the returned walk projections:
DO> MATCH PATHS w = (p1:Person {firstName:'Mark'})-[*2]->(p2:Person {firstName:'Lisa'}) RETURN w;
{
  ...
  _Projection
  {
    w:WALK
    {
      EDGE
      {
        from:3-3-7-121,
        fromClass:'Person',
        attribute:'sentEmails',
        to:3-3-300-251,
        toClass:'Email'
      },
      EDGE
      {
        from:3-3-300-251,
        fromClass:'Email',
        attribute:'recipients',
        to:3-3-8-196,
        toClass:'Person'
      }
    }
  },
  _Projection
  {
    w:WALK
    {
      EDGE
      {
        from:3-3-7-121,
        fromClass:'Person',
        attribute:'receivedEmails',
        to:3-3-290-122,
        toClass:'Email'
      },
      EDGE
      {
        from:3-3-290-122,
        fromClass:'Email',
        attribute:'recipients',
        to:3-3-8-196,
        toClass:'Person'
      }
    }
  }
  ... 
}
Examining a Walk Projection for a Path
Let’s examine the first walk projection shown above in the sample DO runner output. This walk projection describes the parts that make up a path from Mark to Lisa through a particular email.
Notice that this walk projection consists of two edge descriptions that share a common vertex—namely, the Email object with the OID 3-3-300-251.
This sample walk projection shows the following general characteristics of a walk:
The “from” vertex of the first edge description is the beginning of the walk.
The “to” vertex of the last edge description is the end of the walk.
The “to” vertex of one edge description is the “from” vertex of the next edge description.
Learning From the Results
The second walk projection shown in Viewing the Result Set illustrates how we can use the result set to learn about the data. If we inspect the OID of the shared Email vertex, and note the attribute names in the component edge descriptions, we see that this walk projection reveals a path that is not in our original graph diagram. This newly discovered path shows us that Lisa and Mark both received the same email from some third party.
We can refine our query to find the name of that third party:
MATCH w = 
   (p1:Person {firstName:'Mark'})-[:receivedEmails]->(e:Email)-[:recipients]->(p2:Person {firstName:'Lisa'}) 
   RETURN e.sender.firstName;
 
{
  _Projection
  {
    e.sender.firstName :'Samuel'
  }
}
Returning Walks That are Trails
You can return walks that are trails to allow vertices to be revisited within a given walk, provided they are revisited through different edges.
You query for trails by i the TRAILS keyword in addition to assigning a variable to an entire pattern and specifying that variable in the RETURN clause. For example, the following graph query returns all trails that connect one specific Person vertex to another through two consecutive edges:
MATCH TRAILS w = (p1:Person {firstName:'Mark'})-[*2]->(p2:Person {firstName:'Lisa'}) 
   RETURN w;
This query uses the same pattern we saw in Returning Walks That are Paths, so the returned trails represent all communication by email or social media between Mark and Lisa. Because this pattern describes the shortest possible walk between two Person vertices in our data set, the result set for trails turns out to be identical to the result set for paths. (The pattern would need to match more edges to enable the walk to revisit any vertex.) Consequently, the DO runner output for trails is the same as that shown for paths in Viewing the Result Set above. Although this example does not reveal any new walks connecting Mark and Lisa, it does demonstrate that every path is also a trail, so a graph query that finds trails will include paths in its result set.
Now let’s look at a more complex graph query whose result set includes some trails that are not paths. The following graph query returns all trails that connect three specific Person vertices, where the vertices for Mark and Jose are connected through two edges, and vertices for Jose and Lisa are connected through four edges:
MATCH TRAILS w = (p1:Person {firstName:'Mark'})-[*2]->(p2:Person {firstName:'Jose'})-[*4]->(p3:Person {firstName:'Lisa'}) 
   RETURN w;
The trails returned by this query represent all communications between Mark and Lisa that involve some kind of contact between Mark and Jose.
Examining a Walk Projection for a Trail
If you execute the example statement using the DO runner, the output (in the default format) lists multiple walks, each describing a matched trail. Rather than showing an excerpt from the output, we’ll choose just one walk projection and examine it. The chosen walk projection describes the parts that make up a trail from Mark to Jose to Lisa through a particular set of posts and emails.
Notice that this walk projection consists of six edge descriptions. (Because the trail is lengthy, the output was edited a little to combine most of the curly braces with lines of text.) What makes this walk a trail (and not a path) is the presence of a revisited vertex. In particular, the “to” vertex of EDGE 4 is the same as the “from” vertex of EDGE 1. (This same vertex is also listed as the “from” vertex in EDGE 5, indicating that the trail continues beyond the revisited vertex.)
Getting Useful Results
When querying for trails, it can be especially important to make the query pattern as precise as possible.
For example, consider the following graph query, which returns all trails that connect one specific Person vertex to another through four consecutive nonspecific edges:
MATCH TRAILS w = (p1:Person {firstName:'Mark'})-[*4]->(p2:Person {firstName:'Lisa'}) 
   RETURN w;
This query can return some potentially useful results. For example, among the walks that might be returned, we could find out that Mark received a social-media post from someone who also sent an email to Lisa. Or we might find out that Mark owns a phone that sent a text message to a phone owned by Lisa.
However, because the pattern is not specific about which type of edges to traverse, this query can also return trails that simply “backtrack” over pairs of inverse edges without revealing any particularly useful information. An example of such a trail is shown in the following diagram. This trail traverses from Mark to an email he has sent, returns to Mark by way of the inverse edge, and then proceeds through a second email to Lisa.
You may be able to avoid bloating your result set with non-informative trails by using more precise patterns to describe the trails of interest. For example, the pattern in the following statement matches trails only if they pass through some intermediate Person vertex that is different from the vertices representing Mark and Lisa:
MATCH TRAILS w = (p1:Person {firstName:'Mark'})-[*2]->(x:Person)-[*2]->(p2:Person {firstName:'Lisa'}) 
   WHERE p1 != x AND x != p2 
   RETURN w;
A second statement would be needed to find trails that pass through an intermediate Phone vertex:
MATCH TRAILS w = (p1:Person {firstName:'Mark'})-[*2]->(x:Phone)-[*2]->(p2:Person {firstName:'Lisa'}) 
   RETURN w;
Limiting the Number of Returned Walks
You can limit the number of walks that are returned by a DO graph query. Doing so can be especially useful when you are testing a query that could have a particularly large result set, and could therefore take a long time to complete.
You limit a graph query’s result set by including the LIMIT keyword with an integer count, and then enclosing the pattern in parentheses. The LIMIT keyword follows the variable assignment.
For example, the following statement returns at most two walk projections:
MATCH w = LIMIT 2 ((p1:Person {firstName:'Mark'})-[*2]->(p2:Person {firstName:'Lisa'}))
   RETURN w;
The ordering of returned walks is determined solely by the DO query language processor. Consequently, you cannot organize the walks in any particular order before requesting the first N of them.