Objectivity/DB Spark Adapter : Spark Adapter Tutorial : Creating Relationships
5 
Creating Relationships
In this topic, you will:
Learn how relationships are defined in the schema.
Learn how to create objects and add relationships between them using join operations.
Understanding Relationships in Schema
Let’s look at how relationships are defined in the schema by examining some class definitions:
1. Open the etc\schema.txt file and consider the class definitions for the Subscriber and Phone classes. In particular, note the attributes that implement the relationships between these two classes:
...
   CREATE CLASS com.thingspan.spark.demo.Subscriber 
   {
      ... 
      phones: List { Element: Reference { Referenced: com.thingspan.spark.demo.Phone } },
      ... 
   }
 
   CREATE CLASS com.thingspan.spark.demo.Phone 
   {
      ... 
      subscriber: Reference { Referenced: com.thingspan.spark.demo.Subscriber },
      ... 
   }
...
As you can see, objects can have relationships to other objects through references or lists of references. According to the schema, a subscriber will own one or more phones, and each phone will have a single subscriber.
2. (Optional) Examine other classes in the schema file to look at other attributes that represent relationships.
The complete schema for the new classes is represented by the following UML diagram.
In addition to the relationships between subscriber and phone objects, note that each phone has a list of incoming and outgoing phone calls and a city where it is registered. Each phone call will have an originating phone (the fromPhone) and a receiving phone (the toPhone).
Creating Objects with Relationships
Now let’s create the subscriber and phone objects and add the relationships between them. There are three Gradle tasks for this purpose:
createSubscribers creates Subscriber instances and provides values for the simple attributes.
createPhones creates Phone instances and provides values for the simple attributes.
createSubscriberAndPhoneRelationships creates the relationships between Subscriber and Phone objects using inner joins.
Creating Subscribers, Phones, and the Relationships Between Them
To create the Subscriber and Phone objects and the relationships between them:
1. Open src\main\scala\com\thingspan\spark\demo\CreateSubscribers.scala and CreatePhones.scala in your preferred editor.
Note the relationship fields for these two classes. Subscribers have an array of phones, and phones have a single subscriber to match the existing schema in the federated database. The types for both are Long as they will need to hold OIDs, which are long integer types in a federated database.
Subscriber Class
Phone Class
case class Subscriber(
  idNumber: Int,
  phones: ArrayBuffer[Long],
  firstName: String,
  lastName: String
)
case class Phone(
  idNumber: Int,
  phoneName: String
  subscriber: Long,
  incomingCalls: ArrayBuffer[Long],
  outgoingCalls: ArrayBuffer[Long]
)
Both classes have an idNumber field. These fields are designed to hold a pair of matching values that would be assigned by the entity providing phones to subscribers. These assigned values will be used as a column for performing a join operation later in the tutorial.
Important: When creating an object with a reference attribute, you cannot omit the value for the reference attribute. The value must either be set to a valid reference or to 0. Otherwise an attempt will be made to resolve the unset reference attribute and problems will result.
2. From the ObjySparkTutorial directory, execute the Gradle tasks to create the Subscriber and Phone objects:
gradlew createSubscribers
gradlew createPhones
Output messages show information about the created objects, for example:
Subscribers
Phones
|idNumber|phones|firstName|lastName|
+--------+------+---------+--------+
|       1|    []|      Kim|   White|
|       2|    []|     Kate|   Black|
|       3|    []|    Helen|    Wood|
|       4|    []|     Rose|   Quinn|
|       5|    []|      Ali|   Jones|
|       6|    []|    Patty|   Jones|
|       7|    []|      Kim|   Clark|
|       8|    []|      Sue|    Sims|
|       9|    []|      Pat|    Wood|
|      10|    []|     Rose|   Black|
+--------+------+---------+--------+
+--------+----------+-------------+-------------+
|idNumber| phoneName|incomingCalls|outgoingCalls|
+--------+----------+-------------+-------------+
|       1|    xPHONE|           []|           []|
|       2| Turbocomm|           []|           []|
|       3| neuPhonez|           []|           []|
|       4|    xPHONE|           []|           []|
|       5| Turbocomm|           []|           []|
|       6|FloatPhone|           []|           []|
|       7|    xPHONE|           []|           []|
|       8| fastPhone|           []|           []|
|       9| neuPhonez|           []|           []|
|      10| fastPhone|           []|           []|
+--------+----------+-------------+-------------+
You can see that the information for related objects has yet to be provided. For example, Subscriber objects do not yet have their phones arrays populated.
3. Look inside one or more of the data\loc# directories to verify that the database files are being distributed into these locations.
4. Open src\main\scala\com\thingspan\spark\demo\createSubscriberAndPhoneRelationships.scala.
Look at the code that reads in data frames for subscribers and phones. In particular, note that objy.addOidColumn is used for both. This creates a new column with the given name that includes the OID for each object loaded from the federated database.
Subscriber Data Frame
Phone Data Frame
val subscriberDF = sqlContext.read.
  format("com.objy.spark.sql").
  option("objy.bootFilePath", bootFile).
  option("objy.dataClassName",
    "com.thingspan.spark.demo.Subscriber").
  option("objy.addOidColumn", "subscriberOid").
  load
 
subscriberDF.registerTempTable("subscriberTable")
val phoneDF = sqlContext.read.
  format("com.objy.spark.sql").
  option("objy.bootFilePath", bootFile).
  option("objy.dataClassName",
    "com.thingspan.spark.demo.Phone").
  option("objy.addOidColumn", "phoneOid").
  load
 
phoneDF.registerTempTable("phoneTable")
5. Now look at the code that provides the array of Phone objects for the phones attribute of each Subscriber object.
sqlContext.udf.register("toBuffer", (num: Long) => new ArrayBuffer[Long](1) += num)
...
queryForUpdateSubscriberTable.append("SELECT subscriberOid, toBuffer(phoneOid) as phones ")
queryForUpdateSubscriberTable.append("FROM phoneTable INNER JOIN subscriberTable ")
queryForUpdateSubscriberTable.append("ON subscriberTable.idNumber=phoneTable.idNumber ")
queryForUpdateSubscriberTable.append("ORDER BY subscriberOid")
After the subscriberOid column, you can see that a toBuffer function is used to create a buffered array with the phoneOids, and this is used as the value for the phones column. An INNER JOIN of the subscriberTable and the phoneTable is performed by matching the idNumbers of Phone objects and Subscriber objects.
6. (Optional) Look at the code that provides the Subscriber for a given phone.
queryForUpdatePhoneTable.append("SELECT phoneOid, subscriberOid as subscriber ")
queryForUpdatePhoneTable.append("FROM phoneTable INNER JOIN subscriberTable ")
queryForUpdatePhoneTable.append("ON subscriberTable.idNumber=phoneTable.idNumber ")
queryForUpdatePhoneTable.append("ORDER BY phoneOid")
7. From the ObjySparkTutorial directory, run the task as follows:
gradlew createSubscriberAndPhoneRelationships
8. Output messages show information about the created relationships:
You can see that each subscriber now has an array of phones, and each phone now has a single subscriber.
Subscriber data frame:
+---------------+------------------+
|  subscriberOid|            phones|
+---------------+------------------+
|844437815099396|[1407387768520708]|
|844437815099399|[1407387768520710]|
|844437815099402|[1407387768520712]|
|844437815164929|[1407387768586241]|
|844437815164932|[1407387768586243]|
|844437815164935|[1407387768586245]|
|844437815164938|[1407387768586247]|
|844437815164941|[1407387768586249]|
|844437815164944|[1407387768586251]|
|844437815164947|[1407387768586253]|
+---------------+------------------+
Phone data frame:
+----------------+---------------+
|        phoneOid|     subscriber|
+----------------+---------------+
|1407387768520708|844437815099396|
|1407387768520710|844437815099399|
|1407387768520712|844437815099402|
|1407387768586241|844437815164929|
|1407387768586243|844437815164932|
|1407387768586245|844437815164935|
|1407387768586247|844437815164938|
|1407387768586249|844437815164941|
|1407387768586251|844437815164944|
|1407387768586253|844437815164947|
+----------------+---------------+
 
Creating Relationships Between Phones and PhoneCalls
In this section you’ll create PhoneCall objects, then create the relationships between Phones and PhoneCalls.
1. Create the PhoneCall objects:
gradlew createPhoneCalls
2. (Optional) Open src\main\scala\com\thingspan\spark\demo\CreatePhoneCallAndPhoneRelationships.scala in your preferred editor and look at the sections of code that create the relationships.
The incoming and outgoing call attributes are one-to-many relationships. Each of these attributes requires two inner join operations in order to provide the array of values. The first inner join uses the idNumber on Phone and PhoneCall tables to perform a join. The second inner join is performed after a reduce operation organizes each of the relationships (tuples) in the to-many into OIDs that can be written back to the data frame.
3. Create the relationships:
gradlew createPhoneCallAndPhoneRelationships
Output messages show information about the created relationships.
 
Verifying Results
1. From the ObjySparkTutorial directory, start the DO runner in interactive mode as follows:
objy DO -outputFormat JSON -bootFile data\customers.boot 
The DO prompt displays.
2. Enter the following statement, which queries for every PhoneCall object:
FROM com.thingspan.spark.demo.PhoneCall RETURN *;
Examine the output JSON array, which shows the field values for each PhoneCall object:
[
 
  {
    "__class__":"com.thingspan.spark.demo.PhoneCall",
    "__identifier__":"7-3-1-4",
    "idNumber":1,
    "fromPhone":5-3-2-7,
    "toPhone":5-3-2-5,
    "fromPhoneId":7,
    "toPhoneId":6
  },
  {
    "__class__":"com.thingspan.spark.demo.PhoneCall",
    "__identifier__":"7-3-1-5",
    "idNumber":2,
    "fromPhone":5-3-1-6,
    "toPhone":5-3-2-13,
    "fromPhoneId":2,
    "toPhoneId":10
  },
  ...
Note that the fromPhone and toPhone attributes hold the OIDs for the originating and target phone. Thus you can see that the relationships are established.
3. (Optional) Try the following DO statements, which leverage relationships to qualify objects and return them.
DO Statement
Returns
FROM com.thingspan.spark.demo.PhoneCall WHERE toPhone.phoneName='Turbocomm' RETURN *;
All phone calls where the target phone (the toPhone) is a Turbocomm phone.
FROM com.thingspan.spark.demo.Subscriber WHERE ANY(phones, phoneName='Turbocomm') RETURN *;
All subscribers that own at least one Turbocomm phone.
4. To exit the DO runner, type /EXIT.