Objectivity/DB Spark Adapter : Spark Adapter Tutorial : Joining Data From Different Sources
Joining Data From Different Sources
In this section you will load a new data frame with select customer data from the federated database, then perform an INNER JOIN that combines this data with data loaded from a JSON file. The postal code, which is present in both data sources, is used as the common column for the join.
The federated database created previously will serve as an application data store, a Spark data source, and a metadata store.
1. Open src\main\scala\com\thingspan\spark\demo\RunDemo.scala and examine the code that loads the data sources:
val zipsDF = sqlContext.read.json(zipFile)
val customersDF = sqlContext.read.format("com.objy.spark.sql").
  option("objy.bootFilePath", bootFile).
  option("objy.dataClassName", "com.thingspan.spark.demo.Customer").
The code loads a JSON file (etc\zips.json) that correlates city names to zip codes and loads the Objectivity data source that includes the customer objects already present in the federated database.
For more information about options for the Objectivity/DB Spark Adapter (the data frame reader), see Reader Option Reference.
2. Still in RunDemo.scala, examine the code that saves the results of the join operation, creating new target customers in the Objectivity/DB federated database:
  option("objy.bootFilePath",  bootFile).
  option("objy.dataClassName", "com.thingspan.spark.demo.TargetCustomer").
3. Look at the runDemo task in the build.gradle file in the ObjySparkTutorial directory and examine the Spark SQL query that is used:
\"SELECT customers.firstName, customers.lastName, zips.city, customers.age FROM customers INNER JOIN zips ON customers.zipCode=zips._id WHERE customers.firstName = 'Daz'\"
This query selects fields from both the temporary customers table created from the federated database and the temporary zips table created from the JSON source file. The query performs a join on the zip codes and then filters for customers whose first name is Daz.
4. From the ObjySparkTutorial directory, run the task as follows:
gradlew runDemo
At the end of the console output, you can see that for our queried target customer, Daz, we now have information about the city of residence. For example:
|firstName|lastName|         city|age|
|      Daz|   Jones|         NAPA| 26|
|      Daz|   Smith|   TRAVIS AFB| 65|
|      Daz|   Jones|      LINCOLN| 29|