Objectivity/DB Spark Adapter : Spark Adapter Reference : Writing to the Federated Database
Writing to the Federated Database
The Objectivity/DB Spark Adapter can write the contents of a data frame to an Objectivity/DB federated database. For example, the adapter can:
Read data from an external source (such as a JSON file) into a data frame, then write that data to a federated database, akin to an extract, transform, and load (ETL) operation.
Append the results of a Spark job to an existing federated database.
Replace the contents of an existing federated database with the contents of a data frame.
Update objects in a federated database by reference or key field with the contents of a data frame.
Writing a Data Frame to a Federated Database
Writing data to a federated database is accomplished using the Objectivity/DB Spark Adapter and the write API on the SparkSQL DataFrame writer interface.
The following example writes instances of a new target customer class to the federated database. Because this is a new class, the schema does not already exist in the federated database so the adapter will create it; see Schema Generation.
targetCustomersDF.write.
	mode(SaveMode.Overwrite).
	format("com.objy.spark.sql").
	option("objy.bootFilePath",  bootFile).
	option("objy.dataClassName", "com.objy.spark.demo.TargetCustomer").
	save()
In this example:
The SaveMode modifier, Overwrite, is used because the objects do not yet exist in the federated database so an append is not appropriate (see the table below).
The format option specifies the package name of the Objectivity/DB Spark Adapter implementation (com.objy.spark.sql).
Two additional options specify the boot file for the federated database and the full class name for the target class.
The following table describes the Spark SaveMode modifiers that are available.
 
Spark SQL Save Mode
If Instances of the Named Class Exist
SaveMode.ErrorIfExists
(default)
Throw an exception.
SaveMode.Append
Append the contents of the data frame to the existing data.
SaveMode.Overwrite
Overwrite existing data with the contents of the data frame; see Writer Option Reference
SaveMode.Ignore
Do not save the contents of the data frame and do not change the existing data (similar to a CREATE TABLE IF NOT EXISTS in SQL).
Creating and Updating Objects and Relationships
For most database models, the standard save modes for writing data frames (above) are sufficient. Appending rows to a table or dropping and recreating a table is a relatively straightforward operation. Because relationships in an RDBMS are typically modeled by rows containing foreign keys for edge endpoints, appending rows to such a table is the equivalent of adding to a relationship in an Objectivity/DB federated database.
Updating a relationship is more like updating row data in an RDBMS and is not supported by the standard save modes on the Spark data frame writer. This is somewhat problematic because an update operation is more complex and expensive than an append or replace operation.
The Objectivity/DB Spark Adapter provides two additional options, updateByOid and updateByValue, to support updates of this type. The two options are mutually exclusive.
These options can also be used to update simple attribute values on existing objects when used with the Append save mode.
 
SparkSQL Save Mode
Update Option
Description
SaveMode.Append
updateByOid
Find the object by OID, then modify the object in one of two ways.
Update simple attribute values.
Append a value (or an array of values) to an existing relationship, effectively adding a new relationship to an existing object.
SaveMode.Append
updateByValue
Seek the object by column value. If the object does not exist, create it, then optionally modify it in one of two ways.
Update simple attribute values.
Append a value (or an array of values) to an existing relationship, effectively adding a new relationship.
SaveMode.Overwrite
updateByOid
Find the object by OID, then replace the contents of the relationship with those in the data frame (but not overwrite the object itself). The data frame to save should only include columns that are to be updated or those values will be lost.
SaveMode.Overwrite
updateByValue
Find the object by value column, then replace the contents of the relationship with those in the data frame. The data frame to save should only include columns that are to be updated or those values will be lost.
Schema Mapping
When a data frame for an existing class is written to the federated database, the writer checks that the schemas are compatible, then performs all necessary conversions between the field types of the data frame and the attributes for the existing class in the federated database. These conversions can include:
Widening of numeric types.
For example, storing a ShortType from the data frame as a 64-bit integer representation in the federated database.
Conversion of references.
When the LongType column maps to a reference attribute in the schema class, this will be interpreted as an encoded OID.
Conversion to string.
Most types can be represented as a String value.
If this is not the desired behavior, the writer supports a strictSchema option that enforces exact conversions.
Schema Generation
Even though the objy.dataClassName option on the writer interface is required, the named class need not exist in the target federated database. The writer can define and create the Objectivity/DB schema based on the data frame schema definition, then write the rows of the data frame as instances of the new type.
This is a useful feature for extract, transform, and load (ETL) operations where data from one source is to be brought into an Objectivity/DB federated database. Remember that the schema mapping is very literal unless guidance is provided. For example, a LongType column will be written as a numeric even if it logically represents an association (encode OID reference) in the data frame.
Writer Option Reference
The objy.bootFilePath and objy.dataClassName are the only options for which you must supply values.
 
Writer Option Name
Default
Description
objy.bootFilePath
NONE
The path to the boot file that will be used by the adapter to connect to the Objectivity/DB federated database.
objy.dataClassName
NONE
The target data class in the Objectivity/DB federated database. If the class exists, the writer attempts to map the fields in the data frame to the attributes specified by the database schema. If the class does not exist, it is created and mapped to the translated data frame schema; see Schema Mapping
objy.updateByOid
NONE
Indicates that rows in the data frame identify existing objects that must be updated. The value is a column name holding the encoded OID's of the target objects (added using the objy.addOidColumn option on the reader; see Reader Option Reference).
objy.updateByValue
NONE
Indicates that rows in the data frame identify existing objects that must be updated. The value is a column name holding values for the key fields of the target objects.