Objectivity/DB Spark Adapter : Spark Adapter Reference : Reading (Loading) Data From the Federated Database
Reading (Loading) Data From the Federated Database
Reading data from a federated database is accomplished using the Objectivity/DB Spark Adapter and the load API on the SparkSQL DataFrame reader interface. The load API is standard for the SQLContext and is generalized for reading any integrated data source.
Loading Data from Objectivity/DB
The following example creates a data frame representing every customer instance in the federated database. The Objectivity/DB Spark Adapter is specified with the implementation package name (com.objy.spark.sql) as the format.
val customersDF = sqlContext.read.
	format("com.objy.spark.sql").
	option("objy.bootFilePath", "data/customers.boot").
	option("objy.dataClassName", "com.objy.spark.demo.Customer").
	load()
The boot file for the federated database and the full class name for the class of interest must be provided. The resulting data frame for a given class always includes every instance of that class in the federated database.
Reader Option Reference
The objy.bootFilePath and objy.dataClassName are the only options for which you must supply values.
 
Reader Option Name
Default
Description
objy.bootFilePath
NONE
The path to the boot file that will be used by the adapter to connect to the Objectivity/DB federated database.
objy.dataClassName
NONE
Each data frame is based on an Objectivity/DB class. The data frame represents all instances of this class that exist in the target federated database. The data frame schema will be based on the Objectivity/DB schema definition for this class.
The class must exist in the target federation in order for the data frame to be constructed.
objy.addOidColumn
NONE
In addition to the attributes defined in the target class, add a new temporary column to hold a long integer representation of the OID for each instance.
The value for this option is a string name for the OID column (for example, "customerId"). The column name cannot conflict with any attribute name in the target class. By default this column is not created.
When you read data from the federated database into a data frame, you need to use this option if your intention is to update specific objects by OID when you write data back to the federated database.
objy.sessionPurpose
spark_read
Overrides the default transaction mode, which provides a multiple readers and one writer (MROW) mode for file locking; see About Lock Servers in Objectivity/DB Administration.
Optionally specify spark_read_nomrow  to disable the default concurrency mechanism.
objy.containersPerPartition
5
Internally, the adapter informs Spark of physical partitions within the underlying data source. Spark uses this information to divide the work of querying and reading data among its workers within the cluster. For Objectivity/DB, the natural unit of partitioning is the container. The default value of 5 means that each partition gets 5 containers.
 
For a very large federated database, there may be many thousands of containers holding objects of the target data type. Because each partition requires a new transaction, you can consider using a higher value for this option to reduce the number of transactions and improve performance.
Schema Translation
SparkSQL data frames use a high level schema abstraction that applies to all data loaded from supported data sources. This allows all data to be filtered, queried and even joined using a common schema system. This schema is described by the following set of classes in the Spark distribution:
 
Spark Class
Description
StructType
A sequence of StructFields, analogous to the Objectivity/DB schema class. These are used to represent the top level class and any nested or embedded types it contains.
StructField
Contains a name and DataType for the represented field. Also specifies whether the field may be null and any other metadata specific to the underlying implementation. Analagous to the Objectivity/DB schema attribute.
DataType
Base class for all supported types for fields
When a data frame is created from an Objectivity/DB schema class, a conversion is performed between the Objectivity/DB type system and the StructType representation. The following table outlines the various attribute types in Objectivity/DB schema classes and their equivalent data type in the data frame:
 
Family of
Objectivity/DB Data Types
Description
Equivalent
Spark Data Types
Boolean
Values that are either true or false
 
   Binary encoding
Byte
8-bit Boolean values
BooleanType
Character
Characters within a string
 
   Binary encoding
B8, B16,
B32
8-bit, 16-bit, and 32-bit character values.
StringType
No specific character type in Spark SQL
Integer
Signed or unsigned whole numbers
 
   Signed encoding
 
B8
Signed 8-bit
ByteType
B16
Signed 16-bit
ShortType
B32
Signed 32-bit
IntegerType
B64
Signed 64-bit
LongType
   Unsigned encoding
B8
Unsigned 8-bit
ShortType
 
B16
Unsigned 16-bit
IntegerType
 
B32
Unsigned 32-bit
LongType
 
B64
Unsigned 64-bit
LongType
Real
Real numbers with a fractional part
 
   IEEE encoding
B32
32-bit real (floating-point) value
FloatType
 
B64
64-bit real (floating-point) value
DoubleType
String
Sequences of characters
 
   Byte encoding
Fixed, Variable, Optimized
Strings that use an 8-bit character encoding such as ASCII or ISO-Latin1, and are either fixed-length, variable-length, or optimized for (but not limited to) a particular length.
StringType
   Utf8 encoding
   Utf16 encoding
   Utf32 encoding
Fixed, Variable, Optimized
Strings that use Unicode UTF-8, UTF-16, or UTF-32 character encoding, and are either fixed-length, variable-length, or optimized for (but not limited to) a particular length.
StringType
Date
Dates on the Gregorian calendar, with day precision
 
   OBJY encoding
4-byte date values from January 1, 0001 to December 31, 9999.
DateType
   ODMG encoding
   SQL, NullableSQL encodings
Backward compatibility with ODMG or Objectivity/SQL++ date values.
DateTime
Date-times (timestamps) with 100-nanosecond precision
TimestampType
   OBJY encoding
8-byte date-time values from January 1, 0001, 00:00:00.0000000 to December 31, 9999, 23:59:59.9999999.
   SQL, NullableSQL encodings
   JavaDate encoding
   JavaTimeStamp encoding
Backward compatibility with Objectivity/SQL++ or Objectivity for Java date-time values.
Reference
Object references encapsulating the Objectivity/DB object identifier (OID) of a persistent object of a referenceable class described in the schema.
LongType
List
Sequences of elements that may be non-unique
Lists of ordered or unordered elements.
Fixed-size or variable-size lists.
ArrayType
For in-depth information about Objectivity/DB schema types, see Objectivity/DB Schema in Objectivity/DB Administration.