External Query Agents

Printer-friendly version

This is informal user documentation for external query agents (EQA), which is a new feature of the Parallel Query Engine in Objectivity Release 10.1.

Contents

Purpose of EQA

Objectivity/DB 10.1 provides an extension to the Parallel Query Engine to support the concept of extending a predicate scan to fetch information from locations other than Objectivity databases. This completes the concept of a federation by allowing information to be aggregated from multiple sources. Some examples of external information sources that might be used include relational databases, XML document repositories, or Google queries.

EQA Users

This feature is relevant to designers and programmers of certain kinds of applications, which can be characterized by terms such as data discovery, data mining, or data aggregation.

Conceptual Model

There are many kinds of potential information sources and many possible usage scenarios. To address this, Objectivity lets users implement gateway code that can be "plugged in" to a query server such that an Objectivity predicate scan can operate on an external data source as if it were part of the Objectivity federation. Providing this ability as an extension to the Parallel Query Server allows an external search to proceed in parallel with searches of other sources.

For these external sources, EQA provides only for reading data, not for modifying or adding to any external data source. It is also assumed that the data is changing infrequently, if at all. ACID transaction coordination with external databases is not supported (although the query agent could obtain locks on the external source for the duration of its scan).

The query agent must convert the data that it finds into the form of an instance of the persistent-capable class corresponding to the iterator class being used for the predicate scan. The client program receives the data in the form of a transient instance whose OID is only meaningful within the current transaction. The client program can optionally create a persistent copy of the object. It makes more sense for the client program to handle the caching policy because it would be very difficult to design a sufficiently general application-independent policy.

An external query agent is represented in the federation by a pseudo-database, which has a name and number like any other database. Instead of a file pathname, the catalog entry for the pseudo-database specifies the name of the query server plugin and, optionally, a parameter string for that plugin. The parameter string can be used to specify a particular external data source where multiple sources are available.

The following figure shows how an application's scan operation on a pseudo-database triggers the plugin mechanism to load the external query agent plugin. The query agent then accesses the external data source, creates transient instances representing qualified items, and sends the results back through the query server.

External query agent flow

The only operation that can be performed on a pseudo-database is a predicate scan; an attempt to open the database in any other way results in an exception.

Limitations

  • This feature is not supported for federations using the old catalog format (pre-Release 9.0).
  • The query result objects cannot be large objects (larger than one database page), nor may they have any attached VArrays that are large objects.

Task Descriptions

To use this feature, users need to:
  1. Define the persistent capable class or classes that will be used to represent information from external sources. (If suitable classes don't already exist in the schema.)
  2. Implement a query server plugin to construct instances of that class from data in an external data source that matches a given predicate.
  3. Create a pseudo-database in the catalog of the Objectivity federation to represent the external query agent.
  4. Install a plugin specification file that associates the pseudo-database with the plugin library file.

The application program will:

  1. Use parallelScan to scan a pseudo-database or a federation containing one or more pseudo-databases.
  2. Optionally implement its own cache of data thus discovered.

Implement a Query Server Plugin

An external query agent is defined by C++ code in a shared library that:

  • Defines a subclass of the abstract class ooExternalQueryAgent (which is defined in the header ooPQE.h)
  • Implements a createPlugin entry point that returns an instance of the implementation class

A plugin specification file associates the shared library with the "QueryAgent" extension point (a point in the kernel’s execution that accepts the plugin).

The implementation code follows this pattern, in which AnAgent is a placeholder for the user's class name:

  
#include <ooPQE.h>
	class AnAgent: public ooExternalQueryAgent {
		public:
	       	AnAgent(); // could have parameters
	        AnAgent(); // optional; define if needed
	       	virtual void executeAgent(const char* agentParameters,
	                                 const ooObjectQualifier * qualifier,
	                                 ooClusterStrategy & clusterResults,
	                                 ooExternalQueryManager & manager);
		private:
     	// ... declare internal fields and methods ...
   };

   extern "C" 
   #ifdef _MSC_VER  // only for Microsoft Windows
    __declspec (dllexport)
   #endif
   void* createPlugin(const char* pluginKey)
   {
     return new AnAgent(); // could pass pluginKey as argument if needed
   }
   

The work is done by implementing this method:

virtual void
ooExternalQueryAgent::executeAgent( const char* agentParameters , const ooObjectQualifier * qualifier , ooClusterStrategy & clusterResults , ooExternalQueryManager & manager )
This method will be called from one of the query threads to perform the scan of an external data source.
The arguments are:

agentParameters
The parameter string specified when the pseudo-database was created, or NULL if none.
qualifier
The object qualifier object, which contains the type number (each result value needs to be an instance of the class corresponding to this type number, or a subclass thereof) and the predicate query in the form of an expression tree; see the Functional Specification for Objectivity Expression Tree Interface on the Customer Support Web site. The method is not required to use this parameter because results are automatically tested against the qualifier afterwards, but it is provided in case the agent can use it to limit the data being requested from an external data source or to filter the input data before constructing a result object. Typically, the agent will translate the qualifier into a query for the external data source.
clusterResults
Cluster strategy to be used by the agent when creating result objects. The agent creates objects as if they were persistent, but this special cluster strategy causes them to be allocated on a transient page that will be transmitted back to the client instead of being written to a database file. This cluster strategy does not need or use a "near" object parameter.
manager
This argument is an instance of the following class (declared in file ooPQE.h ):
	class ooExternalQueryManager {
		public:
			void reportResult(ooHandle(ooObj)& result);
      		void reportError(const char* message);
      		ooBoolean cancelledScan<();
      	private:
      	// ... [private members not shown] ...
	};
              

The member function reportResult is to be called for each result to be reported to the client, passing the object handle as the argument.

The member function reportError may be called to report an error. The message will be passed on to appear in an exception thrown in the client application.

The member function cancelledScan may be called to test whether the client application has cancelled the scan. If this returns true, the query agent function should return immediately because any results reported after that point are ignored. (This function simply tests a status variable; it does not involve any additional communication with the client process, so there is negligible cost to calling it frequently.)

The executeAgent method will be called in the context of a read-only transaction so that the plugin can optionally look up schema information for the type number. Note that the plugin is not expected to perform any other operations on the Objectivity federation.

For each resulting data item found in the external data source, executeAgent constructs an instance of the class using the provided cluster strategy, and then reports the result to the query server by calling the reportResult method. If the result object satisfies the qualifier, the query server then sends a copy of this data back to the client application's query manager. When there are no more results, executeAgent will return. It can report an error by calling the reportError method before returning. If an ooException is thrown, it will be caught and handled by the caller, so this method does not need to do that.

For example, the skeleton of such a method might look like this:

    void
   AnAgent::executeAgent(const char* agentParameters,
                         const ooObjectQualifier * qualifier,
                         ooClusterStrategy & clusterResults,
                         ooExternalQueryManager & manager) 
   {
     ooTypeNumber typeNumber = qualifier->getShapeNumber();
     const ooExpression* predicate = qualifier->getPredicateExpression();

     ... initialize the external search ...
     while ( !manager.cancelledScan() ) {
       ... find next result in external data source ...
       if ( no_more_results )
         break;
       ooHandle(appClass) result = new(clusterResults) appClass(init_params);
       ... any additional initialization of the result object ...
       manager.reportResult(result);
     }
     ... cleanup ...
   }       
        

The agent can create auxiliary objects to link to the result object. All such objects must be created using the provided cluster strategy and attached to the root result object before it is passed to the provided reporting function. These objects must not be modified or referenced after the result has been reported. A result object could also have links to previously existing persistent objects, but the agent cannot create new persistent objects.

The implementation of the executeAgent method is included in a shared library that is linked to the shared version of the Objectivity/DB library. This agent library file will be dynamically loaded by the query server when needed, as directed by a plugin specification file.

Create a Pseudo-Database

There are two approaches for creating a pseudo-database representing an external query agent:

  • Use an updated administrative tool
  • Use a new method on ooRefHandle(ooFDObj)

Administrative Tool

The oonewdb program has a new option for defining a pseudo-database. For example:

oonewdb -db externalDB -host server12 -queryagent externalDBGateway fd.boot

This defines a database named externalDB that will be implemented by a plugin named externalDBGateway to be loaded by the query server running on host server12 (unless overridden by a parallel scan custom task assigner). The new options are:

-queryagent agentName
This option is used to create a pseudo-database representing a query server plugin that will access data from some external source. The option value string is the plugin name, as referenced in the plugin specification file.

This option is mutually exclusive with the -filepath option, and it requires that -host be specified. When -queryagent is used, the -ap and -weight options have no effect.

-queryparameters string
This may optionally be used to specify a parameter string to pass to the plugin. For example, there could be two pseudo-databases that use the same plugin to access different external databases, where the plugin parameter string specifies which external database is to be accessed.

Such a pseudo-database can be deleted by oodeletedb (with -catalogonly effectively always true because the external data source is not affected). The following tools cannot be used on a pseudo-database: ooattachdb , oochangedb , oocopydb , oonewdbimage , and oodeletedbimage . Also, oobackupx and oorestorex will back up and restore only the catalog entry, not any associated external data.

C++ API

A new method on ooRefHandle(ooFDObj) lets you create a pseudo-database:

unsigned ooRefHandle(ooFDObj)::newQueryAgent( const char * DBname , const char * hostName , const char * queryAgent , const char * parameters = NULL, unsigned userDBID = 0, unsigned pageSize = 0)

Creates a pseudo-database representing a query server plugin. Returns the assigned DB number if successful, or 0 for failure. Note that the result can be directly assigned to a variable of type ooHandle(ooDBObj) .

The arguments are:

DBname
The name of the database. (Could be NULL .)
hostName
The name of the query server host. (This is the default if not overridden by a custom task assigner.)
queryAgent
Name of the query agent. (Must not be null or empty.) This corresponds to the plugin " name " value in the plugin specification file.
parameters
Optional parameter string to pass to the query agent.
userDBID
DB number to be used; if 0, an available number is assigned automatically.
pageSize
Page size for holding the transient objects created as query results. If 0, defaults to the federation's default page size.

For such a database:

  • ooHandle(ooDBObj)::numContObjs() returns 0. This can be used to test whether a handle represents a real database or a pseudo database.
  • ooHandle(ooDBObj)::isReadOnly() returns true.

An error will be signaled for any attempt to open such a database other than within parallelScan .

Install a Plugin Specification File

The implementation of the executeAgent method is included in a shared library that is linked to the shared version of the Objectivity/DB library. This agent library file is dynamically loaded by the query server when needed, as directed by a plugin specification file such as the following:

	   <ObjectivityPlugins>
	     <Plugin extensionPoint="QueryAgent">
	       <CppImplementation library="C:\directory\library.dll"
				key="key string" />
	       <Value name="name" value="agentName" />
	     </Plugin>
	   </ObjectivityPlugins>  
  
  

Where agentName represents the query agent name designated when the pseudo-database was created.

The key string is optional; it is passed as the argument of the createPlugin entry point function, which can use it as it wishes. This plugin specification is in a file with a .plugin suffix placed in the plugins directory of the Objectivity installation before starting the query server.

Use Parallel Scan on a Pseudo-Database

After an application calls parallelScan and ooItr( appClass )::next sets the iterator handle to represent one of the results from an external query agent, the fields of the object can be referenced in the usual way through ooHandle( appClass )::operator-> . However, the object is not actually a persistent object, which imposes a few limitations:
  • The object's OID is only meaningful within the current transaction.
  • Consequently, the object cannot be directly added to an association or persistent collection.
  • The transient instance that is being referenced may be deallocated when there are no longer any open handles referencing it.
  • The object should not be modified because any changes will be lost. Trying to open the object for update causes an error to be signaled.
You can assign the iterator to another handle, which keeps the transient instance alive until that handle is closed or the transaction is ended. The application can optionally make the data persistent by creating a new persistent instance copied or initialized from the transient data.

A new ooBoolean ooHandle(ooObj)::isTransient() method can be used to test whether a scan result is a transient instance instead of a persistent object.

Accessing Pseudo-Databases Using APIs

When iterating over databases in a federation, a scan operation skips any query agent pseudo-databases by default. An optional visitForeign parameter to the following functions can be set to true to access the pseudo-databases.

ooStatus ooRefHandle (ooFDObj)::contains(ooItr(ooDBObj) &dbItr, ooMode mode = oocNoOpen, ooBoolean visitForeign = oocFalse)
ooStatus ooItr(ooDBObj)::scan(const ooHandle(ooFDObj) & FDhandle, ooMode mode = oocNoOpen, ooBoolean visitForeign = oocFalse)	
	

Backward Compatibility

Database Format

No incompatibility is introduced, but applications or tools built with pre-10.0 releases will give a misleading error if they try to open a pseudo-database that they don't understand.

Code

There is no effect on existing applications that are re-built using Release 10.0 as long as they do not attempt to open arbitrary database numbers.

However, if an application built with a pre-10.0 version of Objectivity tries to iterate over a federation that contains pseudo-databases, it will get an obscure error when trying to open them.

Related Documentation

Refer to the following sources for more information:

  • The information about extending Objectivity/DB features in the Objectivity/DB Administration manual in your Objectivity documentation.
  • The information about predicate-query language (PQL) in the Objectivity documentation for your programming language interface.
  • The Functional Specification for Objectivity Expression Tree Interface on the Customer Support Web site.
Date: 
Friday, October 26, 2012
Product: 
Objectivity/DB
Version: 
10.2.1
10.2