When using InfiniteGraph, it is simple to add storage locations using the AddStorageLocation tool which would enable the location to be used for storage when the current main storage location is unavailable (it may no longer be available via the network or the maximum storage capacity may have been reached). At the same time, it can be harder to spread data out evenly during ingest across multiple storage locations because it is a multi-step process. In this blog, the steps to evenly distribute data across storage locations on the same or different hosts are outlined.
In InfiniteGraph, the placement system places data according to a placement model. Without specifying an advanced placement configuration, a default placement model is used. The default placement model places most of the user graph data in one of two groups: Vertex and Edge. If a custom placement model is used, these groups can be overridden (see footnote ).
When setting up the storage locations (local or remote), you can use use the AddStorageLocation tool. When you want to achieve an even distribution of the data, you must also pre-create containers in each of the storage locations for each of the placement groups using the CreateContainers tool. This allows the placement system to "prime" the storage locations and allow the data to spread out according to how many containers were pre-created in each storage location.
Image 1: Multiple Storage Locations with Pre-Created Containers
Adding storage locations can be done at any time, but pre-creating containers for placement groups need to be done after the groups are created (i.e. after the placement model is imported to the database). If there is no custom placement used, the default groups (Vertex, Edge) are created along with the database. Otherwise, the user defined groups are created after the custom placement model is imported via the
igimportplacement tool. Once the groups are created the storage locations and container creation per the groups can be done.
Batch Processing Tool
User data will not be distributed evenly across the set of storage locations until the storage locations are added and containers are pre-created in each of the storage locations. The commands to add locations and create containers can be batch loaded using the
objy execute tool which will process all commands in a text file at once. The text file would contain a list of all of your
addstoragelocation commands (one for each storage location) and
createcontainers commands (one for each group for each storage location). The number of containers that are created will determine how much data is placed in that storage location, so an high number of containers will allow you to achieve a fair distribution. Note: Pre-creating the containers should also improve the speed of ingest because the container creation is being done as a administrative step instead of dynamically at application runtime.
Here is a downloadable version of a sample script which has some
createcontainers commands. Obviously, without knowing what storage paths or boot file name that is required, some fields need to be edited. All remote paths can be specified in the form host::path. In the script, I add one storage location and create 100 containers for each group in that storage location. For each storage location, copy these four commands and configure it for that location (see footnote ). Once the file is edited to contain the correct field values, you can execute the commands by calling
objy execute -infile <path-to-script-file>.
If you want a script file to be processed at create time to integrate it easier into the setup tasks, you can add the property
IG.Placement.CommandSetupFile to the properties file (with the property value being the path to the edited script file). If you do this, the script file will be executed during the placement setup phase at database create time. Note: The
IG.Placement.CommandSetupFile property will be ignored after creation time, so it must be passed in with the properties file when the graph database is created. If the database is already created, just use the
objy execute tool to batch process these commands instead.
 If a custom placement model is being used, you can still use these steps to get distribution of the data, but you may be using different group names instead of or in addition to the three given above. Also, I would highly recommend using "Random-Relaxed" or "RoundRobin-Relaxed" as the
selection attribute value in the
ContainerGroupConfiguration for each of your object placers if you want to achieve even distribution.
 You should end up with n
addstoragelocation commands for n storage locations and 3 times n commands for
createcontainers for each of the 3 default groups for each storage location.