CSV Plugin for C++ Client

Ease of usability is one important design goal of VDMS. We realize many users have data in CSV or tabular format and want to ingest this data into VDMS. Prior to VDMS v2.4.0, a user would create functions to parse their data and convert it to JSON queries for ingestion. Now, a user can use the CSV Plug-in to ingest their data into VDMS using the C++ Client. Currently, this capability is not available with the Python client.

This document provides details for using the CSV Plugin for C++ Client. This document includes:

Basic Building Blocks
Insert Commands

Basic Building Blocks

Constraints

Building block constraints are specified using a prefix 'cons_' followed by am index. This prefix is case sensitive.

cons_1 specifies the first constraint of the constraints json block.

Examples of different types of properties are:

cons_1	cons_2	cons_3
"age>=20,<=90"	gender==M	name==Poonam

The table will result in the following constraints block:

"constraints": {
   "age": [ ">=", 20, "<=", 90 ],
   "gender": [ "==", "M" ],
   "name": [ "==", "Poonam" ]
}

Operations

In a CSV file, operations are provided as a column. The column name is case-sensitive and are specified using ops_ prefix followed by the operation type. Examples of each operation are in the following table:

Operation	Column Name	Example Value	Description
Threshold	ops_threshold	150	Threshold value where all pixels above provided value are returned; Otherwise pixel values are zero
Crop	ops_crop	"255,224,15,10"	Comma delimited values for `x`, `y`, `width`, `height`, respectively.
Resize	ops_resize	"200,175"	Comma delimited values for `width`, and `height`, respectively.
Flip	ops_flip	1	Follows OpenCV convention: 0 for Vertical flip, >0 for Horizontal flip, <0 for both.
Rotate	ops_rotate	"30,TRUE"	Comma delimited values for angle of rotation, and a flag indicating whether the image will be resized so that the rotated image fits (resize = true), or if the image will be kept the same size (resize = false).
Interval	ops_interval	"10,50,2"	Comma delimited values for first frame, last frame, and step size (frames in between), respectively.

If all above operations are provided in a single query, the JSON equivalent is:

"operations": [
   {
      "type": "crop",
      "x": 255,
      "y": 224,
      "width": 15,
      "height": 10,
   },
   {
      "type": "resize",
      "width": 200,
      "height": 175,
   },
   {
      "type": "flip",
      "code": 1
   },
   {
      "type": "rotate",
      "angle": 30,
      "resize": true
   },
   {
      "type": "interval",
      "start": 10,
      "stop": 50,
      "step": 2
   },
   {
      "type": "threshold",
      "value": 150,
   }
]

Properties

Building block properties are specified using a prefix prop_. For any property with date as value is specified with a prefix prop_date:. This prefix is case sensitive.

prop_propertyname specifies that it represents a property with propertyname as the key of the properties json block. Properties can have string, numeric, Boolean, and alphanumeric values.

Only date/time is written in a specific format that is YYYY-MM-DDThh:mm:ssTZD. Currently all values in the date format should be specified.

Examples of different types of properties are :

prop_name	prop_age	prop_date:DoB	prop_hasdog	prop_gender	prop_weight	prop_email	prop_address
Poonam	26	1995-01-29T18:45:12-08:00	TRUE	F	55.9	mydummymail@gmail.com	90 kings land, Jodhpur

The above table will result in the following json:

"properties" : {
   "DoB" : {"_date" : "1995-01-29T18:45:12-08:00"},
   "age" : 26,
   "gender" : "F",
   "hasdog" : true,
   "name" : "Poonam",
   "weight" : 55.9,
   "email" : "mydummymail@gmail.com",
   "address" : "90 kings land, Jodhpur"
}

Rectangle

This is used to set the coordinates for a bounding box. To provide a rectangle value, in the cell provide 4 comma delimited numeric values representing x, y, w, and h respectively.

Insert Commands

AddBoundingBox

Bounding Boxes are a special kind of entity, similar to images. The AddBoundingBox call allows applications to add a region of interest to VDMS, where the entity added has a preset class of BoundingBox. For more details about the command, see AddBoundingBox.

Let's take the following JSON queries for ingesting two Bounding Boxes as an example:

[
   {
      "AddBoundingBox" : {
         "rectangle" : {
            "x" :120 ,"y" :50 ,"w" :40 ,"h" :40
         },
         "properties" : {
            "name" : "jowe",
            "id": 1
         }
      }
   },
   {
      "AddBoundingBox" : {
         "rectangle" : {
            "x" :100 ,"y" :20 ,"w" :20 ,"h" :40
         },
         "properties" : {
            "name" : "suitcase",
            "id": 2
         }
      }
   }
]

The equivalent CSV file is as follows:

RectangleBound,prop_name,prop_id
"120,50,40,40",jowe,1
"100,20,20,40",suitcase,2

In this CSV, the first row provides the column names needed for the AddBoundingBox command and each additional row represents details for one bounding box. The first column is mandatory while the remaining columns are based on the other information needed for the entries.

The following table provides details for the CSV format:

Column Name	Description	Has Position Constraint?	Column Position	Is field mandatory?
RectangleBound	This column is required and provides a rectangular region of interest in the format `x,y,w,h`	Yes	1	Yes
prop_name	This column is user-defined. It specifies the value for property named `name`	No	-	No
prop_id	This column is user-defined. It specifies the value for property named `id`	No	-	No
cons_1	This column is user-defined. It specifies the first constraint needed to find the image to link to Bounding Box	No	-	No

Note: If cell value is empty it means that column name is not applicable for that Bounding Box data.

AddConnection

This command helps an application associate two entities with specific class on the relationship and properties if needed. The source and destination references (ref1 and ref2) corresponds to some references created with the AddEntity command or FindEntity command (using "_ref"). For more details about the command, see AddConnection.

Let's take the following JSON queries for adding a connection between two entities as an example:

[
   {
      "FindEntity" :
      {
            "_ref" : 1,
            "class" : "Person",
            "constraints" :
            {
               "id" : ["==", 1]
            }
      }
   },
   {
      "FindEntity" :
      {
            "_ref" : 3,
            "class" : "Person",
            "constraints" :
            {
               "id" : ["==", 2]
            }
      }
   },
   {
      "AddConnection" :
      {
         "class": "BloodRelation",
         "ref1" : 1,
         "ref2" : 3,
         "properties" : {
            "type": "brother"
         }
      }
   }
]

The equivalent CSV file is as follows:

ConnectionClass,Person@id,Person@id,prop_type
BloodRelation,1,2,brother

In this CSV, the first row provides the column names needed for the AddConnection command and each additional row represents a connection between two entities. The first column is mandatory while the remaining columns are based on the other information needed for the entries.

The following table provides details for the CSV format:

Column Name	Description	Has Position Constraint?	Column Position	Is field mandatory?
ConnectionClass	This column is required and provides a class for the connection	Yes	1	Yes
Person@id	This column is user-defined. It specifies the contraints to find the first entity for a connection. In this example, the first entity must have class `Person` with property `id` equal to 1	No	-	No
Person@id	This column is user-defined. It specifies the contraints to find the second entity for a connection. In this example, the second entity must have class `Person` with property `id` equal to 2	No	-	No
prop_type	This column is user-defined. It specifies the value for property named `type`	No	-	No

Note: If cell value is empty it means that column name is not applicable for that connection.

AddDescriptor

VDMS natively supports high-dimensional feature vector operations allowing efficient similarity searches, particularly useful in ML pipelines. Feature vectors or descriptors are intermediate results of various machine learning or computer vision algorithms when run on visual data. These vectors can be labeled and classified to build search indexes. VDMS does not extract descriptors but once they are available, it can store, index, and search for similarity. For more details about the command, see AddDescriptor.

Let's take the following JSON queries for ingesting a descriptor as an example:

[
   {
      "AddDescriptor" : {
         "set" : "Test_14096",
         "label" : "Rocky",
         "properties" : {
            "age" : 34,
            "gender" : "M"
         }
      }
   },
   {
      "AddDescriptor" : {
         "set" : "Test_14096",
         "label" : "Carolyn",
         "properties" : {
            "age" : 67,
            "gender" : "F"
         }
      }
   }
]

[descriptor1,descriptor2]

The equivalent CSV file is as follows:

DescriptorClass,label,prop_age,prop_gender,inputdata
Test_14096,Rocky,34,M,blob_1.txt
Test_14096,Carolyn,67,F,blob_2.txt

In this CSV, the first row provides the column names needed for the AddDescriptor command and each additional row represents details for one descriptor. The first column is mandatory while the remaining columns are based on the other information needed for the descriptors.

The following table provides details for the CSV format:

Column Name	Description	Has Position Constraint?	Column Position	Is field mandatory?
DescriptorClass	This column is required and provides the name of the DescriptorSet to add the descriptor	Yes	1	Yes
label	This column specifies the label of the descriptor	No	-	No
prop_age	This column is user-defined. It specifies the value for property named `age`	No	-	No
prop_gender	This column is user-defined. It specifies the value for property named `gender`	No	-	No
inputdata	This column is required as it provides the file path which contains the actual blob for the descriptor/feature vector	No	-	Yes

Note: If cell value is empty it means the column is not applicable for that Descriptor.

AddDescriptorSet

A DescriptorSet is a group of descriptors with a fixed number of dimensions that are the result of the same algorithm for feature extraction. For instance, we can create a DescriptorSet and insert multiple descriptors obtained by using OpenFace (128 dimensions), and then index and perform matching operations over those descriptors. For more details about the command, see AddDescriptorSet.

Let's take the following JSON queries for adding two descriptor sets as an example:

[
   {
      "AddDescriptorSet" : {
         "name": "pretty_faces",
         "dimensions": 128,
         "metric": "L2",
         "engine": "FaissFlat",
         "properties": {
            "algorithm": "OpenFace"
         }
      }
   },
   {
      "AddDescriptorSet" : {
         "name": "Test_14096",
         "dimensions": 1024,
         "metric": "IP",
         "engine": "FaissIVFFlat",
         "properties": {
            "algorithm": "random"
         }
      }
   }
]

The equivalent CSV file is as follows:

DescriptorType,dimensions,distancemetric,searchengine,prop_algorithm
pretty_faces,128,L2,FaissFlat,OpenFace
Test_14096,1024,IP,FaissIVFFlat,random

In this CSV, the first row provides the column names needed for the AddDescriptorSet command and each additional row represents details for one descriptor set. The first column is mandatory while the remaining columns are based on the other information needed for the descriptors.

The following table provides details for the CSV format:

Column Name	Description	Has Position Constraint?	Column Position	Is field mandatory?
DescriptorType	This column is required and provides the name of the DescriptorSet	Yes	1	Yes
dimensions	This column is required and specifies the number of dimensions of the feature vector	No	-	Yes
distancemetric	This column is specifies the method used to calculate distances (IP or L2)	No	-	No
searchengine	This column specifies the underlying implementation for indexing and computing distances (TileDBDense, TileDBSparse, FaissFlat, or FaissIVFFlat)	No	-	No
prop_algorithm	This column is user-defined. It specifies the value for property named `algorithm`	No	-	No

Note: If cell value is empty it means the column is not applicable for that DescriptorSet.

AddEntity

Adds a new entity to VDMS. For more details about the command, see AddEntity.

Let's take the following JSON queries for adding three entities as an example:

[
   {
      "AddEntity" : {
         "_ref" : 1,
         "class" : "Person",
         "constraints" : {
            "age" : [">=", 10, "<=", 30],
            "name" : ["==", "Ali"]
         },
         "properties" : {
            "DoB" : {"_date" : "2018-02-27T13:45:12-08:00"},
            "age" : 30,
            "gender" : "M",
            "hasdog" : true,
            "name" : "Ali"
         }
      }
   },
   {
      "AddEntity" : {
         "_ref" : 2,
         "class" : "Person",
         "constraints" : {
            "age" : ["<=",10],
            "name" : ["==","alex"]
         },
         "properties" : {
            "age" : 10,
            "gender" : "M",
            "hasdog" : false,
            "name" : "alex"
         }
      }
   },
   {
      "AddEntity" : {
         "_ref" : 3,
         "class" : "Person",
         "constraints" : {
            "age" : [">=",10],
            "name" : [ "==","Shri"]
         },
         "properties" : {
            "DoB" :{"_date" : "2018-02-27T13:45:12-08:00"},
            "age" : 33,
            "gender" : "F",
            "name" : "Shri"
         }
      }
   }
]

The equivalent CSV file is as follows:

EntityClass,prop_name,prop_age,cons_1,prop_date:DoB,cons_2,prop_hasdog,prop_gender
Person,Ali,30,"age>=10,<=30",2018-02-27T13:45:12-08:00,name==Ali,TRUE,M
Person,alex,10,age<=10,,name==alex,FALSE,M
Person,Shri,33,age>=10,2018-02-27T13:45:12-08:00,name==Shri,,F

In this CSV, the first row provides the column names needed for the AddEntity command and each additional row represents details for one entity. The first column is mandatory while the remaining columns are based on the other information needed for the entities.

The following table provides details for the CSV format:

Column Name	Description	Has Position Constraint?	Column Position	Is field mandatory?
EntityClass	This column is required and provides the class for the entity	Yes	1	Yes
prop_name	This column is user-defined. It specifies the value for property named `name`	No	-	No
prop_age	This column is user-defined. It specifies the value for property named `age`	No	-	No
cons_1	This column is user-defined and specifies a constraint for a conditional add	No	-	No
prop_date:DoB	This column is an user-defined date for property named `DoB`	No	-	No
cons_2	This column specifies the second constraint to use for the conditional add	No	-	No
prop_hasdog	This column is user-defined. It specifies the value for property named `hasdog`	No	-	No
prop_gender	This column is user-defined. It specifies the value for property named `gender`	No	-	No

Note: If cell value is empty it means the column is not applicable for that entity.

AddImage

Adds a new image to VDMS. For more details about the command, see AddImage.

The following JSON query adds an image and performs the threshold and resize operations:

[
   {
      "AddImage" : {
         "format" : "png",
         "properties" : {
            "type" : "scan",
            "part" : "brain"
         },
         "operations" : [
            {"type" : "threshold", "value" : 150},
            {"type" : "resize", "width" : 200, "height" : 200}
         ]
      }
   }
]

[imageblob]

The equivalent CSV file is as follows:

ImagePath,prop_type,prop_part,format,ops_threshold,ops_resize
/C:/Documents/images/click1.jpg,scan,brain,png,150,"200,200"

In this CSV, the first row provides the column names needed for the AddImage command and each additional row represents details for one image. The first column is mandatory while the remaining columns are based on the other information needed for the images.

The following table provides details for the CSV format:

Column Name	Description	Has Position Constraint?	Column Position	Is field mandatory?
ImagePath	This column is required and provides the file path which contains the actual blob for the image	Yes	1	Yes
prop_type	This column is user-defined. It specifies the value for property named `type`	No	-	No
prop_part	This column is user-defined. It specifies the value for property named `part`	No	-	No
format	This column specifies the format used to store the image [jpg, png, tdb, bin]	No	-	No
ops_threshold	This column is optional and specifies the value to use for the threshold operation	No	-	No
ops_resize	This column is optional and specifies the `width,height` to use for the resize operation	No	-	No

Note: If cell value is empty it means the column is not applicable for that Image.

AddVideo

This call allows an application to add (and preprocess) a video in VDMS. The video blob is an encoded binary array using any of the supported containers or encodings. For more details about the command, see AddVideo.

The following JSON query adds a Video to VDMS:

[
   {
      "AddVideo" : {
         "codec" : "h264",
         "container" : "avi",
         "properties" : {
            "name" : "memories"
         },
         "index_frames" : true,
         "operations" : [
            {"type" : "threshold", "value" : 200}
         ]
      }
   }
]

The equivalent CSV file is as follows:

VideoPath,prop_name,format,compressto,ops_threshold,frameindex
C:/Documents/Videos/v1.mp4,memories,avi,h264,200,TRUE

In this CSV, the first row provides the column names needed for the AddVideo command and each additional row represents details for one video. The first column is mandatory while the remaining columns are based on the other information needed for the descriptors.

The following table provides details for the CSV format:

Column Name	Description	Has Position Constraint?	Column Position	Is field mandatory?
VideoPath	This column is required and provides the file path which contains the actual video	Yes	1	Yes
prop_name	This column is user-defined. It specifies the value for property named `name`	No	-	No
format	This column specifies the container used for the video file [mp4, avi, mov]	No	-	No
compressto	This column specifies the codec to be transcoded [xvid, h264, h263]	No	-	No
ops_threshold	This column is optional and specifies the value to use for the threshold operation	No	-	No
frameindex	This column is optional and triggers key-frame index extraction on the video	No	-	No
fromserver	This column is not in the example but can be used to read a file from the server	No	-	No