OneContext Structured Query Language

OneContext allows you to use a custom "Structured Query Language" to filter your embeddings in your vectorDB at runtime.

The syntax is quite similar to what you might find in no-SQL databases like MongoDB, even though it operates on a SQL database at its core.

The syntax is based around the application of operators. There are two levels of operators. You can interpret the two levels as "aggregators" and "comparators".

Aggregators

The aggregator operators you can use are:

Key	Value Description
`$and`	Returns True i.f.f. all of the conditions in this block return True.
`$or`	Returns True if any of the conditions in this block return True.

Comparators

The comparator operators you can use are:

Key	Value Description	Suppled Value Type	Returned Value Type
`$eq`	Returns True if the value returned from the DB is equal to the supplied value.	`string \| int \| float`	`string \| int \| float`
`$gt`	Returns True if the value returned from the DB is greater than the supplied value.	`int \| float`	`int \| float`
`$lt`	Returns True if the value returned from the DB is less than the supplied value.	`int \| float`	`int \| float`
`$in`	Returns True if the value returned from the DB is contained by the supplied array.	`array<string \| int \| float>`	`string \| int \| float`
`$contains`	Returns True if the array value returned from the DB contains the supplied value.	`string \| int \| float`	`array<string \| int \| float>`

Putting it all together

Using the above building blocks, it's pretty simple to put together quite an advanced composite filter across your embeddings at runtime.

For example in Python you could define some metadata filters like the below:

override_meta_filters = { "$and": [
  {"$or": [
    {"file_name": {"$eq":"test_file1.txt"}},
    {"file_name": { "$in": ["test_file1.txt", "test_file2.txt", "test_file3.txt"}}
  ]},
  {"tag": {"$eq": "test"}},
  {"lexrank.percentile_score": {"$gt" : 0.5}},
  {"kmeans.percentile_score" : {"$gt" :0.4}}
]}

Indexing into the metadata_json dictionary

The metadata_json field on the Chunk objects are all stored and indexed as postgres jsonb fields. That means you can index into them as shown in the above example with lexrank.percentile_score and kmeans.percentile_score. These attributes in the metadata_json dictionary themselves would appear as nested dictionaries, for example:

{"lexrank" : {"outright_score" : 1.83, "percentile_score" : "0.8"}, 
 "kmeans" : {"outright_score" : 2.3, "percentile_score" : 0.9}}

However, OneContext's structured query language means you can just "dot" into the nested fields.