OneContext Structured Query Language
OneContext allows you to use a custom "Structured Query Language" to filter your embeddings in your vectorDB at runtime.
The syntax is quite similar to what you might find in no-SQL databases like MongoDB, even though it operates on a SQL database at its core.
The syntax is based around the application of operators
. There are two levels of operators. You can interpret the two levels as "aggregators" and "comparators".
Aggregators
The aggregator operators you can use are:
Key | Value Description |
---|---|
$and |
Returns True i.f.f. all of the conditions in this block return True. |
$or |
Returns True if any of the conditions in this block return True. |
Comparators
The comparator operators you can use are:
Key | Value Description | Suppled Value Type | Returned Value Type |
---|---|---|---|
$eq |
Returns True if the value returned from the DB is equal to the supplied value. | string | int | float |
string | int | float |
$gt |
Returns True if the value returned from the DB is greater than the supplied value. | int | float |
int | float |
$lt |
Returns True if the value returned from the DB is less than the supplied value. | int | float |
int | float |
$in |
Returns True if the value returned from the DB is contained by the supplied array. | array<string | int | float> |
string | int | float |
$contains |
Returns True if the array value returned from the DB contains the supplied value. | string | int | float |
array<string | int | float> |
Putting it all together
Using the above building blocks, it's pretty simple to put together quite an advanced composite filter across your embeddings at runtime.
For example in Python you could define some metadata filters like the below:
override_meta_filters = { "$and": [
{"$or": [
{"file_name": {"$eq":"test_file1.txt"}},
{"file_name": { "$in": ["test_file1.txt", "test_file2.txt", "test_file3.txt"}}
]},
{"tag": {"$eq": "test"}},
{"lexrank.percentile_score": {"$gt" : 0.5}},
{"kmeans.percentile_score" : {"$gt" :0.4}}
]}
Indexing into the metadata_json dictionary
The metadata_json field on the Chunk
objects are all stored and indexed as
postgres jsonb
fields. That means
you can index into them as shown in the above example with
lexrank.percentile_score
and kmeans.percentile_score
. These attributes in the metadata_json dictionary themselves would appear as nested dictionaries, for example:
{"lexrank" : {"outright_score" : 1.83, "percentile_score" : "0.8"},
"kmeans" : {"outright_score" : 2.3, "percentile_score" : 0.9}}
However, OneContext's structured query language means you can just "dot" into the nested fields.