Python
Install the SDK
Initial Setup
To start using OneContext, set the following environment variables:
ONECONTEXT_API_KEY=<your api key> #(1)!
OPENAI_API_KEY=<your openai key> #(2)!
BASE_URL=<your base url> #(3)!
from onecontext import OneContext
oc = OneContext() # reads api_key from ONECONTEXT_API_KEY env variable
- You can get one of these by signing up here
- If you want us to send your context-augmented prompts directly to OpenAI, add your OpenAI API key to the environment variables. Otherwise, leave it blank, and you can send them on your end.
- If you're on the starter plan, this will simply be
https://api.onecontext.ai
. If you're on the enterprise plan, this will be the URL of your private instance of OneContext.
Create your first knowledge base
A knowledge base is a collection of files. We create our first knowledge base and upload a file:
Create a Vector Index
We want to chunk and embed the files in our knowledebase but first we need somewhere to store our vectors. We create a vector index and specify the embedding model that the vector index should expect:
By specifying the model we create a vector index of appropriate dimensions and also ensure that we never write embeddings from a different model to this index.
Create an Ingestion Pipeline
We are ready to deploy our first ingestion pipeline.
ingestion_pipeline = oc.deploy_pipeline("my_ingestion_pipeline", pipeline_yaml_path="./ingestion.yaml")
steps:
- step: KnowledgeBaseFiles
name: input
step_args:
# specify the source knowledgebases to watch
knowledgebase_names: ["my_kb"]
inputs: []
- step: Preprocessor
name: preprocessor
step_args: {}
inputs: [input]
- step: Chunker
name: simple_chunker
step_args:
chunk_size_words: 320
chunk_overlap: 30
inputs: [preprocessor]
- step: SentenceTransformerEmbedder
name: sentence-transformers
step_args:
model_name: BAAI/bge-base-en-v1.5
inputs: [ simple_chunker ]
- step: ChunkWriter
name: save
step_args:
vector_index_name: my_vector_index
inputs: [sentence-transformers]
Let's break down the steps.
The KnowledgeBaseFiles step tells the pipeline to watch the "my_kb" knowledge base. When the pipeline is first deployed all files in the knowledge base will be run through the pipeline. Any subsequent files uploaded to this knowledge base will trigger the pipeline to run.
The Chunker defines how the files will be split into chunks.
The SentenceTransformerEmbedder step specifys the embedding model that will be used to embed the chunks.
Finally, the ChunkWriter step writes the chunks to the vector index we created earlier.
Create a Query Pipeline
Having indexes the files we now create a pipeline to query the vector index.
steps:
- step: SentenceTransformerEmbedder
name: query_embedder
step_args:
model_name: BAAI/bge-base-en-v1.5
include_metadata: [ title, file_name ]
query: "placeholder"
inputs: [ ]
- step: Retriever
name: retriever
step_args:
vector_index_name: my_vector_index
top_k: 100
metadata_filters: { }
inputs: ["query_embedder"]
- step: Reranker
name: reranker
step_args:
query: "placeholder"
model_name: BAAI/bge-reranker-base
top_k: 5
metadata_filters: { }
inputs: [ retriever ]
Here we create a simple two-step query pipeline.
- The
SentenceTransformerEmbedder
step embeds the query - The
Retriever
step embeds the query and performs a similarity search against the index we defined earlier. This step has a high recall and is great to retrieve many candidate vectors. - The
Reranker
step uses cross-encoder model to further narrow down the results only to the most relevant chunks.
Run the Query Pipeline
We can run the query pipeline and override any of default the step arguments defined in our pipeline at runtime by passing a dictionary of the form:
{step_name : {step_arg: step_arg_value}
.
query = "What "
retreiver_top_k = 50
top_k = 5
override_args = {
"query_embedder": {"query": query},
"retriever": {
"top_k": retriever_top_k,
},
"reranker": {"top_k": top_k, "query": query},
}
chunks = query_pipeline.run(override_args)
For much more information on the steps you can add to your pipeline, and what functionality you can get out of pipelines, see the pipelines page.