LouvainCommunityDetection
Louvain Community Detection
The LouvainCommunityDetection step is used to assign a "topic" label to each chunk. This is really useful if you don't necessarily know the salient topics in the content ex-ante, for example if you are summarising long-form podcasts, or books.
Under the hood, we use an optimised Louvain Community Detection algorithm to create clusters of the embeddings passed to this step. We then pass a random sample of embeddings (drawn from around the cluster centroid) to a large language model and give it the task of assigning a cluster label from this small sample of embeddings. We then assign this resultant label to all embeddings in that cluster.
Step Args
Key | Value Type | Value Description |
---|---|---|
resolution |
float |
A parameter for the Louvain model. For more see here. |
model |
str |
The language model that should be used to generate the label titles from the sample of the embeddings from each cluster. |
If you include a LouvainCommunityDetection step, you will find the label assigned to each Chunk, in the metadata_json field of the Chunk in the VectorDB. Similar to the LexRank step, the Louvain metadata is found in its own dictionary in the metadata_json field, with the key being equal to the STEP_NAME, and the label to be found in metadata_json --> STEP_NAME --> "label". i.e. if your Louvain step is called "louvain_step", then each Chunk in the vector index will have {"louvain_step" : { "label": LABEL }} in its metadata_json field.