Louvain Community Detection

The LouvainCommunityDetection step is used to assign a "topic" label to each chunk. This is really useful if you don't necessarily know the salient topics in the content ex-ante, for example if you are summarising long-form podcasts, or books.

Under the hood, we use an optimised Louvain Community Detection algorithm to create clusters of the embeddings passed to this step. We then pass a random sample of embeddings (drawn from around the cluster centroid) to a large language model and give it the task of assigning a cluster label from this small sample of embeddings. We then assign this resultant label to all embeddings in that cluster.

Step Args
Key Value Type Value Description
resolution float A parameter for the Louvain model. For more see here.
model str The language model that should be used to generate the label titles from the sample of the embeddings from each cluster.

If you include a LouvainCommunityDetection step, you will find the label assigned to each Chunk, in the metadata_json field of the Chunk in the VectorDB. Similar to the LexRank step, the Louvain metadata is found in its own dictionary in the metadata_json field, with the key being equal to the STEP_NAME, and the label to be found in metadata_json --> STEP_NAME --> "label". i.e. if your Louvain step is called "louvain_step", then each Chunk in the vector index will have {"louvain_step" : { "label": LABEL }} in its metadata_json field.