LouvainCommunityDetection

Louvain Community Detection

The LouvainCommunityDetection step is used to assign a "topic" label to each chunk. This is really useful if you don't necessarily know the salient topics in the content ex-ante, for example if you are summarising long-form podcasts, or books.

Under the hood, we use an optimised Louvain Community Detection algorithm to create clusters of the embeddings passed to this step. We then pass a random sample of embeddings (drawn from around the cluster centroid) to a large language model and give it the task of assigning a cluster label from this small sample of embeddings. We then assign this resultant label to all embeddings in that cluster.

Step Args

Key	Value Type	Value Description
`resolution`	`float`	A parameter for the Louvain model. For more see here.
`model`	`str`	The language model that should be used to generate the label titles from the sample of the embeddings from each cluster.

If you include a LouvainCommunityDetection step, you will find the label assigned to each Chunk, in the metadata_json field of the Chunk in the VectorDB. Similar to the LexRank step, the Louvain metadata is found in its own dictionary in the metadata_json field, with the key being equal to the STEP_NAME, and the label to be found in metadata_json --> STEP_NAME --> "label". i.e. if your Louvain step is called "louvain_step", then each Chunk in the vector index will have {"louvain_step" : { "label": LABEL }} in its metadata_json field.