Chunking documents are good for RAG. It lets you only pass the relevant sections of your documents to the LLM potentially saving tokens and ensuring that it has the relevant context.

One approach towards this is to chunk documents further! You have parent chunks and child chunks. During the RAG workflow, first your query gets matched against the parent and only then against the chunk. This ensures that you are more likely to get a relevant context as it matched against a more-concise chunk first.

There are variations to this such as BM25 which uses an LLM to generate the smaller chunks but lets see how we can implement this using Langchain

First, define your LLM and Embedding model

from langchain_openai import AzureChatOpenAI, AzureOpenAIEmbeddings
from langchain.embeddings import CacheBackedEmbeddings
from langchain.storage import LocalFileStore
from dotenv import load_dotenv
from os import getenv

load_dotenv()

llm = AzureChatOpenAI(
    deployment_name=getenv("GPT4O_NAME"), max_tokens=4000, temperature=0
)
openai_embeddings = AzureOpenAIEmbeddings(
    azure_deployment=getenv("EMBEDDINGS_NAME"),
    api_key=getenv("OPENAI_API_KEY"),
    azure_endpoint=getenv("AZURE_OPENAI_ENDPOINT"),
)
docs_store = LocalFileStore("./static/cache/docs_cache")
query_store = LocalFileStore("./static/cache/query_cache")

embeddings = CacheBackedEmbeddings.from_bytes_store(
    openai_embeddings,
    query_embedding_cache=query_store,
    namespace=openai_embeddings.model,
    document_embedding_cache=docs_store,
)

Next, define our Text Splitter

from langchain_text_splitters import RecursiveCharacterTextSplitter

parent_splitter = RecursiveCharacterTextSplitter(
    chunk_size=12000,
    length_function=len,
    is_separator_regex=False,
)
child_splitter = RecursiveCharacterTextSplitter(chunk_size=4000)

Improve the ParentDocumentRetriever

from langchain.retrievers import ParentDocumentRetriever

Now, we need a vector store as well as a docstore

from langchain.storage import InMemoryStore
from langchain_chroma import Chroma

I am using Chroma here, you may try this with a vector database or FAISS

vector_store = Chroma(
    collection_name="full_documents", embedding_function=embeddings, persist_directory="./static/chroma",
)

Define our Retriever

retriever = ParentDocumentRetriever(
    vectorstore=vector_store,
    docstore=store,
    child_splitter=child_splitter,
    parent_splitter=parent_splitter,
)

Create your documents and add them to the vector store

retriever.add_documents(doc)