Skip to content

#323: Add the LLM to the Python Friday RAG

Last week we turned blog posts written in Markdown into embeddings in Chroma. That gives us the data that we need for our Python Friday RAG (Retrieval Augmented Generation). In this post we reuse the LangChain configuration that helped us in various posts to connect our script to a local LLM running inside LM Studio.

Installation

If you do not already have these packages installed, now is a good time to do so:

uv pip install chromadb langchain_openai langchain 

Build and augment the prompt

A main part of our RAG is to augment the prompt we send to the LLM with data from our Chroma vector store. For this task we create a class that handles all the work for us. In the constructor we wire everything up, while the _format_prompt() method fetches the data and augments the prompt. While get_chain() builds the LangChain chain, handle_query() is the connector between our prompt building method and LangChain:

import chromadb
from langchain_openai import ChatOpenAI
from langchain_core.runnables import RunnableLambda
from langchain_core.messages import HumanMessage

class RAGAssistant:
    def __init__(self, collection_name, db_path, model_name="openai/gpt-oss-20b"):
        # Initialize database internally
        self.client = chromadb.PersistentClient(path=db_path)
        self.collection = self.client.get_collection(name=collection_name)

        # Initialize LLM internally
        self.llm = ChatOpenAI(
            base_url="http://localhost:1234/v1",
            api_key="lm-studio",
            model=model_name,
            temperature=0.1
        )

    def _format_prompt(self, question):
        """Internal method to query Chroma and build the string."""
        results = self.collection.query(query_texts=[question], n_results=5)

        context_string = ""
        for i, (doc, meta) in enumerate(zip(results["documents"][0], results["metadatas"][0])):
            ref = meta.get("Reference", "Unknown")
            context_string += f"DOCUMENT ID: {i}\nREFERENCE LABEL: {ref}\nCONTENT: {doc}\n---\n"

        return f"""### Instruction
# Answer the user's question using ONLY the provided data in the "Context" section.

# ### Constraints
# 1. NO INLINE CITATIONS: Do not mention the source, reference title, or document ID within the body of your answer. Write the answer as a seamless narrative.
# 2. STRICT GROUNDING: Use only the provided Context. If the answer is missing, say "I do not have enough information in the provided context."
# 3. UNIQUE REFERENCES: At the very end of your response, provide a section titled "References:".
# 4. DEDUPLICATION: In the "References:" section, list every unique REFERENCE LABEL that contributed to your answer. Even if multiple context blocks have the same label, list it only once.
#    - Format: " - {{reference}}"

# ### Context
# {context_string}

# ### User Question
# {question}

# ### Answer
"""

    def handle_query(self, user_input):
        """This matches the signature LangChain expects."""
        full_prompt = self._format_prompt(user_input)
        return [HumanMessage(content=full_prompt)]

    def get_chain(self):
        """Builds the chain using the instance method."""
        return RunnableLambda(self.handle_query) | self.llm

Wire it up and loop through the questions

The final part is where we initialise our class, build the chain and add our well-known loop that allows us to ask multiple questions in one session:

# 1. Instantiate the assistant (No globals!)
assistant = RAGAssistant(
    collection_name="posts", 
    db_path="./PythonFridayRAG.chroma"
)

# 2. Get the chain
chain = assistant.get_chain()

# 3. Loop
print("--- Python Friday RAG Chatbot Started (Type 'exit' to stop) ---")
while True:
    user_query = input("\nYou: ")

    if user_query.lower() in ["quit", "exit", "end"]:
        print("Goodbye!")
        break

    try:
        result = chain.invoke(user_query)

        print("\n🧾 Answer:\n")
        print(result.content)
        print("\n" + "-"*30)

    except Exception as e:
        print(f"\n❌ An error occurred: {e}")

Ask about Python Friday on the command line

We can now run our script and ask questions about Python. If we have posts matching the question, we get an answer augmented with examples from the blog. If the question cannot be answered, our RAG says so:

--- Python Friday RAG Chatbot Started (Type 'exit' to stop) ---

You: What is PEP?

🧾 Answer:

A Python Enhancement Proposal (PEP) is a design document that provides 
information to the Python community or describes a new feature for Python, 
its processes, or environment. It offers a concise technical specification 
of the feature and explains the rationale behind it. PEPs serve as the 
primary mechanism for proposing major new features, gathering community 
input on issues, and documenting the design decisions made in Python. The 
author of a PEP is responsible for building consensus within the community 
and recording any dissenting opinions.

**References:**
 - #17: What is PEP? / PEP?

------------------------------

You: How can we read a file?

🧾 Answer:

To read a file in Python you use the built‑in `open()` function to obtain 
a file object and then call one of its read methods.

1. **Open the file**
   ```python
   with open("filename.txt", "r") as f:
       ...
   ```
   The `"r"` mode tells Python that you want to read the file (the default). 
   Using a `with` statement ensures the file is closed automatically, even 
   if an error occurs.

2. **Read the whole contents at once**
   ```python
   content = f.read()
   print(content)
   ```
   This returns the entire file as a single string.

3. **Read line by line**
   *Using `readline()` in a loop*
   ```python
   while True:
       line = f.readline()
       if not line:          # empty string signals end of file
           break
       print(line)
   ```
   *Or using `readlines()` to get a list of all lines*
   ```python
   lines = f.readlines()
   for line in lines:
       print(line)
   ```

4. **Handle different encodings**
   If the file isn’t encoded in UTF‑8, specify the correct encoding when 
   opening it:
   ```python
   with open("umlaute.txt", "r", encoding="iso-8859-15") as f:
       content = f.read()
       print(content)
   ```

These techniques cover reading text files safely and flexibly in Python.

---

**References:**
 - #16: Working With Files / Reading and writing text files
 - #16: Working With Files / Reading a file line by line
 - #16: Working With Files / Fixing encoding problems

------------------------------

You: what is claude?

🧾 Answer:

I do not have enough information in the provided context.

References:
 - #17: What is PEP?

------------------------------

Next

With Chroma and LangChain we can build a nice little RAG application that answers questions based on posts on PythonFriday.dev. That is a great example on how we can build such a system for ourselves. Next week we add a user interface to get more of a ChatGPT experience.