#302: Create a LLM Client With Chat History

LLMs work state-less. That way they gain scalability, but we as users end up with a problem: The chat bot does not know anything about the answer it gave to the previous question. If we try to ask a follow-up question, we only end up with something like this:

How many entries are there?

Could you please clarify what you are referring to when you ask about the number of entries? Are you asking about entries in a list, dictionary, or some other data structure? Or perhaps you are referring to something else entirely? Providing more context will help me give you an accurate and concise response.

Let us see how we can use LangChain to add conversational memory to a chat bot.

The old chat bot

In post #278 we wrote this little chatbot that allows us to ask multiple questions:

from openai import OpenAI

client = OpenAI(base_url="http://127.0.0.1:1234/v1", api_key="not_needed")
prompt = "Please ask your question or 'end' to quit: "

question = input(prompt)
print("You entered: " + question)

while question.strip() != "end":
    stream = client.chat.completions.create(
        messages=[
            {
                "role": "system",
                "content": "You are a Python expert. Provide accurate and concise responses."
            },
            {
                "role": "user",
                "content": question,
            },        
        ],
        model="gpt-4o",
        stream=True,
    )
    for chunk in stream:
        print(chunk.choices[0].delta.content or "", end="")

    question = input(f"\n\n{prompt}")

Unfortunately, it does not remember our conversation and is unable to answer follow-up questions. While we could extend this bot with our own low-level memory implementation, we switch track and rewrite the bot with LangChain.

Install packages

For our LangChain bot we need these packages:

uv pip install -U langchain langchain_openai

A LangChain bot with memory

Last week we had an example bot that translates English text to French. We reuse a few bits from that solution, change the prompt to answer questions, and add a custom handler to keep track of (multiple) sessions. While this flexibility is a bit too much for a minimalistic CLI tool, it is necessary when we move our bot to a web application. The script ends with the loop that allows us to enter multiple questions:

from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_core.runnables.history import RunnableWithMessageHistory
from langchain_core.chat_history import InMemoryChatMessageHistory

# 1. Define the LLM
llm = ChatOpenAI(
    model="mistral",
    openai_api_base="http://localhost:1234/v1",
    openai_api_key="not-needed",
    temperature=0.7
)

# 2. Define a prompt that includes chat history + user input
prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a Python expert. Provide accurate and concise responses."),
    MessagesPlaceholder(variable_name="history"),   # stores past conversation
    ("human", "{input}")                            # current user message
])

# 3. Create a chain
chain = prompt | llm

# 4. Message history store (per session)
chat_histories = {}

def get_history(session_id: str) -> InMemoryChatMessageHistory:
    if session_id not in chat_histories:
        chat_histories[session_id] = InMemoryChatMessageHistory()
    return chat_histories[session_id]

# 5. Wrap in RunnableWithMessageHistory
chat_with_history = RunnableWithMessageHistory(
    chain,
    get_history,
    input_messages_key="input",   # matches {input} in the prompt
    history_messages_key="history"
)

# 6. Loop for questions
session_id = "user1"

while True:
    user_input = input("You: ")
    if user_input.lower() in ["quit", "exit", "end"]:
        break
    response = chat_with_history.invoke(
        {"input": user_input},
        config={"configurable": {"session_id": session_id}}
    )
    print("Bot:", response.content)

What did we gain?

If we now run our bot, we can ask follow-up questions that our bot can relate to its previous answers:

You: what is the Zen of Python?
Bot: The Zen of Python is a collection of 19 aphorisms that capture the 
philosophy behind the Python programming language. It was written by Tim 
Peters, a Python community member. You can access it in a Python 
interpreter by typing `import this`. Here are the key points:

1. Beautiful is better than ugly.
2. Explicit is better than implicit.
3. Simple is better than complex.
4. Complex is better than complicated.
5. Flat is better than nested.
6. Sparse is better than dense.
7. Readability counts.
8. Special cases aren't special enough to break the rules.
9. Although practicality beats purity.
10. Errors should never pass silently.
11. Unless explicitly silenced.
12. In the face of ambiguity, refuse the temptation to guess.
13. There should be one-- and preferably only one --obvious way to do it.
14. Although that way may not be obvious at first unless you're Dutch.
15. Now is better than never.
16. Although never is often better than *right* now.
17. If the implementation is hard to explain, it's a bad idea.
18. If the implementation is easy to explain, it may be a good idea.
19. Namespaces are one honking great idea -- let's do more of those!

You: How many entries are there?
Bot: There are 19 entries in The Zen of Python.

I am not that happy with the own implementation of the conversation history. There are older examples with LangChain that looked more like what I had in mind. Unfortunately, LangChain version 1 removed a lot of those simpler approaches and the new way to do it requires even more knowledge and packages. That is the reason we leave it here for the time being.

Next week we explore our options to answer more current questions by creating a chat bot that can talk to a search engine.