Skip to content

#331: Short-Term Memory in LangGraph

A while back we added a chat history to our LLM client so that we did not need to constantly repeat ourselves. This problem is now back in our LangGraph workflows. Luckily for us, LangGraph has a simple way to add memory to our workflows. Let us see how we can do that.

Update LangGraph

Before we dive into the code, make sure that you have a current version of LangGraph:

uv pip install -U langgraph langchain-openai langchain-core

Upcoming changes

When we use the InMemorySaver directly, we end up with warning messages. Unfortunately, LangChain already throws a warning about an upcoming change that is not yet implemented in the other tools. Therefore, we need to take a few extra steps when we initialise our InMemorySaver:

import sys
from langchain_openai import ChatOpenAI
from langgraph.checkpoint.memory import InMemorySaver
from langgraph.checkpoint.serde.jsonplus import JsonPlusSerializer
from langgraph.graph import START, END, MessagesState, StateGraph

sys.stdout.reconfigure(encoding="utf-8")

serializer = JsonPlusSerializer()
serializer.allowed_objects = ["messages", "core"]
memory = InMemorySaver(serde=serializer)

Build the workflow

With the initialised InMemorySaver in place, we can wire-up the LLM and generate a minimalistic workflow around the chatbot function:

1
2
3
4
5
6
7
8
9
workflow = StateGraph(state_schema=MessagesState)

workflow.add_node("chatbot", chatbot)

workflow.add_edge(START, "chatbot")
workflow.add_edge("chatbot", END)

# Use the configured memory object
graph = workflow.compile(checkpointer=memory)

The only noticeable difference to our usual way to build the workflow is the checkpointer=memory parameter in the compile() function. Here we say LangGraph to use our InMemorySaver and take the first important step to remember our session.

Interact with the workflow

The second part to get memory for our session is to use the thread_id when we interact with our graph. That way the magic inside LangGraph can take over and track the memory, nicely separated by thread:

def ask(question: str, thread_id: str) -> str:
    config = {"configurable": {"thread_id": thread_id}}
    result = graph.invoke({"messages": [{"role": "user", "content": question}]}, config)
    return result["messages"][-1].content


# Thread A: the model is told a name, then asked to recall it on the next turn.
print("Thread A, turn 1:", ask("Hi! My name is Johnny.", thread_id="A"))
print("Thread A, turn 2:", ask("What is my name?", thread_id="A"))

# Thread B: same question, fresh thread_id — checkpointer has no history, so the
# model cannot answer. This is the point: short-term memory is *thread-scoped*.
print("Thread B, turn 1:", ask("What is my name?", thread_id="B"))

If we run our script, in thread A our script knows our name, while in thread B that context is unknown:

Thread A, turn 1: Hey Johnny! 👋 How’s it going? What can I help you with today?
Thread A, turn 2: Your name is Johnny.
Thread B, turn 1: I don’t actually have any way to know your name—unless you tell me!

Next

Adding short-term memory to a LangGraph application is as easy as to use an in-memory checkpointer and always specifying a thread_id. With those two components LangGraph can do its magic, and we do no longer need to constantly repeat ourselves.

Next week we build on top of this approach and write our own long-term memory solution.