Skip to content

#335: Handle the Raspberry-Test in LangGraph

A common way to test the "quality" of an AI solution is to ask for how many r’s are in the word raspberry. LLMs are notoriously bad in such questions, but that does not mean we have to accept defeat with our AI application. Let us figure out how we can handle these types of tests.

Why are these tests useless?

A LLM does not store words and letters. Instead, all it knows are tokens, embeddings and a lot of statistics about the next token to choose from. With this knowledge base, there is simply no way to answer the question.

Why are these kinds of tests so popular? They are easy to do, and they nearly always give the desired result of a bad looking AI. The number of letters is mostly wrong, but there are even worse outcomes. Last month, Google’s AI Overview tool produced this mind-boggling bad answer:

User: "How many Ps are in Google?"

AI Overview: There are 2 “P”s in the word “Google”.

Let us see how we can prevent such terrible answers inside our LangGraph applications.

Create a reusable subgraph

A check for this kind of tests will be a good candidate for a reusable solution. In last week’s post we created a subgraph and we can build on top of this knowledge and add a little bit more code.

We crate our letter_counter module by putting the code in a file named letter_counter.py. We start with a regular expression to check for questions that we can answer with this module and use the function handles_letter_counter() to check the user input.

The function make_letter_counter() is our factory method that will build us a subgraph that wraps its helper functions to do the counting in Python and formats the answer. The parameters input_key and output_key help us to integrate this subgraph into the larger graph of our application.

The main method glues the LLM with our functions together and runs with the default questions about the raspberry if we do not pass another question:

import re
import sys
from typing import TypedDict

from langchain_openai import ChatOpenAI
from langgraph.graph import START, END, StateGraph

sys.stdout.reconfigure(encoding="utf-8")


LETTER_COUNT_RE = re.compile(
    r"how many ['\"]?(\w)['\"]?'?s?\s+(?:are |is )?in\s+(?:the word )?['\"]?(\w+)",
    re.IGNORECASE,
)


def handles_letter_counter(text: str) -> bool:
    """True if this helper can answer `text` (a 'how many <letter> in <word>' question)."""
    return LETTER_COUNT_RE.search(text) is not None


def make_letter_counter(llm: ChatOpenAI, input_key: str = "question", output_key: str = "answer"):
    State = TypedDict(
        "LetterCounterState",
        {input_key: str, "letter": str, "word": str, "count": int, output_key: str},
    )

    def parse_question(state: State) -> dict:
        question = state[input_key]
        print(f"[parse_question] LLM extracting (letter, word) from: {question!r}")
        response = llm.invoke([
            ("system", "Extract the single letter and the single word the user "
                       "wants counted. Reply with EXACTLY the letter, a space, "
                       "then the word. No quotes, no extra text. "
                       "Example: 'how many r in raspberry?' -> r raspberry"),
            ("user", question),
        ])
        parts = response.content.strip().split()
        if len(parts) == 2 and len(parts[0]) == 1 and parts[1].isalpha():
            letter, word = parts[0], parts[1]
        else:
            m = LETTER_COUNT_RE.search(question)
            if not m:
                raise ValueError(f"Could not parse a letter and word from: {question!r}")
            print("[parse_question] LLM output unusable, used regex fallback")
            letter, word = m.group(1), m.group(2)
        return {"letter": letter, "word": word}

    def count_letter(state: State) -> dict:
        letter = state["letter"]
        word = state["word"]
        count = word.lower().count(letter.lower())
        print(f"[count_letter] Python counted {letter!r} in {word!r}: {count}")
        return {"count": count}

    def format_answer(state: State) -> dict:
        letter = state["letter"]
        word = state["word"]
        count = state["count"]
        sentence = f"There are {count} '{letter.lower()}'s in '{word.lower()}'."
        print(f"[format_answer] {sentence}")
        return {output_key: sentence}

    g = StateGraph(State)

    g.add_node("parse_question", parse_question)
    g.add_node("count_letter", count_letter)
    g.add_node("format_answer", format_answer)

    g.add_edge(START, "parse_question")
    g.add_edge("parse_question", "count_letter")
    g.add_edge("count_letter", "format_answer")
    g.add_edge("format_answer", END)

    return g.compile()

def main() -> None:
    llm = ChatOpenAI(
        base_url="http://localhost:1234/v1",
        api_key="lm-studio",
        model="openai/gpt-oss-20b",
        temperature=0.1,
    )

    letter_counter_subgraph = make_letter_counter(llm)

    question = sys.argv[1] if len(sys.argv) > 1 else "how many r are in raspberry?"

    result = letter_counter_subgraph.invoke({"question": question})

    print("\n--- ANSWER ---")
    print(result["answer"])
    print("--- /ANSWER ---")


if __name__ == "__main__":
    main()

We can run this script on its own to check if everything works as expected:

$ python .\letter_counter.py

[parse_question] LLM extracting (letter, word) from: 'how many r are in raspberry?'
[count_letter] Python counted 'r' in 'raspberry': 3
[format_answer] There are 3 'r's in 'raspberry'.

--- ANSWER ---
There are 3 'r's in 'raspberry'.
--- /ANSWER ---

Use the subgraph

In a different file we can now build our LangGraph application as usual and import our letter_counter module. We need the two functions make_letter_counter() and handles_letter_counter() for the routing and the counting.

After connecting our LLM, we can initialise the letter_counter node and create a route() function that routes the traffic within a conditional edge inside our graph. We build our graph as we did before, and with the conditional edge we rout everything about counting letter to our letter_counter node, while everything else goes to the chat node that calls the LLM to answer our questions.

import sys
from typing import TypedDict

from langchain_openai import ChatOpenAI
from langgraph.graph import START, END, StateGraph

from letter_counter import make_letter_counter, handles_letter_counter

sys.stdout.reconfigure(encoding="utf-8")

llm = ChatOpenAI(
    base_url="http://localhost:1234/v1",
    api_key="lm-studio",
    model="openai/gpt-oss-20b",
    temperature=0.1,
)


class ChatState(TypedDict):
    prompt: str
    reply: str


letter_counter = make_letter_counter(llm, input_key="prompt", output_key="reply")


def route(state: ChatState) -> str:
    decision = "count" if handles_letter_counter(state["prompt"]) else "chat"
    print(f"[route] -> {decision}")
    return decision


def chat(state: ChatState) -> dict:
    print("[chat] Answering with the raw LLM ...")
    response = llm.invoke([("user", state["prompt"])])
    return {"reply": response.content}


workflow = StateGraph(ChatState)

workflow.add_node("letter_counter", letter_counter)
workflow.add_node("chat", chat)

workflow.add_conditional_edges(START, route, {"count": "letter_counter", "chat": "chat"})
workflow.add_edge("letter_counter", END)
workflow.add_edge("chat", END)

graph = workflow.compile()
png_bytes = graph.get_graph(xray=1).draw_mermaid_png()
with open("subgraph_with_letter_counter.png", "wb") as f:
    f.write(png_bytes)


def main() -> None:
    question = sys.argv[1] if len(sys.argv) > 1 else "how many r are in raspberry?"
    result = graph.invoke({"prompt": question})

    print("\n--- ANSWER ---")
    print(result["reply"])
    print("--- /ANSWER ---")


if __name__ == "__main__":
    main()

This gives us this graphical representation of our application:

We can see two branches, the branch with the chat node for LLM questions and the branch for the letter counting with all its parts of the subgraph.

We can now run our application and check if it can handle the raspberry and still gives useful answers on similar but different questions:

$ python .\subgraph_with_letter_counter.py
[route] -> count
[parse_question] LLM extracting (letter, word) from: 'how many r are in raspberry?'
[count_letter] Python counted 'r' in 'raspberry': 3
[format_answer] There are 3 'r's in 'raspberry'.

--- ANSWER ---
There are 3 'r's in 'raspberry'.
--- /ANSWER ---


$ python .\subgraph_with_letter_counter.py "how many calories in an apple?"
[route] -> chat
[chat] Answering with the raw LLM ...

--- ANSWER ---
A medium‑sized (about 182 g) apple contains roughly **95–100 calories**.
- Small apple (~149 g): ~80 cal
- Large apple (~223 g): ~115 cal

The exact number varies with variety, ripeness, and size, but most nutrition labels list about 95 kcal for a standard medium apple.
--- /ANSWER ---

Next

Since people will not stop using silly questions like the number of r's in raspberry, we better make sure that our solutions can handle it. As we saw in this post, it takes not much effort to prevent disastrous answers like the number of p's in Google – we just need to go the extra mile and create a reusable subgraph once and use it in our applications.

Next week we explore our options to turn our tools into a MCP server and use it with an LLM.