#299: A Minimalistic FAQ Bot

A few weeks back I stumbled upon a course on Python chatbots by John Bura. I am still impressed by how little it takes to create a FAQ bot that answers your questions without burning resources with an LLM. Let us see how this minimalistic bot works.

Where the magic happens: vectors

The bot we are going to build uses vectors for its magical abilities to answer questions. We create an embedding of the question the user asks and use this vector in a similarity search with the pre-vectorised list of questions and answers of our FAQ. We pick the closest question of the FAQ and response with the answer part of this entry in the FAQ. This gives us an astounding result without burning GPU cycles with an LLM.

FAQs for the FAQ bot

As a first step we need a list of questions and answers that we can hand to our FAQ bot. Without this data, the bot has nothing to work with and is unable to answer our questions. For this post we go with these 10 questions and put them in the faq.csv file:

Question	Answer
"What is Python Friday?"	"Python Friday is a weekly blog series created by Johnny Graber — an ongoing lab journal of hands‑on Python experiments, mini‑tutorials, and code snippets on practical topics and tools."
"Who writes it, and why did it start?"	"Johnny Graber began posting weekly in early 2020 (on his original blog Improve & Repeat), then moved the series to its own site when it surpassed 250 episodes — driven by a goal to learn something new each Friday."
"How often are new blog posts published?"	"New posts are published roughly once every Friday (i.e. one new Python Friday post per workweek), with occasional skips around public holidays."
"What topics does Python Friday cover?"	"The blog spans a wide spectrum: categories include AI, API, analytics, async, testing, web, data‐visualisation, development, security, and more — covering libraries like Flask, FastAPI, pandas, pytest, Docker, and LLMs."
"How are topics organized on the site?"	"Each post is tagged with specific keywords (e.g. flask, docker, pandas) and categorized under broader buckets like web, testing, AI, analytics, and async, helping readers filter by interest."
"Is Python Friday suitable for beginners or more advanced users?"	"It’s aimed at both: some ‘basics’ and education posts walk through fundamentals, while other posts use advanced features like async, Docker, testing frameworks, database persistence, and LLM integrations."
"Does it include tutorials and code examples?"	"Yes — there’s a GitHub repo with code examples for many posts, and each article often includes working snippets, demos, and guided projects (e.g. a Flask or FastAPI API, or Docker container)."
"Does it cover modern Python tools like Docker, pytest, and LLMs?"	"Definitely—recent posts explore using Docker containers, writing automated tests with pytest, async programming, and even integrating small LLMs or APIs, under relevant tags like API, testing, and AI."
"How can I browse posts by topic?"	"You can use the Tags page to filter posts by library or feature (e.g. flask, pandas, pytest) or browse the main categories list via the site’s sidebar/nav menu."
"Can readers suggest topics or participate?"	"Yes — the author asked readers for future topic suggestions in the #52 retrospective post (“What topics interest you?”), and the site offers links to contact the author."

As you find out when you work with the bot, the better and more different your questions are, the better the results you get. Therefore, I suggest you start with a few questions, write the bot and then iterate on the questions until you are happy with the results you get.

Tools we need

We need Pandas, NumPy and scikit-learn for our little FAQ bot. We can install the packages with this command:

uv pip install pandas numpy scikit-learn

Load the FAQ

In the first part of our bot, we need to load our CSV file into a Pandas DataFrame. That way we have a structure that we can access by index and use as the brain of our bot.

We then use the TfidfVectorizer from scikit-learn to turn our questions into vectors and store them in the variable vectorized_questions:

"""
FAQ Bot based on the course "Python Chatbot Bootcamp with Pandas, NumPy and SciKit"
https://mammothclub.com/course-learn/python-chatbot-bootcamp-with-pandas-numpy-and-scikit
"""

import pandas
import numpy
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

# Load the FAQ data
df = pandas.read_csv("faq.csv")
print(df)
df.dropna(inplace=True)

# Initialize the vectorizer and fit it to the questions and answers
vectorizer = TfidfVectorizer()
vectorizer.fit(numpy.concatenate((df.Question,
                                  df.Answer)))

# Embed the questions into a vectorized format
vectorized_questions = vectorizer.transform(df.Question)
print(vectorized_questions)

Answer the user questions

The second part of our bot gives us the interactivity. We have a loop that allows the user to enter a question. We take this question, use the same vectorizer as we did in part one and turn the question into a vector. We can now see which vectors of our FAQ are similar to the vector representing the question of the user. When we find a match, we go back to our DataFrame with the FAQ and get the answer part of the entry:

print("FAQ Bot is ready. Type your question or 'exit' to quit.")
while True:
    user_input = input("Question? ")
    if user_input.lower() == 'exit':
        print("Exiting the chat. Goodbye!")
        break
    print(user_input)

    # Vectorize the user input and compute similarities with the questions
    vectorized_input = vectorizer.transform([user_input])
    similarities = cosine_similarity(vectorized_input, vectorized_questions)

    # Find the closest question based on cosine similarity
    closest_question = numpy.argmax(similarities,
                                    axis=1)

    print("Similarities: ", similarities)
    print("Closest question: ", closest_question)

    answer = df.Answer.iloc[closest_question].values[0]
    print("Answer:\n", answer)

Run the bot

We can now run the bot and ask questions about the Python Friday blog. Here is a session I did that shows how the bot answered my questions:

FAQ Bot is ready. Type your question or 'exit' to quit.
Question?
What is Python Friday?

Answer:
 Python Friday is a weekly blog series created by Johnny Graber — an ongoing 
 lab journal of hands‑on Python experiments, mini‑tutorials, and code snippets 
 on practical topics and tools.

Question?
Why does Johnny write it?

Answer:
 Johnny Graber began posting weekly in early 2020 (on his original blog Improve 
 & Repeat), then moved the series to its own site when it surpassed 250 episodes 
 — driven by a goal to learn something new each Friday.

Question?
Does it cover Docker?

Answer:
 Definitely—recent posts explore using Docker containers, writing automated tests 
 with pytest, async programming, and even integrating small LLMs or APIs, under 
 relevant tags like API, testing, and AI.

Question?
How about web programming?

Answer:
 You can use the Tags page to filter posts by library or feature (e.g. flask, 
 pandas, pytest) or browse the main categories list via the site’s sidebar/nav 
 menu.

Question?
Why is the sky blue?

Answer:
 Python Friday is a weekly blog series created by Johnny Graber — an ongoing 
 lab journal of hands‑on Python experiments, mini‑tutorials, and code snippets 
 on practical topics and tools.

This comes directly from the script, all I did was to remove the output about the similarity of the questions to improve the readability. As you can see, for most questions we got useful answers. Only when we ask things that are not in the FAQ, we end up with answers that have nothing to do with the question of the user. Not bad for such a basic bot.

Limitations

This bot relies solely on information from the FAQ, there is no LLM or other artificial intelligence tool generating the answers. Therefore, the bot does not know anything beyond the FAQ and will answer with an entry from the FAQ even if the question the user asked is about something else entirely. That is the price we must pay for such a simplistic approach.

We can improve this behaviour a bit with a similarity threshold. If the best fit for the question stays below, we can answer with a pre-defined sentence as a reminder that this is a bot for a specific application or web site. That way we get a better user experience without increasing the complexity too much. However, finding the right threshold is not that easy and requires a lot of experimentation.

Conclusion

With the right concept we can create a bot that answers questions without needing an LLM or another AI tool. The better and more diverse our questions in the FAQ are, the better the answers we can provide to our users.

As impressive this bot is, we quickly end up with more requirements that require a full Retrieval Augmented Generation (RAG) system. Over the next weeks we explore the different parts we need to create our own RAG, but first we take a small break and celebrate post #300.