#299: A Minimalistic FAQ Bot
A few weeks back I stumbled upon a course on Python chatbots by John Bura. I am still impressed by how little it takes to create a FAQ bot that answers your questions without burning resources with an LLM. Let us see how this minimalistic bot works.
Where the magic happens: vectors
The bot we are going to build uses vectors for its magical abilities to answer questions. We create an embedding of the question the user asks and use this vector in a similarity search with the pre-vectorised list of questions and answers of our FAQ. We pick the closest question of the FAQ and response with the answer part of this entry in the FAQ. This gives us an astounding result without burning GPU cycles with an LLM.
FAQs for the FAQ bot
As a first step we need a list of questions and answers that we can hand to our FAQ bot. Without this data, the bot has nothing to work with and is unable to answer our questions. For this post we go with these 10 questions and put them in the faq.csv file:
| Question | Answer |
|---|---|
| "What is Python Friday?" | "Python Friday is a weekly blog series created by Johnny Graber — an ongoing lab journal of hands‑on Python experiments, mini‑tutorials, and code snippets on practical topics and tools." |
| "Who writes it, and why did it start?" | "Johnny Graber began posting weekly in early 2020 (on his original blog Improve & Repeat), then moved the series to its own site when it surpassed 250 episodes — driven by a goal to learn something new each Friday." |
| "How often are new blog posts published?" | "New posts are published roughly once every Friday (i.e. one new Python Friday post per workweek), with occasional skips around public holidays." |
| "What topics does Python Friday cover?" | "The blog spans a wide spectrum: categories include AI, API, analytics, async, testing, web, data‐visualisation, development, security, and more — covering libraries like Flask, FastAPI, pandas, pytest, Docker, and LLMs." |
| "How are topics organized on the site?" | "Each post is tagged with specific keywords (e.g. flask, docker, pandas) and categorized under broader buckets like web, testing, AI, analytics, and async, helping readers filter by interest." |
| "Is Python Friday suitable for beginners or more advanced users?" | "It’s aimed at both: some ‘basics’ and education posts walk through fundamentals, while other posts use advanced features like async, Docker, testing frameworks, database persistence, and LLM integrations." |
| "Does it include tutorials and code examples?" | "Yes — there’s a GitHub repo with code examples for many posts, and each article often includes working snippets, demos, and guided projects (e.g. a Flask or FastAPI API, or Docker container)." |
| "Does it cover modern Python tools like Docker, pytest, and LLMs?" | "Definitely—recent posts explore using Docker containers, writing automated tests with pytest, async programming, and even integrating small LLMs or APIs, under relevant tags like API, testing, and AI." |
| "How can I browse posts by topic?" | "You can use the Tags page to filter posts by library or feature (e.g. flask, pandas, pytest) or browse the main categories list via the site’s sidebar/nav menu." |
| "Can readers suggest topics or participate?" | "Yes — the author asked readers for future topic suggestions in the #52 retrospective post (“What topics interest you?”), and the site offers links to contact the author." |
As you find out when you work with the bot, the better and more different your questions are, the better the results you get. Therefore, I suggest you start with a few questions, write the bot and then iterate on the questions until you are happy with the results you get.
Tools we need
We need Pandas, NumPy and scikit-learn for our little FAQ bot. We can install the packages with this command:
Load the FAQ
In the first part of our bot, we need to load our CSV file into a Pandas DataFrame. That way we have a structure that we can access by index and use as the brain of our bot.
We then use the TfidfVectorizer from scikit-learn to turn our questions into vectors and store them in the variable vectorized_questions:
Answer the user questions
The second part of our bot gives us the interactivity. We have a loop that allows the user to enter a question. We take this question, use the same vectorizer as we did in part one and turn the question into a vector. We can now see which vectors of our FAQ are similar to the vector representing the question of the user. When we find a match, we go back to our DataFrame with the FAQ and get the answer part of the entry:
Run the bot
We can now run the bot and ask questions about the Python Friday blog. Here is a session I did that shows how the bot answered my questions:
FAQ Bot is ready. Type your question or 'exit' to quit.
Question?
What is Python Friday?
Answer:
Python Friday is a weekly blog series created by Johnny Graber — an ongoing
lab journal of hands‑on Python experiments, mini‑tutorials, and code snippets
on practical topics and tools.
Question?
Why does Johnny write it?
Answer:
Johnny Graber began posting weekly in early 2020 (on his original blog Improve
& Repeat), then moved the series to its own site when it surpassed 250 episodes
— driven by a goal to learn something new each Friday.
Question?
Does it cover Docker?
Answer:
Definitely—recent posts explore using Docker containers, writing automated tests
with pytest, async programming, and even integrating small LLMs or APIs, under
relevant tags like API, testing, and AI.
Question?
How about web programming?
Answer:
You can use the Tags page to filter posts by library or feature (e.g. flask,
pandas, pytest) or browse the main categories list via the site’s sidebar/nav
menu.
Question?
Why is the sky blue?
Answer:
Python Friday is a weekly blog series created by Johnny Graber — an ongoing
lab journal of hands‑on Python experiments, mini‑tutorials, and code snippets
on practical topics and tools.
This comes directly from the script, all I did was to remove the output about the similarity of the questions to improve the readability. As you can see, for most questions we got useful answers. Only when we ask things that are not in the FAQ, we end up with answers that have nothing to do with the question of the user. Not bad for such a basic bot.
Limitations
This bot relies solely on information from the FAQ, there is no LLM or other artificial intelligence tool generating the answers. Therefore, the bot does not know anything beyond the FAQ and will answer with an entry from the FAQ even if the question the user asked is about something else entirely. That is the price we must pay for such a simplistic approach.
We can improve this behaviour a bit with a similarity threshold. If the best fit for the question stays below, we can answer with a pre-defined sentence as a reminder that this is a bot for a specific application or web site. That way we get a better user experience without increasing the complexity too much. However, finding the right threshold is not that easy and requires a lot of experimentation.
Conclusion
With the right concept we can create a bot that answers questions without needing an LLM or another AI tool. The better and more diverse our questions in the FAQ are, the better the answers we can provide to our users.
As impressive this bot is, we quickly end up with more requirements that require a full Retrieval Augmented Generation (RAG) system. Over the next weeks we explore the different parts we need to create our own RAG, but first we take a small break and celebrate post #300.