#305: Chat With a Database in LangChain
The CSV bot we created last week uses a lot of unsafe practices. If you are not that happy with running Python code generated by an LLM on your machine, you may want to try a different approach. One approach I find helpful is to load the CSV file into an SQLite database and then use a SQL agent to query the data. Let us find out how we can do that.
Installation
We do not need to put our code into a Docker container as we did last week. We can use our usual environment and make sure that we have these packages installed:
Create the SQL agent
We first need to get the data from the sample data set of Seaborn into a CSV file and then load it into an SQLite database. For the LLM we use a low temperature to get less variability on the answers. With that in place, we can instantiate the SQL agent that gets its ability to talk to the database from the SQLDatabaseToolkit. We finish our agent with the question loop that allows us to ask multiple questions.
Talk to the database
We can now start asking questions about our data and the SQL bot will answer it with a noticeable delay:
✅ SQL Agent ready! Ask me anything about the Titanic database.
Type 'exit' to quit.
🧠 Question: describe the data
💬 Answer: The database contains information about passengers on a ship voyage.
The table includes details such as whether the passenger survived (survived),
their class of travel (pclass), sex, age, number of siblings or spouses aboard
(sibsp), number of parents or children aboard (parch), fare paid (fare), port
of embarkation (embarked), travel class (class), a description of the passenger
(who), whether they were an adult male (adult_male), deck on which their cabin
was located (deck), town of embarkation (embark_town), whether they were alive
at the end of the voyage (alive), and whether they were alone aboard the ship
(alone).
Here are some sample rows from the passengers table:
| survived | pclass | sex | age | sibsp | parch | fare | embarked | class | who | adult_male | deck | embark_town | alive | alone |
|----------|--------|--------|------|-------|-------|----------|----------|---------|--------|------------|-------|-------------|-------|-------|
| 0 | 3 | male | 22.0 | 1 | 0 | 7.25 | S | Third | man | 1 | None | Southampton | no | 0 |
| 1 | 1 | female | 38.0 | 1 | 0 | 71.2833 | C | First | woman | 0 | C | Cherbourg | yes | 0 |
| 1 | 3 | female | 26.0 | 0 | 0 | 7.925 | S | Third | woman | 0 | None | Southampton | yes | 1 |
| 1 | 1 | female | 35.0 | 1 | 0 | 53.1 | S | First | woman | 0 | C | Southampton | yes | 0 |
| 0 | 3 | male | 35.0 | 0 | 0 | 8.05 | S | Third | man | 1 | None | Southampton | no | 1 |
| 0 | 3 | male | None | 0 | 0 | 8.4583 | Q | Third | man | 1 | None | Queenstown | no | 1 |
| 0 | 1 | male | 54.0 | 0 | 0 | 51.8625 | S | First | man | 1 | E | Southampton | no | 1 |
| 0 | 3 | male | 2.0 | 3 | 1 | 21.075 | S | Third | child | 0 | None | Southampton | no | 0 |
| 1 | 3 | female | 27.0 | 0 | 2 | 11.1333 | S | Third | woman | 0 | None | Southampton | yes | 0 |
| 1 | 2 | female | 14.0 | 1 | 0 | 30.0708 | C | Second | child | 0 | None | Cherbourg | yes | 0 |
This data provides a comprehensive overview of the passengers, their demographics,
and their travel details during the voyage.
🧠 Question: how many rows are there?
💬 Answer: There are 891 rows in the database.
🧠 Question: how many people embarked in which town?
💬 Answer: Here are the number of people who embarked from each town:
- None: 2
- Cherbourg: 168
- Queenstown: 77
- Southampton: 644
🧠 Question: how many people embarked in which port?
💬 Answer: The number of people who embarked from each port is as follows:
- Port C: 168 people
- Port Q: 77 people
- Port S: 644 people
There are also 2 people with no recorded embarkation port.
🧠 Question: what's the average age of survivors
💬 Answer: The average age of survivors is approximately 28.34 years.
As we can see with this conversation, the SQL bot gives much more concise answers and is less prone to make up answers. Another benefit of using SQLite is that we can run SQL statements on our own to verify the findings of the bot.
What happens here?
If you want to see what exactly goes on in the background, you can make this little change in the configuration for the SQL bot:
After activating the verbose output, we can see every step the agent takes to figure out how it can answer our question. We see the SQL statements; the conclusion it takes from it and how it plans its next step. As you can imagine, that takes up a lot of output and that is the reason while the chat bot is not the fastest when it comes to answering our questions.
Next
With the SQL bot we get a more reliable approach that can work with our structured data. Not only do we not run generated Python code, we also can easily verify the results by running our own SQL statements. I like this a lot more than the CSV bot and need to check how I can speed up the thinking time.
Next week we continue with our own data and see if we can create a bot to talk with a PDF file.