#269: Introducing DuckDB
DuckDB is an in-process SQL OLAP (Online Analytical Processing) database designed for efficient analytical queries. Known as the "SQLite for analytics", it integrates seamlessly with Python, making it ideal for data analysis. Let us explore the benefits it can bring to our tool box. We focus on the Python integration, but you can use DuckDB with many more programming languages.
Why choose DuckDB?
The main benefits of DuckDB for me are these points:
- Simplicity and Integration: DuckDB runs within the host application. Therefore, we do not need a separate database server and can run SQL queries on our data structures (like Pandas DataFrames) without any additional work.
- Performance: Optimized for analytical queries, DuckDB utilizes a columnar storage format and vectorized query execution, enabling rapid data processing even on large datasets that are larger than our memory.
- Flexibility: DuckDB supports various data formats, including CSV, Parquet, and JSON, and can query data from local files or cloud storage services like S3.
Where DuckDB is not the right fit
DuckDB is made for analytics, not for transactions. If you want to write a lot of small transactions, you will not be happy with the performance you get. You can use it that way, but other systems like SQLite, are much better suited for such a workload.
Installation
We can install DuckDB using uv or pip:
Querying data
The main reason to use DuckDB is to query data, where we can choose from these options: • Query a (CSV) file directly:
• Read a (CSV) file into a relation and then query it: • Query a Pandas data frame:If we run any of the options from above, the show() method will give us a console friendly output:

Work with the result
If we want to do something more than just show the result, we can use the fetchall() method and then iterate through our result:
DuckDB returns results in batches, which is great for handling large result sets. However, when working with smaller sets or for exploratory purposes, this approach may not be the most intuitive.
Convert results into other formats
The fetchall() turns the result into Python objects. If we want to use the result with other Python tools, we can go for a more specific converter that can give us the data back as a Pandas or Polars data frame, an Arrow table or a NumPy array:
Just make sure that you import the tool you want to export to - otherwise you end up with a ModuleNotFoundError exception.
Write results to the disk
If we want to store the result of our query as a CSV or a Parquet file, we do not need to take a detour through Pandas. Instead, we can write the result directly to the disk:
Next
With DuckDB we get a tool that is perfectly integrated into Python. We can use it without any set-up and directly query all the data we have in our variables or files. So far, we used the in-memory part of DuckDB. Next week we look at tables and how we can persist them in DuckDB.