#273: Query Excel Files With DuckDB
Excel is the backbone of many companies. While it comes with a nice format to share tabular data between departments, it is not ideal to work with it in an automated way. Even with the integrated Python and the development functionality, Excel feels something limited compared to what we can do with regular Python. In this post we use Excel only as a storage format and do the work in plain Python and DuckDB.
Load the Excel extension
To get the capabilities to work with Excel in DuckDB, we need to install the Excel extension. For that we can use this query as we learned in the last post to install and load the extension:
Read data from Excel
If we have a file named demo.xlsx, we can use this code to read the data and show the optimised view for the console:
If we want to be more specific on how we load the data, we can use this code:
This extended way of loading data is useful if we have several sheets with data or if we do not have a header column.
Create a table from Excel
We can create a table based on existing data the same way we did for a CSV file:
Query the data
We can use the data directly or load it into a table and then query an Excel sheet as we would do it with any other data source supported by DuckDB:
If you need help with the syntax, check out the post on querying data in DuckDB.
Write to Excel
To close the loop with Excel, we can filter data and then write it to (another) Excel file with this code:
I had to switch to the connection-based approach, since the direct one failed with various errors. Depending on the data you want to write and your Excel version, you may end up with a file that Excel needs to repair before it can open it (see issue #15).
Conclusion
With DuckDB we can not only work seamlessly with Python and the objects we have inside our application, but we can even run SQL against an Excel file. This gives us all the flexibility we need to solve problems in our day-to-day work.
This post concludes this short series on DuckDB. What we covered so far should give you a head start for your own exploration. We may return in the future, but first I need to cover a few topics that currently grab my attention.