#274: Working With Excel Files in Pandas
Last week we saw how we could work seamlessly with Excel in DuckDB. If you have Pandas and want to get a similar experience, this post is for you.
Install openpyxl
If we try to open an Excel file with Pandas, we may run into this error:
ImportError: Missing optional dependency 'openpyxl'. Use pip or conda to install openpyxl.
We can fix this problem by installing openpyxl with this command:
Optional: Install pyarrow
If you want more than int and object as the data types for your columns of your Excel file, you should install pyarrow as well:
Load an Excel file into a data frame
To load an Excel file with the first column as its index, we can use this command:
This gives us a data frame that we can work with like we did with any other data frame:
In post #186 I collected the most helpful ways to filter data in Pandas.
Store a data frame in an Excel file
We can use this command to write a data frame to an Excel file:
The index=True keeps the index column while sheet_name='GB' sets the name of the sheet:

If you need more control when writing the Excel file, the library XlsxWriter may be a great starting place.
Conclusion
With the help of openpyxl and pyarrow we can access Excel files from Pandas and use them like any other data source. This is an immense help if you must work with Excel files and want to automate your manual workflow.