#275: Print Pandas DataFrames as Markdown
When we work with Pandas, we keep our data in a DataFrame. This format works well with the whole workflow, but it is a bit challenging when we dump it to the console so that we then can use it in a document. Luckily for us, there is a nice helper to convert the DataFrame to Markdown.
The problem
We can create a DataFrame like this:
If we print it to the console, we get a nicely formatted table that uses spaces to align the different columns:
As long as we stay in the console, everything works nicely. But when we use this output and copy it somewhere with a different font type, our table no longer looks good, and we need to put in a lot of effort to fix it.
Install tabulate
There is a tiny helper called tabulate that extends the functionality of Pandas. We can install it with this command:
If we skip this step and go directly to the next chapter, we will end up with this error message:
ImportError: Missing optional dependency 'tabulate'. Use pip or conda to install tabulate.
df.to_markdown() as a solution
After we installed the optional dependency tabulate, we can use the df.to_markdown() function of Pandas to render our DataFrame in Markdown syntax to the console:
We now get this nicely formatted table that uses Markdown syntax:
| | Position | Product | Size |
|---:|-----------:|:----------|-------:|
| 0 | 1 | abcd | 90 |
| 1 | 2 | 56-8UI.L | 1000 |
| 2 | 3 | L1 | 3 |
If we do not want to see the index column of the DataFrame, we can pass index=False to the function:
| Position | Product | Size |
|-----------:|:----------|-------:|
| 1 | abcd | 90 |
| 2 | 56-8UI.L | 1000 |
| 3 | L1 | 3 |
This works nicely with everything that is text based. If you have a tool that works with Markdown, we can take the output from above and paste it directly into our Markdown file to display the data as a proper table:
| Position | Product | Size |
|---|---|---|
| 1 | abcd | 90 |
| 2 | 56-8UI.L | 1000 |
| 3 | L1 | 3 |
Leverage tabulate
We can pass additional parameters to the df.to_markdown() function that tell tabulate to use a different way to render our table – we are not limited to Markdown. We could use the parameter tablefmt="latex" and create a table using the LaTeX formatter:
Check the documentation for tabulate to see all the options we can choose from.
Conclusion
The df.to_markdown() function and the tabulate package turn our space separated tables into Markdown (or other formats), that we can use in our documents with ease. Even if we do not directly use Markdown, this change in the table formatting may still be a significant improvement