Skip to content

#185: Creating DataFrames in Pandas

While working on my upcoming blog post on filtering data in Pandas, I noticed a little gap in my knowledge: How can we create a DataFrame without the help of a CSV file? Let us find out what options we have.

Turn a dictionary into a DataFrame

One straight-forward way to create a DataFrame is to use a dictionary. The key of the dictionary will be the name of the column, while the value (or list of values) will be put on separate rows below that column:

1
2
3
4
5
import pandas as pd

d = {'A': [1, 2, 3], 'B': [4, 5, 6]}
df = pd.DataFrame(data=d)
df

This gives us a two column wide and 3 rows long DataFrame:

The values for column A are vertically put below each other.

Turn a list of lists into a DataFrame

We can create a list containing other lists and turn that into a DataFrame:

1
2
3
4
5
6
7
8
rows = []
rows.append(['A', 11, 20])
rows.append(['B', 16, 28])
rows.append(['C', 14, 32])

column_names=['Option','Min','Max']
df = pd.DataFrame(rows, columns=column_names)
df

We are free how we name the columns and the items in the list stay in the order in which we put them into the list(s):

The values of the lists keep their order in the mapping of the columns and the rows.

Turn a NumPy ndarray into a DataFrame

If we process data and already are familiar with NumPy, we can use the ndarray to turn our data into a DataFrame:

1
2
3
4
5
import numpy as np

data = np.array([(1, 2, 3), (4, 5, 6), (7, 8, 9)])
df = pd.DataFrame(data, columns=['x', 'y', 'z'])
df

This takes the three values we have in each tuple and places them horizontally into our DataFrame:

The tuple (1,2,3) is placed in a way that 1 goes into the column x, 2 into y and 3 into z.

Be aware that this gives you a different placement for values than if you would use a dictionary.

Turn a CSV string into a DataFrame

If we have more data or really like CSV, we can create a special string with the StringIO class and put our CSV formatted values there. We then can use the read_csv() method on that string without the need to save our values into a file:

from io import StringIO

data = StringIO("""
A,B,C
1,2,3
4,5,6
7,8,9
""")

df = pd.read_csv(data)
df

This code allows us to keep doing what we already know and turn CSV into a DataFrame:

The first row in the CSV string is used as a header, the rest of the values stay in the same order as we specified them in the string.

Next

These 4 ways allow us to create a DataFrame in Pandas without the need of an additional CSV file. With this new knowledge we can next week experiment with the various options of Pandas to filter data in a DataFrame.