How to construct a DataFrame in pandas

The DataFrame object

In this shot, we will learn about a fundamental structure in pandas: the DataFrame. While a Series is essentially a column, a DataFrame is a multi-dimensional table made up of a collection of Series. DataFrames allow us to store and manipulate tabular data where rows consist of observations and columns represent variables.

There are several ways to create a DataFrame through the use of pd.DataFrame(). For example, we can:

  • pass multiple Series into the DataFrame object; or
  • convert a dictionary to a DataFrame; or
  • import data from a CSV file.

Let’s look at each of these methods in detail.

1. How to construct a DataFrame from a Series object

To create a DataFrame from a single Series, we can pass the Series object as input to the DataFrame creation method, along with an optional input parameter, column, which allows us to name the columns:

import pandas as pd
data_s1 = pd.Series([12, 24, 33, 15],
index=['apples', 'bananas', 'strawberries', 'oranges'])# 'quantity' is the name for our column
dataframe1 = pd.DataFrame(data_s1, columns=['quantity'])
print(dataframe1)

2. How to construct a DataFrame from a dictionary

We can construct a DataFrame from any list of dictionaries. Say we have a dictionary with countries, their capitals, and some other variablepopulation, size of that country, number of schools, etc..

import pandas as pd
dict = {"country": ["Norway", "Sweden", "Spain", "France"],
"capital": ["Oslo", "Stockholm", "Madrid", "Paris"],
"SomeColumn": ["100", "200", "300", "400"]}
data = pd.DataFrame(dict)
print(data)

We can also construct a DataFrame from a dictionary of Series objects. Say we have two different Series: one for the price of fruits and one for their quantity. We want to put all the fruits-related data together into a single table. We can do this like so:

import pandas as pd
quantity = pd.Series([12, 24, 33, 15],
index=['apples', 'bananas', 'strawberries', 'oranges'])
price = pd.Series([4, 4.5, 8, 7.5],
index=['apples', 'bananas', 'strawberries', 'oranges'])
df = pd.DataFrame({'quantity': quantity,
'price': price})
print(df)

3. How to construct a Dataframe through imported data from a file

It’s quite simple to load data from various file formatse.g., CSV, Excel, JSON into a DataFrame.

We will import actual data to analyze the IMDB-movies dataset in the next lesson.

Here is what loading data from different file formats looks like in code:

import pandas as pd 
# Given we have a file called data1.csv in our working directory:
df = pd.read_csv('data1.csv')
#given JSON data
df = pd.read_json('data2.json')

Free Resources