How to use pandas to inspect data

Data Science is the process of analyzing data to extract useful trends and information. This helps an organization improve the efficiency at which it operates.

pandas is a Python library that allows for the effectual inspection of data. It manipulates data mainly in the form of two-dimensional data structures called data frames.

import pandas as pd
#define the two-dimensional data
data = {
"p_ID": [1,2,3,4,5],
"p_name": ['Apple','Orange','Banana','Mango','Peach'],
"p_price": [12,10,8,15,14],
"s_ID": [1,1,2,3,1],
"p_number": [120,105,90,95,115]
}
#load the defined data into a dataFrame object
df = pd.DataFrame(data)
print(df)

There is a multitude of data frame methods that we can use to inspect data. The head() method returns the first five rows of the data frame, while the tail() method returns the last five rows. An INT argument can be passed in both these methods varying the number of rows returned.

print(df.head(3)) #return first three rows from the top
print(df.tail(2)) #return first two rows from the bottom

pandas has a columns attribute that returns the column names of the respective data frame.

print(df.columns) #print the column names of the data frame

The shape method returns the shape of the data frame in the form of a tuple. The tuple holds the number of rows as its first value and the number of columns as its second.

print(df.shape) #print the shape of the data frame

To get summarized statistics about the columns in our data frame, pandas provides the describe() method. This method only includes columns with numeric values.

print(df.describe()) #get statistical summary

Free Resources