Pandas library in Python is used to work with dataframes which structure data in rows and columns. It is widely used in data analysis and machine learning.
Indexing is used to obtain a portion of the dataframe only. Indexing can support both ranges of rows and columns. We can index by passing in the number of rows or by mentioning column names.
To index dataframe by row numbers, we pass in the starting and ending row numbers inside the []
operator.
The syntax is as follows:
dataframe[start:end]
:
is called the range operator. It defines a range between the start and endpoints. Thus, it includes all elements within.
The end value is exclusive. It is not included in the subset of the dataframe.
Indexing a dataframe from 0:5 would return rows from row number 0 to row number 4.
If the start value is not present, indexing starts from the 0th index. All values are included from the start till the end value specified.
If the end value is not present, all values from the start until the end are included.
The code snippet below shows how rows can be indexed in Pandas:
import pandas as pd# Creating a dataframedf = pd.DataFrame({'Sports': ['Football', 'Cricket', 'Baseball', 'Basketball','Tennis', 'Table-tennis', 'Archery', 'Swimming', 'Boxing'],'Player': ["Messi", "Afridi", "Chad", "Johnny", "Federer","Yong", "Mark", "Phelps", "Khan"],'Rank': [1, 9, 7, 12, 1, 2, 11, 1, 1] })print("Original Dataframe")print(df)print('\n')print("Indexing Dataframe")print('\n')print(df[2:4]) # Both rangesprint('\n')print(df[:3]) # End value onlyprint('\n')print(df[6:]) # Start value only
We can also specify column names to get entire columns. To do so, we mention column names inside the []
operator.
The syntax is as follows:
dataframe[['column1', 'column2', 'column3']]
If there is more than one column, we must enclose them within []
, which indicates a collection of columns.
A single column can be indexed as follows:
dataframe["column1"]
However, this will return a series and not a dataframe. To convert it to a dataframe, we can enclose the single column inside the []
operator. The syntax will be as follows:
dataframe[["column1"]]
The code snippet below shows how columns can be indexed in Pandas:
import pandas as pd# Creating a dataframedf = pd.DataFrame({'Sports': ['Football', 'Cricket', 'Baseball', 'Basketball','Tennis', 'Table-tennis', 'Archery', 'Swimming', 'Boxing'],'Player': ["Messi", "Afridi", "Chad", "Johnny", "Federer","Yong", "Mark", "Phelps", "Khan"],'Rank': [1, 9, 7, 12, 1, 2, 11, 1, 1] })print("Original Dataframe")print(df)print('\n')print("Indexing Dataframe")print('\n')print(df["Player"]) # Single column returning a seriesprint('\n')print(df[["Player"]]) # Single column returning a dataframeprint('\n')print(df[["Player", "Rank"]]) # Multiple columns
Free Resources