Pandas is a Python library used widely in the field of data science and machine learning. It helps manipulate and prepare numerical data to pass to the machine learning models. Pandas provides us with loc
and iloc
functions to select rows and columns from a pandas DataFrame.
In this Answer, we will look into the ways we can use both of the functions to select data from a DataFrame and highlight the key differences between them.
loc
vs iloc
In this section, we will explore a side-by-side difference between both functions.
loc[] | iloc[] | |
Abbreviation |
|
|
Definition | Selects rows and columns based on the labels (indexes and name). | Select rows and columns based on index numbers rather than column labels. |
Single value | df.loc["A"] OR df.loc[1] | df.iloc[1] |
List | df.loc[["A", "B", "E"]] | df.iloc[[1, 2, 4]] |
Slicing | df.loc["A" : "E", "W" : "Z"] *includes both labels (int or str) of the slice operator df.loc[included : included] | df.iloc[1 : 5 , 1 : 3] *includes the first but excludes the second label (int) of the slice operator df.iloc[included:excluded] |
Condition | df.loc[condition] | Cannot apply condition in the operator. |
Now that we have a clear understanding, let's see an illustration of selecting a DataFrame row using both functions.
In the illustration above, we access the second row of the DataFrame using the loc
and iloc
functions. Using the iloc
function, we provide the position of the row i.e., 1 in the square brackets []
of the function, whereas for the loc
function we provide the label of the row i.e., "B".
Throughout the Answer, we will be using the following example data to apply the loc
and iloc
accessors in our coding examples:
Name | Age | Country |
John | 20 | USA |
James | 30 | Canada |
Alex | 23 | Brazil |
Sara | 13 | Argentina |
Andrew | 42 | Australia |
Albert | 12 | England |
iloc
iloc
becomes useful when we want to access data using the numerical position of the rows and columns. Below, we will see some of the use cases where we can use the iloc
function:
The syntax to access columns and rows from a DataFrame is:
DataFrame.iloc[ row , column ]
In the syntax above, we specify the DataFrame
from which we want to access the rows and columns.
row
: We insert the row number that is to be accessed. The row number of a DataFrame starts from the 0 index.
column
: We insert the column number that is to be accessed. The column number of a DataFrame also starts from the 0 index.
Below is a code example to access specific rows and columns from a DataFrame.
import pandas as pd person_data = pd.DataFrame({ "Name": ["Albert", "James", "Alex", "Bob", "Sara", "Bill", "Daniel", "John"], "Country": ["USA", "USA", "USA", "Canada", "Canada", "Germany", "Germany", "Egypt"], "Age": [12, 14, 20, 25, 29, 45, 32, 60] }) print(person_data) print("Accessing 3rd row of the DataFrame:") print(person_data.iloc[3]) print("######################################") print("Accessing 3rd row and 1st column of the DataFrame:") print(person_data.iloc[3 , 0]) print("######################################") print("Accessing 3rd,5th and 6th rows and 1st column of the DataFrame:") print(person_data.iloc[[3,5,6]])
Line 1: We import the pandas
library so that we can create a DataFrame and apply the iloc
function on it.
Line 2: We create a DataFrame and store it in the person_data
variable.
Line 9: We print the complete DataFrame.
Line 12: We access row 3 of the DataFrame.
Line 16: We access row 3 and column 0 of the DataFrame.
Line 20: We access multiple rows by passing in a list of indices [3,5,6]
.
To access a range of columns and rows, we use a colon :
operator in the rows and columns. The syntax is given below:
DataFrame.iloc[rowstart : rowend , colstart : colend]
rowstart
: The starting index of the row range from where we want to access the rows.
rowend
: The ending index of the row range. The ending row index is not accessed; rather, the rows till rowend-1
indices are accessed.
colstart
: The starting index of the column range from where we want to access the columns.
colend
: The ending index of the column range. The ending column index is not accessed. Rather, the columns till colend-1
indices are accessed.
Below is a coding example to access a range of rows and columns.
import pandas as pd person_data = pd.DataFrame({ "Name": ["Albert", "James", "Alex", "Bob", "Sara", "Bill", "Daniel", "John"], "Country": ["USA", "USA", "USA", "Canada", "Canada", "Germany", "Germany", "Egypt"], "Age": [12, 14, 20, 25, 29, 45, 32, 60] }) print("Accessing range of rows from the DataFrame:") print(person_data.iloc[3:5]) print("######################################") print("Accessing range of rows and columns from the DataFrame:") print(person_data.iloc[3:5 , 0:2])
Line 10: We select a range of rows starting from index 3, ending till index 4 = colend - 1
Line 14: We select a range of rows and columns.
loc
loc
function is handy when accessing rows and columns based on meaningful labels rather than integer positions. Below, we will see some of the use cases where we can use the loc
function:
The syntax to access rows and columns is:
DataFrame.loc[rowlabel , collabel]
rowlabel
: We define the row label we want to select from the DataFrame.
collabel
: We define the column label we want to select from the DataFrame.
An example code for selecting specific rows and columns is given below:
import pandas as pd person_data = pd.DataFrame({ "Name": ["Albert", "James", "Alex", "Bob", "Sara", "Bill", "Daniel", "John"], "Country": ["USA", "USA", "USA", "Canada", "Canada", "Germany", "Germany", "Egypt"], "Age": [12, 14, 20, 25, 29, 45, 32, 60] }, index=["A" ,"B", "C", "D", "E", "F", "G", "H"]) print("Accessing row with label B:") print(person_data.loc["B"]) print("######################################") print("Accessing row with label B and column with label Country:") print(person_data.loc["B" , "Country"]) print("######################################") print("Accessing A, B and C labeled rows and column with Name and Age labels:") print(person_data.loc[["A" , "B" , "D"] , ["Name" , "Age"]])
Line 7: We give custom string indices to the DataFrame rows using the index
parameter.
Line 10: We access the row with the label B
.
Line 14: We access the cell containing the row with the label B
and the column with the label Country
.
Line 18: We access specific rows and columns by passing in the label names in a list.
To select a range of rows and columns using the loc
function, we use column and row splitting using the colon :
operator similar to the iloc
function.
The coding example is given below:
import pandas as pd person_data = pd.DataFrame({ "Name": ["Albert", "James", "Alex", "Bob", "Sara", "Bill", "Daniel", "John"], "Country": ["USA", "USA", "USA", "Canada", "Canada", "Germany", "Germany", "Egypt"], "Age": [12, 14, 20, 25, 29, 45, 32, 60] }, index=["A" ,"B", "C", "D", "E", "F", "G", "H"]) print("Accessing range of rows from the DataFrame:") print(person_data.loc["B" :"E"]) print("######################################") print("Accessing range of rows and columns from the DataFrame:") print(person_data.loc["B":"E" , "Name":"Country"])
Line 10: We pass a range of rows, starting from the row with the label B
and ending at the row with the label E
.
Line 14: We pass a range of rows and columns.
To apply a filter on a DataFrame, we can pass in a condition in the bracket []
of the loc
function. To explain this, we can see a coding example below:
import pandas as pd person_data = pd.DataFrame({ "Name": ["Albert", "James", "Alex", "Bob", "Sara", "Bill", "Daniel", "John"], "Country": ["USA", "USA", "USA", "Canada", "Canada", "Germany", "Germany", "Egypt"], "Age": [12, 14, 20, 25, 29, 45, 32, 60] }, index=["A" ,"B", "C", "D", "E", "F", "G", "H"]) print(person_data) print("Applying filter of person's age greater than 40") print(person_data.loc[person_data["Age"] > 40])
Line 12: We pass in the filter, person_data["Age"] > 40
inside the brackets []
. Each value of the column with the label Age
is checked. If the condition is true, then the value is selected.
loc
and iloc
are powerful data selection tools that pandas DataFrame provides. We use loc
for accessing labeled data, whereas we use iloc
to access data based on position.
Free Resources