How to slice columns in pandas

pandas is a powerful Python open-source library for performing Exploratory Data Analysis (EDA) tasks and manipulating large datasets. It provides efficient tools for slicing, filtering, aggregating, and pivoting data.

Overall, pandas is an essential tool for anyone working with data, whether they are a data scientist, analyst, or researcher.

Data slicing

Data slicing is a powerful technique that simplifies the analysis of large and complex datasets. This technique breaks down massive amounts of data into smaller, more manageable subsets, enabling us to extract meaningful insights more efficiently. By focusing on specific subsets of data, data slicing helps identify specific patterns and trends, facilitating the elimination of noise and irrelevant data.

Setting up a DataFrame for slicing

To begin slicing data in pandas, the first step is to import the pandas library. Once imported, we will create a sample DataFrame df with four rows and four columns.

Code example

The code below imports the pandas library into Python and creates a sample data to slice:

import pandas as pd
df = pd.DataFrame({"a": [1, 2, 3, 4],
"b": [2, 3, 4, 5],
"c": [3, 4, 5, 6],
"d": [4, 5, 6, 7]})
print(df)

Code explanation

  • Line 1: We import the pandas library.

  • Lines 2–5: We create a sample DataFrame df by calling the DataFrame() method from pandas.

  • Line 6: We print the sample DataFrame to the console using the print() statement.

Methods for slicing columns in pandas

After creating the sample DataFrame, there are several techniques available in pandas to perform slicing operations. These include using reindex, the [] notation, and the .loc[] and .iloc[]methods. Each of these methods has its own benefits and limitations, depending on the specific requirements of the data analysis task. We'll explore each of these techniques in detail and demonstrate how they can be used effectively to slice columns in the DataFrame.

Slicing a column using reindex

Slicing a column using reindex can be useful in situations where we want to rearrange the order of the rows and columns in a DataFrame based on a specific column, or if we want to select only certain columns and keep their order intact.

Code example

The code below selects the column b from the original DataFrame df and stores it to the new DataFrame df_slice.

df_slice = df.reindex(columns = ['b'])
print(df_slice)

Code explanation

  • Line 1: We create a new variable df_slice to store the subset of the DataFrame from the original DataFrame df by using the reindex method. The columns parameter of the reindex method is set to ['b'], which means that the new DataFrame df_slice will only contain the b column from the original DataFrame.

  • Line 2: We print the new DataFrame df_slice to the console using the print() statement.

Slicing multiple columns using reindex

Slicing multiple columns using reindex can be useful in situations where we want to extract multiple columns and retain their original order.

Code example

The code below selects the columns c and a from the original DataFrame df and stores them to the new DataFrame df_slice.

df_slice = df.reindex(columns = ['c','a'])
print(df_slice)

Code explanation

  • Line 1: We create a new variable df_slice to store the subset of the DataFrame from the original DataFrame df by using the reindex method. The columns parameter of the reindex method is set to ['c','a'], which means that the new DataFrame df_slice will contain two columns, c and a, from the original DataFrame.

  • Line 2: We print the new DataFrame df_slice to the console using the print() statement.

Slicing a column using the [ ] notation

With this simple method, we can use the [ ] single notation for 1-d arrays and the [[ ]] double notation for 2-d arrays, and pass the column's name as a string.

Code example

The code below selects the columns c and d using the indexing system from the original DataFrame df and stores them to the new DataFrame df_slice.

df_slice = df[['c','d']]
print (df_slice)

Code explanation

  • Line 1: We create a new variable df_slice to store the subset of the DataFrame from the original DataFrame df by using the [] method. The columns parameter is set to ['c','d'], which means that the new DataFrame df_slice will contain two columns, c and d, from the original DataFrame.

  • Line 2: We print the new DataFrame df_slice to the console using the print() statement.

Slicing a column using the .loc[ ] method with step size 2

The pandas library includes a method called .loc[ ] that enables the indexing-based slicing of a DataFrame. With this method, we can access a specific group of rows and columns from a DataFrame using their labels.

Code example

The code below creates a new DataFrame df_slice by selecting the columns a and d from the original DataFrame df, using the loc indexing syntax with a step size of 2.

df_slice = df.loc[:, 'a':'d':2]
print(df_slice)

Code explanation

  • Line 1: We create a new variable named df_slice to store a subset of a pandas DataFrame. The : on the left side of the comma specifies that we want to select all the rows of the DataFrame, and 'a':'d':2 on the right side of the comma specifies that we want to select columns with labels between, and including, a and d but only for every second column.

  • Line 2: We print the subset of the original DataFrame, which contains only the columns a and d.

Slicing a column using the .iloc[]method with step size 1

pandas also includes a method called .iloc[] that allows indexing-based slicing of a DataFrame. This method is particularly helpful when the DataFrame has an index label that is not a numeric or when the user is unsure about the index label.

Code example

The code below creates a new DataFrame df_slice by selecting columns 0, 1, and 2 from the original DataFrame df, using the .iloc indexing syntax with a step size of 1.

df_slice = df.iloc[:,0:3:1]
print(df_slice)

Code explanation

  • Line 1: We create a new variable named df_slice to store a subset of a pandas DataFrame. The : on the left side of the comma specifies that we want to select all the rows of the DataFrame, and 0:3:1 on the right side of the comma specifies that we want to select columns with integer positions between 0 (inclusive) and 3 (exclusive), in steps of 1.

  • Line 2: We print the subset of the original DataFrame, which contains only the first three columns.

Free Resources