How to merge DataFrames using the merge() method in Python

Parameters

The merge() method takes the following parameters:

left: This is a DataFrame.
right: This is another DataFrame. The left DataFrame is merged with the right DataFrame and a new DataFrame is returned.

Note: Nothing happens to the left and right DataFrames, i.e., they do not change.

how: This parameter specifies the type of merge to be performed. The following are the types of merges available within the merge() method: left, right, outer, inner and cross.
- left type: This type of merge returns all the rows from the left DataFrame. It also returns the rows from the right DataFrame that are in common with those of the left DataFrame. If there are no rows in the right DataFrame that are in common (rows with similar values) with those of the left DataFrame, NaN is returned. The merge is performed on a common column or columns.
- right type: This type of merge returns all the rows from the right DataFrame. It also returns the rows from the left DataFrame that are in common with those of the right DataFrame. If there are no rows in the left DataFrame that are in common with those of the right DataFrame, NaN is returned. The merge is performed on a common column or columns.
- outer type: This type of merge returns all the rows from the left and right DataFrames. This also includes the rows from the left and right DataFrames that are in common with each other. If no common rows are found, NaN is returned. The merge is performed on a common column or columns.
- inner type: This is the default merge type. It returns only the rows that are common between the left and right DataFrames. If there are no common rows, an empty DataFrame is returned. The merge is performed on a common column or columns.
- cross type: This type of merge returns the Cartesian product of the rows from the left and right DataFrames, i.e., it combines and returns each row from the left DataFrame with each row from the right DataFrame. We don’t need to specify any common column or columns when using this merge type.
on: This parameter represents the column or columns to perform the merge on. These columns should be present in both the left and right DataFrames. Its default value is None, i.e., no common column or columns are used to perform the merge() operation.
left_on (None by default):
- Column(s) to merge on from the left DataFrame.
- Used when column names differ between DataFrames.
- If None, it merges on the index of the left DataFrame.
right_on (None by default):
- Column(s) to merge on from the right DataFrame.
- If None, it merges on the index of the right DataFrame.
left_index (False by default):
- If True, merge uses the index of the left DataFrame as the key.
right_index (False by default):
- If True, merge uses the index of the right DataFrame as the key.
sort (False by default):
- If True, sorts the resulting DataFrame by the merge keys.
suffixes (('_x', '_y') by default):
- A tuple of two strings that are added as suffixes to overlapping column names from the left and right DataFrames. This helps avoid column name duplication and allows clear identification of source columns. At least one value must not be None.
copy (True by default):
- If True, creates a copy of the DataFrame.
- If False, modifies in place (not often necessary).
indicator (False by default):
- If True, adds a _merge column to the output DataFrame, showing the source of each row. It indicates whether a row is from the left DataFrame only, right DataFrame only, or both. This column is helpful for understanding the merge outcome.
validate (None by default):
- Ensures that the merge follows the specified relationship type (e.g., "one_to_one", "one_to_many", "many_to_one", "many_to_many"). It checks if the merge keys are unique as per the specified type, providing validation on the merging logic.

import pandas as pd
left_dataframe = pd.DataFrame({
    'id': [1, 2, 3, 4],
    'name': ['Leo', 'Jacob', 'James', 'Mason'],
    'age': [18, 20, 23, 19],
    'course': ['english', 'persian', 'arts', 'chemistry']
})
print('')
right_dataframe = pd.DataFrame({
    'id': [1, 2, 3, 4],
    'name': ['William', 'Lucas', 'Henry', 'Elio'],
    'age': [18, 21, 26, 25],
    'course': ['english', 'persian', 'Literature', 'Physics']
})
print('')
print('Left data frame')
print(left_dataframe)
print('')
print('Right data frame')
print(right_dataframe)
print('')

import pandas as pd
left_dataframe = pd.DataFrame({
    'id': [1, 2, 3, 4],
    'name': ['Leo', 'Jacob', 'James', 'Mason'],
    'age': [18, 20, 23, 19],
    'course': ['english', 'persian', 'arts', 'chemistry']
})
print('')
right_dataframe = pd.DataFrame({
    'id': [1, 2, 3, 4],
    'name': ['William', 'Lucas', 'Henry', 'Elio'],
    'age': [18, 21, 26, 25],
    'course': ['english', 'persian', 'Literature', 'Physics']
})
print('')
print('Left data frame')
print(left_dataframe)
print('')
print('Right data frame')
print(right_dataframe)
print('')
print('Left merge')
print(pd.merge(left_dataframe, right_dataframe, how='left', on='course'))
print('')

First, two data frames are created. The left_dataframe contains information about students, such as their id, name, age, and the course they are enrolled in. Similarly, the right_dataframe also contains student information, but it has different names, ages, and courses. The columns in both data frames that are common are id, name, age, and course.

The pd.merge() function is then used to merge these two data frames based on a common column, which is the course column. The merge operation is specified to be a 'left' merge, meaning all rows from the left data frame (left_dataframe) will be included, and matching rows from the right data frame (right_dataframe) will be added where possible. If there is no match for a particular course from the left data frame, the corresponding columns from the right data frame will contain NaN.

The result of this merge operation is printed, which shows how the two data frames are combined by the course column. In this case, the age and name columns from the right_dataframe are added to the left_dataframe, and any missing values (if no matching courses are found) are filled with NaN.

How to right merge a `pandas` DataFrame

This code demonstrates how to perform a right merge between two data frames using the pandas library in Python.

import pandas as pd
left_dataframe = pd.DataFrame({
    'id': [1, 2, 3, 4],
    'name': ['Leo', 'Jacob', 'James', 'Mason'],
    'age': [18, 20, 23, 19],
    'course': ['english', 'persian', 'arts', 'chemistry']
})
print('')
right_dataframe = pd.DataFrame({
    'id': [1, 2, 3, 4],
    'name': ['William', 'Lucas', 'Henry', 'Elio'],
    'age': [18, 21, 26, 25],
    'course': ['english', 'persian', 'Literature', 'Physics']
})
print('')
print('Left data frame')
print(left_dataframe)
print('')
print('Right data frame')
print(right_dataframe)
print('')
print('Right merge')
print(pd.merge(left_dataframe, right_dataframe, how='right', on='course'))
print('')

Just like the previous example, two data frames are created. The left_dataframe contains details about students, including their id, name, age, and the course they are taking. The right_dataframe contains similar information but with different names, ages, and courses.

In this case, the pd.merge() function is used to merge the two data frames, with the how='right' argument. This specifies that the merge should be a right merge, meaning all rows from the right data frame (right_dataframe) will be included in the result. Matching rows from the left data frame (left_dataframe) will be added where possible. If no match is found for a particular course from the right data frame, the corresponding columns from the left data frame will contain NaN.

The result of the right merge operation is printed, which shows how the two data frames are combined by the course column. Here, the age and name columns from the left_dataframe are added to the right_dataframe, and any missing values are filled with NaN if no corresponding course is found in the left data frame.

This right merge ensures that all entries from the right data frame are retained, which is useful when you want to prioritize the data in the right frame but still combine matching rows from the left.

How to inner merge a `pandas` DataFrame

This code demonstrates how to perform an inner merge between two data frames using the pandas library in Python.

import pandas as pd
left_dataframe = pd.DataFrame({
    'id': [1, 2, 3, 4],
    'name': ['Leo', 'Jacob', 'James', 'Mason'],
    'age': [18, 20, 23, 19],
    'course': ['english', 'persian', 'arts', 'chemistry']
})
print('')
right_dataframe = pd.DataFrame({
    'id': [1, 2, 3, 4],
    'name': ['William', 'Lucas', 'Henry', 'Elio'],
    'age': [18, 21, 26, 25],
    'course': ['english', 'persian', 'Literature', 'Physics']
})
print('')
print('Left data frame')
print(left_dataframe)
print('')
print('Right data frame')
print(right_dataframe)
print('')
print('Inner merge')
print(pd.merge(left_dataframe, right_dataframe, how='inner', on=['age', 'course']))
print('')

First, two data frames are created: left_dataframe and right_dataframe. Each contains information about students, such as their id, name, age, and the course they are enrolled in. The courses and ages vary between the two data frames, but both data frames share the same structure.

The pd.merge() function is used to merge the two data frames with the how='inner' argument, specifying that only the rows with matching values in both the age and course columns will be included in the result. In an inner merge, only the rows where there is a match between the two data frames (based on the specified columns, which in this case are age and course) are retained. Rows that do not have a corresponding match in both data frames are excluded from the result.

The output shows how the two data frames are merged based on the common values in both the age and course columns. For instance, students with the same age and course values in both data frames will be included, while others that don't match in both columns will be omitted.

This type of merge is useful when you want to retain only the data that has exact matches across both data frames for the specified columns, ensuring that only common entries are included.

How to outer merge a `pandas` DataFrame

This code demonstrates two different types of merges between two data frames using the pandas library in Python: an outer merge.

import pandas as pd
left_dataframe = pd.DataFrame({
    'id': [1, 2, 3, 4],
    'name': ['Leo', 'Jacob', 'James', 'Mason'],
    'age': [18, 20, 23, 19],
    'course': ['english', 'persian', 'arts', 'chemistry']
})
print('')
right_dataframe = pd.DataFrame({
    'id': [1, 2, 3, 4],
    'name': ['William', 'Lucas', 'Henry', 'Elio'],
    'age': [18, 21, 26, 25],
    'course': ['english', 'persian', 'Literature', 'Physics']
})
print('')
print('Left data frame')
print(left_dataframe)
print('')
print('Outer merge')
print(pd.merge(left_dataframe, right_dataframe, how='outer', on='course'))
print('')

First, two data frames, left_dataframe and right_dataframe, are created. Each contains student data with columns for id, name, age, and course. The data in these two data frames are similar but not identical, especially in the course column, which has some common values and some unique ones between the two data frames.

The pd.merge() function is first used with the how='outer' argument, which specifies an outer merge. In an outer merge, all rows from both data frames are included in the result. For rows that do not have a match in one of the data frames based on the course column, NaN is used to fill the columns from the non-matching data frame. This ensures that no data is discarded, even if there is no match for a given course. The result is a combination of both data frames, where all the courses are included, and missing values are handled with NaN where necessary.

How to cross merge a `pandas` DataFrame

This code demonstrates two different types of merges between two data frames using the pandas library in Python: an outer merge.

import pandas as pd
left_dataframe = pd.DataFrame({
    'id': [1, 2, 3, 4],
    'name': ['Leo', 'Jacob', 'James', 'Mason'],
    'age': [18, 20, 23, 19],
    'course': ['english', 'persian', 'arts', 'chemistry']
})
print('')
right_dataframe = pd.DataFrame({
    'id': [1, 2, 3, 4],
    'name': ['William', 'Lucas', 'Henry', 'Elio'],
    'age': [18, 21, 26, 25],
    'course': ['english', 'persian', 'Literature', 'Physics']
})
print('')
print('Left data frame')
print(left_dataframe)
print('')
print('Cross merge')
print(pd.merge(left_dataframe, right_dataframe, how='cross'))
print('')

The merge operation uses how='cross'. A cross-merge combines every row from the left data frame with every row from the right data frame. Unlike the other types of merges, there are no conditions for matching columns; every row from the left is paired with every row from the right. This results in a large number of combinations, where the course and age columns from both data frames are combined across all possible pairings. In this case, the number of rows in the result will be the product of the number of rows in the left and right data frames (4 * 4 = 16 rows).

These two types of merges are useful in different scenarios: an outer merge is useful when you want to retain all data, even if some values don't have corresponding matches, while a cross merge is helpful when you want to generate all possible combinations between two data sets.

Frequently asked questions

Haven’t found what you were looking for? Contact Us

What is the difference between join() and merge() in pandas?

join(): Primarily joins DataFrames based on their indices.
merge(): More versatile, allows joining on specified columns, and supports various join types (inner, outer, left, right, cross).

How to combine multiple DataFrames into one in Python?

The following methods can be used to combine multiple DataFrames into one in Python:

pd.concat() method: Used to concatenate two or more DataFrames along a specified axis (rows or columns), combining them into a single DataFrame.
pd.merge() method: Used for joining DataFrames based on common columns or indices (similar to SQL joins). It’s not for simple concatenation.

What is concat() in pandas?

In Pandas, the concat() function is a powerful tool for combining multiple DataFrames or Series into a single, unified DataFrame or Series.

Syntax:

pd.concat(objs, axis=0, join='outer', join_axes=None, ignore_index=False, 
          keys=None, levels=None, names=None, verify_integrity=False, sort=None, copy=True)

How do you concatenate one DataFrame to another?

You concatenate DataFrames in pandas using pd.concat([df1, df2]). This stacks them vertically by default.

import pandas as pd

df1 = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})
df2 = pd.DataFrame({'A': [5, 6], 'B': [7, 8]})

result = pd.concat([df1, df2]) 
print(result)

This will stack the rows of df2 below the rows of df1.

How to merge two DataFrames in Python?

You can merge two DataFrames in Python using the merge() function from pandas. You can merge based on one or more columns, like a database join.

Example:

import pandas as pd

df1 = pd.DataFrame({'ID': [1, 2, 3], 'Name': ['Alice', 'Bob', 'Charlie']})
df2 = pd.DataFrame({'ID': [1, 2, 4], 'Age': [25, 30, 22]})

merged_df = pd.merge(df1, df2, on='ID')
print(merged_df)

Output:

   ID     Name  Age
0   1    Alice   25
1   2      Bob   30

What does merge() do in Python?

The merge() function is used to combine two DataFrames based on a common column or index, similar to SQL joins (inner, outer, left, and right joins). It returns a new DataFrame with the combined data.

What is the difference between concat() and merge() in pandas?

concat(): Stacks DataFrames either vertically (rows) or horizontally (columns), without needing a common column. It is primarily used for appending or joining DataFrames along a specific axis.
merge(): Combines DataFrames based on a common column or index, similar to SQL joins, allowing for more complex and controlled merging (e.g., inner, left, right, outer joins).

How to merge Series into a pandas DataFrame?

We can merge Series into a DataFrame by adding them as columns. You can do this by directly assigning each Series to a new column or using the pd.concat() function.

Example 1: Adding Series as columns

import pandas as pd

# Creating DataFrame
df = pd.DataFrame({'A': [1, 2, 3]})

# Creating Series
s1 = pd.Series([10, 20, 30], name='B')
s2 = pd.Series([100, 200, 300], name='C')

# Merging Series into DataFrame
df['B'] = s1
df['C'] = s2

print(df)

Output:

   A   B    C
0  1  10  100
1  2  20  200
2  3  30  300

Example 2: Using `pd.concat()`

df = pd.concat([df, s1, s2], axis=1)
print(df)

Output:

   A   B    C
0  1  10  100
1  2  20  200
2  3  30  300

In both cases, the Series are added as columns to the DataFrame.

How to merge pandas columns within the same DataFrame?

To merge pandas columns within the same DataFrame, you can combine or concatenate columns using simple operations like addition, string concatenation, or using the apply() function. Here is an example of using the + operator.

Using `+` for merging columns

If you want to combine two columns by concatenating their values (assuming they are strings or numbers):

import pandas as pd

df = pd.DataFrame({'A': ['Hello', 'Good', 'Nice'],
                   'B': ['World', 'Morning', 'Day']})

# Merging the two columns by concatenating their values
df['C'] = df['A'] + ' ' + df['B']

print(df)

Output:

       A        B           C
0  Hello    World   Hello World
1   Good   Morning   Good Morning
2   Nice      Day     Nice Day

Free AI Mock Interviews

Coding Interview

Coding PatternsFree Interview

Gain insights and practical experience with coding patterns through targeted MCQs and coding problems, designed to match and challenge your expertise level.

System Design

YouTubeFree Interview

Learn to design a video streaming platform like YouTube by tackling functional and non-functional requirements, core components, and high-level to detailed design challenges.

Free Resources

How to merge DataFrames using the merge() method in Python

The `merge()` method

Parameters

Implementation of combining two DataFrames using the `merge()` function

How to left merge a `pandas` DataFrame

How to right merge a `pandas` DataFrame

How to inner merge a `pandas` DataFrame

How to outer merge a `pandas` DataFrame

How to cross merge a `pandas` DataFrame

Conclusion

Frequently asked questions

What is the difference between join() and merge() in pandas?

How to combine multiple DataFrames into one in Python?

What is concat() in pandas?

How do you concatenate one DataFrame to another?

How to merge two DataFrames in Python?

Example:

What does merge() do in Python?

What is the difference between concat() and merge() in pandas?

How to merge Series into a pandas DataFrame?

Example 1: Adding Series as columns

Example 2: Using `pd.concat()`

How to merge pandas columns within the same DataFrame?

Using `+` for merging columns

How to merge DataFrames using the merge() method in Python

The merge() method

Parameters

Implementation of combining two DataFrames using the merge() function

How to left merge a pandas DataFrame

How to right merge a pandas DataFrame

How to inner merge a pandas DataFrame

How to outer merge a pandas DataFrame

How to cross merge a pandas DataFrame

Conclusion

Frequently asked questions

What is the difference between join() and merge() in pandas?

How to combine multiple DataFrames into one in Python?

What is concat() in pandas?

How do you concatenate one DataFrame to another?

How to merge two DataFrames in Python?

Example:

What does merge() do in Python?

What is the difference between concat() and merge() in pandas?

How to merge Series into a pandas DataFrame?

Example 1: Adding Series as columns

Example 2: Using pd.concat()

How to merge pandas columns within the same DataFrame?

Using + for merging columns

The `merge()` method

Implementation of combining two DataFrames using the `merge()` function

How to left merge a `pandas` DataFrame

How to right merge a `pandas` DataFrame

How to inner merge a `pandas` DataFrame

How to outer merge a `pandas` DataFrame

How to cross merge a `pandas` DataFrame

Example 2: Using `pd.concat()`

Using `+` for merging columns