How to use the merge() function for data frames in R

Overview

The merge() function in R combines two data frames.

The most crucial requirement for connecting two data frames is that the column type is the same on which the merging occurs.

The merge() function is similar to the join function in a Relational Database Management System (RDMS).

Syntax

Let’s look at the syntax of the function.

merge(x, y, by, by.x, by.y, all.x, all.y, sort = TRUE)

Arguments

  • x: This is the first data frame or object to be merged.

  • y: This is the second data frame or object to be merged.

  • by, by.x, by.y: This specifies the columns used for the merging.

  • all.x, all.y: If either is TRUE, additional rows will be added to the output:

    • For each row in x in the case of all.x=TRUE OR
    • For each row in y in the case of all.y=TRUE

    If the syntax does not have a matching row for the associated data frame or object, then NAs will be printed in these rows.

  • sort = TRUE/FALSE: This specifies if the results are sorted or not.

Code

Let’s look at the following example:

x = data.frame(StudentId = c(1:6),
Marks = c("70", "84", "90", "93", "80", "76"))
y = data.frame(StudentId = c(2, 4, 6, 7, 8),
city = c("Lahore", "Karachi", "Peshawar", "Quetta", "Multan"))
z = merge(x, y, by = "StudentId")
z

Explanation

  • Lines 1-2: We define a dataset x with StudentId and their respective Marks.

  • Lines 4-5: We define a dataset y with StudentId and the city they belong to.

  • Line 7: We take the natural join of the two data frames and merge the data into another dataset, z.

We get three values in the output because these three values (i.e., 2, 4, and 6) are common in our data frames.

Free Resources

Copyright ©2025 Educative, Inc. All rights reserved