What is the unite() function in R programming?

Tidy data

There are many ways to arrange data, and some make it easy to analyze data, which is where tidy data comes in. The concept of tidy data is explained in Hadley Wickham’s 2014 paper, Tidy Data.

In tidy data, each observation is mapped to a useful shape or structure. In the structure of tidy data:

  • Every column represents a variable
  • Every row entry of data is an observation
  • Each cell is a single value

The graphics below demonstrate the multiple fields of tidy data frames.

Rows, Columns, and Cells

The unite() method

The unite() method is used to merge two or more columns into a single column or variable. unite() generates a single data frame as output after merging the specified columns.

Syntax


unite(data, col, ..., sep = ",", remove = TRUE)

Parameters

  • data: Table or data frame of interest
  • col: Name of a new column that is to be added
  • ...: Names of columns that are to be united
  • sep: How to join the data in the columns
  • remove: Removes input columns from the output data frame; default = TRUE

Return value

unite() returns a copy of the data frame with new columns.

Code

# Initializing Matrix with values
Matrix <- matrix(c('2000m2','NY','New York','$20000','3500m2','Chi','Chicago','$24000','1300m2','Bos'
,'Boston','$90888' ,'1600m2','Was','Washington','$90013'), ncol=4, byrow=TRUE)
colnames(Matrix) <- c('House_Area','Location','City','Price')
rownames(Matrix) <- c('1','2','3','4')
# Converting matrix to table
# using table() method
Housing_dataset <- as.table(Matrix)
# Show data in table format named Housing_dataset
print(Housing_dataset)
# Calling unite() method for merging
Housing_dataset_updated = unite(Housing_dataset,col='Address'
, c('City', 'P.Code') , sep = " ", remove = TRUE)
print(Housing_dataset_updated)

Expected output

Expected Output

Explanation

As highlighted, the unite() method takes the House_dataset with the new col='Adress' variable and the two merging columns c('City', 'P.Code') as arguments. unite() generates the output Housing_dataset_updated like the image above, with two columns (City and P.Code) as a single variable, Address.

Free Resources