What is the update() function in R?

Functions in R are reusable blocks of code that help maintain organization and prevent code repetition. Functions are either built-in (predefined functions that are available in R) or user-defined (written by the user).

The update() function is a built-in function used to modify or refit specific components of a model while retaining the structure and attribute of the original object. By using the update() function, we can easily enhance and customize an existing model without having to recreate it from scratch.

Syntax

The syntax of the update() function is as follows:

update(object, formula, ...)
  • object: The object that represents the existing model we wish to modify.

  • formula: The new formula for the model.

  • ...: Additional arguments.

Usage

We will explore the usage of the update() function using the following examples:

Example 1: Updating a model with a new formula

The following code shows how to update a linear model with a new formula:

# Create a vector of independent variables
x <- c(1, 2, 3, 4, 5, 6)
# Create a vector of dependent variables
y <- c(42, 43, 44, 45, 43, 47)
# Fit a linear model
model <- lm(y ~ x)
# Update the model with a new formula
new_model <- update(model, y ~ x + I(x^2))
# Print the model summary
summary(new_model)

Code explanation

  • Line 2: This creates a vector x containing the independent variables with values 1, 2, 3, 4, 5, and 6.

  • Line 5: This creates a vector y containing the dependent variables with values 42, 43, 44, 45, 43, and 47.

  • Line 8: This fits a linear model using the lm() function, where y is regressed on x. This means it attempts to find a linear relationship between x and y.

  • Line 11: This updates the model created in line 8 using the update() function. The updated model, stored in the new_model variable, includes an additional term I(x^2). The term I(x^2) indicates that the independent variable x should be squared before being included in the model.

  • Line 14: This prints a summary of the new_model using the summary() function. The summary provides various statistics and information about the fitted model, including coefficient estimates, standard errors, t-values, p-values, and the overall model fit.

Example 2: Removing a variable from a model

The following code shows how to remove a variable using the update() function:

# Sample data points
x <- c(1, 2, 3, 4, 5)
y <- c(2, 4, 6, 8, 10)
z <- c(3, 6, 9, 12, 15)
# Create a data frame using the sample data points
data <- data.frame(x, y, z)
# Fit a linear regression model
model <- lm(y ~ x + z, data = data)
#Print the summary of the initial model
summary(model)
# Remove the variable `z` from the model
reduced_model <- update(model, y ~ x - z)
# Print the summary of the reduced model
summary(reduced_model)

Code explanation

  • Lines 2–4: These lines define sample data points for the variables x, y, and z.

  • Line 7: This creates a data frame named data using the data.frame() function, where x, y and z are assigned as columns of the data frame.

  • Line 10: This fits a linear regression model using the lm() function. The formula y ~ x + z specifies that y is the response variable, and x and z are the predictor variables, using the data from the data data frame.

  • Line 13: This line uses the summary() function to obtain a summary of the initial model and provide statistical information about the model's coefficients, standard errors, p-values, and goodness-of-fit measures.

  • Line 16: This line uses the update() function to create a reduced model by removing the variable z from the original model. The updated formula, y ~ x - z indicates that z should be excluded from the model.

  • Line 19: Finally, the summary() function is used to print the summary of the reduced model's statistical information. This allows us to analyze how removing a variable can affect the model.

Example 3: Updating a model with new data

The following code shows how to update a linear model with new data:

# Create a vector of independent variables
x <- c(1, 2, 3, 4, 5, 6)
# Create a vector of dependent variables
y <- c(42, 43, 44, 45, 43, 47)
# Fit a linear model
model <- lm(y ~ x)
# Create a new data frame with new observations
new_data <- data.frame(x = c(11, 12, 13, 14, 15, 16), y = c(52, 54, 56, 58, 60, 61))
# Update the model with the new data
updated_model <- update(model, data = new_data)
# Print the model summary
summary(updated_model)

Code explanation

  • Line 11: This creates a new data frame named new_data using the data.frame() function. The data frame has two variables: x and y. The x variable contains the values 11, 12, 13, 14, 15, and 16, and the y variable contains the values 52, 54, 56, 58, 60, and 61. Essentially, it creates a new set of observations for the independent variable x and the dependent variable y.

  • Line 14: This updates the previously created model model, using the update() function. The updated model, stored in the updated_model variable, incorporates the new data from the new_data data frame. By specifying the data = new_data argument, the model is adjusted to consider the additional data points in the new_data data frame.

  • Line 17: The summary() function is applied to the updated model to obtain a summary of the model’s statistical information based on the updated data. This allows us to analyze how the model’s parameters and statistical measures change when fitting the model to a different dataset.

Free Resources