Functions in R are reusable blocks of code that help maintain organization and prevent code repetition. Functions are either built-in (predefined functions that are available in R) or user-defined (written by the user).
The update()
function is a built-in function used to modify or refit specific components of a model while retaining the structure and attribute of the original object. By using the update()
function, we can easily enhance and customize an existing model without having to recreate it from scratch.
The syntax of the update()
function is as follows:
update(object, formula, ...)
object
: The object that represents the existing model we wish to modify.
formula
: The new formula for the model.
...
: Additional arguments.
We will explore the usage of the update()
function using the following examples:
The following code shows how to update a linear model with a new formula:
# Create a vector of independent variablesx <- c(1, 2, 3, 4, 5, 6)# Create a vector of dependent variablesy <- c(42, 43, 44, 45, 43, 47)# Fit a linear modelmodel <- lm(y ~ x)# Update the model with a new formulanew_model <- update(model, y ~ x + I(x^2))# Print the model summarysummary(new_model)
Line 2: This creates a vector x
containing the independent variables with values 1
, 2
, 3
, 4
, 5
, and 6
.
Line 5: This creates a vector y
containing the dependent variables with values 42
, 43
, 44
, 45
, 43
, and 47
.
Line 8: This fits a linear model using the lm()
function, where y
is regressed on x
. This means it attempts to find a linear relationship between x
and y
.
Line 11: This updates the model created in line 8 using the update()
function. The updated model, stored in the new_model
variable, includes an additional term I(x^2)
. The term I(x^2)
indicates that the independent variable x
should be squared before being included in the model.
Line 14: This prints a summary of the new_model
using the summary()
function. The summary provides various statistics and information about the fitted model, including coefficient estimates, standard errors, t-values, p-values, and the overall model fit.
The following code shows how to remove a variable using the update()
function:
# Sample data pointsx <- c(1, 2, 3, 4, 5)y <- c(2, 4, 6, 8, 10)z <- c(3, 6, 9, 12, 15)# Create a data frame using the sample data pointsdata <- data.frame(x, y, z)# Fit a linear regression modelmodel <- lm(y ~ x + z, data = data)#Print the summary of the initial modelsummary(model)# Remove the variable `z` from the modelreduced_model <- update(model, y ~ x - z)# Print the summary of the reduced modelsummary(reduced_model)
Lines 2–4: These lines define sample data points for the variables x
, y
, and z
.
Line 7: This creates a data frame named data
using the data.frame()
function, where x
, y
and z
are assigned as columns of the data frame.
Line 10: This fits a linear regression model using the lm()
function. The formula y ~ x + z
specifies that y
is the response variable, and x
and z
are the predictor variables, using the data from the data
data frame.
Line 13: This line uses the summary()
function to obtain a summary of the initial model and provide statistical information about the model's coefficients, standard errors, p-values, and goodness-of-fit measures.
Line 16: This line uses the update()
function to create a reduced model by removing the variable z
from the original model. The updated formula, y ~ x - z
indicates that z
should be excluded from the model.
Line 19: Finally, the summary()
function is used to print the summary of the reduced model's statistical information. This allows us to analyze how removing a variable can affect the model.
The following code shows how to update a linear model with new data:
# Create a vector of independent variablesx <- c(1, 2, 3, 4, 5, 6)# Create a vector of dependent variablesy <- c(42, 43, 44, 45, 43, 47)# Fit a linear modelmodel <- lm(y ~ x)# Create a new data frame with new observationsnew_data <- data.frame(x = c(11, 12, 13, 14, 15, 16), y = c(52, 54, 56, 58, 60, 61))# Update the model with the new dataupdated_model <- update(model, data = new_data)# Print the model summarysummary(updated_model)
Line 11: This creates a new data frame named new_data
using the data.frame()
function. The data frame has two variables: x
and y
. The x
variable contains the values 11
, 12
, 13
, 14
, 15
, and 16
, and the y
variable contains the values 52
, 54
, 56
, 58
, 60
, and 61
. Essentially, it creates a new set of observations for the independent variable x
and the dependent variable y
.
Line 14: This updates the previously created model model
, using the update()
function. The updated model, stored in the updated_model
variable, incorporates the new data from the new_data
data frame. By specifying the data = new_data
argument, the model is adjusted to consider the additional data points in the new_data
data frame.
Line 17: The summary()
function is applied to the updated model to obtain a summary of the model’s statistical information based on the updated data. This allows us to analyze how the model’s parameters and statistical measures change when fitting the model to a different dataset.