A boxplot, or a whisker plot, is a graphical representation of a dataset’s distribution. It summarizes the data’s median, quartiles, range, and outliers. The plot provides a visual summary of the data’s central tendency, spread, and potential outliers.
The plot comprises a box representing the interquartile range (IQR), the range between the first and third quartiles. The median is shown as a line within the box. Whiskers extend from the box to indicate the range of the data, typically spanning 1.5 times the IQR. The data points outside this range are considered outliers and are usually depicted as individual points.
R provides users with the built-in boxplot()
function to create boxplots. It at least requires one variable, the numerical data.
Let’s create a simple boxplot using the mtcars
dataset.
# Load the datasetdata(mtcars)# Create the boxplotboxplot(mtcars$mpg)
Line 2: This line loads the mtcars
dataset, a built-in dataset.
Line 4: It creates a box plot of the mpg
variable from the mtcars
dataset.
Now let’s add some more arguments to the plot.
# Load the datasetdata(mtcars)# Create the boxplotboxplot(mpg ~ factor(cyl), data = mtcars,main = "Box Plot of MPG by Number of Cylinders",xlab = "Number of Cylinders",ylab = "Miles per Gallon",col = "pink",border = "black",lwd = 2)
Line 2: This line loads the mtcars
dataset, a built-in dataset.
Line 4: This formula specifies that the variable mpg
should be plotted against the factor variable cyl
. It indicates that the mpg
values should be grouped and plotted based on the different levels of cyl
from the specified dataset, which in our case is mtcars
.
Line 5: This argument sets the main title of the plot.
Lines 6–7: These arguments set the label for the x-axis and y-axis.
Line 8: This argument sets the fill color of the boxes in the box plot.
Line 9: This argument sets the color of the lines surrounding the boxes.
Line 10: This argument sets the line width of the lines in the box plot, which is 2
in our case.
Free Resources