For simple data analysis, the first important step after
Therefore, we will discuss two techniques to read an Excel file in R.
read_excel() from readx1The read_excel() function is used to read an Excel file, and it is only accessible after you import the readxl library in the R program.
read_excel(
path,
sheet = 1,
col_names = TRUE,
col_types = NULL,
na = "",
skip = 0
)
path: The directory of the Excel file with extension.
sheet: Default= 1. If we have more than 1 sheet in a single workbook, we can assign sheet #, i.e., 1, 2, 3, 4…
col_names: Default= TRUE.
TRUE: Read the first row as a file header.FALSE: The file does not have a header.Character Vector: Character vector that contains header names.col_types: Default= NULL.
NULL: Interpret the type of column from the spreadsheet, i.e., Date, Currency, etc.Vector: A vector that contains "blank", "numeric", "date", or "text".na: Default= "". Keeps empty cells as is, but we can specify na to fill empty cells.
skip: Default= 0. Number of rows to skip from start before reading a file.
This method returns Excel file data as a data frame.
In this example, we read an Excel file, Employee.xlsx. Below, we have the expected output of this code.
# installing librarylibrary(readxl) # Optional Stepdata <- read_excel ("Employee.xlsx")print(data)
read.xlsx() from xlsxThe read.xlsx() function is imported from the xlsx library, which is used to read an Excel file in R language.
read.xlsx(
path,
sheet,
startRow = 1,
colNames = TRUE,
rowNames = FALSE,
detectDates = FALSE,
skipEmptyRows = TRUE,
skipEmptyCols = TRUE,
rows = NULL,
cols = NULL,
check.names = FALSE,
sep.names = ".",
namedRegion = NULL,
na.strings = "NA",
fillMergedCells = FALSE
)
path: Directory of file including extension.
sheet: Index or name of the sheet to read data.
startRow*: Default= 1`. The first row to begin and check for data.
colNames: Default= TRUE. If True, the first row will be selected as column names.
rowNames: Default= FALSE. If True, the first column will be selected as row names.
detectDates: Default= FALSE. If True, this method will try to recognize Dates and perform conversions if needed.
skipEmptyRows: Default= TRUE. If True, empty rows will be skipped during loading. After the first row containing data, other empty rows will be returned as NAs.
skipEmptyCols: Default= TRUE. If True, empty columns will be skipped during loading.
rows: Default= NULL.
NULL: If rows are set to NULL, then all rows are read.
Vector: If rows are set to a vector of numeric values, then specified rows will be read.
cols: Default= NULL.
NULL: If cols is set to NULL, then all columns are read.
Vector: If cols is set to vector of numeric values, then specified columns will be read.
check.names: Default= FALSE. If True, then it will check if each variable is syntactically valid or not.
sep.names: Default= ".". Separates each variable in column names.
namedRegion: Default= NULL.
NULL: Values of rows, cols, and startRow parameters will be considered within a workbook.
Not NULL: Values of rows, cols, and startRow parameters are ignored.
na.strings: Default= "NA". Returns blank cells as NAs.
fillMergedCells: Default= FALSE. If True, then the value in a merged cell (Parent) is given to all cells within the merge (Child).
This method returns Excel file data as a data frame.
# Installing required libraryinstall.packages("xlsx") # Optional step# using read.xlsx method to read excel filesdata <-read.xlsx('Employee.xlsx')print(data)