For simple data analysis, the first important step after
Therefore, we will discuss two techniques to read an Excel file in R.
read_excel()
from readx1
The read_excel()
function is used to read an Excel file, and it is only accessible after you import the readxl
library in the R program.
read_excel(
path,
sheet = 1,
col_names = TRUE,
col_types = NULL,
na = "",
skip = 0
)
path
: The directory of the Excel file with extension.
sheet
: Default= 1
. If we have more than 1 sheet in a single workbook, we can assign sheet #, i.e., 1, 2, 3, 4…
col_names
: Default= TRUE
.
TRUE
: Read the first row as a file header.FALSE
: The file does not have a header.Character Vector
: Character vector that contains header names.col_types
: Default= NULL
.
NULL
: Interpret the type of column from the spreadsheet, i.e., Date, Currency, etc.Vector
: A vector that contains "blank"
, "numeric"
, "date"
, or "text"
.na
: Default= ""
. Keeps empty cells as is, but we can specify na
to fill empty cells.
skip
: Default= 0
. Number of rows to skip from start before reading a file.
This method returns Excel file data as a data frame.
In this example, we read an Excel file, Employee.xlsx
. Below, we have the expected output of this code.
# installing librarylibrary(readxl) # Optional Stepdata <- read_excel ("Employee.xlsx")print(data)
read.xlsx()
from xlsxThe read.xlsx()
function is imported from the xlsx
library, which is used to read an Excel file in R language.
read.xlsx(
path,
sheet,
startRow = 1,
colNames = TRUE,
rowNames = FALSE,
detectDates = FALSE,
skipEmptyRows = TRUE,
skipEmptyCols = TRUE,
rows = NULL,
cols = NULL,
check.names = FALSE,
sep.names = ".",
namedRegion = NULL,
na.strings = "NA",
fillMergedCells = FALSE
)
path
: Directory of file including extension.
sheet
: Index or name of the sheet to read data.
startRow*:
Default= 1`. The first row to begin and check for data.
colNames
: Default= TRUE
. If True
, the first row will be selected as column names.
rowNames
: Default= FALSE
. If True
, the first column will be selected as row names.
detectDates
: Default= FALSE
. If True
, this method will try to recognize Dates and perform conversions if needed.
skipEmptyRows
: Default= TRUE
. If True
, empty rows will be skipped during loading. After the first row containing data, other empty rows will be returned as NA
s.
skipEmptyCols
: Default= TRUE
. If True
, empty columns will be skipped during loading.
rows
: Default= NULL
.
NULL
: If rows
are set to NULL
, then all rows are read.
Vector
: If rows
are set to a vector of numeric values, then specified rows will be read.
cols
: Default= NULL
.
NULL
: If cols
is set to NULL
, then all columns are read.
Vector
: If cols
is set to vector
of numeric values, then specified columns will be read.
check.names
: Default= FALSE
. If True
, then it will check if each variable is syntactically valid or not.
sep.names
: Default= "."
. Separates each variable in column names.
namedRegion
: Default= NULL
.
NULL
: Values of rows
, cols
, and startRow
parameters will be considered within a workbook.
Not NULL
: Values of rows
, cols
, and startRow
parameters are ignored.
na.strings
: Default= "NA"
. Returns blank cells as NA
s.
fillMergedCells
: Default= FALSE
. If True
, then the value in a merged cell (Parent
) is given to all cells within the merge (Child
).
This method returns Excel file data as a data frame.
# Installing required libraryinstall.packages("xlsx") # Optional step# using read.xlsx method to read excel filesdata <-read.xlsx('Employee.xlsx')print(data)