How to read a CSV file as an array in Julia

In Julia, reading CSVcomma-separated values files into arrays can be done using the built-in CSV package. This package provides high-performance CSV parsing and writing functionality.

How to install a CSV package

To use the CSV package, we can start installing it by running the following command in the Julia Read-Eval-Print Loop (REPL):

using Pkg

Pkg.add("CSV")

Note: We have already installed the CSV package on our platform.

The read function

Once the package is installed, we can read a CSV file using the CSV.read function, which takes in the file path as a string and returns a DataFrame object:

using CSV, DataFrames

data = CSV.read("abc.csv", DataFrame)

By default, CSV.read assumes that the first row of the CSV file contains the header names. If our file doesn't have a header row, we can set the header argument to false:

data = CSV.read("abc.csv", DataFrame, header=false)

We can also specify the delimiter character using the delim argument. For example, to read a tab-separated file, we can set delim='\t' as follows:

data = CSV.read("abc.csv", DataFrame, header=false, delim='\t')

To read in specific columns of the CSV file, we can use the select argument of the CSV.read function. For example, to read only the first and third columns of a file, we can do the following:

data = CSV.read("abc.csv", DataFrame, header=false, select=[1, 3])

If our CSV file has missing values represented by a specific string (e.g., “NA”), we can set the missingstring argument to the corresponding value. For example, to treat “NA” as missing values, we can do the following:

data = CSV.read("abc.csv", missingstring="NA")

This will convert any occurrences of NA in the CSV file to the Julia missing value.

The Matrix function

Once we have read the data as a DataFrame, we can convert it to an array using the Matrix function:

array_data = Matrix(data)

This will give us the exact dimensions as the original DataFrame.

Code

Some of the commands explained above have been executed in the code widget below:

main.jl
abc.csv
using CSV, DataFrames
x = " ------------------------------------";
# Reading a CSV file
data = CSV.read("abc.csv", DataFrame)
println(data)
println(x)
# Reading a CSV file,
# setting the header argument to false
data = CSV.read("abc.csv", DataFrame, header=false)
println(data)
println(x)
# Reading a CSV file using the delim argument
data = CSV.read("abc.csv", DataFrame, header=false, delim='\t')
println(data)
println(x)
# Reading a CSV file using the select argument
data = CSV.read("abc.csv", DataFrame, header=false, select=[1, 3])
println(data)
println(x)
# Reading a CSV file using the missingstring argument
data = CSV.read("abc.csv", DataFrame, header=false, missingstring="NA")
println(data)
println(x)
# Convert the dataframe into an array
# using the Matrix function
array_data = Matrix(data)
println(array_data[:,1:3])
println(x)

Free Resources