What is the genfromtxt() function in NumPy?

Overview

The genfromtxt() function is used to load data in a program from a text file. It takes multiple argument values to clean the data of the text file. It also has the ability to deal with missing or null values through the processes of filtering, removing, and replacing.

Note: The genfromtxt() function from the Numpy module is perfect for data loading and cleaning.

Syntax


# Signature according to documentation
numpy.genfromtxt(fname,
dtype= <class 'float'>,
comments= '#',
delimiter= None,
skip_header= 0,
skip_footer= 0,
converters= None,
missing_values= None,
filling_values= None,
usecols= None,
names= None,
excludelist= None,
deletechars= "!#$%&'()*+, -./:;<=>?@[\\]^{|}~",
replace_space= '_',
autostrip= False,
case_sensitive= True,
defaultfmt= 'f%i',
unpack= None,
usemask= False,
loose= True,
invalid_raise= True,
max_rows= None,
encoding= 'bytes',
*,
like= None)

Parameter values

There are numerous argument values for the genfromtxt() function. However, in this shot, we'll only focus on the most common ones:

  • fname: {generator, list of strings, path-like object, file}
    • This is the filename which is going to read.
  • dtype: {data type}
    • This shows the data type of the resultant array. The default value of the dtype is 'float'.
  • comments='#': {string}
    • These are the characters that are used to describe each line of content.
  • delimiter=None: {sequence, string, integer}
    • This is the value that's used to separate values.
  • skip_header=0: {integer value}
    • This value gives an instruction to skip rows from the beginning of the file.
  • skip_footer=0: {integer value}
    • This value gives an instruction to skip rows from the bottom of the file.
  • converters=None: {variable maybe lambdas, etc.}
    • This value instructs the lambda functions or variables to transform the data of columns into values.
  • missing_values=None: {variable maybe a string, etc.}
    • This is the string value that's used to replace missing values.
  • filling_values=None: {variable maybe a string, etc.}
    • This is the string value that's used to fill the missing values when loading data.
  • usecols=None: {sequence of integers, column name, etc.}
    • This value indicates which column will be read. In the case of an integer sequence, it always starts with 0.
  • replace_space='_': {char value}
    • The character type value is used to replace each whitespace as default.
  • max_rows=None: {integer value}
    • This number indicates the maximum rows of data that need to be read.
  • encoding='bytes': {string}
    • An encoding scheme is used to decode argument files or string data.
  • like: {string_like object}
    • This helps to compare two objects. For instance, if we want to return a non-pandas array, we have to mention it as a like object.

Return value

ndarray: This function returns data as an array. If usemask is set, it returns a masked array.

Explanation

main.py
employee.txt
# load numpy library
import numpy as np
# invoking genfromtxt method to read employee.txt file
content = np.genfromtxt("employee.txt", dtype=str, encoding = None, delimiter=",")
# print file data on console
print("File data:", content)
  • Line 4: We invoke the genfromtxt() function to load data from the employee.txt file. The dtype=str data type loads the word after the comma delimiter as a Python string and returns all of the file data as an ndarray array.
  • Line 6: We print the contents of the file as an ndarray array.

Free Resources