pandas is an open-source Python library that provides functionalities to manipulate and analyze data. It is used widely in machine learning to inspect data and extract useful information. This allows for the most appropriate model to be applied to the dataset.
pandas provides an astype()
function which casts a pandas object to the specified dtype
to keep the data type consistent.
DataFrame.astype(dtype, copy=True, errors='raise')
dtype
: This specifies the data type. It can also be a dictionary with data types for each of the columns.copy
: This can be set to True
or False
. If it is set to True
, a copy will be returned instead of changes being made to the original data frame. The default is set to True
. (Optional)errors
: They can be set to raise
or ignore
. Any errors that may come up are ignored if they are set to ignore
. The default is set to raise
. (Optional)The function returns a pandas data frame object that has the changed data types.
#import libraryimport pandas as pd#initialize datadata = {"Time": [45, 56, 49, 64, 53],"Steps": [134, 153, 178, 205, 186],"Calories": [284.5, 234.5, 291.4, 251.0, 211.7]}#create a data framedf = pd.DataFrame(data)#print data frameprint(df)
We pass a dictionary as an argument in order to change the data type of all the columns separately. The dictionary contains the column names and the respective dtypes
to which we want to change.
#change the data types of all the columnsnew_df = df.astype({'Time': 'float','Steps': 'float','Calories': 'int64'})#print new data frameprint(new_df)