Pandas is a Python open-source library that is used for data manipulation and data cleaning. It is widely used in data sciences, data analysis, and machine learning tasks. We use it to prepare our data for analysis and pass it to machine learning models.
Pandas provides us with two data structures. The first one is the Series, which is used for one-dimensional data. The second one is the DataFrame, which is used for two-dimensional data like:
Name | Age | Country |
John | 20 | USA |
James | 30 | Canada |
Alex | 23 | Brazil |
Sara | 13 | Argentina |
Andrew | 42 | Australia |
Albert | 12 | England |
In this Answer, we will explore how we can convert pandas DataFrame into an Excel file.
To convert a DataFrame to Excel, we use the to_excel
function. The syntax is given below.
<dataframe>.to_excel(excel_writer = "" , sheet_name = "" , columns = [] , header = bool , index = bool , index_label = "" , startrow = 0 , startcol = 0, na_rep = "" , float_format = None , engine = None , merge_cells = bool , inf_rep = "inf" , freeze_panes = None , storage_option = None)
The parameters involved are enlisted below:
excel_writer
: Specify the Excel filename in string format.
sheet_name
: Represents the sheet name of the Excel file.
columns
: Represents the column names of the DataFrame that we want to add in the Excel file.
header
: Represents a bool value to specify if we want to show column headers.
index
: Represents a bool to tell if we want the index column.
index_label
: Represents the index column's header name.
startrow
: Specify the row number of the Excel file from where we want to insert the data.
startcol
: Specify the column number of the Excel file from where we want to insert the data.
na_rep
: Specify the cell value for missing data.
float_format
: Specify the string format to represent float numbers in the Excel file. For example, bringing a float number to 3 decimal places using float_format="%.3f"
.
engine
: Specify the engine we want to use. Either the openpyxl
or the xlsxwriter
engine.
merge_cells
: Specify if we want to merge the cells for muli-indexed rows. By default, the value is True
.
inf_rep
: Specify the representation of infinity values. By default, there is no native representation of infinity value in Excel.
freeze_panes
: Specify a tuple of length two e.g. (2,5). This will freeze the 2nd row and 5th column on the screen and keep it visible even if we scroll towards the right on the Excel sheet.
storage_option
: Specify it when we are dealing with different storage services like cloud storage. It takes in a dictionary value used to configure the connection settings of the storage service.
In the example below, we will create a pandas DataFrame and convert it into an Excel file. Please click the "Run" button below to create an Excel file with the name names_data.xlsx
.
import pandas as pd df = pd.DataFrame({"Name":["John" , "James" , "Alex" , "Andrew" , "Sara", "Albert"] , "Age" :["20" , "30" , "23" , "13" , '42' , "12"] , "Country": ["USA" , "Canada" , "Brazil" , "Argentina" , "Australia" , "England"]}) df.to_excel(excel_writer="names_data.xlsx", columns=['Name' , 'Age' , 'Country'], header=True, index=True, index_label='Index', startrow=0, startcol=0, )
On executing the code, the excel file is created that we can view in the terminal using ls
command.
Line 1: We import the pandas library.
Lines 3–5: We create a pandas DataFrame and add the data with Name
, Age
, and Country
as the column headers. We store the DataFrame in the df
variable.
Line 7: We use the .to_excel()
function and name our Excel file as names_data.xlsx
.
Line 8: We pass the DataFrame column headers we want to store in the Excel file.
Line 9: We set the header
value to True
, which will display the column headers in the Excel file.
Line 10: We set the index
to True
that adds an index column in the Excel file starting from 0.
Line 11: We set the index column's name to Index
.
Line 12: We set the startcol
to 0, which means that the data will start to insert from column 0 in the Excel file.
Line 13: We set the startrow
to 0, which means that the data will start to insert from row 0 in the Excel file.
Pandas provide us with the to_excel
function to convert a DataFrame into an Excel file which helps store the data locally in our system.
Free Resources