Learn more: What is XML?
In this Answer, we discuss the process of creating a DataFrame using the BeautifulSoup library in Python. The BeautifulSoup library is used for data pulling and
We use the following command to install the BeautifulSoup library and some other necessary libraries locally:
# install beautifulsouppip install beautifulsoup4# install pandaspip install pandas
Step 1: We include some necessary libraries in the program.
from bs4 import BeautifulSoup # including BeautifulSoup from bs4 moduleimport pandas as pd # including pandas as pd
In the code snippet given above, we import the BeautifulSoup and Pandas libraries.
Step 2: We read the XML file.
fd = open("data.xml",'r')data = fd.read()
In the code snippet above, we open a data.xml
file in read mode, 'r'
. The open()
function returns a file descriptor, fd
. Then, we use the read()
function to extract the file content in data
.
Step 3: We invoke the BeautifulSoup library.
soup = BeautifulSoup(data,'xml')
Here, we pass data
and the data file format xml
to the BeautifulSoup function.
Step 4: We search the data.
authors = soup.find_all('author')titles = soup.find_all('title')prices = soup.find_all('price')pubdate = soup.find_all('publish_date')genres = soup.find_all('genre')des = soup.find_all('description')
Step 5: We get the text data from XML.
data = []for i in range(0,len(authors)):rows = [authors[i].get_text(),titles[i].get_text(),genres[i].get_text(),prices[i].get_text(),pubdate[i].get_text(),des[i].get_text()]data.append(rows)
Step 6: We create and print the DataFrame.
df = pd.DataFrame(data,columns = ['Author','Book Title','Genre','Price','Publish Date','Description'], dtype = float)display(df)