Converting XML files to CSV is common in data processing and transformation. XML is a frequently used format for representing data, while CSV is a simpler format often used for data interchange.
The Go built-in package called encoding/xml
can be used for parsing XML files. It allows us to map XML elements and attributes to Go struct fields. We’ll use the encoding/csv
package in Go for writing CSV data.
One challenge when converting XML to CSV is handling varying XML structures. XML files from different sources might have different tags and structures. To overcome this, we’ll implement a dynamic schema approach where the schema (headers) for the CSV file is determined dynamically based on the XML content. This is done through the following two steps:
Understanding the layout of the XML file
Identifying the pattern and repetition required in the CSV file
Here, we deal with this type of an XML file:
<Entitys><Entity><name>Sara</name><age>30</age><adress>Adress of Sara</adress><adress2>Adress line 2 of Sara</adress2><email>Sara@gmail1.com</email></Entity><Entity><name>Smith Wasten</name><age>25</age><adress>Adress of Smith Wasten</adress><email>Smith Wasten@example.com</email></Entity><Entity><name>Michael Cornival</name><age>40</age><adress>Adress of Michael Cornival</adress><email>michael.Cornival@hotmail1.com</email></Entity></Entitys>
We will use the os
package to open the XML file, as shown below:
xmlFile, err := os.Open("test.xml")if err != nil {fmt.Println("Error opening XML file:", err)return}defer xmlFile.Close()
We will analyze the XML data to determine the headers for the CSV file dynamically. We will write structs that collect information from the XML file, a shown below:
type Node struct {XMLName xml.NameAttrs []xml.Attr `xml:"-"`Children []Node `xml:",any"`Text string `xml:",chardata"`}
XMLName
: This field stores the XML tag name.
Attrs
: This field stores the attribute used in the XML tag.
Childern
: This field stores the child tag of the main tag recursively.
Text
: This field stores the text of the child tag.
These fields can vary and depend on our dataset in the XML file. However, the main purpose is to deal with XML data with some similar pattern.
We will use the encoding/xml
package to parse the XML data into Go structs.
decoder := xml.NewDecoder(xmlFile)var rootNode Nodeerr = decoder.Decode(&rootNode)if err != nil {fmt.Println("Error decoding XML:", err)return}
Line 1: Create a decoder from the xmlFile
object using the NewDecoder
handler.
Line 3: Decode and store the XML data into the rootNode
.
We will use the encoding/csv
package to create and write a CSV file. We will write the determined headers first, then iterate through the parsed XML data again to extract values and write them into the corresponding CSV rows.
// Create the CSV file writercsvFile, err := os.Create("output/output.csv")if err != nil {fmt.Println("Error creating CSV file:", err)return}defer csvFile.Close()writer := csv.NewWriter(csvFile)defer writer.Flush()// Determine the header by iterating the rootNodevar header []stringfor _, node := range rootNode.Children {var csvHeader []stringfor _, child := range node.Children {csvHeader = append(csvHeader, child.XMLName.Local)}if len(csvHeader) > len(header) {header = csvHeader}}// Write the header in the CSV file firstwriter.Write(header)//Iterate the rootNode to write data against the headerfor _, node := range rootNode.Children {var csvData []stringj := 0for _, child := range node.Children {if child.XMLName.Local != header[j]{for child.XMLName.Local != header[j] {csvData = append(csvData, "")j = j + 1}}csvData = append(csvData, child.Text)j=j+1}writer.Write(csvData)}
The playground below uses the test.xml
file and converts it into the output.csv
file. You can change the test.xml
file to a similar layout and explore it.
Note: The
output.csv
file can be downloaded by clicking on the blue button which comes after pressing the “Run” button below.
package mainimport ("encoding/csv""encoding/xml""fmt""os")type Node struct {XMLName xml.NameAttrs []xml.Attr `xml:"-"`Children []Node `xml:",any"`Text string `xml:",chardata"`}func main() {// xmlFile, err := os.Open("test.xml")xmlFile, err := os.Open("test.xml")if err != nil {fmt.Println("Error opening XML file:", err)return}defer xmlFile.Close()decoder := xml.NewDecoder(xmlFile)var rootNode Nodeerr = decoder.Decode(&rootNode)if err != nil {fmt.Println("Error decoding XML:", err)return}csvFile, err := os.Create("output/output.csv")if err != nil {fmt.Println("Error creating CSV file:", err)return}defer csvFile.Close()writer := csv.NewWriter(csvFile)defer writer.Flush()var header []stringfor _, node := range rootNode.Children {var csvHeader []stringfor _, child := range node.Children {csvHeader = append(csvHeader, child.XMLName.Local)}if len(csvHeader) > len(header) {header = csvHeader}}writer.Write(header)for _, node := range rootNode.Children {var csvData []stringj := 0for _, child := range node.Children {if child.XMLName.Local != header[j]{for child.XMLName.Local != header[j] {csvData = append(csvData, "")j = j + 1}}csvData = append(csvData, child.Text)j=j+1}writer.Write(csvData)}}
Unlock your potential: Golang series, all in one place!
To continue your exploration of Golang, check out our series of Answers below:
What is the NewReplacer function in golang?
Learn how Go's strings.NewReplacer()
efficiently replaces multiple substrings in a single pass, avoiding sequential replacements.
Type Assertions and Type Switches in Golang
Learn how type assertions and type switches in Go enable dynamic type handling within interfaces, ensuring type safety and flexibility for robust and efficient programming.
What is the fan-out/fan-in pattern in Golang
Learn how the fan-out/fan-in pattern in Go parallelizes tasks using goroutines and channels, enabling concurrent execution and efficient result aggregation.
Getting Started with Golang Unit Testing
Learn how to perform unit testing in Go by creating _test.go
files, using the testing
package, and writing clear test cases.
How to parse xml file to csv using golang with dynamic shcema?
Learn how to use Go's encoding/xml
and encoding/csv
packages to dynamically convert XML files to CSV.
Free Resources