Thought leadership from the most innovative tech companies, all in one place.

Convert XML to CSV Using Python

A guide on converting XML to CSV using Python.

image

Unlike HTML, XML files have their own tags. Based on these tags we can select the required columns to make a CSV.

This process has two conversions: XML → DICT → CSV

Sample XML:

<employees>
   <employee>
      <name>Carolina</name>
        <role>Data Engineer</role>
        <age>24</age>
    </employee>
    <employee>
      <name>Roosaka</name>
        <role>Data Scientist</role>
        <age>27</age>
    </employee>
    <employee>
      <name>Kumar</name>
        <role>Machine Learning Engineer</role>
        <age>31</age>
    </employee>
    <employee>
      <name>Vijay</name>
        <role>Devops Engineer</role>
        <age>26</age>
    </employee>
</employees>

Step 1: Import the required packages

Install the required packages if prompted:

#1
import csv
import xmltodict

Step 2: Converting XML to DICT

#2
# Reading xml file
with open("sample.xml", 'r') as file:
    filedata = file.read()

# Converting xml to python dictionary (ordered dict)
data_dict = xmltodict.parse(filedata)

The converted dict is an ordered dictionary, so the following code converts all the ordered dict elements to dict.

#3
# creating a list of employee data
employee_data_list = [dict(x) for x in data_dict["employees"]["employee"]]

This is how employee_data_list looks, we can parse the data based on our required columns.

[{'name': 'Carolina', 'role': 'Data Engineer', 'age': '24'},
{'name': 'Roosaka', 'role': 'Data Scientist', 'age': '27'},
{'name': 'Kumar', 'role': 'Machine Learning Engineer', 'age': '31'},
{'name': 'Vijay', 'role': 'Devops Engineer', 'age': '26'} ]

Step 3: Converting Dict to CSV

Making headers and rows of employee data and writing it to the CSV file.

#3
# Selecting headers for CSV
HEADERS = ['name', 'role' ,'age']

rows = []

# Interating through each element to get row data
for employee in employee_data_list:
    name = employee["name"]
    role= employee["role"]
    age = employee["age"]

	# Adding data of each employee to row list
	    rows.append([name,role,age])

#Writing to CSV
with open('employee_data.csv', 'w',newline="") as f:
    write = csv.writer(f)
    write.writerow(HEADERS)
    write.writerows(rows)

Conclusion

The data can also be parsed by using ordered dict but for a better understanding of data, I have converted the data to a dictionary in the code snippet #3.

LinkedIn GitHub




Continue Learning