Unlike HTML, XML files have their own tags. Based on these tags we can select the required columns to make a CSV.
This process has two conversions: XML → DICT → CSV
Sample XML:
<employees>
<employee>
<name>Carolina</name>
<role>Data Engineer</role>
<age>24</age>
</employee>
<employee>
<name>Roosaka</name>
<role>Data Scientist</role>
<age>27</age>
</employee>
<employee>
<name>Kumar</name>
<role>Machine Learning Engineer</role>
<age>31</age>
</employee>
<employee>
<name>Vijay</name>
<role>Devops Engineer</role>
<age>26</age>
</employee>
</employees>
Step 1: Import the required packages
Install the required packages if prompted:
#1
import csv
import xmltodict
Step 2: Converting XML to DICT
#2
# Reading xml file
with open("sample.xml", 'r') as file:
filedata = file.read()
# Converting xml to python dictionary (ordered dict)
data_dict = xmltodict.parse(filedata)
The converted dict is an ordered dictionary, so the following code converts all the ordered dict elements to dict.
#3
# creating a list of employee data
employee_data_list = [dict(x) for x in data_dict["employees"]["employee"]]
This is how employee_data_list looks, we can parse the data based on our required columns.
[{'name': 'Carolina', 'role': 'Data Engineer', 'age': '24'},
{'name': 'Roosaka', 'role': 'Data Scientist', 'age': '27'},
{'name': 'Kumar', 'role': 'Machine Learning Engineer', 'age': '31'},
{'name': 'Vijay', 'role': 'Devops Engineer', 'age': '26'} ]
Step 3: Converting Dict to CSV
Making headers and rows of employee data and writing it to the CSV file.
#3
# Selecting headers for CSV
HEADERS = ['name', 'role' ,'age']
rows = []
# Interating through each element to get row data
for employee in employee_data_list:
name = employee["name"]
role= employee["role"]
age = employee["age"]
# Adding data of each employee to row list
rows.append([name,role,age])
#Writing to CSV
with open('employee_data.csv', 'w',newline="") as f:
write = csv.writer(f)
write.writerow(HEADERS)
write.writerows(rows)
Conclusion
The data can also be parsed by using ordered dict but for a better understanding of data, I have converted the data to a dictionary in the code snippet #3.