In this article, I am converting the nested listed into a single list. We will convert the flattened list into a DataFrame. The structure of a nested list looks similar to this: [[list 1],[list 2],[list3], ..ā¦, [list n]].
This is part of the data-preprocessing to generate the HTML map page shown below.
This article is a part of a series:
Part 1: A simple example of scraping multiple web pages at once using BeautifulSoup
Part 2: This page
Part 3: Finding latitude and longitude of addresses using GoogleMaps API
Part 4: Using Folium to map latitude and longitude
In the previous article, I scarped a website using BeautifulSoup, and the data is retrieved in the form of a nested list. In this article, I am converting the nested listed into a single list.
Import nested list from a text file
You can follow the steps provided in the previous article to generate your nested list or download the nested list from my Github repository. āsta.txtā file contains a nested list of station names, and āadd.txtā contains a nested list of corresponding station addresses.
# 'sta.txt' contains nested list of stations
content = open("sta.txt", "r")
sta = eval(content.read())
content.close()
# 'add.txt' contains nest list of corresponding station addresses
content = open("add.txt", "r")
add = eval(content.read())
content.close()
A new variable is created from the āstaā to convert the nested list to a single list. The following code is used to flatten the nested list āstaā to a list called āall_stationsā.
# sta is a nested list [[],[],[]]
all_stations = []
for stations in sta:
for station in stations:
all_stations.append(station)
The same process is repeated to convert a nested list āaddā to a single list āall_addressā. The code is given below:
# add is a nested list [[],[],[]]
all_address = []
for addresses in add:
for address in addresses:
all_address.append(address)
Converting the lists to a DataFrame
To create a DataFrame, we will first assign the newly created list to pd.DataFrame and assign column name as āstationā. We will also add a column that contains the station addresses. Both lines of codes are given below.
df = pd.DataFrame(all_stations,columns=['Stations'])
df['Address'] = all_address
df.head(10)
The next article will extract the information related to the latitude and longitudinal coordinates based on the addresses extracted from the web page and stored in the DataFrame. You can read the next article here: Part 3: Finding latitude and longitude of addresses using GoogleMaps API
GitHub
Main folder: https://github.com/jabirjamal/jabirjamal.com/tree/main/FE/FE_05
sub-folder related to this article:
https://github.com/jabirjamal/jabirjamal.com/tree/main/FE/FE_05/M_02