So, the first step is to identify the sheets that want to read then put those sheets in a list. In my case, I needed to collect a number of terms across multiple sheets and then use those terms to create additional columns in a DataFrame. Finally, I had to populate the data based on the content from multiple text files obtained from an OCR process to extract text from pdf documents.
In my case, these are the names of the sheets.
sheetnames = ['Art', 'Plane', 'Yacht', 'Hedge Funds']
Then, I identified the column names in the Excel sheets.
cols = ['Full Name', 'Manufacturer/Model', 'Manufacturer/Model', 'Funds']
Next, I created an empty list to hold the terms.
terms = []
Next, I iterated over the columns and loaded the sheets. I used the enumerate to keep track of the sheet names. Then, I iterated over the data frame that I read in to generate the full list of terms. I ended up with a list of around 500 terms.
filename = 'd:/freelancing/projects/ej/UCC Web Scraping List.xlsx'
for i, col in enumerate(cols):
df_terms = pd.read_excel(filename, sheet_name = sheetnames[i])
for index, row in df_terms.iterrows():
terms.append(row[col])
And there you have it. Thank you for reading.