Introduction:
This article describes how to download email attachments from an Outlook mailbox using Python. We will be using the following libraries to accomplish this task.
AzureAD/microsoft-authentication-library-for-python
Note: Please note that if you are using ADAL for authentication, Microsoft recommends migrating to MSAL.
Requirements:
To extract attachments from an email we need the following.
- Mailbox credentials (username and password)
- Sender email (Filter messages from a specific sender if required)
- Message id (Unique message id for an email)
- Attachment id for a Message id (Attachment id for an email that has an attachment).
The above data is required to authenticate, initialise the ClientApplication object, and construct the MS Graph API to download the attachments.
Install MSAL:
$pip install msal
The Microsoft authentication library for python allows you to sign in users or apps with Microsoft identities, obtain auth tokens to be used with Microsoft Graph APIs. They are built using OAuth2 and OpenID connect protocols.
Initialising the Client Application:
MSAL defines 3 types of applications and clearly provides a demarcation in initialising them.
- Client Application
- PublicClientApplication
- ConfidentialClientApplication
To learn more about the OAuth client types please click here. In this article, we will be using ClientApplication
to initialise the app object and reuse it throughout our application.
from msal import ClientApplication
class AttachmentDownloader:
def __init__(self, username: str, password: str):
self.client_id = '<your client id>'
self.authority = 'https://login.microsoftonline.com/<tenant-name>'
# Initialise MS ClientApplication object with your client_id and authority URL
self.app = ClientApplication(client_id=self.client_id,
authority=self.authority)
self.username = username # your mailbox username
self.password = password # your mailbox password
if __name__ == "__main__":
downloader = AttachmentDownloader("username@outlook.com", "password")
Acquire token:
Now that we have our app object initialised, we can acquire the token
. This token can be used to extract to access_token
for headers.
token = self.app.acquire_token_by_username_password(username=self.username,
password=self.password,
scopes=['.default'])
print(token)
Output:
This gets the default, top 10 messages in the signed-in user’s mailbox.
If you wish to increase the number of results returned, you could set the page size using the top query parameter.
https://graph.microsoft.com/v1.0/me/messages?$top=20
This sets the page size to 20.
The token output looks like this.
{
"token_type":"Bearer",
"scope":"email openid profile 00000003-0000-0000-c000-000000000000/EWS.AccessAsUser.All 00000003-0000-0000-c000-000000000000/IMAP.AccessAsUser.All 00000003-0000-0000-c000-000000000000/Mail.Read-0000-c000-000000000000/Mail.Read.Shared 00000003-0000-0000-c000-000000000000/Mail.ReadWrite.Shared 00000003-0000-0000-c000-000000000000/Mail.Send 00000003-0000-0000-c000-000000000000/Mail.Send.Shared 00000003-0000-0000-c000-000000000000/POP.AccessAsUser.All 00000003-0000-0000-c0/User.Read 00000003-0000-0000-c000-000000000000/.default",
"expires_in":4914,
"ext_expires_in":4914,
"access_token":"eyJ0eXAiOiJKV1QiLCJub25jZSI6InJKaWVzUE9ERGNXTjItZlIwQTRTWVFoV2t6aVEyelFENmlMS2N1M2xycFUiLCJhbGciOiJSUzI1NiIsIng1dCI6ImpTMVhvMU9XRGpfNTJ2YndHTmd2UU8yVnpNYyIsImtpZCI6ImpTMVhvMU9XRGpfNTJ2YndHTmd2UU8yVnpNYyJ9-YQwnl0SIOht0EVcKtJuAOrMUP4xsR0uNBInGcpob9r9Pt_ZX6z_Jw412TIxdBw",
"refresh_token":"0.ARMAyjiRs.AgABAAAAAAD--DLA3VO7QrddgJg7WevrAgDs_wQA9P_9CM2vmlsFp62-YzCVROSVA-HK0F0KUqGrlLA-t-s8KOlN-elmtVBhSaVj1KvuqtxSH-lVvchKt4ZSy1aFGodMGo6M5A2a0k7E7xJgTlqeRSrS7Cq-UTekMTIzIUly7F6euyyJi1XeMLhB7Uhr-Dk_Y3pYVNn6Wy_pZOcracO-7WqlrbUQGg0bSbv-",
"id_token":"eyJ0eXAiOiJKV1QiLCJhbGciOiJSUzI1NiIsImtpZCI6ImpTMVhvMU9XRGpfNTJ2YndHTmd2UU8yVnpNYyJ9.eyJhdWQiOiIwMWZlMmVjOC01MzYzLTQ2YmUtYjEyNC01MWUwZTAxOWMwMGIiLCJpc3MiOiJodHRwczovL2xvZ2luLm1pY3Jvc29mdG9ubGluZS5jb20vYjM5MTM4Y2EtM2NlZS00YjRhLWE0ZDYtY2Q4M2Q5ZGQ",
"client_info":"eyJ1aWQiOiI4NzMwYjc5Ni1mNDRkk",
"id_token_claims":{
"aud":"01fe2ec8-5363-46be-b124-e019c00b",
"iss":"https://login.microsoftonline.com/b39138ca-3cee-/v2.0",
"iat":1648185788,
"nbf":1648185788,
"exp":1648189688,
"name":"USER",
"oid":"8730b796-f44d-4f3d-8b01-9e201055d039",
"preferred_username":"user@outlook.com",
"rh":"0.ARMAyjiRs-.",
"sub":"Z5RogClxDJWqQ",
"tid":"b39138ca-3cee-4b4a-a4d6-cd83d9dd62f0",
"uti":"AOkSTATnukSA",
"ver":"2.0"
}
}
Extract emails for a username:
We will use the MS Graph API to extract all the messages. The API documentation is available here.
try:
response = requests.get(url='https://graph.microsoft.com/v1.0/me/messages',
headers=headers)
print(response.json())
except RequestException as re:
pass
Output:
{
"@odata.context":"https://graph.microsoft.com/v1.0/$me#users('8730b796')/messages",
"value":[
{
"@odata.etag":"W/\"CQAAABYAAAC8zwaNAGKyT72PVfyJ7\"",
"id":"AAMkAGMxNjhkZjJlLWYwOTYtNDQ1ZS1hM2U1LTk2YTRhNWI0NjExOABGAAA",
"createdDateTime":"2022-03-24T15:31:02Z",
"lastModifiedDateTime":"2022-03-24T15:32:19Z",
"changeKey":"CQAAABYAAAC8zwaNAGKyT72PVfyUAQXwAAAqUjJ7",
"categories":[
],
"receivedDateTime":"2022-03-24T15:31:03Z",
"sentDateTime":"2022-03-24T15:30:39Z",
"hasAttachments":true,
"internetMessageId":"<132.JavaMail.spc@na1-napp11>",
"subject":"report",
"bodyPreview":"Hi",
"importance":"normal",
"parentFolderId":"AAMkAGMxNjhkZjJlLWYwOTYtNDQ1ZS1hM2UAA=",
"conversationId":"AAQkAGMxNjhkZjJlLWYwO8=",
"conversationIndex":"AQHYP5QrD/Cyu7b0BUOW6LeyvZLCfw==",
"isDeliveryReceiptRequested":"None",
"isReadReceiptRequested":false,
"isRead":false,
"isDraft":false,
"webLink":"https://outlook.office365.com/owa/?ItemID=AAMkAGMxNjhkZjJlLWYwOTYtNexvsurl=1&viewmodel=ReadMessageItem",
"inferenceClassification":"focused",
"body":{
"contentType":"html",
"content":"<html><head>\r\n<meta http-equiv=\"Content-Type\" content=\"text/html; charset=utf-8\"></head><body><p></p><p>Thanks & Regards,</p><p</p><br><hr></body></html>"
},
"sender":{
"emailAddress":{
"name":"sender@outlook.com",
"address":"sender@outlook.com"
}
},
"from":{
"emailAddress":{
"name":"sender@outlook.com",
"address":"sender@outlook.com"
}
},
"toRecipients":[
{
"emailAddress":{
"name":"USER",
"address":"user@outlook.com"
}
}
],
"replyTo":[
{
"emailAddress":{
"name":"",
"address":""
}
}
],
"flag":{
"flagStatus":"notFlagged"
}
}
}
Now that we have the messages, we need to extract the id
of the messages that have the flag hasAttachments
set to True
. Something like below:
mails_with_attachments = list()
# all_msgs is the response of all messages API
for msg in all_msgs.get('value'):
if msg.get('sender').get('emailAddress').get('address') == sender and msg.get('hasAttachments'):
mails_with_attachments.append(msg.get("id"))
Alternatively, you could also use the filter options in the API, to filter messages from a specific sender like below.
GET https://graph.microsoft.com/v1.0/me/messages?$select=sender,subject
Here, sender
is the filter. You could also couple multiple filters like sender and subject as well.
Now, let’s assume we have all the ids of emails with attachments. We need to get all the attachment ids for an email.
Extract attachment id for a message:
We are assuming that this email has only one attachment. The API documentation is available here.
response = requests.get(url='"https://graph.microsoft.com/v1.0/me/messages/<msg-id>/attachments"', headers=headers)
jsonified_response = response.json()
attachment_id = jsonified_response.get('value')[0].get('id')
Download attachment:
Now that we have both the attachment and msg ids, we can download the attachment. I have also added a progress bar to show the download progress. the API documentation for download is available here.
from tqdm import tqdm
def download_attachment(msg_id, att_id, name):
try:
response = requests.get(
url="https://graph.microsoft.com/v1.0/me/messages/<msg id>/attachments/<attachment id>/$value",
headers=headers,
stream=True)
file_size = len(response.content)
with open(f"{name}.zip", 'wb') as f, tqdm(unit='iB', unit_scale=True,
unit_divisor=1024, total=file_size,
desc=f"Downloading {name}.zip") as pbar:
for data in response.iter_content(chunk_size=1024):
pbar.update(len(data))
f.write(data)
return response
except ConnectionError as ce:
print(f" Connection error: {ce}")
except RequestException as re:
print(f" Request exception: {re}")
Summary:
- Create a
ClientApplication
object and use it throughout the lifecycle of our application. - Use the app object to extract all attachments for a specific sender.
- Filter the messages with attachments and add all the attachment ids and their corresponding message ids to a
list/dict
. - Use the message id and attachment id to download the attachment.
And there you have it. Thank you for reading.