How to Download Attachments From Outlook Using Python and MSAL

Automate your Outlook attachment downloads using Python

Published on

image

Introduction:

This article describes how to download email attachments from an Outlook mailbox using Python. We will be using the following libraries to accomplish this task.

AzureAD/microsoft-authentication-library-for-python

Note: Please note that if you are using ADAL for authentication, Microsoft recommends migrating to MSAL.

Requirements:

To extract attachments from an email we need the following.

  • Mailbox credentials (username and password)
  • Sender email (Filter messages from a specific sender if required)
  • Message id (Unique message id for an email)
  • Attachment id for a Message id (Attachment id for an email that has an attachment).

The above data is required to authenticate, initialise the ClientApplication object, and construct the MS Graph API to download the attachments.

Install MSAL:

$pip install msal

The Microsoft authentication library for python allows you to sign in users or apps with Microsoft identities, obtain auth tokens to be used with Microsoft Graph APIs. They are built using OAuth2 and OpenID connect protocols.

Initialising the Client Application:

MSAL defines 3 types of applications and clearly provides a demarcation in initialising them.

  • Client Application
  • PublicClientApplication
  • ConfidentialClientApplication

To learn more about the OAuth client types please click here. In this article, we will be using ClientApplication to initialise the app object and reuse it throughout our application.

from msal import ClientApplication

class AttachmentDownloader:

def __init__(self, username: str, password: str):

self.client_id = '<your client id>'

self.authority = 'https://login.microsoftonline.com/<tenant-name>'

# Initialise MS ClientApplication object with your client_id and authority URL

self.app = ClientApplication(client_id=self.client_id,

authority=self.authority)

self.username = username # your mailbox username

self.password = password # your mailbox password

if __name__ == "__main__":

downloader = AttachmentDownloader("username@outlook.com", "password")

Acquire token:

Now that we have our app object initialised, we can acquire the token. This token can be used to extract to access_token for headers.

token = self.app.acquire_token_by_username_password(username=self.username,

password=self.password,

scopes=['.default'])

print(token)

Output:

This gets the default, top 10 messages in the signed-in user’s mailbox.

If you wish to increase the number of results returned, you could set the page size using the top query parameter.

https://graph.microsoft.com/v1.0/me/messages?$top=20

This sets the page size to 20.

The token output looks like this.

{

"token_type":"Bearer",

"scope":"email openid profile 00000003-0000-0000-c000-000000000000/EWS.AccessAsUser.All 00000003-0000-0000-c000-000000000000/IMAP.AccessAsUser.All 00000003-0000-0000-c000-000000000000/Mail.Read-0000-c000-000000000000/Mail.Read.Shared 00000003-0000-0000-c000-000000000000/Mail.ReadWrite.Shared 00000003-0000-0000-c000-000000000000/Mail.Send 00000003-0000-0000-c000-000000000000/Mail.Send.Shared 00000003-0000-0000-c000-000000000000/POP.AccessAsUser.All 00000003-0000-0000-c0/User.Read 00000003-0000-0000-c000-000000000000/.default",

"expires_in":4914,

"ext_expires_in":4914,

"access_token":"eyJ0eXAiOiJKV1QiLCJub25jZSI6InJKaWVzUE9ERGNXTjItZlIwQTRTWVFoV2t6aVEyelFENmlMS2N1M2xycFUiLCJhbGciOiJSUzI1NiIsIng1dCI6ImpTMVhvMU9XRGpfNTJ2YndHTmd2UU8yVnpNYyIsImtpZCI6ImpTMVhvMU9XRGpfNTJ2YndHTmd2UU8yVnpNYyJ9-YQwnl0SIOht0EVcKtJuAOrMUP4xsR0uNBInGcpob9r9Pt_ZX6z_Jw412TIxdBw",

"refresh_token":"0.ARMAyjiRs.AgABAAAAAAD--DLA3VO7QrddgJg7WevrAgDs_wQA9P_9CM2vmlsFp62-YzCVROSVA-HK0F0KUqGrlLA-t-s8KOlN-elmtVBhSaVj1KvuqtxSH-lVvchKt4ZSy1aFGodMGo6M5A2a0k7E7xJgTlqeRSrS7Cq-UTekMTIzIUly7F6euyyJi1XeMLhB7Uhr-Dk_Y3pYVNn6Wy_pZOcracO-7WqlrbUQGg0bSbv-",

"id_token":"eyJ0eXAiOiJKV1QiLCJhbGciOiJSUzI1NiIsImtpZCI6ImpTMVhvMU9XRGpfNTJ2YndHTmd2UU8yVnpNYyJ9.eyJhdWQiOiIwMWZlMmVjOC01MzYzLTQ2YmUtYjEyNC01MWUwZTAxOWMwMGIiLCJpc3MiOiJodHRwczovL2xvZ2luLm1pY3Jvc29mdG9ubGluZS5jb20vYjM5MTM4Y2EtM2NlZS00YjRhLWE0ZDYtY2Q4M2Q5ZGQ",

"client_info":"eyJ1aWQiOiI4NzMwYjc5Ni1mNDRkk",

"id_token_claims":{

"aud":"01fe2ec8-5363-46be-b124-e019c00b",

"iss":"https://login.microsoftonline.com/b39138ca-3cee-/v2.0",

"iat":1648185788,

"nbf":1648185788,

"exp":1648189688,

"name":"USER",

"oid":"8730b796-f44d-4f3d-8b01-9e201055d039",

"preferred_username":"user@outlook.com",

"rh":"0.ARMAyjiRs-.",

"sub":"Z5RogClxDJWqQ",

"tid":"b39138ca-3cee-4b4a-a4d6-cd83d9dd62f0",

"uti":"AOkSTATnukSA",

"ver":"2.0"

}

}

Extract emails for a username:

We will use the MS Graph API to extract all the messages. The API documentation is available here.

try:

response = requests.get(url='https://graph.microsoft.com/v1.0/me/messages',

headers=headers)

print(response.json())

except RequestException as re:

pass

Output:

{

"@odata.context":"https://graph.microsoft.com/v1.0/$me#users('8730b796')/messages",

"value":[

{

"@odata.etag":"W/\"CQAAABYAAAC8zwaNAGKyT72PVfyJ7\"",

"id":"AAMkAGMxNjhkZjJlLWYwOTYtNDQ1ZS1hM2U1LTk2YTRhNWI0NjExOABGAAA",

"createdDateTime":"2022-03-24T15:31:02Z",

"lastModifiedDateTime":"2022-03-24T15:32:19Z",

"changeKey":"CQAAABYAAAC8zwaNAGKyT72PVfyUAQXwAAAqUjJ7",

"categories":[

],

"receivedDateTime":"2022-03-24T15:31:03Z",

"sentDateTime":"2022-03-24T15:30:39Z",

"hasAttachments":true,

"internetMessageId":"<132.JavaMail.spc@na1-napp11>",

"subject":"report",

"bodyPreview":"Hi",

"importance":"normal",

"parentFolderId":"AAMkAGMxNjhkZjJlLWYwOTYtNDQ1ZS1hM2UAA=",

"conversationId":"AAQkAGMxNjhkZjJlLWYwO8=",

"conversationIndex":"AQHYP5QrD/Cyu7b0BUOW6LeyvZLCfw==",

"isDeliveryReceiptRequested":"None",

"isReadReceiptRequested":false,

"isRead":false,

"isDraft":false,

"webLink":"https://outlook.office365.com/owa/?ItemID=AAMkAGMxNjhkZjJlLWYwOTYtNexvsurl=1&viewmodel=ReadMessageItem",

"inferenceClassification":"focused",

"body":{

"contentType":"html",

"content":"<html><head>\r\n<meta http-equiv=\"Content-Type\" content=\"text/html; charset=utf-8\"></head><body><p></p><p>Thanks &amp; Regards,</p><p</p><br><hr></body></html>"

},

"sender":{

"emailAddress":{

"name":"sender@outlook.com",

"address":"sender@outlook.com"

}

},

"from":{

"emailAddress":{

"name":"sender@outlook.com",

"address":"sender@outlook.com"

}

},

"toRecipients":[

{

"emailAddress":{

"name":"USER",

"address":"user@outlook.com"

}

}

],

"replyTo":[

{

"emailAddress":{

"name":"",

"address":""

}

}

],

"flag":{

"flagStatus":"notFlagged"

}

}

}

Now that we have the messages, we need to extract the id of the messages that have the flag hasAttachments set to True. Something like below:

mails_with_attachments = list()

# all_msgs is the response of all messages API

for msg in all_msgs.get('value'):

if msg.get('sender').get('emailAddress').get('address') == sender and msg.get('hasAttachments'):

mails_with_attachments.append(msg.get("id"))

Alternatively, you could also use the filter options in the API, to filter messages from a specific sender like below.

GET https://graph.microsoft.com/v1.0/me/messages?$select=sender,subject

Here, sender is the filter. You could also couple multiple filters like sender and subject as well.

Now, let’s assume we have all the ids of emails with attachments. We need to get all the attachment ids for an email.

Extract attachment id for a message:

We are assuming that this email has only one attachment. The API documentation is available here.

response = requests.get(url='"https://graph.microsoft.com/v1.0/me/messages/<msg-id>/attachments"', headers=headers)

jsonified_response = response.json()

attachment_id = jsonified_response.get('value')[0].get('id')

Download attachment:

Now that we have both the attachment and msg ids, we can download the attachment. I have also added a progress bar to show the download progress. the API documentation for download is available here.

from tqdm import tqdm

def download_attachment(msg_id, att_id, name):

try:

response = requests.get(

url="https://graph.microsoft.com/v1.0/me/messages/<msg id>/attachments/<attachment id>/$value",

headers=headers,

stream=True)

file_size = len(response.content)

with open(f"{name}.zip", 'wb') as f, tqdm(unit='iB', unit_scale=True,

unit_divisor=1024, total=file_size,

desc=f"Downloading {name}.zip") as pbar:

for data in response.iter_content(chunk_size=1024):

pbar.update(len(data))

f.write(data)

return response

except ConnectionError as ce:

print(f" Connection error: {ce}")

except RequestException as re:

print(f" Request exception: {re}")

Summary:

  • Create a ClientApplication object and use it throughout the lifecycle of our application.
  • Use the app object to extract all attachments for a specific sender.
  • Filter the messages with attachments and add all the attachment ids and their corresponding message ids to a list/dict.
  • Use the message id and attachment id to download the attachment.

And there you have it. Thank you for reading.

Enjoyed this article?

Share it with your network to help others discover it

Continue Learning

Discover more articles on similar topics