circuit

How to Download Attachments From Outlook Using Python and MSAL

Automate your Outlook attachment downloads using Python


Introduction:

This article describes how to download email attachments from an Outlook mailbox using Python. We will be using the following libraries to accomplish this task.

AzureAD/microsoft-authentication-library-for-python

Note: Please note that if you are using ADAL for authentication, Microsoft recommends migrating to MSAL.

Requirements:

To extract attachments from an email we need the following.

  • Mailbox credentials (username and password)
  • Sender email (Filter messages from a specific sender if required)
  • Message id (Unique message id for an email)
  • Attachment id for a Message id (Attachment id for an email that has an attachment).

The above data is required to authenticate, initialise the ClientApplication object, and construct the MS Graph API to download the attachments.

Install MSAL:

$pip install msal

The Microsoft authentication library for python allows you to sign in users or apps with Microsoft identities, obtain auth tokens to be used with Microsoft Graph APIs. They are built using OAuth2 and OpenID connect protocols.

Initialising the Client Application:

MSAL defines 3 types of applications and clearly provides a demarcation in initialising them.

  • Client Application
  • PublicClientApplication
  • ConfidentialClientApplication

To learn more about the OAuth client types please click here. In this article, we will be using ClientApplication to initialise the app object and reuse it throughout our application.

from msal import ClientApplication

class AttachmentDownloader:

def __init__(self, username: str, password: str):

self.client_id = '<your client id>'

self.authority = 'https://login.microsoftonline.com/<tenant-name>'

# Initialise MS ClientApplication object with your client_id and authority URL

self.app = ClientApplication(client_id=self.client_id,

authority=self.authority)

self.username = username # your mailbox username

self.password = password # your mailbox password

if __name__ == "__main__":

downloader = AttachmentDownloader("username@outlook.com", "password")

ClientApplication Initialisation

Acquire token:

Now that we have our app object initialised, we can acquire the token. This token can be used to extract to access_token for headers.

token = self.app.acquire_token_by_username_password(username=self.username,

password=self.password,

scopes=['.default'])

print(token)

Output:

This gets the default, top 10 messages in the signed-in user’s mailbox.

If you wish to increase the number of results returned, you could set the page size using the top query parameter.

https://graph.microsoft.com/v1.0/me/messages?$top=20

This sets the page size to 20.

The token output looks like this.

{

"token_type":"Bearer",

"scope":"email openid profile 00000003-0000-0000-c000-000000000000/EWS.AccessAsUser.All 00000003-0000-0000-c000-000000000000/IMAP.AccessAsUser.All 00000003-0000-0000-c000-000000000000/Mail.Read-0000-c000-000000000000/Mail.Read.Shared 00000003-0000-0000-c000-000000000000/Mail.ReadWrite.Shared 00000003-0000-0000-c000-000000000000/Mail.Send 00000003-0000-0000-c000-000000000000/Mail.Send.Shared 00000003-0000-0000-c000-000000000000/POP.AccessAsUser.All 00000003-0000-0000-c0/User.Read 00000003-0000-0000-c000-000000000000/.default",

"expires_in":4914,

"ext_expires_in":4914,

"access_token":"eyJ0eXAiOiJKV1QiLCJub25jZSI6InJKaWVzUE9ERGNXTjItZlIwQTRTWVFoV2t6aVEyelFENmlMS2N1M2xycFUiLCJhbGciOiJSUzI1NiIsIng1dCI6ImpTMVhvMU9XRGpfNTJ2YndHTmd2UU8yVnpNYyIsImtpZCI6ImpTMVhvMU9XRGpfNTJ2YndHTmd2UU8yVnpNYyJ9-YQwnl0SIOht0EVcKtJuAOrMUP4xsR0uNBInGcpob9r9Pt_ZX6z_Jw412TIxdBw",

"refresh_token":"0.ARMAyjiRs.AgABAAAAAAD--DLA3VO7QrddgJg7WevrAgDs_wQA9P_9CM2vmlsFp62-YzCVROSVA-HK0F0KUqGrlLA-t-s8KOlN-elmtVBhSaVj1KvuqtxSH-lVvchKt4ZSy1aFGodMGo6M5A2a0k7E7xJgTlqeRSrS7Cq-UTekMTIzIUly7F6euyyJi1XeMLhB7Uhr-Dk_Y3pYVNn6Wy_pZOcracO-7WqlrbUQGg0bSbv-",

"id_token":"eyJ0eXAiOiJKV1QiLCJhbGciOiJSUzI1NiIsImtpZCI6ImpTMVhvMU9XRGpfNTJ2YndHTmd2UU8yVnpNYyJ9.eyJhdWQiOiIwMWZlMmVjOC01MzYzLTQ2YmUtYjEyNC01MWUwZTAxOWMwMGIiLCJpc3MiOiJodHRwczovL2xvZ2luLm1pY3Jvc29mdG9ubGluZS5jb20vYjM5MTM4Y2EtM2NlZS00YjRhLWE0ZDYtY2Q4M2Q5ZGQ",

"client_info":"eyJ1aWQiOiI4NzMwYjc5Ni1mNDRkk",

"id_token_claims":{

"aud":"01fe2ec8-5363-46be-b124-e019c00b",

"iss":"https://login.microsoftonline.com/b39138ca-3cee-/v2.0",

"iat":1648185788,

"nbf":1648185788,

"exp":1648189688,

"name":"USER",

"oid":"8730b796-f44d-4f3d-8b01-9e201055d039",

"preferred_username":"user@outlook.com",

"rh":"0.ARMAyjiRs-.",

"sub":"Z5RogClxDJWqQ",

"tid":"b39138ca-3cee-4b4a-a4d6-cd83d9dd62f0",

"uti":"AOkSTATnukSA",

"ver":"2.0"

}

}

token output

Extract emails for a username:

We will use the MS Graph API to extract all the messages. The API documentation is available here.

try:

response = requests.get(url='https://graph.microsoft.com/v1.0/me/messages',

headers=headers)

print(response.json())

except RequestException as re:

pass

extract all messages from a sender in outlook using MSAL

Output:

{

"@odata.context":"https://graph.microsoft.com/v1.0/$me#users('8730b796')/messages",

"value":[

{

"@odata.etag":"W/\"CQAAABYAAAC8zwaNAGKyT72PVfyJ7\"",

"id":"AAMkAGMxNjhkZjJlLWYwOTYtNDQ1ZS1hM2U1LTk2YTRhNWI0NjExOABGAAA",

"createdDateTime":"2022-03-24T15:31:02Z",

"lastModifiedDateTime":"2022-03-24T15:32:19Z",

"changeKey":"CQAAABYAAAC8zwaNAGKyT72PVfyUAQXwAAAqUjJ7",

"categories":[

],

"receivedDateTime":"2022-03-24T15:31:03Z",

"sentDateTime":"2022-03-24T15:30:39Z",

"hasAttachments":true,

"internetMessageId":"<132.JavaMail.spc@na1-napp11>",

"subject":"report",

"bodyPreview":"Hi",

"importance":"normal",

"parentFolderId":"AAMkAGMxNjhkZjJlLWYwOTYtNDQ1ZS1hM2UAA=",

"conversationId":"AAQkAGMxNjhkZjJlLWYwO8=",

"conversationIndex":"AQHYP5QrD/Cyu7b0BUOW6LeyvZLCfw==",

"isDeliveryReceiptRequested":"None",

"isReadReceiptRequested":false,

"isRead":false,

"isDraft":false,

"webLink":"https://outlook.office365.com/owa/?ItemID=AAMkAGMxNjhkZjJlLWYwOTYtNexvsurl=1&viewmodel=ReadMessageItem",

"inferenceClassification":"focused",

"body":{

"contentType":"html",

"content":"<html><head>\r\n<meta http-equiv=\"Content-Type\" content=\"text/html; charset=utf-8\"></head><body><p></p><p>Thanks &amp; Regards,</p><p</p><br><hr></body></html>"

},

"sender":{

"emailAddress":{

"name":"sender@outlook.com",

"address":"sender@outlook.com"

}

},

"from":{

"emailAddress":{

"name":"sender@outlook.com",

"address":"sender@outlook.com"

}

},

"toRecipients":[

{

"emailAddress":{

"name":"USER",

"address":"user@outlook.com"

}

}

],

"replyTo":[

{

"emailAddress":{

"name":"",

"address":""

}

}

],

"flag":{

"flagStatus":"notFlagged"

}

}

}

all messages JSON output

Now that we have the messages, we need to extract the id of the messages that have the flag hasAttachments set to True. Something like below:

mails_with_attachments = list()

# all_msgs is the response of all messages API

for msg in all_msgs.get('value'):

if msg.get('sender').get('emailAddress').get('address') == sender and msg.get('hasAttachments'):

mails_with_attachments.append(msg.get("id"))

extract the message id

Alternatively, you could also use the filter options in the API, to filter messages from a specific sender like below.

GET https://graph.microsoft.com/v1.0/me/messages?$select=sender,subject

Here, sender is the filter. You could also couple multiple filters like sender and subject as well.

Now, let’s assume we have all the ids of emails with attachments. We need to get all the attachment ids for an email.

Extract attachment id for a message:

We are assuming that this email has only one attachment. The API documentation is available here.

response = requests.get(url='"https://graph.microsoft.com/v1.0/me/messages/<msg-id>/attachments"', headers=headers)

jsonified_response = response.json()

attachment_id = jsonified_response.get('value')[0].get('id')

extract attachment id

Download attachment:

Now that we have both the attachment and msg ids, we can download the attachment. I have also added a progress bar to show the download progress. the API documentation for download is available here.

from tqdm import tqdm

def download_attachment(msg_id, att_id, name):

try:

response = requests.get(

url="https://graph.microsoft.com/v1.0/me/messages/<msg id>/attachments/<attachment id>/$value",

headers=headers,

stream=True)

file_size = len(response.content)

with open(f"{name}.zip", 'wb') as f, tqdm(unit='iB', unit_scale=True,

unit_divisor=1024, total=file_size,

desc=f"Downloading {name}.zip") as pbar:

for data in response.iter_content(chunk_size=1024):

pbar.update(len(data))

f.write(data)

return response

except ConnectionError as ce:

print(f" Connection error: {ce}")

except RequestException as re:

print(f" Request exception: {re}")

download attachment

Summary:

  • Create a ClientApplication object and use it throughout the lifecycle of our application.
  • Use the app object to extract all attachments for a specific sender.
  • Filter the messages with attachments and add all the attachment ids and their corresponding message ids to a list/dict.
  • Use the message id and attachment id to download the attachment.

And there you have it. Thank you for reading.




Continue Learning