AWS Lambda: Retrieving S3 Bucket Folders and Reading Data with Python

Harness the capabilities of AWS Lambda for seamless, event-driven data processing.

Introduction

AWS Lambda, a serverless computing service, provides a scalable and cost-effective way to run code without provisioning or managing servers. In this article, we’ll explore the practical usage of AWS Lambda by focusing on retrieving folders within an S3 bucket and reading data using Python. Whether you’re a seasoned AWS user or just starting, this guide will help you harness the capabilities of AWS Lambda for seamless, event-driven data processing.

Setting the Stage

Before diving into Lambda functions, make sure you have an AWS account and the AWS Command Line Interface (CLI) installed. Additionally, create an S3 bucket with the desired folder structure and data. To ensure smooth execution, grant necessary permissions to your Lambda function. This article assumes basic familiarity with AWS services and Python.

Creating an AWS Lambda Function

  1. Granting S3 Listing Permissions:
  • Open the AWS Identity and Access Management (IAM) Console.
  • Create a new IAM policy or edit an existing one, adding the necessary permissions to list objects in the S3 bucket. For example, you can use the managed policy AmazonS3ReadOnlyAccess.
  • Attach this policy to the IAM role associated with your Lambda function.

Open the AWS Lambda Console:

  • Navigate to the AWS Management Console and choose Lambda.

Create a New Lambda Function:

  • Click on “Create function” and choose “Author from scratch.”
  • Give your function a name, select the runtime as “Python,” and choose the existing role with the required permissions.

Writing Python Code:

  • Scroll down to the Function code section and replace the default code with the following Python script:
import os
import boto3

def lambda_handler(event, context):
    # Extracting bucket and key from the event
    bucket = event[''Records''][0][''s3''][''bucket''][''name'']
    key = event[''Records''][0][''s3''][''object''][''key'']

    # Initializing S3 client
    s3 = boto3.client(''s3'')

    # Listing folders in the bucket
    folders = list_folders(bucket)

    # Reading data from S3
    for folder in folders:
        folder_path = f"{key}/{folder}/" if key else f"{folder}/"
        data = read_data_from_s3(bucket, folder_path)
        print(f"Data from {folder} folder: {data}")

def list_folders(bucket):
    s3 = boto3.client(''s3'')
    
    # Listing objects in the bucket with ListObjectsV2
    objects = s3.list_objects_v2(Bucket=bucket)

    # Extracting unique folder names
    folders = set()
    for obj in objects.get(''Contents'', []):
        folders.add(os.path.dirname(obj[''Key'']))

    return folders

def read_data_from_s3(bucket, key):
    s3 = boto3.client(''s3'')
    
    # Reading data from S3
    response = s3.get_object(Bucket=bucket, Key=key)
    data = response[''Body''].read().decode(''utf-8'')
    
    return data

This script retrieves folder names within the specified S3 bucket and reads data from each folder.

Deploying and Testing

Deploying the Function:

  • Save the Lambda function, and click on “Deploy” to update the function with the latest changes.

Testing the Lambda Function:

  • Upload a file to the S3 bucket, triggering the Lambda function.
  • Check the CloudWatch logs for Lambda to see the output.

Conclusion

AWS Lambda provides a powerful serverless computing environment for executing code in response to various events, such as changes in an S3 bucket. In this article, we focused on retrieving folder names within an S3 bucket and reading data from those folders using a Python-based Lambda function.

Continue Learning

Discover more articles on similar topics