Introduction
AWS Lambda, a serverless computing service, provides a scalable and cost-effective way to run code without provisioning or managing servers. In this article, we’ll explore the practical usage of AWS Lambda by focusing on retrieving folders within an S3 bucket and reading data using Python. Whether you’re a seasoned AWS user or just starting, this guide will help you harness the capabilities of AWS Lambda for seamless, event-driven data processing.
Setting the Stage
Before diving into Lambda functions, make sure you have an AWS account and the AWS Command Line Interface (CLI) installed. Additionally, create an S3 bucket with the desired folder structure and data. To ensure smooth execution, grant necessary permissions to your Lambda function. This article assumes basic familiarity with AWS services and Python.
Creating an AWS Lambda Function
- Granting S3 Listing Permissions:
- Open the AWS Identity and Access Management (IAM) Console.
- Create a new IAM policy or edit an existing one, adding the necessary permissions to list objects in the S3 bucket. For example, you can use the managed policy
AmazonS3ReadOnlyAccess
. - Attach this policy to the IAM role associated with your Lambda function.
Open the AWS Lambda Console:
- Navigate to the AWS Management Console and choose Lambda.
Create a New Lambda Function:
- Click on “Create function” and choose “Author from scratch.”
- Give your function a name, select the runtime as “Python,” and choose the existing role with the required permissions.
Writing Python Code:
- Scroll down to the Function code section and replace the default code with the following Python script:
import os
import boto3
def lambda_handler(event, context):
# Extracting bucket and key from the event
bucket = event[''Records''][0][''s3''][''bucket''][''name'']
key = event[''Records''][0][''s3''][''object''][''key'']
# Initializing S3 client
s3 = boto3.client(''s3'')
# Listing folders in the bucket
folders = list_folders(bucket)
# Reading data from S3
for folder in folders:
folder_path = f"{key}/{folder}/" if key else f"{folder}/"
data = read_data_from_s3(bucket, folder_path)
print(f"Data from {folder} folder: {data}")
def list_folders(bucket):
s3 = boto3.client(''s3'')
# Listing objects in the bucket with ListObjectsV2
objects = s3.list_objects_v2(Bucket=bucket)
# Extracting unique folder names
folders = set()
for obj in objects.get(''Contents'', []):
folders.add(os.path.dirname(obj[''Key'']))
return folders
def read_data_from_s3(bucket, key):
s3 = boto3.client(''s3'')
# Reading data from S3
response = s3.get_object(Bucket=bucket, Key=key)
data = response[''Body''].read().decode(''utf-8'')
return data
This script retrieves folder names within the specified S3 bucket and reads data from each folder.
Deploying and Testing
Deploying the Function:
- Save the Lambda function, and click on “Deploy” to update the function with the latest changes.
Testing the Lambda Function:
- Upload a file to the S3 bucket, triggering the Lambda function.
- Check the CloudWatch logs for Lambda to see the output.
Conclusion
AWS Lambda provides a powerful serverless computing environment for executing code in response to various events, such as changes in an S3 bucket. In this article, we focused on retrieving folder names within an S3 bucket and reading data from those folders using a Python-based Lambda function.