Read File Data from S3 using Python AWS Lambda

Shreyas MS
7 min readJul 9, 2023

In this post we will see how to automatically trigger the AWS Lambda function which will read the files uploaded into S3 bucket and display the data using the Python Pandas Library.
We will use boto3 API’s to read the files from S3 Bucket.

Steps followed to achieve this:
1. Create an IAM Role in AWS
2. Create an AWS S3 Bucket.
3. Create the AWS Lambda function with S3 triggers enabled.
4. Update the Lambda Code with python script to read the data

Pre-requisites

This article assumes that the following objects are already configured:

— AWS Account.

Now lets perform steps:

Step 1 → Create an IAM Role in AWS

In this step, we will create the IAM Role with required permissions to access S3. For more details refer to Lambda Execution Role

— Login to AWS management console and navigate to AWS IAM and click on Create Role.
— In the role creation window, select AWS Service and Lambda in the common use cases as shown below and click next.

— In the Add Permissions window look for below mentioned permissions and select those and click next.
: AmazonS3ReadOnlyAccess
: AWSLambdaBasicExecutionRole
— Enter the role name and create the role.

Step 2 → Create the AWS S3 Bucket.

— Login to AWS management console and navigate to AWS S3 & click on Create Bucket.
— Enter the unique bucket name and leave the rest of the option as-is and click on create bucket.

Now lets upload CSV file with some data in it.

— Navigate to S3 bucket window and click on Upload button and select the file and upload it (Sample data shown below).

Step 3 → Create the AWS Lambda function with S3 triggers enabled.

— Login to AWS management console and navigate to AWS Lambda.
— Navigate to Lambda function & click on Create Function.
— Select Author from scratch and enter the basic information as mentioned below.

  • Function Name: lambda-s3-trigger
  • Runtime: choose run time as per the python version. (In my case Python 3.8)
  • Architecture: x86_64
  • Change default execution role: Select the role which was created in the Step1

The lambda function has been created, now lets the Add the trigger to the lambda function, so that when ever the specific files have been uploaded then this lambda function will be called automatically.

— Open the Lambda function which was created earlier and click on the Add trigger option

— In the Add trigger window, select S3 option from the dropdown and enter the basic details mentioned below and click Add button.

  • Bucket: select the bucket which was created in the Step 2
  • Event types: Select only Put option, since we want to trigger the lambda function only when the files have been uploaded into the bucket.
  • Suffix: Enter “ .csv “, so that the lambda function will be triggered only when then the file type of CSV have been uploaded.
  • Recursive invocation: acknowledge this option. Read the description provided by AWS very carefully & try not to implement the recursive process.

Step 4 → Update the Lambda Code with python script to read the data.

— Open the Lambda function which was created earlier and navigate to the Code tab, where you will see some default code written by lambda.

— Replace that default code with the below mentioned code.

import json
import boto3
import io
from io import StringIO
import pandas as pd

s3_client = boto3.client('s3')

def lambda_handler(event, context):
try:
s3_Bucket_Name = event["Records"][0]["s3"]["bucket"]["name"]
s3_File_Name = event["Records"][0]["s3"]["object"]["key"]

object = s3_client.get_object(Bucket=s3_Bucket_Name, Key=s3_File_Name)
body = object['Body']
csv_string = body.read().decode('utf-8')
dataframe = pd.read_csv(StringIO(csv_string))

print(dataframe.head(3))

except Exception as err:
print(err)

# TODO implement
return {
'statusCode': 200,
'body': json.dumps('Hello from Lambda!')
}

Now lets test the code manually by creating the test event in the Lambda function and see whether the code is working properly or not.

— Click on the Test button and configure test events as mentioned below.

  • Choose Create new event.
  • For Event name, enter a name for the test event. For example, s3-test-event.
  • Event sharing settings let it be Private
  • For Event template, choose s3-put.
  • In the test event JSON, replace the S3 bucket name (example-bucket) and object key (test/key) with your bucket name and test file name.

Your test event should look similar to the following:

Now we are done with all the configurations steps. Lets trigger the Test Event and see the results.

— Click on the Test button and Lambda will run the code and results will be displayed on Execution Results window.

Now if you notice, there will be an error message saying “ No module name ‘pandas’ “. This is because in the lambda python script we have used the PANDAS module which is currently not available in the Lambda Layer.

In-order to use PANDAS module in lambda function, a Lambda Layer needs to be attached to the AWS Lambda Function.

Now lets attach the PANDAS layer:

— On the Lambda Function window scroll-down and Click on the Add Layer button.

— In the Add Layer window select the option as shown below and then click on add.

Now once again test the script by clicking on the Test Button, you will see the result which display the data available in the CSV file.

So now our python code is running without any error. Lets upload the CSV file into the S3 bucket which will trigger the Lambda function and will display the data available in the file.

— Navigate to S3 console and upload the new file of type CSV (Employee_Details_1.csv).

Upon uploading the new file, the trigger will be called and Lambda Function will be executed automatically & data will be written to AWS Cloud Watch Logs.

To view the data in the logs:
— Navigate to Lambda Function, then go into Monitor & click on the View CloudWatch Logs button. It will open the logs window and click on the Log File.

Now in the Log File, you can see the data available in the CSV file which was uploaded into S3 bucket.

To Summarize:

  • AWS Lambda a serverless compute service, which lets you run the code without needing to think about the servers and it’s maintenance. You only pay for the compute time that you use.
  • We can written code in one of the supported languages and runtimes and upload them to AWS Lambda, which executes those functions in an efficient and flexible manner.
  • Some of the usecases of Lambda Function:
    - Event-Based Data Processing. Like reading the files & perform some data transformation logic.
    - Moving data between different datastores on demand or at regular time interval.
    - Automating business tasks that don’t required an server up & running all the times.
    - Schedule Job to clean up the infrastructure.

Thank You for reading, I hope this blog will help you getting the understanding of AWS Lambda Function, it’s capability & Usecases.

--

--

Shreyas MS

Data Engineer by Profession | Data & Cloud Enthusiast - Snowflake | AWS | Connect - linkedin.com/in/shreyas-ms-48661533