Puppeteer is a Node library commonly used for browser automation tasks. In this shot we will cover how to run puppeteer on AWS Lambda, a serverless computing platform, and how to save screenshots in
In this shot we will be using the Serverless framework and Lambda layers.
We assume that you have a configured environment with AWS CLI installed, proper AWS credentials, and Serverless.
Let’s begin by setting up the directory and initializing a Node.js project:
mkdir -p puppeteer-lambda/src
cd puppeteer-lambda
npm init -y
Next, we need to install Serverless as a dev dependency:
npm install --save-dev serverless
Now you may want to install a puppeteer and use it directly, but the puppeteer package size is quite large, and you may run into problems running it on AWS Lambda.
A better method is to use chrome-aws-lambda. This module comes with the chromium binary for the Lambda environment, making it suitable for our project.
Let’s install the chrome-aws-lambda module:
npm install --save-dev chrome-aws-lambda
Notice that we installed chrome-aws-lambda as a dev dependency. This is because we will be using a Lambda layer that comes pre-installed with this module.
Now we need to configure the serverless.yml file. We have provided a sample file with comments, most of the parameters are the same as the official documentation.
# serverless.ymlservice: lambdaScreenshotcustom:# change this name to something uniques3Bucket: screenshot-filesprovider:name: awsregion: us-east-1versionFunctions: false# here we put the layers we want to uselayers:# Google Chrome for AWS Lambda as a layer# Make sure you use the latest version depending on the region# https://github.com/shelfio/chrome-aws-lambda-layer- arn:aws:lambda:${self:provider.region}:764866452798:layer:chrome-aws-lambda:10# function parametersruntime: nodejs12.xmemorySize: 2048 # recommendedtimeout: 30iamRoleStatements:- Effect: AllowAction:- s3:PutObject- s3:PutObjectAclResource: arn:aws:s3:::${self:custom.s3Bucket}/*functions:capture:handler: src/capture.handlerenvironment:S3_REGION: ${self:provider.region}S3_BUCKET: ${self:custom.s3Bucket}resources:Resources:# Bucket where the screenshots are storedscreenshotsBucket:Type: AWS::S3::BucketDeletionPolicy: DeleteProperties:BucketName: ${self:custom.s3Bucket}AccessControl: Private# Grant public read-only access to the bucketscreenshotsBucketPolicy:Type: AWS::S3::BucketPolicyProperties:PolicyDocument:Statement:- Effect: AllowAction:- s3:GetObjectPrincipal: "*"Resource: arn:aws:s3:::${self:custom.s3Bucket}/*Bucket:Ref: screenshotsBucket
Finally, we can write the function that will run on AWS Lambda.
In the code given below, we use puppeteer to spin up a headless chromium instance, go to a page, and take a screenshot. Next, we upload the screenshot to the S3 client for storage. Finally, we return the URL of the uploaded screenshot file.
// src/capture.js// this module will be provided by the layerconst chromeLambda = require("chrome-aws-lambda");// aws-sdk is always preinstalled in AWS Lambda in all Node.js runtimesconst S3Client = require("aws-sdk/clients/s3");// create an S3 clientconst s3 = new S3Client({ region: process.env.S3_REGION });// The function to runexports.handler = async (event) => {// launch a headless browserconst browser = await chromeLambda.puppeteer.launch({args: chromeLambda.args,defaultViewport: chromium.defaultViewport,executablePath: await chromeLambda.executablePath});// Open a page and navigate to the urlconst page = await browser.newPage();await page.goto(event.url);// take a screenshotconst buffer = await page.screenshot()// upload the image using the current timestamp as filenameconst result = await s3.upload({Bucket: process.env.S3_BUCKET,Key: `${Date.now()}.png`,Body: buffer,ContentType: "image/png",ACL: "public-read"}).promise();// return the uploaded image urlreturn { url: result.Location };};
Lastly, we can deploy the function to AWS using the Serverless command:
sls deploy
To test the function, you can go to the AWS console by going to “Configure test events” and entering a test URL in the textbox:
{
"url": "https://example.com/"
}
Click “Create” and then click “Test”. You should see your test running and, after a few seconds, you will have the URL to your screenshot.
Free Resources