AWS Data API for Amazon Aurora Serverless

Introduction

At re:invent 2017 AWS introduced Aurora Serverless database MySQL compatible. For years, the primary database AWS promoted for Serverless was DynamoDB which is a super-scalable noSQL database. Still, for someone coming from SQL world it was hard to jump on board as querying the DynamoDB database is not an easy task and designers need to think of all usage patterns before they start building it. AWS Aurora as a managed SQL service was already a major success and the fastest growing AWS service in history so AWS introduced a product (Aurora Serverless) that brings us some great stuff from both worlds – SQL and Serverless. We don’t need to think about servers any more. The database will scale automatically. Even back to 0. But connecting to the database remained the same. You just use your standard library and you are fine.

In May 2019 AWS introduced Data API for Amazon Aurora Serverless. In this blog post Jeff Barr explains the reasons to introduce it, discussing the issues with different client libraries, connection timeout and pooling and so on. But I would say that the most important sentence in this blog was the one saying “Because there is no long-running server, there’s no place to store a connection identifier for reuse.”

The whole idea about the Serverless is not to manage your own servers but to use managed services that auto scale with your needs. In a traditional 3-layer app (client-appserver-database) the AppServer layer would take care of a connection pool and allow the app to have many queries but still maintaining the maximum number of connections to the database layer. AppServer layer is still there and serves us very well but in an attempt to make such applications Highly Available and Scalable we have no option but to multiply Application server instances and with them to multiply the connection pool layer. Due to unpredictable number of AppServers we can’t control the final maximum number of connections to the database anymore. With introduction of Lambda things got even worse as we can now have thousands of simultaneous events attacking our database.

So AWS provided us with a centralized proxy or connection pool that keeps our connections open! even during auto scaling cycles of Aurora Serverless database from Zero to MaxCPUs and back without us noticing it.

Aurora Servlerless scales from Zero to Max and back based on the load (CPU usage and number of connections). To be precise, there is no clear info in Aurora FAQs about warm-up or scale-up time or exact formula which triggers scale-up but if you check this video you can see AWS team talking about 5 sec to warm-up and 5 sec to scale up. In reality, cold start takes about 30-40 sec. Of, course, you can set the minimum capacity to 1 ACU (Aurora Capacity Unit) to avoid this problem and have your server warm all the time.

From the developer’s perspective Data API is an HTTPS based REST API using JSON format to map communication with backend Aurora MySQL cluster and its proxy fleet.

This means it will require you to rewrite your application. Those who use JSON in their database responses (like Node.js apps) will have much less to change.

If you have time and want your existing Java app running with no changes you can build your own JDBC-REST driver so you can make your transition to Serverless a bit easier.

Our project

Our goal is to run a Serverless Lambda SQL powered web site in production with minimum costs. We will use this opportunity to demonstrate how easy is to deploy the entire architecture using CDK. We will not add a frontend or authorization layer to the app.

Code

Before you read the rest of the post, please clone this repo

https://github.com/goranopacic/dataapi-demo.git

so you can test it by yourself. It is a CDK project with:

  • Serverless Aurora Mysql database
  • Lambda function
  • Application Load Balancer with Lambda function as a target

AWS Cloud Development Kit (CDK) is a Infrastructure-As-A-Code solution that simplifies CloudFormation writing. It is great in sooo many ways that is out of scope of this post but you will see by yourself how easy it is to create the environment for this demo.

So, let’s build the system in the next few steps:
1. define Aurora Serverless database for MySQL
2. define Lambda function and map it via Application Load Balancer
3. deploy it
4. review Lambda function and test it

Step 1: Defining a database server

dataapi-demo-stack.ts file contains two lines required to start our server.

import { AuroraServerless } from "./auroraserverless";
var aurora = new AuroraServerless(this,'aurora-serverless')

AuroraServerless is a Construct we created for this example. Now go to lib/auroraserverless.ts file and you will find the following snippet

const secret = new DatabaseSecret(this, "MasterUserSecretDemoDataApi", {         
 username: "dbroot"       
});

which defines Secret which would keep the database password.

const dbcluster = new CfnDBCluster(this, 'apidbcluster', {
 engine: 'aurora',
 engineMode: 'serverless',
 masterUsername: secret.secretValueFromJson("username").toString(),
 masterUserPassword: secret.secretValueFromJson("password").toString(),
 scalingConfiguration: {
  autoPause: true,
  minCapacity: 1,
  maxCapacity: 16,
  secondsUntilAutoPause: 300
 }
});
this.clusterarn =`arn:aws:rds:${region}:${account}:cluster:${dbcluster.ref}`;
        this.clusterid = `${dbcluster.ref}`;
  
secret.addTargetAttachment('AttachedSecret', {
        target: this
})

In these few lines of code we’ve created a MySQL Aurora Serverless database with scaling from 1 to 16 ACU and back to 0 (autoPause: true) after 300s of being idle. We’ve defined a secret that will contain username and password and attached it to the Cluster.

Step 2: Defining a Lambda Function

Now go back to dataapi-demo-stack.ts and define the function:

const demoLambda = new lambda.Function(this, 'demo', {
 runtime: lambda.Runtime.NODEJS_8_10,
 handler: 'demo.handler',
 code: lambda.Code.asset('./lambda'),
 environment: {
  DBCLUSTERARN: aurora.clusterarn,
  DBCLUSTERID: aurora.clusterid,
  SECRETARN: aurora.secretarn
 },
 timeout: Duration.seconds(60)
});
  
const statement1 = new iam.PolicyStatement();
statement1.addResources(secret.secretArn);
statement1.addActions('secretsmanager:GetSecretValue');
demoLambda.addToRolePolicy(statement1);
  
const statement2 = new iam.PolicyStatement();
statement2.addResources(aurora.clusterarn)
statement2.addActions('rds-data:ExecuteStatement', 'rds-data:BatchExecuteStatement', 'rds-data:BeginTransaction', 'rds-data:CommitTransaction', 'rds-data:RollbackTransaction');
demoLambda.addToRolePolicy(statement2);

const statement3 = new iam.PolicyStatement();
statement3.addResources(aurora.clusterarn)
statement3.addActions('rds:DescribeDBClusters');
demoLambda.addToRolePolicy(statement3);

We’ve created the lambda definition with code (asset) in lambda/demo.js file, attached three policies we need for Data API. We’ve also set an ARN for our Secret and Database Cluster to Environment Variables SECRETARN and DBCLUSTERARN. We need lambda to have permission to read password from secrets manager so we allowed this particular lambda to read this particular secret via secretsmanager:GetSecretValue action (statement1). We’ve also enabled this lambda to execute statements on this particular Database (statement2) and we allowed this lambda to execute DescribeDBClusters (statement3) on this DB Cluster so we can check the status of it. You can see how we assigned minimum permissions to exact resources that need them. You see how easy is to get in love with CDK!

In the next few lines we created an Application Load Balancer with our Lambda function as a target so we can call {HTTPENDPOINT}/?action=something and invoke our Lambda. Please make sure you have CDK 1.6.1 at least as it introduces ALB/Lambda as targets.

//ALB
const securityGroup = new ec2.SecurityGroup(this, 'websecurity', { 
 vpc, 
 allowAllOutbound: false,
});  
securityGroup.addIngressRule(ec2.Peer.anyIpv4(), ec2.Port.tcp(80));

const loadBalancer = new elbv2.ApplicationLoadBalancer(this, 'LB', {
 vpc,
 internetFacing: true,
 securityGroup: securityGroup
 }
);

const listener = loadBalancer.addListener('Listener', { port: 80 });
listener.addTargets('Targets', {
 targets: [new targets.LambdaTarget(demoLambda)]
});

Step 3: Deploy

If you have your CDK project in place and you have CDK installed you should be able to: npm run build and then cdk deploy it. If everything goes well you should see the output like this:

Outputs:
DataapiDemoStack.auroraserverlessDBClusterARND199EAB3 = arn:aws:rds:eu-west-1:408012700083:cluster:dataapidemostack-auroraserverlessapidbcluster131e-1vl96kh3bb2q8
 DataapiDemoStack.auroraserverlessDBClusterDBIdentifierFA201D74 = dataapidemostack-auroraserverlessapidbcluster131e-1vl96kh3bb2q8
 DataapiDemoStack.ALBHttpEndPoint = Dataa-LB8A1-N2SDD6X11BLF-2089537588.eu-west-1.elb.amazonaws.com
 DataapiDemoStack.auroraserverlessSecretARN9634856B = arn:aws:secretsmanager:eu-west-1:408012700083:secret:auroraserverlessMasterUserS-m8hZNs5hROEJ-3Sc36Z

You will now run this CLI command to enable Data API. Replace the {CLUSTER} with the value from the previous line (your equivalent to dataapidemostack-auroraserverlessapidbcluster131e-1vl96kh3bb2q8).

aws rds modify-db-cluster \
    --db-cluster-identifier {CLUSTER} \
    --enable-http-endpoint

Unfortunatelly, at the moment, Enabling Data API is not supported in CloudFormation but I expect that to happen soon so we can enable it when we create the cluster in CDK. Here is the CF Roadmap and Github Issue to track.

Another line from the Output section:

DataapiDemoStack.ALBHttpEndPoint = Dataa-LB8A1-N2SDD6X11BLF-2089537588.eu-west-1.elb.amazonaws.com

which represents your {HTTPENDPOINT} part of the URL. If you call {HTTPENDPOINT}/?action=hi you will be greeted with Hi response.

Please note that your database is deployed inside VPC but even if you don’t need access from Lambda to your VPC, you will be able to connect to your Serverless database! If you want access from your VPC (Ec2, ECS, …) make sure you set up the mysql port in your database security group.

In production, it is recommended to enable secret rotation. I recommend reading AWS Secret Manager Construct library and to checkout this AWS CDK issue on Creating Aurora Serverless Cluster using a RDS Construct.

Step 4: Review

So let’s review it by walking through

  • Test – check if the database is up and running
  • Warm-up – see how the database responds while still warming up
  • Init – create the database schema and fill it with initial data
  • Select – see how to query the database
  • Transaction – simple example on how to execute the transaction
  • Batch – optimize your insert/update statements by creating a batch statement

TEST

Call {HTTPENDPOINT}/?action=test and it will display current Capacity of the cluster. Capacity 0 means that there are no servers available behind the proxy at the moment. Here is the code behind it:

var params = {
 DBClusterIdentifier: process.env.DBCLUSTERID
}
var res = await RDS.describeDBClusters(params).promise()
var serverStatus = 'DOWN'
if (res.DBClusters[0].Capacity > 0) {
        serverStatus = 'UP';      
}

If the cluster is down we will not be able to execute our queries. Therefore we need to warm it up.

WARM-UP

Call {HTTPENDPOINT}/?action=warmup and lambda will try to call SELECT 1 and check if database connection is up and server is responding. If it took more than five minutes (secondsUntilAutoPause: 300) from initializing the database cluster till you called this URL your database server will be stopped (Current Capacity: 0). First Data API call to its proxy will start the initialization of the database cluster and you will have to wait for it to start for 30-60s.

In the meantime your calls may fail due to timeouts. Check the Lambda CDK definition and you’ll see that we set the execution timeout to 60s. Unfortunately, the Data API’s executeStatement which has its own timeout of ~12 seconds does not have distinct code for warm-up period so you will end up with the response similar to this:

{
     "message": "Communications link failure\n\nThe last packet sent successfully to the server was 0 milliseconds ago. The driver has not received any packets from the server.",
     "code": "BadRequestException",
     "time": "2019-08-23T08:01:51.007Z",
     "requestId": "d35da7a0-f749-4219-b80e-f3f1e9be5d7b",
     "statusCode": 400,
     "retryable": false,
     "retryDelay": 59.6895016023675
}

You’ll have to look for BadRequestException code and parse the message for Communications link failure. You can respect retryDelay from this response.

Of course, 30-60s for a website to load is a nightmare. Therefore this solution is still not a good option for public web sites but it is fine for private web sites that can present you with Loading … message.

Why don’t we ask for Capacity and check it against zero before every statement or at least check it if there was an error like this? Unfortunately it is not Current Capacity but Targeted Capacity. The second you start executing statements it changes to 1 despite the fact it is not up and running yet. It may also be time consuming to have one extra REST call before every statement.

As it takes between 30 and 60 sec for the database to warm-up we can’t use API Gateway because of its 30 sec integration timeout that would cause {“message”: “Endpoint request timed out”} message. Therefore in this project we are using Application Load Balancer with Lambda targets as it doesn’t introduce any other timeouts beside Lambda execution timeout.

Of course, there is always an option to set the minimum ACU to 1 and run and pay for the database no matter if you use it or not.

So why did I spend so much time on scaling from 0 to 1 when we can always keep the minimum capacity at 1?

1 ACU running for entire month will cost you around 45$ which is almost the same cost as single-AZ t3.medium instance running all month long. Running many low traffic web sites with separate database without simple and fast scaling from 0 to 1 would not make any sense. Also, should we really consider something Serverless if it can’t easily scale from 0 to MAX and if need to pay for it even if we don’t use it?

INIT

go to lambda/demo.js and find the if (action == ‘init’) block and you’ll find CREATE DATABASE, CREATE TABLE and INSERT INTO statements.

const params1 = {
 secretArn: process.env.SECRETARN,
 resourceArn: process.env.DBCLUSTERARN,
 sql: `CREATE DATABASE demodb CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci`
}
let data1 = await RDSDATA.executeStatement(params1).promise()

const params2 = {
 secretArn: process.env.SECRETARN,
 resourceArn: process.env.DBCLUSTERARN,
 sql: `CREATE table demodb.demotable(id BIGINT AUTO_INCREMENT, demoname VARCHAR(255), demodate DATETIME, PRIMARY KEY (id))`
}
let data2 = await RDSDATA.executeStatement(params2).promise()

const params3 = {
 secretArn: process.env.SECRETARN,
 resourceArn: process.env.DBCLUSTERARN,
 sql: `INSERT INTO demodb.demotable(demoname,demodate) VALUES (:name,:date)`,
 parameters: [
 {
  name: 'name',
  value: {
   stringValue : 'Welcome'
  }
 },{
  name: 'date',
  value: {
   stringValue : '2019-08-18 01:01:01'
  }
 }
 ]
}
let data3 = await RDSDATA.executeStatement(params3).promise()

you can run it by calling

{HTTPENDPOINT}/?action=init

SELECT

Now that we have our database we should start reading from it. If you call

{HTTPENDPOINT}/?action=select

lambda will execute

const params = {
 secretArn: process.env.SECRETARN,
 resourceArn: process.env.DBCLUSTERARN,
 includeResultMetadata: true,
 sql: `select * from demotable`,
 database: 'demodb'
}
 
var data1 = await RDSDATA.executeStatement(params).promise()
var tabledata = JSON.stringify(data1, null, 2);

the result will be the following:

{  
   "columnMetadata":[  
      {  
         "arrayBaseColumnType":0,
         "isAutoIncrement":true,
         "isCaseSensitive":false,
         "isCurrency":false,
         "isSigned":true,
         "label":"id",
         "name":"id",
         "nullable":0,
         "precision":20,
         "scale":0,
         "schemaName":"",
         "tableName":"demotable",
         "type":-5,
         "typeName":"BIGINT"
      },
      {  
         "arrayBaseColumnType":0,
         "isAutoIncrement":false,
         "isCaseSensitive":false,
         "isCurrency":false,
         "isSigned":false,
         "label":"demoname",
         "name":"demoname",
         "nullable":1,
         "precision":255,
         "scale":0,
         "schemaName":"",
         "tableName":"demotable",
         "type":12,
         "typeName":"VARCHAR"
      },
      {  
         "arrayBaseColumnType":0,
         "isAutoIncrement":false,
         "isCaseSensitive":false,
         "isCurrency":false,
         "isSigned":false,
         "label":"demodate",
         "name":"demodate",
         "nullable":1,
         "precision":19,
         "scale":0,
         "schemaName":"",
         "tableName":"demotable",
         "type":93,
         "typeName":"DATETIME"
      }
   ],
   "numberOfRecordsUpdated":0,
   "records":[  
      [  
         {  
            "longValue":1
         },
         {  
            "stringValue":"Welcome"
         },
         {  
            "stringValue":"2019-08-18 01:01:01"
         }
      ]
   ]
}

includeResultMetadata is optional as well as parameters.

Be careful and limit your responses because max response size is 1MB or 1,000 records and you will not receive your response if it is above any of these limits. Also, you should stay below 64KB per row in the result set.

I’ve set some logs to track the execution. The first SELECT query takes about 800ms and subsequent queries max at 200ms. If you look at the top of the Lambda you’ll see that I’ve enabled http KeepAlive for sdk requests as suggested in numerous AWS specs but I can’t say I noticed any difference.

Keep-Alive Disabled series of requests: 130ms,72ms,92ms,138ms,57ms,81ms,61ms
Keep-Alive Enabled series of requests: 123,ms,77ms,96ms,96ms,100ms,148ms,75ms

I’ll put some more pressure on the system to test it further with huge number of requests but in the meantime I’ll keep it enabled.

Wait a minute. ~100 ms per SELECT ! yes, that is too much for any intensive web site with many INSERT/UPDATE operations but it is actually enough to run an application with simple CRUD operations.

I’ve also set the config for sdk trying to modify default retry and timeout policies but as we are dealing with proxy endpoint and not the database server directly these configuration changes had no effect.

PARAMETERS

Passing parameters is simple but you have to take care of object types. You must map your data types to blobValue, booleanValue, doubleValue, longValue, stringValue and isNull value. If you don’t see your value type here you should use blobValue to map it. This is a must for all BINARY types.

If you pass UTF-8 characters using stringValue you will end up with ??? chars in the database. You should use blobValue as it automatically BASE64 encodes strings for you:

sql: `INSERT INTO demotable(demoname) VALUES (1,:name)`,
parameters: [
 {
  name: 'name',
  value: {
   blobValue : 'АБВГДЂЕЖЗ...'
  }
 }
]

TRANSACTIONS

Transaction starts with beginTransaction call. It returns transactionId that you can use to call transaction statements. When you finish you call commitTransaction() or rollbackTransaction().

If you want to run this code call: {HTTPENDPOINT}/?action=transaction

Here is the beginTransaction()

let paramsTransaction = {
 secretArn: process.env.SECRETARN,
 resourceArn: process.env.DBCLUSTERARN,
 database: 'demodb'
};
let transData = await RDSDATA.beginTransaction(paramsTransaction).promise();
      transId = transData.transactionId;

and a statement in it:

const params1 = {
 secretArn: process.env.SECRETARN,
 resourceArn: process.env.DBCLUSTERARN,
 sql: `INSERT INTO demotable(demoname) VALUES('NAME1')`,
 database: 'demodb',
 transactionId: transId
}
let data1 = await RDSDATA.executeStatement(params1).promise();

and finally

let paramsCommitTransaction = {
 secretArn: process.env.SECRETARN,
 resourceArn: process.env.DBCLUSTERARN,
 transactionId: transId
  };

 let commitData = await RDSDATA.commitTransaction(paramsCommitTransaction).promise(); 

If everything goes well you’ll receive

{ "transactionStatus": "Transaction Committed" }

Keep in mind the following limitations: A transaction times out if there are no calls that use its transaction ID in three minutes. If a transaction times out before it’s committed, it’s rolled back automatically. A transaction can run for a maximum of 24 hours. A transaction is terminated and rolled back automatically after 24 hours.

In this transaction example in demo.js file you can one find how to read autogenerated field:

const autogeneratedID = data.generatedFields[0].longValue;

BATCHES

Calling REST api for every single item in an array would cause long total time of execution. Data API provides us with an option to send an array of data together with a single SQL command.

const params3 = {
 secretArn: process.env.SECRETARN,
 resourceArn: process.env.DBCLUSTERARN,
 database: 'demodb',
 sql: `INSERT INTO demotable(demoname,demodate) VALUES (:name,:date)`,
        parameterSets: [
          [
            {
              name: 'name',
              value: {
                stringValue : 'John'
              }
            },
            {
              name: 'date',
              value: {
                stringValue : '2019-08-19'
              }
            },
          ],
          [
            {
              name: 'name',
              value: {
                stringValue : 'Peter'
              }
            },
            {
              name: 'date',
              value: {
                stringValue : '2019-08-20'
              }
            },
          ],
        ]
      }
let data4 = await RDSDATA.batchExecuteStatement(params3).promise()

To run this code call: {HTTPENDPOINT}/?action=batch

There are some limits to it as API returns all the rows it executed with corresponding execution result. Response size limit is 1 MB or 1,000 records. As doc says: “If the call returns more than 1 MB of response data or over 1,000 records, the call is terminated.”

CLI

You can execute all these statements without lambda and aws-sdk directly from AWS CLI. Here is the SELECT command (just replace ARNs from CDK output).

aws rds-data execute-statement --secret-arn {SECRETARN} --resource-arn {DBCLUSTERARN} --schema "mysql" --sql "SELECT * FROM demotable" --database "demodb"  

and it will return the following response:

{
     "records": [
         [
             {
                 "longValue": 1
             }, 
             {
                 "stringValue": "test11111"
             }
         ], 
............
     ], 
     "numberOfRecordsUpdated": 0
 }

This is great for testing or some cool scripting as you will be able to access relational database from your scripts without extra libraries or tools. The only tool you need is aws cli. Well, not the only one 🙂 Before you start, you should definitelly install jq tool so you can parse the response like this:

$ aws rds-data execute-statement --secret-arn {SECRETARN} --resource-arn {DBCLUSTERARN} --schema "mysql" --sql "SELECT * FROM demotable" --database "demodb" | jq .numberOfRecordsUpdated
0

This leads us to another question – why don’t we have Data API for ‘non-serverless’ Aurora or even standard RDS databases? It is obvious that there are two parts of the project – one is the Database to JSON API and another one is the Proxy component. Being able to use Data API JSON translation would be great even without proxy. We will need it to do local lambda testing in case we use Data API library.

There is one interesting switch you can use if you want to initialize some long running command: --continue-after-timeout and you want it to continue running even after timeout. It is recommended for DDL commands. As per spec, “When a DDL statement terminates before it is finished running, it can result in errors and possibly corrupted data structures”. AWS, can you add an option to run command asynchronously and get some id to check if it is still running? Although, I must say that it is probably not a good practice to run long-running sql statements in Aurora Serverless as it may conflict with scaling up and down. It can also stop the database maintenance as Aurora Serverless updates by itself and it won’t be able to finish the process if there is some running transaction or temp table usage.

Challenges

I’ve been using Aurora Serverless in production for some time and it is a great product with great future. The same goes for Data API. Still, there are some challenges on the road:

  • CDK would benefit from dedicated Construct for Serverless Aurora with an option to enable DATA api (waiting for CloudFormation). Being able to deploy entire stack automatically would allow us to run relational databases per project, tenant or similar.
  • Missing IAM authentication support for Serverless databases
  • Warm-up – make it faster (bellow 5 seconds or at least bellow API-GW timeout time) and we will have a production ready system capable of running low traffic web sites
  • SELECT results are too verbose. We can transform the response but why would we face size limits when we don’t have to. Please give us options to ask for even simpler responses.
  • option to disable response for batchExecuteStatement (size limits)
  • option to set Timeout for ExecuteStatement command so we can wait for the database to warm-up
  • provide Current Capacity response in Aws SDK DescribeDBClusters. Even better, provide it via Data API instead of RDS API
  • PostgreSQL support for DATA API is missing. Aurora Serverless for PostgreSQL is GA since June 2019 and we expect DATA API support very soon
  • running commands asynchronously – with sql command id to check if it is still running
  • speed up statement executions – make it close to direct mysql database calls
  • … how about SQL for DynamoDB 🙂 via Data API …

So, is it any good? Yes it is and I’m already using it in production for a low traffic web site.
It still needs some improvements on 0 to 1 scaling so we can call it Serverless.

Resources

Aurora User Guide Data API main AWS document page about Data API
Aurora serverless User Guide by AWS
Jeff Barr’s blog post introducing Data API for Amazon Aurora Serverless
Jeremy Daly’s blog post and his DataAPIClient library – great blog post about Data API
Mani’s blog post – look ma, no database connection pools 😉

If you want to keep using your MySQL libraries with Aurora Serverless and skip using Data API another great read is Jeremy Daly’s library for Connection Pool for lambda and MySQL.

If you want to connect to provisioned MySQL servers and you are ready to manage your connection pool by yourself I suggest you to read about ProxySQL for MySQL and read this great Blog post by Tulsi Garg and Diego Magalhaes about deploying it properly in AWS environment. For postgreSQL check pgBouncer and this blog post by Bob Strahan from 2015 and this blog post by Dave Clark about deploying pgBouncer in AWS ECS.

p.s. don’t forget to cdk destroy once you finish the test

Leave a Reply