comment 0

Using Terraform and Serverless Framework

Image from wikicommons.

The Serverless framework (SF) is a fantastic tool for testing and deploying lambda functions, but it’s reliance on cloudformation makes it clumsy for infrastructure like DynamoDB, S3 or SQS queues.

For example, if your serverless.yml file had 5 lambdas, you’d be able to sls deploy all day long. But add just one S3 bucket, and you’d first have to sls remove before you could deploy again. This different behavior in the framework, once you introduce ‘infra’ is clumsy. Sometimes I use deploy to add functions without wanting to remove existing resources.

Terraform though, keeps the state of your infrastructure, and can apply only the changes. It also has powerful commands like taint, that can re-deploy a single piece of infrastructure, for instance to wipe clean a DynamoDB.

In this post, I’ll show how I got Terraform and Serverless to work together in deploying an application, using both frameworks strengths to complement each other.

**From here on, I’ll refer to tool Serverless Framework as SF to avoid confusing it with the actual term serverless

Terraform and Serverless sitting on a tree

First some principles:

  • Use SF for Lambda & API Gateway
  • Use Terraform for everything else.
  • Use a tfvars file for Terraform variable
  • Use JSON for the tfvars file
  • Terraform deploys first followed by SF
  • Terraform will not depend on any output from SF
  • SF may depend on output from terraform
  • Use SSM Parameter Store to capture Terraform outputs
  • Import inputs into Serverless from SSM Parameter Store
  • Use workspaces in Terraform to manage different environments.
  • Use stages in Serverless to manage different environments.
  • stage.name == workspace.name

In the end the deployment will look like this:

comment 0

Securing Lambda Functions

First a definition.

A lambda function is a service provided by aws that runs code for you without the introducing the complexity of provisioning servers of managing Operating Systems. It belongs in a category of architectures called serverless architectures.

There’s a whole slew of folks trying to define with is serverless, but my favorite definition is this.

Serverless means No Server Ops

Joe Emison

They’re the final frontier of compute, where the idea is that developers just write code, while allowing AWS (or Google/MSFT) to take care of everything else. This includes H/W management, OS Patching, even application level maintenance like Webserver upgrades are not your problem anymore with serverless.

Nothing runs on fairy-dust though, serverless still has servers — but in this world those servers, their operating systems, and the underlying runtime (e.g. Python, Node, JVM) are fully managed services that you pay per use.

As a developer you write some code into a function. Upload that function to AWS — and now you can invoke this function over and over again without worrying about servers, operating systems or run-time.

But how does AWS achieve this?

Before we can understand how to secure a serverless function, we need to at least have a fair understanding of how Serverless functions (like AWS Lambda) work.

So how does a lambda function work?

comment 0

The problem with Grab

As a company, Grab has done enormously well for itself, and naturally will be the target of some hate.

But I think there’s a deeper issue with Grab that needs addressing before it becomes an unsolvable problem.

Grab is a win-win

Let’s start with what makes Grab so appealing.

Grab (at least in my mind) is the highest paying hourly wage job in the country. As long as you possess a car, and a valid driving license you can be a Grab driver, earning significantly more than any other hourly wage job.

According to this WOB article (which looks suspiciously like a paid ad), the average Grab driver earns RM5,000 per month, which is crazy money for a unskilled job — and yes driving Grab is unskilled labour.

For unskilled work in Malaysia, earning RM5,000 per month is a god-send, after all even graduate employees don’t earn that much. And like all hourly wage jobs, the more hours you put in, the more money they make — 5,000 is just where it starts

So this seems like a win-win for everyone, drivers get to earn, and at the same time provide a service that is in high demand.

And in truth, Grab is a win-win — at least for now.

Fast-forward

The problem is that when you fast-forward 10 years, or just 2 elections from now.

Most Grab drivers I’ve met aren’t doing this part-time. They’re driving as a full-time job, and they’re putting in serious hours (10-12 a day) to make serious money. That means they’ve no time or to up-skill themselves, because every hour learning a new skill is an hour they could have been driving.

The cost of learning to them is a double-whammy, first they spend on acquiring the new skill (like everybody else), but also the lose income from their not driving. This for most, will be too high a price to pay.

You might argue that driving isn’t un-skilled. But all it takes to be a Grab driver is a driving license and a car, skills don’t factor into this. Grab doesn’t care if you’re a PhD, diploma holder or SPM drop-out, it’ll pay the same.

Grab views all of it’s drivers as a supplier of the one commodity it needs — cars to move passengers. The only time Grab pays more to drivers is when they turn on the auto-accept feature, because that makes their algorithm more efficient. The more subservient you are to the algorithm, the better it will reward you — that is a pretty nasty feeling.

So as more folks join the Grab band-wagon, we’re sucking out skilled labour from the job-market. Leaving the entire country, as a whole, worse off in terms of competitiveness. But we’re just getting started.

comment 0

Android TV boxes

Android TV boxes, are computers that stream content from the internet onto your TV. The difference between them and your smart-phone is that it has a HDMI connector to your TV, and it usually comes pre-loaded with software to illegally stream content.

While the boxes themselves, are general purpose computers running Android (the most popular OS today), the real focus of any regulation should be on the software on the device and the internet-based streaming services that support them.

Which seems to be the case…

Today, TheStar reports that the MCMC will begin blocking these unauthorized streaming services, rendering the boxes that connect to them useless.

But, if the MCMC uses it’s usual method of DNS filtering to implement the block, it’ll be trivial for most folks to circumvent the issue, the boxes run Android after all. The government will very quickly find itself in a cat and mouse situation in trying to block them.

comment 0

2018 in Review

I started the year building out govScan.info, a site that audits .gov.my websites for TLS implementation. Overall I curated a list of ~5000 Malaysian government domains through various OSINT and enumeration techniques and now use that list to scan them…

comment 0

Introducing potassium-40

Over the past few weeks, I’ve been toying with lambda functions and thinking about using them for more than just APIs. I think people miss the most interesting aspect of serverless functions — namely that they’re massively parallel capability, which can do a lot more than just run APIs or respond to events.

There’s 2-ways AWS let’s you run lambdas, either via triggering them from some event (e.g. a new file in the S3 bucket) or invoking them directly from code. Invoking is a game-changer, because you can write code, that basically offloads processing to a lambda function directly from the code. Lambda is a giant machine, with huge potential.

What could you do with a 1000-core, 3TB machine, connected to a unlimited amount of bandwidth and large number of ip addresses?

Here’s my answer. It’s called potassium-40, I’ll explain the name later

So what is potassium-40

Potassium-40 is an application-level scanner that’s built for speed. It uses parallel lambda functions to do http scans on a specific domain.

Currently it does just one thing, which is to grab the robots.txt from all domains in the cisco umbrella 1 million, and store the data in the text file for download. (I only grab legitimate robots.txt file, and won’t store 404 html pages etc)

This isn’t a port-scanner like nmap or masscan, it’s not just scanning the status of a port, it’s actually creating a TCP connection to the domain, and performing all the required handshakes in order to get the robots.txtfile.

Scanning for the existence of ports requires just one SYN packet to be sent from your machine, even a typical banner grab would take 3-5 round trips, but a http connection is far more expensive in terms of resources, and requires state to be stored, it’s even more expensive when TLS and redirects are involved!

Which is where lambda’s come in. They’re effectively parallel computers that can execute code for you — plus AWS give you a large amount of free resources per month! So not only run 1000 parallel processes, but do so for free!

A scan of 1,000,000 websites will typically take less than 5 minutes.

But how do we scan 1 million urls in under 5 minutes? Well here’s how.

comment 0

GitHub webhooks with Serverless

GitHub Webhooks
with Serverless

Just because you have webhook, doesn’t mean you need a webserver.

With serverless AWS Lambdas you’ve got a free (as in beer) and always on ability to receive webhooks callbacks without the need for pesky servers. In this post, I’ll setup a serverless solution to accept incoming POST from a GitHub webhook.

comment 0

govScan.info now has DNS records

DNS Queries on GovScan.Info

This post is a very quick brain-dumpĀ  of stuff I did over the weekend, in the hopes that I don’t forget it :). Will post more in-depth material if time permits over the weekend.

govScan.info, a site I created as a side hobby project to track TLS implementation across .gov.my websites — now tracks DNS records as well. For now, I’m only tracking MX, NS, SOA and TXT records (mostly to check for dmarc) but I may put more record types to query.

DNS Records are queried daily at 9.05pm Malaysia Time (might be a minute or two later, depending on the domain name) and will be stored indefinitely. Historical records can be queried via the API, and documentation has been updated.