In 2015, I was invited to a variety program on Astro to talk about cybersecurity. This was just after Malaysian Airlines (MAS) had their DNS hijacked, but I was specifically told by the producer that I could NOT talk about…
Last Month, I embarked on a new project called GovTLS Audit, a simple(ish) program that would scan 1000+ government websites to check for their TLS implementation. The code would go through a list of hostnames, and scan each host for TLS implementation details like redirection properties, certificate details, http headers, even stiching together Shodan results into a single comprehensive data record. That record would inserted into a DynamoDB, and exposed via a rest endpoint.
Initially I ran the scans manually Sunday night, and then uploaded the output files to S3 Buckets, and ran the scripts to insert them into the DB.
But 2 weeks ago, I decided to Automate the Process, and the architecture of this simple project is complete(ish!). Nothing is ever complete, but this is a good checkpoint, for me to begin documenting the architecture of GovTLS Audit (sometimes called siteaudit), and for me to share.
What is GovTLS Audit
First let’s talk about what GovTLS Audit is — it’s a Python Script that scans a list of sites on the internet, and stores the results in 3 different files, a CSV file (for human consumption), a JSONL file (for insertion into DynamoDB) and a JSON file (for other programmatic access).
A different script then reads in the JSONL file and loads each row into database (DynamoDB), and then uploads the 3 files as one zip to an S3 bucket.
On the ‘server-side’ there are 3 lambda functions, all connected to an API Gateway Endpoint Resource.
- One that Queries the latest details for a site [/siteDetails]
- One that Queries the historical summaries for the site [/siteHistory]
- One that List all scan (zip files) in the S3 Bucket [/listScans]
Let’s start this post the same way I start my day — by looking at Facebook.
Facebook made $40 Billion dollars in revenue in 2017, solely from advertising to pure schmucks like you. The mantra among the more technically literate is that facebook doesn’t have users it has products that it sells to advertisers, it just so happens that all its products are homo-sapien smart-phone totting urbanites (just like you!)
The platforms meteoric rise from nobody to top-dog, is a dream-story in Silicon Valley, but underneath the veneer of wholesome innovation lies a darker secret, one that could be responsible for the polarization of entire communities, including our own. And it’s all because of their most valuable employee.
No, not Mark Zuckerberg, but the real genius behind the blue and white site. The one responsible for billions of ad revenue facebook generates yearly, and unsurprisingly she’s female.
Anna Lytica and Machine Learning
There’s probably thousands of post your facebook friends make everyday, but she decides which 3 to fit onto your smartphone screen first, and the next 3 and so forth. From the millions of videos shared every hour, she painstakingly picks the few you’d see in your timeline, she decides which ads to show you, and which advertisers to sell you too, underneath the hood in the giant ad behemoth, she lies working all day, everyday.
She isn’t a person, ‘she’ is an algorithm, a complex program that does billions of calculations a second, and for this post we’ll give her the name… Anna Lytica.
Facebook doesn’t talk about her much, she is after all a trade secret (sort of), but what she does and how she does it, might be as much a mystery to us, as it is to Mr. Zuckerberg. Machine Learning algorithms are complex things, we know how to build them, and train them, but how they actually work is sometimes beyond our understanding.
Google can train Alpha-Go to play a game, but how it makes decisions is unknown to Google and even itself — it just IS a Go player.And it is really sad, when we watch these AI algorithms make amazing discoveries, but are unable to explain their rationale to us mere humans. It’s the reason why Watson, IBMs big AI algorithm, hasn’t taken off in healthcare, there’s no point recommending a treatment for cancer, if the algorithm can’t explain why it chose the treatment in the first place.
This is hard to grasp, but AI isn’t just a ‘very powerful’ program, AI is something else entirely. We don’t even use traditional words like write or build to refer to the process of creating them (like we do regular programs), instead we use the word train.
We train an algorithm to play Go, to drive, or to treat cancer. We do this the same way we breed dogs, we pick specimens with the traits we want, and breed them till we end up with a something that matches our desires. How a dog works, and what a dog thinks is irrelevant. If we want them big, we simply breed the biggest specimens, the process is focused entirely on outcome.
Similarly, how the algorithm behaves is driven by what it was trained to do. How it works is irrelevant, all that matters is outcome. Can it play Go, can it drive, can it answer jeopardy? If you want to understand an algorithm you need to know what it was trained to do.
Anna Lytica, was trained to keep you browsing Facebook, after all the companies other endeavors like internet.org, and instant articles were built with the same intention. And while good ol’ Mark stated that he’s tweaking Anna to reduce the time people spend on Facebook, this is something new, an exception to the years Facebook tweaked her to keep you on their site.
After all the average monthly user spends 27 minutes per day in the app, and if you go by daily users, they spend about 41 minutes per day on Facebook. If that’s the end-result of tweaking Anna to ensure we spend less time on Facebook — God help us all!
And while it’s difficult to understand how Anna works, its very easy to guess how she’ll behave. If the end result of Anna’s training is to keep you browsing Facebook, then human psychology reveals a simple trait all humans share — confirmation bias.
Last week I launched a draft of the Gov.my Audit, and this week we have version 2.0
Here’s what changed:
- Added More Sites. We now scan a total of 1324 government websites, up from just 1180.
- Added Shodan Results. Results includes both the open ports and time of the Shodan scan (scary shit!)
- Added Site Title. Results now include the HTML title to give a better description of the site (hopefully!).
- Added Form Fields. If the page on the root directory has an input form, the names of the fields will appear in the results. This allows for a quick glance at which sites have forms, and (roughly!) what the form ask for (search vs. IC Numbers).
- Added Domain in the CSV. The CSV is sorted by hostname, to allow for grouping by domain names (e.g. view all sites from selangor.gov.my or perlis.gov.my)
- Added an API. Now you can query the API can get more info on the site, including the cert info and HTTP headers.
- Released the Serverless.yml files for you to build the API yourself as well 🙂
All in all, it’s a pretty bad-ass project (if I do say so myself). So let’s take all that one at a time.
I keep this blog to help me think, and over the past week, the only thing I’ve been thinking about, was sayakenahack.
I’ve declined a dozen interviews, partly because I was afraid to talk about it, and partly because my thoughts weren’t in the right place. I needed time to re-group, re-think, and ponder.
This blog post is the outcome of that ‘reflective’ period.
The PR folks tell me to strike while the iron is hot, but you know — biar lambat asal selamat.
Why I started sayakenahack?
I’m one part geek and one part engineer. I see a problem and my mind races to build a solution. Building sayakenahack, while difficult, and sometimes frustrating, was super-duper fun. I don’t regret it for a moment, regardless of the sleepless nights it has caused me.
But that’s not the only reason.
I also built it to give Malaysians a chance to check whether they’ve been breached. I believe this is your right, and no one should withhold it from you. I also know that most Malaysians have no chance of ever checking the breach data themselves because they lack the necessary skills.
I know this, because 400,000 users have visited my post on “How to change your Unifi Password“.
If they need my help to change a Wifi password, they’ve got no chance of finding the hacker forums, downloading the data, fixing the corrupted zip, and then searching for their details in file that is 10 million rows long — and no, Excel won’t fit 10mln rows.
So for at least 400,000 Malaysians, most of whom would have had their data leaked, there would have been zero chance of them ever finding out. ZERO!
The ‘normal’ world is highly tech-illiterate (I’ve even talked about it on BFM). Sayakenahack was my attempt to make this accessible to common folks. To deny them this right of checking their data is just wrong.
But why tell them at all if there’s nothing they can do about it? You can’t put the genie back in the lamp.
Why does sayakenahack have dummy data? If I enter “123456” and “112233445566” I still get results.
I was struggling with answering this question, as some folks have used it to ‘prove’ that I was a phisher. We’ll get to that later, for now I hope to answer why these ‘fake’ IC numbers exist in the sayakenahack.
Firstly, I couldn’t find a good enough way to validate IC numbers as I was inserting them into the database. Most of you think that IC numbers follow a pre-define pattern :
- 6-digit birthday (yymmdd format)
- 2-digit state code
- 4-digit personal identifier, where the last digit is odd for men, and even for women.
But, there are still folks with old IC numbers, and the army have their own format. Not to mention that the IC Number field can be populated by passport numbers (for foreigners) and Company registration IDs. So instead of cracking my head on how to validate IC numbers, I decided to pass them all in.
The only ‘transformation’ I do is to strip them of all non-AlphaNumeric characters and uppercasing any letters in the result. This would standardize the IC numbers in the database, regardless of source file format.
Had I done some validation, I might have removed these dummy entries — but fortunately I didn’t.
Upon further analyzing the data, I went back to the original source files and notice something strange, the account numbers belonged to some strange names. And then it made sense — this was Test data.
Test data in a Production Environment to be exact.
And when the Database for the telco was dumped, the telco’s didn’t remove these test accounts from their system. So what we have is a bunch of dummy accounts, with dummy IC numbers.
On the 19th of October, Lowyat.net reported that a user was selling the personal data of MILLIONS of Malaysians on their forum. Shortly after, the article was taken down on the request of the MCMC, only to put up again, a…
WordPress sites get hacked all the time, because the typical WordPress blogger install 100’s of shitty plugins and rarely updates their site. On the one hand, it’s great that WordPress has empowered so many people to begin blogging without requiring the ‘hard’ technical skills, on the other it just gives criminals a large number of potential victims.
Two years ago, when I studied the details of phishing attacks that targeted Maybank and RHB, I found that attackers use compromised WordPress sites to host their phishing content. They’d first hack into a seemingly random WordPress website, host their phishing content there, and then blast out emails to unsuspecting victims with links to pointing back to their hacked bounty. If the hack works they’d get free username and passwords, and if they were ever caught, most evidence would point to the unsuspecting WordPress site owner.
So if you have a WordPress site (like me), chances are you’re in the cross-hairs of hackers already, and securing your site is the responsible thing to do.
In general WordPress sites should be:
- Updated Automatically
- Use a minimal number of plugins
- Use plugins only from reputable publishers
- Use themes only from reputable publishers–and have only one theme in the install directory
- Employ strong passwords for the admin & user
- Have the permissions of the underlying folders set accordingly (i.e.CHMOD them all)
But even if you took all precautions to hardened your site, there’s always a possibility of it getting hacked. No security is perfect, and you should look into backups–backup often and to a separate location. That way, a compromised site can be rebuilt, even if it were defaced. The last thing you want is to lose your precious design and data, because some one installed a shitty plugin over the weekend.
Today, I’ll walk through a short bash script I wrote to backup (and restore) a WordPress installation from scratch. It took me quite a while to write this (partly because I have no experience with Bash scripts), but I thought it would be good to walkthrough the details of the script and what it does.
The full script is available on github here, and the usage instructions will be maintained there. The write-up below describes code the first production release, linked here, even though I’ve since updated the scripts to include some modifications, and as we speak I’m just about the release version 1.2.
So here we go…
The following 3 folks, were greatly influential in the writing of the script, listed in no particular order. No to mention, the wonderful folks at stackoverflow that helped tremendously as well.
Thanks to Andrea Fabrizi for the awesome DropboxUploader script
Thanks to Ben Kulbertis for the awesome Cloudflare update script
Thanks to Peteris.Rocks for inspiring me with his Unattended WordPress Installation script
As a pre-requisite to all this, I made the following decisions.
The back ups would be stored in DropBox– Dropbox has free options (up to 2GB) and has versioning by default.All your backups are versioned and kept for 30 days (not just the latest upload, which gets destroyed if you’re hit by malware). Doing this on AWS requires extra work, which I wasn’t prepared to do, and AWS has no free tier for S3 storage.
Also, I use CloudFlare to maintain the DNS. It’s optional of course, but I needed a DNS provider that had an API, and they were the logical choice. This allowed the script to update your DNS as well.
Finally, the script assumes a standard LAMP stack, i.e. Linux (specifically Ubuntu 16.04), Apache , MySql and PHP. PHP is enforced by WordPress itself so that’s fine.But the ‘trend’ these days is to have NGINX instead of Apache, and MariaDB instead of MySQL. I kept things in ‘classic’ mode for now, I may revisit in the future.
As Malaysia slowly (but surely) migrates to Chip and Pin, some banks have taken the opportunity to issue not just new Pin-enabled cards, but contactless-enabled ones as well.
To be clear, Banks are only mandated to issue new Pin cards (replacing the signature cards you had before), but are taking the opportunity to also embed contactless capabilities into them as well. After all they’re already issuing new cards to every (single!) card holder, might as well get them on the contactless bandwagon while they’re at it.
The reason for being so gung-ho about contactless is purely economical. Research suggest that the easier payment methods become, the more money people are willing to spend. People with credit cards spend more than people with just cash, and 0% interest schemes have been a godsend to retailers. Contactless payments, which don’t involve cumbersome Pins or signatures, are clearly the next evolutionary step, with one research paper suggesting they increase customer spending by nearly 10%.
Banks make money from small percentages per transactions, the more transactions at higher amounts, the more money they stand to make. So if an extra dollar worth of electronics in a contactless card increases revenue by 10%–why not?!
Pins are for security, Contactless is for convenience
But while PINs are a security feature, contactless is all about convenience. And conveniences trade-off security, so it stands to reason that contactless cards are less secure than regular ‘contact’ ones.
The question is whether that trade-off is worth the increase in convenience. After all, nothing is absolutely secure, and in today’s criminally infested internet, keeping your money under the mattress is safer than keeping it in a bank–but nobody does it because the mattress would be too inconvenient.
So what convenience are you getting with a contactless cards?
For one thing, no more waiting for a receipt printout to sign on, or bending down to an inconveniently placed pinpad to type in your PIN. Plus, for someone with gigantic fingers like me, I jump on the opportunity to avoid having to fidget with pinpads that must have been designed for dwarf children after they’ve been struck by the ray gun from Honey I shrunk the kids.
But that’s about it–the only convenience contactless cards provide is that you can do contactless payments–up to a specified amount.
The question now is what security trade offs are you making for this remarkable feature?
When I was in school, we joked about people who kept their money under the mattress, that somehow those who didn’t use banks were less intelligent than people who did.The general thinking was that smart people kept their money in the bank, where it was safe from theft, fire and flood, while still collecting interest.
In the 80’s this was a compelling argument, when interest rates were high and banks really did provide security,but is that thinking still applicable today?
In June of 2000, Maybank launched their ‘new’ internet banking platform, Maybank2u, which allowed their customers to do their banking online, outside of traditional branches or even ATMs. Few years later, it begun offering online purchases and soon after the mobile app was launched.
But while online banking platforms brought convenience, they also introduced new security threats — and it wasn’t clear whose job it was to secure against those new threats, and who would be liable for inevitable financial losses.
Was it going to be bank who assumed liability, just like they did before, or would it be the account holder, or possibly a mixture of both?
The answer depends on who gets attacked, because not all attacks are equal.
Not all attacks are equal
There’s two types of attack, one where the bank itself is attacked, and another where the account holder is targeted instead.
When someone walks into a bank with the threat of violence, and walks out with $30,000 of the banks cash, the bank absorbs all the loses. After all, that’s why your money is in their safe and not under the mattresses.
But there exist another class of attack–customer impersonation, where the attacker isn’t threatening violence or even ‘attacking’, but trying to fool the bank into believing they are the rightful account holders. In other words, the attacker is trying to impersonate you, to get to your money.
And in the digital world, customer impersonation is far more common. Consider the case of ATM fraud.
ATMs identify a user by verifying their ATM cards, and then prompting them for the PIN. More specifically, the ATM first authenticates the inserted ATM card (is this card real?) and then proceeds to ask the user for the PIN (is the person the accountholder?), once an ATM is satisfied, it then proceeds to grant the user access to the account.
Hence if an attacker managed to steal your card and knows your PIN, the ATM has no way to differentiate between you and the attacker. Anyone could take your money from your account, by just having your ATM card and PIN, in contrast robbers attacking a bank would simply be taking the bank’s cash…not yours.
Credit Card fraud is another prime example, but at least in Malaysia end customers have their liability capped at RM250 provided they report their lost cards in a ‘reasonable’ amount of time. For debit cards and ATM cards are not protected in the same way. Which is strange because the poorer sections of society who need more protection usually have debit instead of credit cards.
To summarize, customer impersonation isn’t the same as a bank robbery, when the bank issues you credentials (like PINs, passwords or ATM cards), the responsibility to secure those credentials are yours–and if those credentials are compromised, then you’ll have to shoulder some of the financial losses as well.