The Malaysian Ministry of Education Data Breach

Ok, I’ve been pretty involved in the latest data breach, so here’s my side of the story.

At around 11pm last Friday, I got a query from Zurairi at The Malay Mail, asking for a second opinion on a strange email the newsdesk received from an ‘anonymous source’. The email was  regular vulnerability disclosure, but one that was full of details, attached with an enormous amount of data.

This wasn’t a two-liner tweet, this was a detailed email with outlined sub-sections. It covered why they were sending the email, what the vulnerable system was, how to exploit the vulnerability and finally (and most importantly!) a link to a Google Drive folder containing Gigabytes of data.

The email pointed to a Ministry of Education site called SAPSNKRA, used for parents to check on their children’s exam results. Quick Google searches reveal the site had security issues in the past including one blog site advising parents to proceed past the invalid certificate warning in firefox. But let’s get back to the breach.

My first reaction was to test the vulnerability, and sure enough, the site was vulnerable to SQL Injection, in exactly the manner specified by the email. So far email looked legitimate.

Next, I verified the data in the Google Drive folder, by downloading the gigabytes of text files, and checking the IC Numbers of children I knew.

I further cross-checked a few parents IC numbers against the electoral roll. Most children have some indicator of their fathers name embedded in their own, either through a surname or the full name of the father after the bin, binti, a/l or a/p. By keying in the fathers IC number, and cross-referencing the fathers name against what was in the breach, it was easy to see that the data was the real deal.

So I called back Zurairi and confirmed to him that the data was real, and that the site should be taken offline. I also contacted a buddy of mine over at MKN, to see if he could help, and Zurairi had independently raised a ticket with MyCert (a ticket??!!) and tried to contact the Education Minister via his aide.

Obviously neither Zurairi nor myself, or any of the other journalist I kept in touch with, could report on the story. The site was still vulnerable, and we didn’t want someone else breaching it.

The next morning, I emailed the anonymous source and asked them to take down the Google Drive, explaining that the breach was confirmed, and people were working to take down the site. Hence there was no reason to continue exposing all of that personal information on the internet.

They agreed, and wiped the drive clean, and shortly after I got confirmation that the SAPSNKRA website had been taken down. So with the site down, and the Google Drive wiped cleaned, it seemed the worst was behind us.

Danger averted…at least for now.

But, since Data breaches last forever, and this was a breach, we should talk about what data was in the system. Zurairi did a good job here, but here’s my more detail take on the issue. Continue reading

3 times GovTLS helped fixed government websites

Couple months back I started GovTLSAudit. A simple service that would scan domains, and report on their implementation of TLS. But the service seems to have benefits above and beyond that, specifically around having a list of a government sites that we can use to cross-check against other intel sources like Shodan (which we already do daily) and VirusTotal.

So here’s 3 times GovTLSAudit helped secure government websites.

That time Yayasan Islam Terengganu was used a phishing website

I used virustotal’s search engine to see if they had extra domains to scan, and found a few rather suspicious looking urls including:

This was an obvious phishing campaign being run out of a domain. Digging further, I found that the IP address the malicious urls resolve to was local, and belonged to Exabytes. And while the root page was a bare apache directory, buried deep within the sites sub-directories was a redirect that pointed to a Russian IP.

I took to twitter to report my findings — I kinda like twitter for this, and the very next day Exabytes come back with a followup that they were fixing it. That’s good, because having a phishing campaign run on infrastructure isn’t exactly what you’d like.

There’s a lot more details in the tweet about how I investigated this,– click here to follow the thread. A warning though — I regularly delete my old tweets. So get it while it’s there :).

Continue reading

Look ma, Open Redirect on Astro

If you’ve come here from a link on twitter — you’d see that the address bar still says, but the site is rendering this page from my blog. If not, click this link to see what I mean. You’ll get something like this:

Somehow I’ve managed to serve content from my site on an astro domain. Rest assured, I haven’t ‘hacked’ astro servers and uploaded my page, but I’ve performed an equally sinister attack called open redirect.

While browsing online for some more info on the astro breach, I found this blogpost from Amirul Amir, detailing the open redirect vulnerability on astro’s website. The post is dated Nov 2016, yet the vulnerability still works — and even though Amirul laments that he informed Astro, they seem to have taken no action in more than a year.

You might be wondering what good is an open redirect vulnerability?

Well, an attacker might send you a phishing email, pretending to be astro asking for you update to update your info on their site, and they’ve even included a conveniently placed a link for you to click. The link looks legitimate (it has in it), so you click it to a find a legitimate looking site, with valid certificates to boot — so you enter your username and password.

But the site isn’t legitimate, it’s an attackers page (that looks exactly like Astro’s) rendered over the original website, leveraging the vulnerability (and some added javascript) — and now you’ve just given your username and password away.

The frustrating thing with open redirect, is that all the techniques we educate people to use for detecting phishing sites, don’t work in this case — because this is the ‘real’ site, that’s been compromised by open redirect.It has the correct domain, it even has the right certificates, the only way you’d know is if actually look into the embedded javascript, but 99% of folks never do that.

So for a company like Astro to be sitting on this vulnerability for more than a year is not acceptable.

Just more bad news for their already shitty response to the original data breach.

Shout out to Amirul who blogs over at,although it looks like he stopped blogging in 2016, pity because he had good content.

The Astro Data Breach

I previously wrote about how data breaches are like diamonds:

  • They’re not as rare as you think
  • They’re worth far more to you than to a thief
  • They last forever

And the recent debacle over the Astro data breach epitomizes all of these characteristics.

First off, Lowyat has already reported 3 big data breaches (at least by my count), and rest assured these won’t be the last. Data breaches will continue to happen, and just like diamonds, they’re not as rare people think they are — they happen all the time, get used to it.

Secondly, the Astro breach is reportedly being sold for 30-45 cents per record. Almost any victim would be willing to pay 100 times more to keep that data private, yet on the ‘market’ these things sell for pennies. Honestly, I’d be surprised if anyone paid the sticker price on this, because even 30 cents per record sounds high to me.

Finally, (and most importantly), data breaches are forever!

It’s like pee-ing in the pool, once you do it, there’s no reversing the process. There is no such thing as ‘un-breaching’. Astro (and Lowyat) thought that the breach was “contained” when the links were taken down — but there is no containing data breaches.

If there were a way to contain digital data on the internet, illegal downloaders wouldn’t be an issue!

Once a breach happens, we expect the authorities and data owners to try to limit the damage inflicted on victims. Part of that is trying to ‘contain’ the breach — but most of it, is simply informing the victims with specific details of what data of theirs was breached.

It gives victims visibility of what data was lost, and allows them to take at least some measures to protect themselves.

Sure, Astro lodge a police report, and roped in MCMC, but what’s the point of telling the regulator, if the regulator won’t inform the customers it’s duty bound to protect.

Here’s one thing that’s already changed post GE14

In 2015, I was invited to a variety program on Astro to talk about cybersecurity.

This was just after Malaysian Airlines (MAS) had their DNS hijacked, but I was specifically told by the producer that I could NOT talk about the MAS hack, because MAS was a government linked company, and they couldn’t talk bad about GLCs.

Then half-way through the interview they asked me about government intervention, and I said something to the effect of “Governments are part of the problem and should refrain from censoring the internet”, that sound-bite never made it to TV because it was censored.

This was some stupid variety show called VBuzz, on a Tamil TV channel (of all places), tucked away in the Astro labyrinth of channels, and even then they were absolutely piss scared of being critical of anything even remotely close to the government. My statement wasn’t even directed at the Malaysian government, it was directed at government intervention in general, but alas, they feared too much and censored it out.

To be fair, I’m 100% certain the station would not have been in any trouble if they just broadcasted all of what I said (I’ve been more critical on other mediums like the blog and radio), but the producers chose to err on the side of caution.

When I asked why it was censored, they said it was because of the ‘law’. When I pressed her to give the actual law in question, and her response (quite nonchalantly) was that it was an un-written law!

Then…GE14 happened and….

Two days ago, on Astro Awani (LIVE!!), a commentator openly criticized directly the SPR commissioner, a Government Agency….and Astro didn’t censor now. Far more critical than what I said, and Far worse, but somehow magically the media found their spine that day.

It’s now a viral clip of how press freedom actually looks like, and hopefully this refreshing change will permeate through all of Malaysian media.

Gov TLS Audit has a website!

Gov TLS Audit finally has a website to complement the API.

I used the services of a guy from fiverr to code the site, it isn’t the best design in the world, but it’s good enough for now. The site allows you to query a site and view the historical details of a particular website. The full list of .gov hostnames can be found here.

It also links to the full daily scan outputs (in csv, json and jsonl formats) if you wish to download and do more analysis. Please note, the csv output has some errors that I’ve not had time to fix, best you use the jsonl or json file, which don’t have errors, but also have much more details.

It’s hosted on, because I’m not made out of money and can’t afford another domain just for this side-project. Strangely this is blocked by most locals ISPs, which is bad, I’m going to write a letter to ISPs, since the block request from PDP was very specific and didn’t include subdomains of sayakenahack.

Hope you like it, leave comments below for feedback, or email me keith [at]

First I deleted my most popular tweet — then I deleted 2000 more.

Two weeks ago, I rage-tweeted something regarding Malaysian politics that got a lot more viral than I liked (I’ve censored out the profanity for various reasons, most notably, there are teenagers who read this blog). It was a pointless collection of 200 characters, that somehow resonated with people enough to be shared across social media. Obviously, since it was me, the tweet was filled with a small collection of profanities, and laced with just the right amount of emotive content 🙂

But then things started getting bad.

Soon after I tweeted, I received messages from folks I hadn’t met in decades, showing me screenshots of their whatsapp group that had my tweet — my wife’s chinese speaking colleagues were showing it to her at work — I checked, and nearly 2,000 people retweeted it, which isn’t typical for me, and frankly speaking pretty scary.

As much as I’d like to have my content shared, the tweet in question is nothing but couple of crude words pieced together in a ‘rage-tweet’. And I understand that it emotionally resonates with folks who are angry, but if this the level of discourse we’re having on  Malaysian social media, we should be alarmed. Completely pointless rants being viralled is not how we ubah, it is the absolute opposite of how we ubah!

Research on the virality of articles from the New York Times showed that ‘angry’ content was more viral than any other, beating out awe, surprise and even practical value. The angrier the content, the more likely it would be shared. A rage-tweet is more likely to go viral than something like fuel-saving tips, even though the latter clearly is more valuable to readers.

At this point, I’d rant about how the media has a responsibility to look beyond clicks and ads, and to think about the impact of their content on society, but since I owned the tweet, I simply deleted it. Of course, I can’t stop the screen-shots being shared across whatsapp, but we do what we can.

Deleting your tweets

That got me thinking, twitter is a cesspool of angry farts screaming at each other, and that has some value.

But while, what I tweet today, may be relevant and acceptable today, it may not be 2-3 years from now. Kinda like how Apu from the Simpsons was acceptable and non-offensive in the 90’s.

I’m ashamed to say it, but I once thought that Michael Learns to Rock was a great rock band, in context, thats acceptable for a 12 year old 2 decades ago, before even Napster or Limewire. Of course, as a adult in 2018, I’m thoroughly aware that AC/DC are the greatest rock band ever, and Michael Learns to Rock, well they’re not exactly Denmark’s best export.

And that’s the problem, twitter removes context  — it’s very easy to take a 140 character tweet from 5 years ago out of context. Nobody cares about context on a platform that limits users to 140 characters (or 280 characters since end 2017). Maybe you quoted an article from TheMalaysianInsider, which, guess what, no longer exist. Context is rather fluid on twitter, and it changes rapidly over weeks, let alone the years from your first tweet.

For example,  this tweet from Bersatu’s Wan Saiful:

Gee, I wonder who he was talking about, a simple internet search will give you the answer, but that’s not the point.

Wan Saiful changed his opinion,  and he’s explained why, people should be allowed to change their mind.Freedom to change your opinion not just perfectly fine, it’s a per-requisite for progress.If we allow our tweet history to be a ball-and-chain that ties us to our old idealogy, how could we ever progress? Everybody changes their mind — and that’s OK.

The point is twitter should not be a historical archive — it should be current. A great place to have an informed discussion of current affairs, but not a place to keep old, out-dated and out of context material floating around.

Hence, I decided to delete all my tweets that were older than 90 days old, and here’s how. Continue reading

Gov TLS Audit : Architecture

Last Month, I embarked on a new project called GovTLS Audit, a simple(ish) program that would scan 1000+ government websites to check for their TLS implementation. The code would go through a list of hostnames, and scan each host for TLS implementation details like redirection properties, certificate details, http headers, even stiching together Shodan results into a single comprehensive data record. That record would inserted into a DynamoDB, and exposed via a rest endpoint.

Initially I ran the scans manually Sunday night, and then uploaded the output files to S3 Buckets, and ran the scripts to insert them into the DB.

But 2 weeks ago, I decided to Automate the Process, and the architecture of this simple project is complete(ish!). Nothing is ever complete, but this is a good checkpoint, for me to begin documenting the architecture of GovTLS Audit (sometimes called siteaudit), and for me to share.

What is GovTLS Audit

First let’s talk about what GovTLS Audit is — it’s a Python Script that scans a list of sites on the internet, and stores the results in 3 different files, a CSV file (for human consumption), a JSONL file (for insertion into DynamoDB) and a JSON file (for other programmatic access).

A different script then reads in the JSONL file and loads each row into database (DynamoDB), and then uploads the 3 files as one zip to an S3 bucket.

On the ‘server-side’ there are 3 lambda functions, all connected to an API Gateway Endpoint Resource.

  • One that Queries the latest details for a site [/siteDetails]
  • One that Queries the historical summaries for the site [/siteHistory]
  • One that List all scan (zip files) in the S3 Bucket [/listScans]

Finally there’s a separate S3 bucket to serve the ‘website’, but that’s just a simple html file with some javascript to list all scan files available for download. In the End, it looks something like this (click to enlarge):

Continue reading

Read this before GE14

Let’s start this post the same way I start my day — by looking at Facebook.

Facebook made $40 Billion dollars in revenue in 2017, solely from advertising to pure schmucks like you. The mantra among the more technically literate is that facebook doesn’t have users it has products that it sells to advertisers, it just so happens that all its products are homo-sapien smart-phone totting urbanites (just like you!)

The platforms meteoric rise from nobody to top-dog, is a dream-story in Silicon Valley, but underneath the veneer of wholesome innovation lies a darker secret, one that could be responsible for the polarization of entire communities, including our own. And it’s all because of their most valuable employee.

No, not Mark Zuckerberg, but the real genius behind the blue and white site. The one responsible for billions of ad revenue facebook generates yearly, and unsurprisingly she’s female.

Anna Lytica and Machine Learning

There’s probably thousands of post your facebook friends make everyday, but she decides which 3 to fit onto your smartphone screen first, and the next 3 and so forth. From the millions of videos shared every hour, she painstakingly picks the few you’d see in your timeline, she decides which ads to show you, and which advertisers to sell you too, underneath the hood in the giant ad behemoth, she lies working all day, everyday.

She isn’t a person, ‘she’ is an algorithm, a complex program that does billions of calculations a second, and for this post we’ll give her the name… Anna Lytica.

Facebook doesn’t talk about her much, she is after all a trade secret (sort of), but what she does and how she does it, might be as much a mystery to us, as it is to Mr. Zuckerberg. Machine Learning algorithms are complex things, we know how to build them, and train them, but how they actually work is sometimes beyond our understanding.

Google can train Alpha-Go to play a game, but how it makes decisions is unknown to Google and even itself — it just IS a Go player.And it is really sad, when we watch these AI algorithms make amazing discoveries, but are unable to explain their rationale to us mere humans. It’s the reason why Watson, IBMs big AI algorithm, hasn’t taken off in healthcare, there’s no point recommending a treatment for cancer, if the algorithm can’t explain why it chose the treatment in the first place.

This is hard to grasp, but AI isn’t just a ‘very powerful’ program, AI is something else entirely. We don’t even use traditional words like write or build to refer to the process of creating them (like we do regular programs), instead we use the word train.

We train an algorithm to play Go, to drive, or to treat cancer. We do this the same way we breed dogs, we pick specimens with the traits we want, and breed them till we end up with a something that matches our desires. How a dog works, and what a dog thinks is irrelevant. If we want them big, we simply breed the biggest specimens, the process is focused entirely on outcome.

Similarly, how the algorithm behaves is driven by what it was trained to do. How it works is irrelevant, all that matters is outcome. Can it play Go, can it drive, can it answer jeopardy? If you want to understand an algorithm you need to know what it was trained to do.

Anna Lytica, was trained to keep you browsing Facebook, after all the companies other endeavors like, and instant articles were built with the same intention. And while good ol’ Mark stated that he’s tweaking Anna to reduce the time people spend on Facebook, this is something new, an exception to the years Facebook tweaked her to keep you on their site.

After all the average monthly user spends 27 minutes per day in the app, and if you go by daily users, they spend about 41 minutes per day on Facebook. If that’s the end-result of tweaking Anna to ensure we spend less time on Facebook — God help us all!

And while it’s difficult to understand how Anna works, its very easy to guess how she’ll behave. If the end result of Anna’s training is to keep you browsing Facebook, then human psychology reveals a simple trait all humans share — confirmation bias. Continue reading

Why we need centralized breach notification

Let’s start with the basics. Data Breaches are common — and will continue to be the norm.

How the App Economy and Big Data ruined it

As we shifted towards the ‘App-Economy’ and ‘Big-Data’ (circa 3 years ago), consumers begun sharing more data with more apps. Everyone and their granny, wanted to create a new app, and everyone was told to collect as much data as possible. Then, because storage costs were low, they were encouraged to store as much data as they could first — and figure out how to use it later.

Apps like UBER, Deliveroo, Spotify, FourSquare, GoGet, are ideas we never thought possible 5 years ago. yet today, they’re an ingrained part of how Urbanites (like me!) live our lives. And each of these is a service that has your data, not just what you gave them when you registered, but probably all transactional information as well (every order, deal or purchase you made from their service).

Today, we’re beginning to view Data not just as an asset, but also a liability. Legal regulations have placed penalties for organizations that lose customer data, but you can’t lose what you don’t have. If your company deletes customer data it no longer needs, then it doesn’t run the risk of losing it to a hack — better yet, if the company doesn’t even collect that data in the first place, the risk is completely eliminated.

But this is paradigm shift is new, and we’re still living with the consequences of the ‘App-Economy’ and ‘Big-Data’. It doesn’t mean that these ideas were bad, it’s just the implications are only now being understood, and while there was a lot of good as a result of the app economy and big data, there were some bad consequences as well.

As more apps, have more data, it’s more likely one of them loses it, and lost it bad!

“But hang on a minute! Don’t these companies have security in place”. Yes they do, but security is hard.

Every Death Star has an exhaust port

On the internet, attacking a website is a lot easier than defending it, every death star has an exhaust port, and sometimes you don’t even need Jedi to exploit them.

Let’s assume you own a site,, and you run this site on fairly standard Windows Servers. Attackers continually scan and look for vulnerabilities throughout the site in an automated fashion, because the cost of scanning is essentially zero. An attacker can try different angles, and techniques, and each failed attempt is simply shrugged off.

But a defender like yourself, has to defend against all these attacks, all the time, everytime! Screw up once, and your data is gone.

And even assuming you have the most secure-minded developers working for you, so your application itself is secure (a very big IF), your supply chain might is never pristine. Every now and then, Microsoft releases a new patch that you need to update on your servers. Forget to patch, and it’ll leave your server vulnerable to exploit.

But Patching has a risk — what if you patched and then the server crashes, or some feature doesn’t work anymore? What if you patched windows and suddenly stopped making you any money? That’s a risk you can’t afford, you’ve got kids to feed.

Hence, you build out a test environment, test out the new patches, make sure everything works, and then (and only then) do you deploy this change. Risk averted, but doing this takes time, only the most mature development teams can build, test and deploy an application in hours. Most take days, weeks, or even months — if I’m being honest, years isn’t unheard of either.

In the mean-time you run the risk that maybe you get hacked, but that’s better than definitely not making money. Which is a rational decision, but it will leave you and customers at risk.

If we put those 3 things together, the App Economy, the Big Data craze and the difficulty of defending against attacks, you have all the ingredients necessary to guarantee that data breaches will continue to happen, and we shouldn’t limit our focus to just avoiding data breaches, but rather expand it to both avoidance and response.

Responding to a breach

Yahoo Breach Notification

Yahoo Breach Notification

So if breaches are to continually occur, what can we do?

To me, first and foremost, we need to notify victims. This is just not up for debate, a victim of a crime should know if they’re a victim, and more importantly what they’re a victim off.

I’ve blogged before about the disclosure principle of the PDPA:

The PDPA states that if someone discloses your data to a 3rd-party, they have to inform you, and seek your consent prior to disclosure. Seems logical, that if they lost (instead of disclosed) that same data, to a malicious 3rd-party, those same principles apply. Obviously consent is moot when it comes to breaches, but the notice and choice principle is meant to inform the data subject, and that principle should still stand.

The responsibility of notice and choice cannot simply vanish because data was hacked, if anything the responsibility to disclose has to be intensified. Breach notification is primarily concern in helping the victim mitigate the effects of a breach on the. Breach notification is about taking the data in a breach and using it for good.

And if we agree that breaches occur, and breach notification should be important, let me now make the case for centralized notification.

Centralized Notification

Traditionally breach notification involves a bit of finger-wagging at the organization to informed its customers about the data breach, and what specific elements of data were in it.

But that only goes so far, the impact of a breach is both far-reaching in time, and cumulative.

If you haven’t changed your phone number since 2014, chances are your number is in the Nuemera breach, and since your father/mother remain the same person for your entire life, that piece of data is always considered breached thanks to the dermaorgan breach.

Breaches are also cumulative, in that a sum of data across all breaches add up. Maybe we get your name from one breach, phone number from another, and spouse information from somewhere else. Individually each breach may only whisper a bit about your identity, but cumulatively these breaches might give enough information for a full blown history of your life, enough for someone to steal your identity or scam you directly.

So if the premise is to inform victims of their risk, and that risk is the cumulative risk of individual breaches, than breach notification must also be cumulative — centralized in one place for victims to get the right information.

And centralized notification, under the government auspices (of course!) has the added benefit of ensuring consistency and quality of breach notifications. The organization that was breached still needs to inform its customers, but the victims can now view their full risk profile across multiple breaches in one place.

Finally, since the risk is also far-reaching in time, i.e. a breach from 2014 can still be used in 2018 to scam you, we need a place that will persist the data in one place. If you had 10Malaysian haveibeenpwned? mobile phone accounts across 10 different telco’s, you’d have to call each one of them to get your breach details — 5 years from now, which breaches you were a part of, calling them again can be tedious.

And since breach notification isn’t revenue generating, organizations are unlikely to process them efficiently, usually resulting in a turn-around time measured in x working days. This is not effective.

Centralized breach notification will benefit victims immensely and allow them to get a centralized view of all the breaches they’ve been in, from a single place.

What does centralized breach notification look like?

The one working example is

Services like haveibeenpwned are enormously successful, sometimes averaging millions of user per day.It’s now being recommended by law-enforcement and used by government agencies in the UK and Australia.

This is a private service, operated by an individual, dealing with stolen(?) data, and it’s used by Police as well as official government agencies, and the best (and only) example of centralized breach notification around.

Regardless of which data breach you were in (Yahoo!, Dropbox, LinkedIn, etc), you can understand which breaches you were apart of by simply typing your email into a search box and get a full understanding of your risk across those known data breaches. Simply put, haveibeenpwned is amazing.

But how would this work for Malaysia?

Malaysian haveibeenpwned?

Well, since you asked.

I propose a single website where people can type in their IC Numbers, and quickly figure out if they were part of the many(!) data breaches affecting Malaysian Personal data. This way, as new data breaches are announced, they can be added to the list, and people can slowly (but surely) understand how their risk increases with each breach.

Sounds eerily familiar to something I worked on before, but just can’t remember.