First I deleted my most popular tweet — then I deleted 2000 more.

Two weeks ago, I rage-tweeted something regarding Malaysian politics that got a lot more viral than I liked (I’ve censored out the profanity for various reasons, most notably, there are teenagers who read this blog). It was a pointless collection of 200 characters, that somehow resonated with people enough to be shared across social media. Obviously, since it was me, the tweet was filled with a small collection of profanities, and laced with just the right amount of emotive content 🙂

But then things started getting bad.

Soon after I tweeted, I received messages from folks I hadn’t met in decades, showing me screenshots of their whatsapp group that had my tweet — my wife’s chinese speaking colleagues were showing it to her at work — I checked, and nearly 2,000 people retweeted it, which isn’t typical for me, and frankly speaking pretty scary.

As much as I’d like to have my content shared, the tweet in question is nothing but couple of crude words pieced together in a ‘rage-tweet’. And I understand that it emotionally resonates with folks who are angry, but if this the level of discourse we’re having on  Malaysian social media, we should be alarmed. Completely pointless rants being viralled is not how we ubah, it is the absolute opposite of how we ubah!

Research on the virality of articles from the New York Times showed that ‘angry’ content was more viral than any other, beating out awe, surprise and even practical value. The angrier the content, the more likely it would be shared. A rage-tweet is more likely to go viral than something like fuel-saving tips, even though the latter clearly is more valuable to readers.

At this point, I’d rant about how the media has a responsibility to look beyond clicks and ads, and to think about the impact of their content on society, but since I owned the tweet, I simply deleted it. Of course, I can’t stop the screen-shots being shared across whatsapp, but we do what we can.

Deleting your tweets

That got me thinking, twitter is a cesspool of angry farts screaming at each other, and that has some value.

But while, what I tweet today, may be relevant and acceptable today, it may not be 2-3 years from now. Kinda like how Apu from the Simpsons was acceptable and non-offensive in the 90’s.

I’m ashamed to say it, but I once thought that Michael Learns to Rock was a great rock band, in context, thats acceptable for a 12 year old 2 decades ago, before even Napster or Limewire. Of course, as a adult in 2018, I’m thoroughly aware that AC/DC are the greatest rock band ever, and Michael Learns to Rock, well they’re not exactly Denmark’s best export.

And that’s the problem, twitter removes context  — it’s very easy to take a 140 character tweet from 5 years ago out of context. Nobody cares about context on a platform that limits users to 140 characters (or 280 characters since end 2017). Maybe you quoted an article from TheMalaysianInsider, which, guess what, no longer exist. Context is rather fluid on twitter, and it changes rapidly over weeks, let alone the years from your first tweet.

For example,  this tweet from Bersatu’s Wan Saiful:

Gee, I wonder who he was talking about, a simple internet search will give you the answer, but that’s not the point.

Wan Saiful changed his opinion,  and he’s explained why, people should be allowed to change their mind.Freedom to change your opinion not just perfectly fine, it’s a per-requisite for progress.If we allow our tweet history to be a ball-and-chain that ties us to our old idealogy, how could we ever progress? Everybody changes their mind — and that’s OK.

The point is twitter should not be a historical archive — it should be current. A great place to have an informed discussion of current affairs, but not a place to keep old, out-dated and out of context material floating around.

Hence, I decided to delete all my tweets that were older than 90 days old, and here’s how. Continue reading

Gov TLS Audit : Architecture

Last Month, I embarked on a new project called GovTLS Audit, a simple(ish) program that would scan 1000+ government websites to check for their TLS implementation. The code would go through a list of hostnames, and scan each host for TLS implementation details like redirection properties, certificate details, http headers, even stiching together Shodan results into a single comprehensive data record. That record would inserted into a DynamoDB, and exposed via a rest endpoint.

Initially I ran the scans manually Sunday night, and then uploaded the output files to S3 Buckets, and ran the scripts to insert them into the DB.

But 2 weeks ago, I decided to Automate the Process, and the architecture of this simple project is complete(ish!). Nothing is ever complete, but this is a good checkpoint, for me to begin documenting the architecture of GovTLS Audit (sometimes called siteaudit), and for me to share.

What is GovTLS Audit

First let’s talk about what GovTLS Audit is — it’s a Python Script that scans a list of sites on the internet, and stores the results in 3 different files, a CSV file (for human consumption), a JSONL file (for insertion into DynamoDB) and a JSON file (for other programmatic access).

A different script then reads in the JSONL file and loads each row into database (DynamoDB), and then uploads the 3 files as one zip to an S3 bucket.

On the ‘server-side’ there are 3 lambda functions, all connected to an API Gateway Endpoint Resource.

  • One that Queries the latest details for a site [/siteDetails]
  • One that Queries the historical summaries for the site [/siteHistory]
  • One that List all scan (zip files) in the S3 Bucket [/listScans]

Finally there’s a separate S3 bucket to serve the ‘website’, but that’s just a simple html file with some javascript to list all scan files available for download. In the End, it looks something like this (click to enlarge):


Continue reading

Read this before GE14

Let’s start this post the same way I start my day — by looking at Facebook.

Facebook made $40 Billion dollars in revenue in 2017, solely from advertising to pure schmucks like you. The mantra among the more technically literate is that facebook doesn’t have users it has products that it sells to advertisers, it just so happens that all its products are homo-sapien smart-phone totting urbanites (just like you!)

The platforms meteoric rise from nobody to top-dog, is a dream-story in Silicon Valley, but underneath the veneer of wholesome innovation lies a darker secret, one that could be responsible for the polarization of entire communities, including our own. And it’s all because of their most valuable employee.

No, not Mark Zuckerberg, but the real genius behind the blue and white site. The one responsible for billions of ad revenue facebook generates yearly, and unsurprisingly she’s female.

Anna Lytica and Machine Learning

There’s probably thousands of post your facebook friends make everyday, but she decides which 3 to fit onto your smartphone screen first, and the next 3 and so forth. From the millions of videos shared every hour, she painstakingly picks the few you’d see in your timeline, she decides which ads to show you, and which advertisers to sell you too, underneath the hood in the giant ad behemoth, she lies working all day, everyday.

She isn’t a person, ‘she’ is an algorithm, a complex program that does billions of calculations a second, and for this post we’ll give her the name… Anna Lytica.

Facebook doesn’t talk about her much, she is after all a trade secret (sort of), but what she does and how she does it, might be as much a mystery to us, as it is to Mr. Zuckerberg. Machine Learning algorithms are complex things, we know how to build them, and train them, but how they actually work is sometimes beyond our understanding.

Google can train Alpha-Go to play a game, but how it makes decisions is unknown to Google and even itself — it just IS a Go player.And it is really sad, when we watch these AI algorithms make amazing discoveries, but are unable to explain their rationale to us mere humans. It’s the reason why Watson, IBMs big AI algorithm, hasn’t taken off in healthcare, there’s no point recommending a treatment for cancer, if the algorithm can’t explain why it chose the treatment in the first place.

This is hard to grasp, but AI isn’t just a ‘very powerful’ program, AI is something else entirely. We don’t even use traditional words like write or build to refer to the process of creating them (like we do regular programs), instead we use the word train.

We train an algorithm to play Go, to drive, or to treat cancer. We do this the same way we breed dogs, we pick specimens with the traits we want, and breed them till we end up with a something that matches our desires. How a dog works, and what a dog thinks is irrelevant. If we want them big, we simply breed the biggest specimens, the process is focused entirely on outcome.

Similarly, how the algorithm behaves is driven by what it was trained to do. How it works is irrelevant, all that matters is outcome. Can it play Go, can it drive, can it answer jeopardy? If you want to understand an algorithm you need to know what it was trained to do.

Anna Lytica, was trained to keep you browsing Facebook, after all the companies other endeavors like internet.org, and instant articles were built with the same intention. And while good ol’ Mark stated that he’s tweaking Anna to reduce the time people spend on Facebook, this is something new, an exception to the years Facebook tweaked her to keep you on their site.

After all the average monthly user spends 27 minutes per day in the app, and if you go by daily users, they spend about 41 minutes per day on Facebook. If that’s the end-result of tweaking Anna to ensure we spend less time on Facebook — God help us all!

And while it’s difficult to understand how Anna works, its very easy to guess how she’ll behave. If the end result of Anna’s training is to keep you browsing Facebook, then human psychology reveals a simple trait all humans share — confirmation bias. Continue reading

Why we need centralized breach notification

Let’s start with the basics. Data Breaches are common — and will continue to be the norm.

How the App Economy and Big Data ruined it

As we shifted towards the ‘App-Economy’ and ‘Big-Data’ (circa 3 years ago), consumers begun sharing more data with more apps. Everyone and their granny, wanted to create a new app, and everyone was told to collect as much data as possible. Then, because storage costs were low, they were encouraged to store as much data as they could first — and figure out how to use it later.

Apps like UBER, Deliveroo, Spotify, FourSquare, GoGet, are ideas we never thought possible 5 years ago. yet today, they’re an ingrained part of how Urbanites (like me!) live our lives. And each of these is a service that has your data, not just what you gave them when you registered, but probably all transactional information as well (every order, deal or purchase you made from their service).

Today, we’re beginning to view Data not just as an asset, but also a liability. Legal regulations have placed penalties for organizations that lose customer data, but you can’t lose what you don’t have. If your company deletes customer data it no longer needs, then it doesn’t run the risk of losing it to a hack — better yet, if the company doesn’t even collect that data in the first place, the risk is completely eliminated.

But this is paradigm shift is new, and we’re still living with the consequences of the ‘App-Economy’ and ‘Big-Data’. It doesn’t mean that these ideas were bad, it’s just the implications are only now being understood, and while there was a lot of good as a result of the app economy and big data, there were some bad consequences as well.

As more apps, have more data, it’s more likely one of them loses it, and lost it bad!

“But hang on a minute! Don’t these companies have security in place”. Yes they do, but security is hard.

Every Death Star has an exhaust port

On the internet, attacking a website is a lot easier than defending it, every death star has an exhaust port, and sometimes you don’t even need Jedi to exploit them.

Let’s assume you own a site, makemoney.com, and you run this site on fairly standard Windows Servers. Attackers continually scan and look for vulnerabilities throughout the site in an automated fashion, because the cost of scanning is essentially zero. An attacker can try different angles, and techniques, and each failed attempt is simply shrugged off.

But a defender like yourself, has to defend against all these attacks, all the time, everytime! Screw up once, and your data is gone.

And even assuming you have the most secure-minded developers working for you, so your application itself is secure (a very big IF), your supply chain might is never pristine. Every now and then, Microsoft releases a new patch that you need to update on your servers. Forget to patch, and it’ll leave your server vulnerable to exploit.

But Patching has a risk — what if you patched and then the server crashes, or some feature doesn’t work anymore? What if you patched windows and suddenly makemoney.com stopped making you any money? That’s a risk you can’t afford, you’ve got kids to feed.

Hence, you build out a test environment, test out the new patches, make sure everything works, and then (and only then) do you deploy this change. Risk averted, but doing this takes time, only the most mature development teams can build, test and deploy an application in hours. Most take days, weeks, or even months — if I’m being honest, years isn’t unheard of either.

In the mean-time you run the risk that maybe you get hacked, but that’s better than definitely not making money. Which is a rational decision, but it will leave you and customers at risk.

If we put those 3 things together, the App Economy, the Big Data craze and the difficulty of defending against attacks, you have all the ingredients necessary to guarantee that data breaches will continue to happen, and we shouldn’t limit our focus to just avoiding data breaches, but rather expand it to both avoidance and response.

Responding to a breach

Yahoo Breach Notification

Yahoo Breach Notification

So if breaches are to continually occur, what can we do?

To me, first and foremost, we need to notify victims. This is just not up for debate, a victim of a crime should know if they’re a victim, and more importantly what they’re a victim off.

I’ve blogged before about the disclosure principle of the PDPA:

The PDPA states that if someone discloses your data to a 3rd-party, they have to inform you, and seek your consent prior to disclosure. Seems logical, that if they lost (instead of disclosed) that same data, to a malicious 3rd-party, those same principles apply. Obviously consent is moot when it comes to breaches, but the notice and choice principle is meant to inform the data subject, and that principle should still stand.

The responsibility of notice and choice cannot simply vanish because data was hacked, if anything the responsibility to disclose has to be intensified. Breach notification is primarily concern in helping the victim mitigate the effects of a breach on the. Breach notification is about taking the data in a breach and using it for good.

And if we agree that breaches occur, and breach notification should be important, let me now make the case for centralized notification.

Centralized Notification

Traditionally breach notification involves a bit of finger-wagging at the organization to informed its customers about the data breach, and what specific elements of data were in it.

But that only goes so far, the impact of a breach is both far-reaching in time, and cumulative.

If you haven’t changed your phone number since 2014, chances are your number is in the Nuemera breach, and since your father/mother remain the same person for your entire life, that piece of data is always considered breached thanks to the dermaorgan breach.

Breaches are also cumulative, in that a sum of data across all breaches add up. Maybe we get your name from one breach, phone number from another, and spouse information from somewhere else. Individually each breach may only whisper a bit about your identity, but cumulatively these breaches might give enough information for a full blown history of your life, enough for someone to steal your identity or scam you directly.

So if the premise is to inform victims of their risk, and that risk is the cumulative risk of individual breaches, than breach notification must also be cumulative — centralized in one place for victims to get the right information.

And centralized notification, under the government auspices (of course!) has the added benefit of ensuring consistency and quality of breach notifications. The organization that was breached still needs to inform its customers, but the victims can now view their full risk profile across multiple breaches in one place.

Finally, since the risk is also far-reaching in time, i.e. a breach from 2014 can still be used in 2018 to scam you, we need a place that will persist the data in one place. If you had 10Malaysian haveibeenpwned? mobile phone accounts across 10 different telco’s, you’d have to call each one of them to get your breach details — 5 years from now, which breaches you were a part of, calling them again can be tedious.

And since breach notification isn’t revenue generating, organizations are unlikely to process them efficiently, usually resulting in a turn-around time measured in x working days. This is not effective.

Centralized breach notification will benefit victims immensely and allow them to get a centralized view of all the breaches they’ve been in, from a single place.

What does centralized breach notification look like?

The one working example is haveibeenpwned.com

Services like haveibeenpwned are enormously successful, sometimes averaging millions of user per day.It’s now being recommended by law-enforcement and used by government agencies in the UK and Australia.

This is a private service, operated by an individual, dealing with stolen(?) data, and it’s used by Police as well as official government agencies, and the best (and only) example of centralized breach notification around.

Regardless of which data breach you were in (Yahoo!, Dropbox, LinkedIn, etc), you can understand which breaches you were apart of by simply typing your email into a search box and get a full understanding of your risk across those known data breaches. Simply put, haveibeenpwned is amazing.

But how would this work for Malaysia?

Malaysian haveibeenpwned?

Well, since you asked.

I propose a single website where people can type in their IC Numbers, and quickly figure out if they were part of the many(!) data breaches affecting Malaysian Personal data. This way, as new data breaches are announced, they can be added to the list, and people can slowly (but surely) understand how their risk increases with each breach.

Sounds eerily familiar to something I worked on before, but just can’t remember.

Gov.My TLS audit: Version 2.0

Last week I launched a draft of the Gov.my Audit, and this week we have version 2.0

Here’s what changed:

  1. Added More Sites. We now scan a total of 1324 government websites, up from just 1180.
  2. Added Shodan Results. Results includes both the open ports and time of the Shodan scan (scary shit!)
  3. Added Site Title. Results now include the HTML title to give a better description of the site (hopefully!).
  4. Added Form Fields. If the page on the root directory has an input form, the names of the fields will appear in the results. This allows for a quick glance at which sites have forms, and (roughly!) what the form ask for (search vs. IC Numbers).
  5. Added Domain in the CSV. The CSV is sorted by hostname, to allow for grouping by domain names (e.g. view all sites from selangor.gov.my or perlis.gov.my)
  6. Added an API. Now you can query the API can get more info on the site, including the cert info and HTTP headers.
  7. Released the Serverless.yml files for you to build the API yourself as well 🙂

All in all, it’s a pretty bad-ass project (if I do say so myself). So let’s take all that one at a time.

Continue reading

I scanned 1000 government sites, what I found will NOT shock you

Previously, I moaned about dermaorgan.gov.my, a site that was probably hacked but was still running without basic TLS. This is unacceptable, that in 2018, we have government run websites, that ask for personal information, running without TLS.

So I decided to check just how many .gov.my sites actually implemented TLS, and how many would start being labled ‘not secure’ by Google in July. That’s right, Google will start naming and shaming sites without TLS, so I wanted to give .gov.my sites the heads up!

Why check for TLS?

TLS doesn’t guarantee a site is secure (nothing does!), but a site without TLS signals lack of care from the administrator. The absence of TLS is an indicator of just how lightly the security of the servers has been taken.

Simply put, TLS is necessary for not sufficient for security — and since it’s the easiest thing to detect for, without running intrusive network scans, it seems like the best place to start.

How I checked for TLS?

But first I needed a list of .gov.my sites.

To do that, I  wrote a web-crawler that started with a few .gov.my links, and stored the results. It then repeated the process for the links, the links of the links…and so forth. After 3 iterations, I ended with 20,000 links from 3,000+ individual hostnames (a word I wrongly use in place of FQDN, but since the code uses hostnames, I’m sticking to it for now — please forgive me networking nerds)

I then manually filtered the hostnames to those from a .gov.my or .mil.my domain and scanned them for a few things:

  • Does it have a https website ( if it doesn’t redirect)
  • Does it redirect users from http to https
  • Does the https site have a valid certificate
    • Does it match the hostname
    • Does it have a fresh certificate (not expired)
    • Can the certificate be validated — this required all intermediary certs to be present
  • What is the IP of the site
  • What is the asn of the IP
  • What is the server & X-Powered-By headers returned by the host

Obviously, as I was coding this, my mind got distracted and I actually collected quite a bit more data, but those fields are in the csv for you the Excel the shit out off! The repository contains both a json and jsonl file that has more data.

Now onto the results

Continue reading

Another Day, Another breach

220,000 is a lot of people. It’s the population of a small town like Taiping, and roughly twice the capacity of Bukit Jalil Stadium.

Yet today, a data breach of this size, barely registers in the news-cycle. After all, the previous data breach was 200 times bigger, and occurred just 3 months ago. How could we take seriously something that occurs so frequently, and on a scale very few comprehend.

Individually, each breach is not particularly damaging, it’s a thin thread of data about victims, but they do add up. Criminals use multiple breaches, and stitch together a fabric of the victims identity, eventually being able to forge credit card applications in their name, or to perform typical scams.

But if you’re thinking of avoiding being in a breach, that’s an impossible task. The only Malaysians that weren’t part of the telco breach, were those without mobile phones. In the organ donor leak, the victims were kind-hearted souls who were innocent bystanders in the war between attackers and defenders on the internet.

The only specific advice that would work, would be to not subscribe to mobile phone accounts and don’t pledge your organs. That is not useful advice.

I wanted this post to be about encouraging people to stop worrying about data breaches, and move on with their lives. To accept that the price of living in a hyper-connected world, is that you’ll be data breach victim every now and then — I wanted to demonstrate this by actually going out and pledging my organs to show that we shouldn’t be afraid.

But when I went to the Malaysian organ donation website (demarorgan.gov.my), I was greeted by all too common “Connection is Not Secure” warning. Which just made my head spin!

Continue reading

That long post about Data breaches (you never wanted to read!)

Part 1: An intro to Data Breaches

Let’s start with some basics. What is a Data Breach?

According to Verizon, a data breach is when you’ve confirmed that data has been lost to an attacker, while a data incident is merely something that ‘may’ result in a breach.

An incident is when a laptop goes missing from your company’s office.

A breach is when the data on that laptop is published online.

Breaches can be negligent or malicious in nature. An employee accidentally sending staff details to a vendor, would be negligent, but when someone breaks into your office to steal your laptops, that’s malicious. Usually breaches require both negligent and malicious elements, and rarely do we see cases of elite nation-state actors hacking into orgazations.

But just because incidents happen before breaches, it doesn’t mean we learn about them in that order. Sometimes we find the lost data first, and then scramble backwards to determine the incident.

Imgur, a popular image sharing site that lost 1.7mln records, were made aware of data breach from Troy Hunt, and had to go into damage control, presumably before knowing of what incident caused it.

When the Philippines Election Commission (COMELEC) website was hacked, it was apparent what had happened, hackers defaced the entire site, hence the incident and breach happened simultaneously. For Equifax, the time between incident and breach was a few months (at least that’s what Equifax say), and for the Malaysian Telco breach, the gap was years.

I’m not here to talk about criminal investigations, but cut the Malaysian police some slack, its difficult to investigate a 3 year old crime. Sometimes IT pros can’t investigate bugs that occurred 3 days ago, because logs were already purged, what more 3 years.

The new normal

Data breaches are the new ‘normal’, an accepted risk of living in today’s hyper-connected world.

The price of our shiny new toys, like iPhones, Uber, and foodpanda, is the risk of data breaches. For all the hullabaloo around Uber, people are still going to use their service, because let’s face it, Uber is still better than no Uber.

All the new online services, that make our lives better, require us to share some personal data with them. And that implicitly means, that there is a risk that data is lost.

Impact to companies

But just because Data breaches are normal, doesn’t mean they don’t impact the companies that experience them.

Last year, the Journal of Cybersecurity (yes, there is such a thing), published a study suggesting damage from data breaches to companies were minimal:

Specifically, we find that the cost of a typical cyber incident in our sample is less than $200 000 (about the same as the firm’s annual IT security budget), and that this represents only 0.4% of their estimated annual revenues.

… much lower than retail shrinkage (1.3%), online fraud (0.9%), and overall rates of corruption, financial misstatements, and billing fraud (5%)

Still, while the initial cost is paltry, the long term consequence isn’t.  Comparitech did an analysis of data breaches and the effect it had on share prices:

Breached companies tend to underperform the NASDAQ. They recover to the index’s performance level after 38 days on average, but after three years the NASDAQ ultimately outperforms them by a margin of over 40 percent. (emphasis mine)

In other words, the effect of a breach causes stock prices to immediately drop, then recover, but subsequently get pummeled.

However, almost all studies reveal that the damage from data breaches to companies are reducing over time!

From comparitech:

The most notable result is older breaches met with a stronger initial reaction than newer breaches. One theory is that … they become more common. This causes a “breach fatigue” … in which investors are less shaken by data breaches as time goes on.

And the journal:

The litigation rate for all cyber events has been generally decreasing…the litigation rate for data breaches was around 20% in 2004, but has fallen to about 5% in 2014.

Conclusion?

Both the market, and the general population have become more accepting of breaches.

So if breaches are bad, and they’re the new normal, can we at least contain them. As we’ll see next, the answer is a resounding no.

Part 2: Data Breaches cannot be contained

Just because they’re normal doesn’t mean data breaches are harmless, modern breaches share 3 common traits that make uncontainable.

  • They last forever
  • They’re very common
  • They have little re-sale value

Sort of like diamonds, so let’s take these one at a time.

They last forever

Once a breach data is published on the internet, there’s no containing it.

This isn’t unique to data breaches, all digital content has the same issue. It’s the reason why online piracy is a big deal — it’s impossible to contain the distribution digital content once the internet gets a hold of it.

When a DVD-quality version of Expendables 3 was leaked online, Lionsgate was powerless to stop it. The most they could do was arrest the perpretrator, and absorb a $10mln loss. This is a powerful hollywood studio, with millions to lose, and they couldn’t stop the dissemation of their content.

The whole point of digitizing information, is to allow easier duplication and transportation of data. So when you try to ‘contain’ a data breach , you’re not so much trying to out-wit the attackers, as you’re trying to fight the nature of the internet itself.

And if the breached data are immutable (un-changeable) attributes like date of birth, MyKad numbers or blood type, breach victims remain victims — forever.

Given how pervasive the impact of a breach can be, you’d expect them to be rare. But boy, would you be wrong.

They’re common

Nobody has a definitive answer, but Gemalto estimate 918 reported breaches in the first half of 2017, while Verizon count nearly 2,000 breaches in 2016, and none of these are exhaustive list.

If you’re reading this, chances are you’re already part of multiple reported breaches, and possibly even more un-reported ones. Check out haveibeenpwned to see if your email has been in other breaches.

They more services we consume, the more databases our data exist in, the more likely we are to be victims of breaches.

They have little re-sale value

But who’s carrying out these breaches? and what are they using the data for?

The breached data are often traded on internet forums. If credit card data involved, they often go to carding forums that specialize in their trade. Depending on the type of the card, these numbers can go for as low as $8.

And that’s credit card data which can be monetized (relatively!) easily. Personal account data, like MyKad Numbers and home addresses, require more effort to monetize, and hence, worth less.

So much less, that in cases like the Malaysian telco breach, they’ve been published online for free!

But why would attackers go through the trouble of ‘hacking’ only to publish it for free?

One word — reputation.

Just because the online forums are anonymous, doesn’t mean reputation doesn’t count. All forums support a pseudonym, and that online identity can gain reputation for uploading leaked database. Some sites even have a point system based on how many leaked databases you uploaded.

Some users just copy the data from one forum and post it on another, gaining reputation on forums, but in the process guaranteeing the leaked data remains on the internet forever, and not dependent on a single forum or user.

You might be willing to pay thousands of dollars to protect MyKad number, but it’s traded like pokemon cards online.

This tri-fecta of factors, means that trying to contain breach is a fools errand, and a non-starter of a solution.

But what happens if the breach occurs, but isn’t yet on the internet? Do companies have still have a chance of containing it? As Uber learnt quite recently, the answer is still NO.

Part 3: The peril hiding data breaches

In 2016, Uber got hacked, and it was a classic case of a single careless error leading to a massive problem. Happens all the time, this I.T thing is hard man, that’s why you pay us the money!

But the controversy wasn’t about the breach, rather Uber’s response to it.

Instead of reporting this to regulators and users, Uber signed up the attackers with an NDA (yes, a legal non-disclosure contract) and reported the extortion as a bug-bounty payment of $100,000.

[Note to companies: Signing up extortionist with legal NDAs isn’t a good breach response]

The contract had specific personal details of the attackers, and once signed, put both Uber and the attacker  in a Mutually Assured Destruction (MAD) scenario.

No regulator, driver or rider was informed. There’s a deep philosophical question here, if a breach doesn’t happen on the internet, is it really a breach?

Turns out, the answer is still YES. Because Uber seemed had weathered this breach, for about a year — Until they got a new CEO.

When Dara Khosrowshahi took over Uber, he decided to release a statement informing everyone of the breach. Only problem was that it took him months to do it

In a strongly worded letter, one US Senator made the following statement to Uber’s new CEO:

While Uber reportedly learned of the breach in November 2016 — and reports indicate you subsequently learned of the breach … in September 2017 — Uber decided not to inform passengers and drivers of the breach until last week. Even more disturbingly, Uber is reported to have shared information concerning the breach with potential investors prior to alerting regulators or affected drivers and passengers, as required under numerous state data breach laws

Joe Sullivan, Uber’s chief security officer was fired, with some members of his team quitting shortly after.

So remember, if you’re covering something up for your company, one day you’ll get a new CEO, and things might not look so good. Case in point, the VolksWagon coder who helped them cheat on the emissions test. [Spoiler alert: he’s in jail]

Engineers can no longer use the “I just followed orders” excuse, didn’t work for Joe Sullivan, didn’t work for the VW coder, and it won’t work for you.

People can forgive a breach, after all breaches happen everyday,

What people find harder to forgive is a deliberate attempt to conceal the truth. That’s unethical.

And while most lawyers in Malaysia, including the Bar Council are of the opinion that the breach notification isn’t part of Malaysian law, I beg to differ, I think the PDPA does mandate breach notifications (just not explicitly), and that’s what we cover next.

Part 4: The Ethics of Breach Notifications

Few lawyers give much thought to data breaches, and hence the legal literature around them is sparse. But I did find this gem of an article from PWC:

…the question of giving of notice to regulators and individuals, the narrow legalistic view would be that the Data Protection Act does not contain an express requirement for notice…

But how does an ethical view alter the situation? The answer might be … the ethically correct thing to do is to give notice, perhaps based on the rationale that notice will reduce the risks of harms…

But surely that can’t be right? The law and ethics cannot deliver conflicting judgments on matters of fundamental importance? Surely the bigger picture requires convergence of results?…

Unlike printed words in legislation, the people who oversee us and sit in judgment over us have ethical content and context….they apply a purposive approach to the interpretation of the law when that is required to deliver just results.  Thus, they can take the view that the legally correct thing to do is the ethical one. At that point breach disclosure is seen as being part of the law, regardless of the narrow picture within the text of the Data Protection Act.

If you’re a victim of a data breach, as nearly all Malaysians are, no provision in the PDPA explicitly grants you a right to be informed. But if we define Data Breaches as “Involuntary Data Disclosures” the picture clears up.

The PDPA states that if someone discloses your data to a 3rd-party, they have to inform you, and seek your consent prior to disclosure. Seems logical, that if they lost (instead of disclosed) that same data, to a malicious 3rd-party, those same principles apply. Obviously consent is moot when it comes to breaches, but the notice and choice principle is meant to inform the data subject, and that principle should still stand.

But I’m no lawyer, so let’s ask a simple question.

Are other jurisdictions, who’ve thought about this long and hard enough, coming to a conclusion that breach victims should be informed, and is there a general trend towards breach notification?

YES and YES!

In the US, such laws exists in most states, and a Fedaral law is in motion.

In Europe, the GDPR will mandate notifying victims within 72 hours, on top of the existing telecoms package, which already mandates breach notification.

Even in the Philippines the Data privacy commission ordered the Election Commission (COMELEC) inform breach victims, in the wake of their 2016 data breach. The privacy commission also recommended that the chairman of the COMELEC be criminally prosecuted (Criminally!!)

When a company takes my data, they have a moral, ethical and legal obligation to inform me of who they’re giving that data to, and what that 3rd-party is doing with it.

That responsibility cannot vanish just because the data was ‘hacked’.

We cannot reduce the responsibility of data collectors when data is lost to malicious 3rd-parties, if anything we need to increase their responsibility, to incentivize them to more fervently protect data.

I’m particularly disappointed that no one from the telcos have leaked anonymously to the press, or spoken off the record to reporters. I don’t understand how an entire industry can collective lack conscience on the issue. Under normal circumstances, the telcos fall over themselves to assure us that customers are the most important thing to them, and that they commit to transparency.

Customers are important to telcos? Transparency?! Really?!

Because in this instance — the customers were not important at all, and there certainly isn’t any transparency from the telcos. If customers were as important as telcos claim, they’d have informed each and every victim by now, and compensated them in some form.

So I don’t think telcos care about their customers, and I certainly don’t think their transparent.

But informing the victims are just the first step in handling a data breach. Next, we explore ways of mitigating the impact to Malaysians from our massive breach.

Part 5: Reducing the impact of breaches

As we discussed in previous posts, eliminating or containing breaches are impossible, instead the focus should be mitigation, and that needs to happen at a policy level.

Individually, there’s little you can do to tackle the breach, but the government needs to setup policies to mitigate the impact of the Telco breach, and prepare us for future breaches (and oh yes, there will be more!)

But before that, we need to tackle some flaws in with our MyKad, specifically their numbers.

Having a single identifier, like the MyKad number, is a good thing, the Government needs something that uniquely identifies you. It helps ensure don’t vote twice, or allow tax authorities to audit all your bank accounts.

But an identifier, should just identify your, not reveal information.

MyKad as Identified

Today, MyKad numbers reveal your age, gender and state of birth. Identifiers are usually public knowledge, and hence shouldn’t contain revealing data. If MyKad numbers are designed to be shared, then people should feel comfortable sharing them (like duh!) How many people are comfortable sharing their age?

A good example are bank account numbers, just browse Lelong, where sellers openly publish their bank account numbers — because the only thing anyone can do with them, is give them money.

Let’s re-design MyKad numbers to be good identifiers by removing information from them and then let’s stop using them as authenticators.

Autheticators? WTF is that?

An identifier makes a claim of who you are, and might be public knowledge.

An authenticator, proveJust saying!s that claim, and must be secret.

Think of identifiers as usernames, while authenticators are the passwords. Unfortunately, we’ve made MyKad numbers both — and obviously something can’t be public knowledge and secret at the same time.

Many organizations assume I’m Keith Rozario, just because I know Keith Rozario’s MyKad number. When I call my bank for information like my account balance, they will usually ‘verify’ me by asking questions like:

  • What’s my MyKad Number
  • What’s my Address
  • How do I usually pay your Credit Card Bill, etc etc.

The answers to the first 2 questions are in the Telco Breach, and freely available online for anyone. And there are  an uncountable number or organizations that already have my MyKad number and address. MyKad numbers are NOT secret, they were never designed to be, let’s stop pretending that they are.

A possible alternative is to use a Phone Pin, or a voice password but let’s move on away from MyKad numbers, and onto identity freezes.

Identity Freezes

Today, If you fail to pay your Digi bill, you end up on a blacklist, and no telco will give you a post-paid line.

Systems like CTOS and CCRIS, are in place to protect corporations from giving credit to the ‘wrong’ people.Let’s take those same systems, and use them to protect data breach victims as well, which at this point is every Malaysian.

I propose the concept of an identity freeze, where you ca FUCKING.n freeze your identity from being accessed by anyone. This automatically means banks won’t give you credit, but can also prevent telcos from issuing you accounts (even prepaid ones). Very similar to credit freezes in the US.

Once you freeze your identity, you’d be safely assured that no one can take out loans, credit cards, or phone lines in your name. Today, the only people with that assurance, are people on the blacklist.

The process for unlocking an identity should be painful (relatively speaking), very verbose, and temporary. The default should be frozen unless otherwise requested — and the entire system should be provided for free.

This mitigates the impact of data breaches, giving victims some control of their data, and ensuring the damage of the breach is limited. It reduces the value of data breach information, because it would take a lot more to monetize the data, which in turn would probably(!) reduce the number of breaches themselves.

Though again, once that breach data hits the internet, there’s no stopping it.

But mitigating impacts to victims is one thing, many are interested in holding companies accountable for this breach. Data breaches are the new normal, but companies shouldn’t be let off scot-free. In the next post, we explore how to determine if the company has taken reasonable steps to protect your data.

Part 6: Practical Steps for Security

When it comes to security, the PDPA is rather light. I won’t waste space quoting it , but the crux is that companies that have personal data must take practical steps to secure the data. Fortunately, those practical steps are not defined, because technology always outpaces legislature.

The PDPA instead outlines principles, and leaves it to judges and regulators to constantly redefine what is practical (or not). It’s unfair to judge a security incident from 4 years ago, using the knowledge we have today, the risk profile and technicalities surrounding data breaches have evolved immensely.

It’s yet, another reason, why disclosing a breach as soon as it’s discovered is in the organizations own interest. It means you get judged ‘in your own time’ , and not 3 years down the road, when people are more paranoid, and security levels are much higher.

But that still begs the question of what are practical steps?

Instead of focusing on firewalls, and anti-virus, and block-chain (gasp!), the focus should be on simple questions that any layman can understand.

  1. What is the gross revenue and profit of the company?
  2. What’s the highest ranking officer in the organization that looks purely at security?
  3. What are the resources (budget and people) allocated to to that ranking officer?
  4. What were the recommendations of that team prior to the breach, and which of them were (or were not) followed?

If the highest ranking security officer is Freddy who just graduated in marketing last year, then that’s not very practical, or If the team the security officer team consists of two interns, both of whom were aspire to be copy-writers, that’s not very practical.

You get the picture.

It’s only AFTER we’ve established that the security officer had sufficient rank, and that a reasonable amount of resources and budget were allocated, do we need go into the specifics.

And, we will rarely need specifics. (trust me!)

Equifax , a company whose revenue runs into the Billions (!) , laid the blame for their massive breach down to a single employee, Tech Crunch reported:

“The human error was that the individual who’s responsible for communicating in the organization to apply the patch, did not,”

The notion that just one person didn’t do their job and led to the biggest breach in history is quite an amazing claim and shows a fundamental lack of good security practices.

If a billion dollar organization, left patching of critical software to Bob from accounting, we don’t need to ask about firewalls now, do we?

Part 7: Conclusion

This was a 7-part post, with a total word count exceeding 3700, and if you’ve come this far you deserve nothing less than a solid hand-shake, pat on the back, and a freaking 5-star medal. Unfortunately, I can only afford first two, if you ever see me in the person, come on over and claim your prize.

So let’s wrap this baby up.

Data breaches are the new normal, and people have accepted that. But it doesn’t mean they aren’t damaging both to victims and the organizations involved. Once a breach happens, there’s no containing it, affected organizations involve should quickly move to inform the victims.

And even though current laws in Malaysia don’t explicitly require breach notifications, applying the principles in the PDPA, will lead us to conclude that breach notification is indeed mandatory and that is inline with the global trend.

To mitigate the impact of data breaches, regulators need to revamp the mechanism and systems we have in place. Specifically the MyKad numbering system which desperately needs a revamp, and implementing a robust identity freezing system, that will allow average citizens better control of their data.

In the end, we will eventually learn to build systems that handles breaches effectively, but until then victims will continue to suffer. Here’s hoping we reach that point sooner rather than later.

THE END. (phew! that was a long one)

Got any thoughts around breaches?

Hit me up on email keith [at] keithrozario.com , or find me on twitter @keithrozario.

As a bonus, here’s a short piece about false prepaid registrations you might want to read, to help untangle the giant hairball that is the Malaysian Telco breach.

Part 8: False prepaid registrations

Consider this a bonus piece from my long thoughts about data breaches. You might the older post before reading this. So let’s dive in.

The telco breach was a giant hairball of issues, and one of the strands in the hairball is false prepaid registrations.

Immediately after releasing sayakenahack, people reported that they were seeing additional numbers linked to their mykad numbers. From TheStar:

Malaysian Communications and Multi­media Commission (MCMC) network security and enforcement sector chief officer Zulkar­nain Mohd Yassin said it would most likely be a case of other people using another person’s identity to register.

“We are serious about this. That’s why you see many compounds issued by the MCMC to service providers in respect of non-compliance with the guidelines of prepaid registrations,” he said.

He’s right, telcos have been issued summons for false registrations every year from 2014 to 2017, withTune Talk chief executive officer Jason Lo telling Digital News Asia (DNA):

…although there are many systems in place to ensure registrations are as accurate as possible, with a network of thousands of dealers, it can be hard to monitor every one

The Malaysian Telco Breach was two issues. One was the chronic problem of false prepaid registration, and two, the breach itself. The former is not a trivial issue, because the Evidence Act in Malaysia states:

A person who is registered with a network service provider as a subscriber…on which any publication originates from is presumed to be the person who published …unless the contrary is proved.

Hence, if a phone number, that is registered to you, is publishing seditious statements on WhatsApp you would be deemed to have published them. And the onus is on YOU, to prove otherwise, a guilty till proven innocent kinda law.

So what do we do?

In I.T we have a saying, if you can’t prevent, at least detect.

So if we can’t prevent false registrations, we should at least allow for victims to check regularly.

But how to check?

Solutions that scale

The Malaysian Reserve, quoted one expert saying that we should all call our mobile providers to find out, the expert added that it took him ‘only’ 20 minutes to do so.

Only 20 minutes? Only??!!

Malaysia has 10 different Telcos, if calling one takes 20 minutes, calling all of them would take 3 hours. That’s too high a price just to check if you’re part of the breach. No wonder nobody has bothered.

The telco breach had millions of records, If we assume that 20 million victims made these 3 hour calls, that’s 60 million man hours spent.

Even if the telco’s collectively dedicate 3000 people, working 8 hour shifts, 24×7. It would take 7 years to inform all the victims. If those 3000 people were paid a monthly salary of RM1000, the cost for labour alone would be RM250 million.

Any solution that requires victims to place phone calls, will fail, because the scale of the breach cannot be solved manually. A solution that would work for 1000 victims may not work for 20 million.

The solution should be, oh, I don’t know….something like a central website, where you type in data, and get a automated response with no human intervention, and maybe it would be able to verify your phone number through a One-Time-Password if the owner had cash.

Final Disclaimer

Some have suggested my data isn’t 100% accurate, and accused me to sharing inaccurate data.

They’re right. Three things though.That hasn’t happened.

One, I’ve never claimed sayakenahack was 100% accurate, I’m just claiming, that I found data online, some of which has your myKad number on it. Whether that data represents accurately what is (or was) in the telco database, is not something I can guarantee.

Two, because of false prepaid registration, nobody can be absolutely sure of all the numbers registered in their name, unless they go to each and every telco physically.

Three, MCMC has promised to resolve the issue of false prepaid registration by 1st December 2017 (yes, that date is past), including requiring MyKad registration for top-ups. I’m not sure if that has happened yet.

Writing Millions of rows into DynamoDB

While designing sayakenahack, the biggest problem I faced was trying to write millions of rows efficiently into DynamoDB. I slowly worked my way up from 100 rows/second to around the 1500 rows/second range, and here’s how I got there.

Work with Batch Write Item

First mistake I did was a data modelling error. Sayakenahack was supposed to take a single field (IC Number) and return the results of all phone numbers in the breach. So I initially modeled the phone numbers as an array within an item (what you’d called a row in regular DB speak).

Strictly speaking this is fine, DynamoDB has an update command that allows you to update/insert an existing item. Problem is that you can’t batch an update command, each update command can only update/insert one item at a time.

Running a script that updated one row in DynamoDB (at a time) was painfully slow. Around 100 items/second on my machine, even if I copied that script to an EC2 instance in the same datacenter as the DynamoDB, I got no more than 150 items/second.

At that rate, a 10 million row file would take nearly 18 hours to insert. That wasn’t very efficient.

So I destroyed the old paradigm, and re-built.

Instead of phone numbers being arrays within an item, phone numbers were the item itself. I kept IC Number as the partition key (which isn’t what Amazon recommend), which allowed me to query for an IC Number and get an array of items.

This allowed me to use DynamoDB’s batch_write_item functionality, which does up to 25 request at once (up to a maximum of 16MB). Since my items weren’t anywhere 16MB,  I would theoretically get a 25 fold increase in speed.

In practice though, I got ‘just’ a 10 fold increase, allowing me to write 1000 items/second, instead of 100. This meant I could push through a 10 million row file in under 3 hours.

First rule of thumb when trying to write lots of rows into DynamoDB — make sure the data is modeled so that you can batch insert, anything else is painfully slow. Continue reading