Gov.My TLS audit: Version 2.0

Last week I launched a draft of the Gov.my Audit, and this week we have version 2.0

Here’s what changed:

  1. Added More Sites. We now scan a total of 1324 government websites, up from just 1180.
  2. Added Shodan Results. Results includes both the open ports and time of the Shodan scan (scary shit!)
  3. Added Site Title. Results now include the HTML title to give a better description of the site (hopefully!).
  4. Added Form Fields. If the page on the root directory has an input form, the names of the fields will appear in the results. This allows for a quick glance at which sites have forms, and (roughly!) what the form ask for (search vs. IC Numbers).
  5. Added Domain in the CSV. The CSV is sorted by hostname, to allow for grouping by domain names (e.g. view all sites from selangor.gov.my or perlis.gov.my)
  6. Added an API. Now you can query the API can get more info on the site, including the cert info and HTTP headers.
  7. Released the Serverless.yml files for you to build the API yourself as well 🙂

All in all, it’s a pretty bad-ass project (if I do say so myself). So let’s take all that one at a time.

Continue reading

I scanned 1000 government sites, what I found will NOT shock you

Previously, I moaned about dermaorgan.gov.my, a site that was probably hacked but was still running without basic TLS. This is unacceptable, that in 2018, we have government run websites, that ask for personal information, running without TLS.

So I decided to check just how many .gov.my sites actually implemented TLS, and how many would start being labled ‘not secure’ by Google in July. That’s right, Google will start naming and shaming sites without TLS, so I wanted to give .gov.my sites the heads up!

Why check for TLS?

TLS doesn’t guarantee a site is secure (nothing does!), but a site without TLS signals lack of care from the administrator. The absence of TLS is an indicator of just how lightly the security of the servers has been taken.

Simply put, TLS is necessary for not sufficient for security — and since it’s the easiest thing to detect for, without running intrusive network scans, it seems like the best place to start.

How I checked for TLS?

But first I needed a list of .gov.my sites.

To do that, I  wrote a web-crawler that started with a few .gov.my links, and stored the results. It then repeated the process for the links, the links of the links…and so forth. After 3 iterations, I ended with 20,000 links from 3,000+ individual hostnames (a word I wrongly use in place of FQDN, but since the code uses hostnames, I’m sticking to it for now — please forgive me networking nerds)

I then manually filtered the hostnames to those from a .gov.my or .mil.my domain and scanned them for a few things:

  • Does it have a https website ( if it doesn’t redirect)
  • Does it redirect users from http to https
  • Does the https site have a valid certificate
    • Does it match the hostname
    • Does it have a fresh certificate (not expired)
    • Can the certificate be validated — this required all intermediary certs to be present
  • What is the IP of the site
  • What is the asn of the IP
  • What is the server & X-Powered-By headers returned by the host

Obviously, as I was coding this, my mind got distracted and I actually collected quite a bit more data, but those fields are in the csv for you the Excel the shit out off! The repository contains both a json and jsonl file that has more data.

Now onto the results

Continue reading

Another Day, Another breach

220,000 is a lot of people. It’s the population of a small town like Taiping, and roughly twice the capacity of Bukit Jalil Stadium.

Yet today, a data breach of this size, barely registers in the news-cycle. After all, the previous data breach was 200 times bigger, and occurred just 3 months ago. How could we take seriously something that occurs so frequently, and on a scale very few comprehend.

Individually, each breach is not particularly damaging, it’s a thin thread of data about victims, but they do add up. Criminals use multiple breaches, and stitch together a fabric of the victims identity, eventually being able to forge credit card applications in their name, or to perform typical scams.

But if you’re thinking of avoiding being in a breach, that’s an impossible task. The only Malaysians that weren’t part of the telco breach, were those without mobile phones. In the organ donor leak, the victims were kind-hearted souls who were innocent bystanders in the war between attackers and defenders on the internet.

The only specific advice that would work, would be to not subscribe to mobile phone accounts and don’t pledge your organs. That is not useful advice.

I wanted this post to be about encouraging people to stop worrying about data breaches, and move on with their lives. To accept that the price of living in a hyper-connected world, is that you’ll be data breach victim every now and then — I wanted to demonstrate this by actually going out and pledging my organs to show that we shouldn’t be afraid.

But when I went to the Malaysian organ donation website (demarorgan.gov.my), I was greeted by all too common “Connection is Not Secure” warning. Which just made my head spin!

Continue reading

That long post about Data breaches (you never wanted to read!)

Part 1: An intro to Data Breaches

Let’s start with some basics. What is a Data Breach?

According to Verizon, a data breach is when you’ve confirmed that data has been lost to an attacker, while a data incident is merely something that ‘may’ result in a breach.

An incident is when a laptop goes missing from your company’s office.

A breach is when the data on that laptop is published online.

Breaches can be negligent or malicious in nature. An employee accidentally sending staff details to a vendor, would be negligent, but when someone breaks into your office to steal your laptops, that’s malicious. Usually breaches require both negligent and malicious elements, and rarely do we see cases of elite nation-state actors hacking into orgazations.

But just because incidents happen before breaches, it doesn’t mean we learn about them in that order. Sometimes we find the lost data first, and then scramble backwards to determine the incident.

Imgur, a popular image sharing site that lost 1.7mln records, were made aware of data breach from Troy Hunt, and had to go into damage control, presumably before knowing of what incident caused it.

When the Philippines Election Commission (COMELEC) website was hacked, it was apparent what had happened, hackers defaced the entire site, hence the incident and breach happened simultaneously. For Equifax, the time between incident and breach was a few months (at least that’s what Equifax say), and for the Malaysian Telco breach, the gap was years.

I’m not here to talk about criminal investigations, but cut the Malaysian police some slack, its difficult to investigate a 3 year old crime. Sometimes IT pros can’t investigate bugs that occurred 3 days ago, because logs were already purged, what more 3 years.

The new normal

Data breaches are the new ‘normal’, an accepted risk of living in today’s hyper-connected world.

The price of our shiny new toys, like iPhones, Uber, and foodpanda, is the risk of data breaches. For all the hullabaloo around Uber, people are still going to use their service, because let’s face it, Uber is still better than no Uber.

All the new online services, that make our lives better, require us to share some personal data with them. And that implicitly means, that there is a risk that data is lost.

Impact to companies

But just because Data breaches are normal, doesn’t mean they don’t impact the companies that experience them.

Last year, the Journal of Cybersecurity (yes, there is such a thing), published a study suggesting damage from data breaches to companies were minimal:

Specifically, we find that the cost of a typical cyber incident in our sample is less than $200 000 (about the same as the firm’s annual IT security budget), and that this represents only 0.4% of their estimated annual revenues.

… much lower than retail shrinkage (1.3%), online fraud (0.9%), and overall rates of corruption, financial misstatements, and billing fraud (5%)

Still, while the initial cost is paltry, the long term consequence isn’t.  Comparitech did an analysis of data breaches and the effect it had on share prices:

Breached companies tend to underperform the NASDAQ. They recover to the index’s performance level after 38 days on average, but after three years the NASDAQ ultimately outperforms them by a margin of over 40 percent. (emphasis mine)

In other words, the effect of a breach causes stock prices to immediately drop, then recover, but subsequently get pummeled.

However, almost all studies reveal that the damage from data breaches to companies are reducing over time!

From comparitech:

The most notable result is older breaches met with a stronger initial reaction than newer breaches. One theory is that … they become more common. This causes a “breach fatigue” … in which investors are less shaken by data breaches as time goes on.

And the journal:

The litigation rate for all cyber events has been generally decreasing…the litigation rate for data breaches was around 20% in 2004, but has fallen to about 5% in 2014.

Conclusion?

Both the market, and the general population have become more accepting of breaches.

So if breaches are bad, and they’re the new normal, can we at least contain them. As we’ll see next, the answer is a resounding no.

Part 2: Data Breaches cannot be contained

Just because they’re normal doesn’t mean data breaches are harmless, modern breaches share 3 common traits that make uncontainable.

  • They last forever
  • They’re very common
  • They have little re-sale value

Sort of like diamonds, so let’s take these one at a time.

They last forever

Once a breach data is published on the internet, there’s no containing it.

This isn’t unique to data breaches, all digital content has the same issue. It’s the reason why online piracy is a big deal — it’s impossible to contain the distribution digital content once the internet gets a hold of it.

When a DVD-quality version of Expendables 3 was leaked online, Lionsgate was powerless to stop it. The most they could do was arrest the perpretrator, and absorb a $10mln loss. This is a powerful hollywood studio, with millions to lose, and they couldn’t stop the dissemation of their content.

The whole point of digitizing information, is to allow easier duplication and transportation of data. So when you try to ‘contain’ a data breach , you’re not so much trying to out-wit the attackers, as you’re trying to fight the nature of the internet itself.

And if the breached data are immutable (un-changeable) attributes like date of birth, MyKad numbers or blood type, breach victims remain victims — forever.

Given how pervasive the impact of a breach can be, you’d expect them to be rare. But boy, would you be wrong.

They’re common

Nobody has a definitive answer, but Gemalto estimate 918 reported breaches in the first half of 2017, while Verizon count nearly 2,000 breaches in 2016, and none of these are exhaustive list.

If you’re reading this, chances are you’re already part of multiple reported breaches, and possibly even more un-reported ones. Check out haveibeenpwned to see if your email has been in other breaches.

They more services we consume, the more databases our data exist in, the more likely we are to be victims of breaches.

They have little re-sale value

But who’s carrying out these breaches? and what are they using the data for?

The breached data are often traded on internet forums. If credit card data involved, they often go to carding forums that specialize in their trade. Depending on the type of the card, these numbers can go for as low as $8.

And that’s credit card data which can be monetized (relatively!) easily. Personal account data, like MyKad Numbers and home addresses, require more effort to monetize, and hence, worth less.

So much less, that in cases like the Malaysian telco breach, they’ve been published online for free!

But why would attackers go through the trouble of ‘hacking’ only to publish it for free?

One word — reputation.

Just because the online forums are anonymous, doesn’t mean reputation doesn’t count. All forums support a pseudonym, and that online identity can gain reputation for uploading leaked database. Some sites even have a point system based on how many leaked databases you uploaded.

Some users just copy the data from one forum and post it on another, gaining reputation on forums, but in the process guaranteeing the leaked data remains on the internet forever, and not dependent on a single forum or user.

You might be willing to pay thousands of dollars to protect MyKad number, but it’s traded like pokemon cards online.

This tri-fecta of factors, means that trying to contain breach is a fools errand, and a non-starter of a solution.

But what happens if the breach occurs, but isn’t yet on the internet? Do companies have still have a chance of containing it? As Uber learnt quite recently, the answer is still NO.

Part 3: The peril hiding data breaches

In 2016, Uber got hacked, and it was a classic case of a single careless error leading to a massive problem. Happens all the time, this I.T thing is hard man, that’s why you pay us the money!

But the controversy wasn’t about the breach, rather Uber’s response to it.

Instead of reporting this to regulators and users, Uber signed up the attackers with an NDA (yes, a legal non-disclosure contract) and reported the extortion as a bug-bounty payment of $100,000.

[Note to companies: Signing up extortionist with legal NDAs isn’t a good breach response]

The contract had specific personal details of the attackers, and once signed, put both Uber and the attacker  in a Mutually Assured Destruction (MAD) scenario.

No regulator, driver or rider was informed. There’s a deep philosophical question here, if a breach doesn’t happen on the internet, is it really a breach?

Turns out, the answer is still YES. Because Uber seemed had weathered this breach, for about a year — Until they got a new CEO.

When Dara Khosrowshahi took over Uber, he decided to release a statement informing everyone of the breach. Only problem was that it took him months to do it

In a strongly worded letter, one US Senator made the following statement to Uber’s new CEO:

While Uber reportedly learned of the breach in November 2016 — and reports indicate you subsequently learned of the breach … in September 2017 — Uber decided not to inform passengers and drivers of the breach until last week. Even more disturbingly, Uber is reported to have shared information concerning the breach with potential investors prior to alerting regulators or affected drivers and passengers, as required under numerous state data breach laws

Joe Sullivan, Uber’s chief security officer was fired, with some members of his team quitting shortly after.

So remember, if you’re covering something up for your company, one day you’ll get a new CEO, and things might not look so good. Case in point, the VolksWagon coder who helped them cheat on the emissions test. [Spoiler alert: he’s in jail]

Engineers can no longer use the “I just followed orders” excuse, didn’t work for Joe Sullivan, didn’t work for the VW coder, and it won’t work for you.

People can forgive a breach, after all breaches happen everyday,

What people find harder to forgive is a deliberate attempt to conceal the truth. That’s unethical.

And while most lawyers in Malaysia, including the Bar Council are of the opinion that the breach notification isn’t part of Malaysian law, I beg to differ, I think the PDPA does mandate breach notifications (just not explicitly), and that’s what we cover next.

Part 4: The Ethics of Breach Notifications

Few lawyers give much thought to data breaches, and hence the legal literature around them is sparse. But I did find this gem of an article from PWC:

…the question of giving of notice to regulators and individuals, the narrow legalistic view would be that the Data Protection Act does not contain an express requirement for notice…

But how does an ethical view alter the situation? The answer might be … the ethically correct thing to do is to give notice, perhaps based on the rationale that notice will reduce the risks of harms…

But surely that can’t be right? The law and ethics cannot deliver conflicting judgments on matters of fundamental importance? Surely the bigger picture requires convergence of results?…

Unlike printed words in legislation, the people who oversee us and sit in judgment over us have ethical content and context….they apply a purposive approach to the interpretation of the law when that is required to deliver just results.  Thus, they can take the view that the legally correct thing to do is the ethical one. At that point breach disclosure is seen as being part of the law, regardless of the narrow picture within the text of the Data Protection Act.

If you’re a victim of a data breach, as nearly all Malaysians are, no provision in the PDPA explicitly grants you a right to be informed. But if we define Data Breaches as “Involuntary Data Disclosures” the picture clears up.

The PDPA states that if someone discloses your data to a 3rd-party, they have to inform you, and seek your consent prior to disclosure. Seems logical, that if they lost (instead of disclosed) that same data, to a malicious 3rd-party, those same principles apply. Obviously consent is moot when it comes to breaches, but the notice and choice principle is meant to inform the data subject, and that principle should still stand.

But I’m no lawyer, so let’s ask a simple question.

Are other jurisdictions, who’ve thought about this long and hard enough, coming to a conclusion that breach victims should be informed, and is there a general trend towards breach notification?

YES and YES!

In the US, such laws exists in most states, and a Fedaral law is in motion.

In Europe, the GDPR will mandate notifying victims within 72 hours, on top of the existing telecoms package, which already mandates breach notification.

Even in the Philippines the Data privacy commission ordered the Election Commission (COMELEC) inform breach victims, in the wake of their 2016 data breach. The privacy commission also recommended that the chairman of the COMELEC be criminally prosecuted (Criminally!!)

When a company takes my data, they have a moral, ethical and legal obligation to inform me of who they’re giving that data to, and what that 3rd-party is doing with it.

That responsibility cannot vanish just because the data was ‘hacked’.

We cannot reduce the responsibility of data collectors when data is lost to malicious 3rd-parties, if anything we need to increase their responsibility, to incentivize them to more fervently protect data.

I’m particularly disappointed that no one from the telcos have leaked anonymously to the press, or spoken off the record to reporters. I don’t understand how an entire industry can collective lack conscience on the issue. Under normal circumstances, the telcos fall over themselves to assure us that customers are the most important thing to them, and that they commit to transparency.

Customers are important to telcos? Transparency?! Really?!

Because in this instance — the customers were not important at all, and there certainly isn’t any transparency from the telcos. If customers were as important as telcos claim, they’d have informed each and every victim by now, and compensated them in some form.

So I don’t think telcos care about their customers, and I certainly don’t think their transparent.

But informing the victims are just the first step in handling a data breach. Next, we explore ways of mitigating the impact to Malaysians from our massive breach.

Part 5: Reducing the impact of breaches

As we discussed in previous posts, eliminating or containing breaches are impossible, instead the focus should be mitigation, and that needs to happen at a policy level.

Individually, there’s little you can do to tackle the breach, but the government needs to setup policies to mitigate the impact of the Telco breach, and prepare us for future breaches (and oh yes, there will be more!)

But before that, we need to tackle some flaws in with our MyKad, specifically their numbers.

Having a single identifier, like the MyKad number, is a good thing, the Government needs something that uniquely identifies you. It helps ensure don’t vote twice, or allow tax authorities to audit all your bank accounts.

But an identifier, should just identify your, not reveal information.

MyKad as Identified

Today, MyKad numbers reveal your age, gender and state of birth. Identifiers are usually public knowledge, and hence shouldn’t contain revealing data. If MyKad numbers are designed to be shared, then people should feel comfortable sharing them (like duh!) How many people are comfortable sharing their age?

A good example are bank account numbers, just browse Lelong, where sellers openly publish their bank account numbers — because the only thing anyone can do with them, is give them money.

Let’s re-design MyKad numbers to be good identifiers by removing information from them and then let’s stop using them as authenticators.

Autheticators? WTF is that?

An identifier makes a claim of who you are, and might be public knowledge.

An authenticator, proveJust saying!s that claim, and must be secret.

Think of identifiers as usernames, while authenticators are the passwords. Unfortunately, we’ve made MyKad numbers both — and obviously something can’t be public knowledge and secret at the same time.

Many organizations assume I’m Keith Rozario, just because I know Keith Rozario’s MyKad number. When I call my bank for information like my account balance, they will usually ‘verify’ me by asking questions like:

  • What’s my MyKad Number
  • What’s my Address
  • How do I usually pay your Credit Card Bill, etc etc.

The answers to the first 2 questions are in the Telco Breach, and freely available online for anyone. And there are  an uncountable number or organizations that already have my MyKad number and address. MyKad numbers are NOT secret, they were never designed to be, let’s stop pretending that they are.

A possible alternative is to use a Phone Pin, or a voice password but let’s move on away from MyKad numbers, and onto identity freezes.

Identity Freezes

Today, If you fail to pay your Digi bill, you end up on a blacklist, and no telco will give you a post-paid line.

Systems like CTOS and CCRIS, are in place to protect corporations from giving credit to the ‘wrong’ people.Let’s take those same systems, and use them to protect data breach victims as well, which at this point is every Malaysian.

I propose the concept of an identity freeze, where you ca FUCKING.n freeze your identity from being accessed by anyone. This automatically means banks won’t give you credit, but can also prevent telcos from issuing you accounts (even prepaid ones). Very similar to credit freezes in the US.

Once you freeze your identity, you’d be safely assured that no one can take out loans, credit cards, or phone lines in your name. Today, the only people with that assurance, are people on the blacklist.

The process for unlocking an identity should be painful (relatively speaking), very verbose, and temporary. The default should be frozen unless otherwise requested — and the entire system should be provided for free.

This mitigates the impact of data breaches, giving victims some control of their data, and ensuring the damage of the breach is limited. It reduces the value of data breach information, because it would take a lot more to monetize the data, which in turn would probably(!) reduce the number of breaches themselves.

Though again, once that breach data hits the internet, there’s no stopping it.

But mitigating impacts to victims is one thing, many are interested in holding companies accountable for this breach. Data breaches are the new normal, but companies shouldn’t be let off scot-free. In the next post, we explore how to determine if the company has taken reasonable steps to protect your data.

Part 6: Practical Steps for Security

When it comes to security, the PDPA is rather light. I won’t waste space quoting it , but the crux is that companies that have personal data must take practical steps to secure the data. Fortunately, those practical steps are not defined, because technology always outpaces legislature.

The PDPA instead outlines principles, and leaves it to judges and regulators to constantly redefine what is practical (or not). It’s unfair to judge a security incident from 4 years ago, using the knowledge we have today, the risk profile and technicalities surrounding data breaches have evolved immensely.

It’s yet, another reason, why disclosing a breach as soon as it’s discovered is in the organizations own interest. It means you get judged ‘in your own time’ , and not 3 years down the road, when people are more paranoid, and security levels are much higher.

But that still begs the question of what are practical steps?

Instead of focusing on firewalls, and anti-virus, and block-chain (gasp!), the focus should be on simple questions that any layman can understand.

  1. What is the gross revenue and profit of the company?
  2. What’s the highest ranking officer in the organization that looks purely at security?
  3. What are the resources (budget and people) allocated to to that ranking officer?
  4. What were the recommendations of that team prior to the breach, and which of them were (or were not) followed?

If the highest ranking security officer is Freddy who just graduated in marketing last year, then that’s not very practical, or If the team the security officer team consists of two interns, both of whom were aspire to be copy-writers, that’s not very practical.

You get the picture.

It’s only AFTER we’ve established that the security officer had sufficient rank, and that a reasonable amount of resources and budget were allocated, do we need go into the specifics.

And, we will rarely need specifics. (trust me!)

Equifax , a company whose revenue runs into the Billions (!) , laid the blame for their massive breach down to a single employee, Tech Crunch reported:

“The human error was that the individual who’s responsible for communicating in the organization to apply the patch, did not,”

The notion that just one person didn’t do their job and led to the biggest breach in history is quite an amazing claim and shows a fundamental lack of good security practices.

If a billion dollar organization, left patching of critical software to Bob from accounting, we don’t need to ask about firewalls now, do we?

Part 7: Conclusion

This was a 7-part post, with a total word count exceeding 3700, and if you’ve come this far you deserve nothing less than a solid hand-shake, pat on the back, and a freaking 5-star medal. Unfortunately, I can only afford first two, if you ever see me in the person, come on over and claim your prize.

So let’s wrap this baby up.

Data breaches are the new normal, and people have accepted that. But it doesn’t mean they aren’t damaging both to victims and the organizations involved. Once a breach happens, there’s no containing it, affected organizations involve should quickly move to inform the victims.

And even though current laws in Malaysia don’t explicitly require breach notifications, applying the principles in the PDPA, will lead us to conclude that breach notification is indeed mandatory and that is inline with the global trend.

To mitigate the impact of data breaches, regulators need to revamp the mechanism and systems we have in place. Specifically the MyKad numbering system which desperately needs a revamp, and implementing a robust identity freezing system, that will allow average citizens better control of their data.

In the end, we will eventually learn to build systems that handles breaches effectively, but until then victims will continue to suffer. Here’s hoping we reach that point sooner rather than later.

THE END. (phew! that was a long one)

Got any thoughts around breaches?

Hit me up on email keith [at] keithrozario.com , or find me on twitter @keithrozario.

As a bonus, here’s a short piece about false prepaid registrations you might want to read, to help untangle the giant hairball that is the Malaysian Telco breach.

Part 8: False prepaid registrations

Consider this a bonus piece from my long thoughts about data breaches. You might the older post before reading this. So let’s dive in.

The telco breach was a giant hairball of issues, and one of the strands in the hairball is false prepaid registrations.

Immediately after releasing sayakenahack, people reported that they were seeing additional numbers linked to their mykad numbers. From TheStar:

Malaysian Communications and Multi­media Commission (MCMC) network security and enforcement sector chief officer Zulkar­nain Mohd Yassin said it would most likely be a case of other people using another person’s identity to register.

“We are serious about this. That’s why you see many compounds issued by the MCMC to service providers in respect of non-compliance with the guidelines of prepaid registrations,” he said.

He’s right, telcos have been issued summons for false registrations every year from 2014 to 2017, withTune Talk chief executive officer Jason Lo telling Digital News Asia (DNA):

…although there are many systems in place to ensure registrations are as accurate as possible, with a network of thousands of dealers, it can be hard to monitor every one

The Malaysian Telco Breach was two issues. One was the chronic problem of false prepaid registration, and two, the breach itself. The former is not a trivial issue, because the Evidence Act in Malaysia states:

A person who is registered with a network service provider as a subscriber…on which any publication originates from is presumed to be the person who published …unless the contrary is proved.

Hence, if a phone number, that is registered to you, is publishing seditious statements on WhatsApp you would be deemed to have published them. And the onus is on YOU, to prove otherwise, a guilty till proven innocent kinda law.

So what do we do?

In I.T we have a saying, if you can’t prevent, at least detect.

So if we can’t prevent false registrations, we should at least allow for victims to check regularly.

But how to check?

Solutions that scale

The Malaysian Reserve, quoted one expert saying that we should all call our mobile providers to find out, the expert added that it took him ‘only’ 20 minutes to do so.

Only 20 minutes? Only??!!

Malaysia has 10 different Telcos, if calling one takes 20 minutes, calling all of them would take 3 hours. That’s too high a price just to check if you’re part of the breach. No wonder nobody has bothered.

The telco breach had millions of records, If we assume that 20 million victims made these 3 hour calls, that’s 60 million man hours spent.

Even if the telco’s collectively dedicate 3000 people, working 8 hour shifts, 24×7. It would take 7 years to inform all the victims. If those 3000 people were paid a monthly salary of RM1000, the cost for labour alone would be RM250 million.

Any solution that requires victims to place phone calls, will fail, because the scale of the breach cannot be solved manually. A solution that would work for 1000 victims may not work for 20 million.

The solution should be, oh, I don’t know….something like a central website, where you type in data, and get a automated response with no human intervention, and maybe it would be able to verify your phone number through a One-Time-Password if the owner had cash.

Final Disclaimer

Some have suggested my data isn’t 100% accurate, and accused me to sharing inaccurate data.

They’re right. Three things though.That hasn’t happened.

One, I’ve never claimed sayakenahack was 100% accurate, I’m just claiming, that I found data online, some of which has your myKad number on it. Whether that data represents accurately what is (or was) in the telco database, is not something I can guarantee.

Two, because of false prepaid registration, nobody can be absolutely sure of all the numbers registered in their name, unless they go to each and every telco physically.

Three, MCMC has promised to resolve the issue of false prepaid registration by 1st December 2017 (yes, that date is past), including requiring MyKad registration for top-ups. I’m not sure if that has happened yet.

Writing Millions of rows into DynamoDB

While designing sayakenahack, the biggest problem I faced was trying to write millions of rows efficiently into DynamoDB. I slowly worked my way up from 100 rows/second to around the 1500 rows/second range, and here’s how I got there.

Work with Batch Write Item

First mistake I did was a data modelling error. Sayakenahack was supposed to take a single field (IC Number) and return the results of all phone numbers in the breach. So I initially modeled the phone numbers as an array within an item (what you’d called a row in regular DB speak).

Strictly speaking this is fine, DynamoDB has an update command that allows you to update/insert an existing item. Problem is that you can’t batch an update command, each update command can only update/insert one item at a time.

Running a script that updated one row in DynamoDB (at a time) was painfully slow. Around 100 items/second on my machine, even if I copied that script to an EC2 instance in the same datacenter as the DynamoDB, I got no more than 150 items/second.

At that rate, a 10 million row file would take nearly 18 hours to insert. That wasn’t very efficient.

So I destroyed the old paradigm, and re-built.

Instead of phone numbers being arrays within an item, phone numbers were the item itself. I kept IC Number as the partition key (which isn’t what Amazon recommend), which allowed me to query for an IC Number and get an array of items.

This allowed me to use DynamoDB’s batch_write_item functionality, which does up to 25 request at once (up to a maximum of 16MB). Since my items weren’t anywhere 16MB,  I would theoretically get a 25 fold increase in speed.

In practice though, I got ‘just’ a 10 fold increase, allowing me to write 1000 items/second, instead of 100. This meant I could push through a 10 million row file in under 3 hours.

First rule of thumb when trying to write lots of rows into DynamoDB — make sure the data is modeled so that you can batch insert, anything else is painfully slow. Continue reading

Identity in a Post-Breach world (draft)

Posting this here first, my thoughts to follow. Random thoughts below are draft :).

Random thoughts on the matter

  1.  We still need a single identifier in Malaysia (IC Number), this is administrative necessity. LHDN needs to check your bank accounts, Election Commission needs to know you’re not double-voting..etc.
  2. But that single identifier should not be used as an authenticator. No one should ask me for my IC number as a means of authenticating myself. When I call the bank, they shouldn’t be asking me for my IC number as a means of proving my identity to them.
  3. There’s too much info in the IC number (age, state, gender). Take all of that out, and replace with a random blob of numbers — one that cannot be guessed as well. So something like 8 numbers and 4 letters large enough, so criminals can’t guess.
  4. We need ‘identity-freezes’ in Malaysia. In America you can freeze your credit, but in Malaysia we need to go a step further and put an Identity freeze, especially for internet services.
  5. Check out section 114(a) of the evidence act, wrongly registered phone numbers are a thing, and they’re bad. If someone registered a pre-paid account in your name, and posted something bad — you’d be in trouble.
  6. If you took a loan from Maybank to buy a house, and 1 year later defaulted on the loan, no other bank in the country would grant you a loan. This protects the banks from issuing credit to someone who can’t pay back. So we have credit freezes.
  7. Let’s use the same mechanism to allow people to lock their identities, so no one can open bank accounts, telco accounts, not even Astro, TNB or Indah Water as long as the identity is locked. This way, the value of a stolen identity is tremendously reduced, and we protect breached victims.
  8. Identities can be ‘un-freezed’, e.g. when you buy a house, but then re-freezed shortly after.
  9. More thoughts to come…..

Sayakenahack: Epilogue

I keep this blog to help me think, and over the past week, the only thing I’ve been thinking about, was sayakenahack.

I’ve declined a dozen interviews, partly because I was afraid to talk about it, and partly because my thoughts weren’t in the right place. I needed time to re-group, re-think, and ponder.

This blog post is the outcome of that ‘reflective’ period.

The PR folks tell me to strike while the iron is hot, but you know — biar lambat asal selamat.

Why I started sayakenahack?

I’m one part geek and one part engineer. I see a problem and my mind races to build a solution. Building sayakenahack, while difficult, and sometimes frustrating, was super-duper fun. I don’t regret it for a moment, regardless of the sleepless nights it has caused me.

But that’s not the only reason.

I also built it to give Malaysians a chance to check whether they’ve been breached. I believe this is your right, and no one should withhold it from you. I also know that most Malaysians have no chance of ever checking the breach data themselves because they lack the necessary skills.

I know this, because 400,000 users have visited my post on “How to change your Unifi Password“.

400,000!!!

If they need my help to change a Wifi password, they’ve got no chance of finding the hacker forums, downloading the data, fixing the corrupted zip, and then searching for their details in file that is 10 million rows long — and no, Excel won’t fit 10mln rows.

So for at least 400,000 Malaysians, most of whom would have had their data leaked, there would have been zero chance of them ever finding out. ZERO!

The ‘normal’ world is highly tech-illiterate (I’ve even talked about it on BFM).  Sayakenahack was my attempt to make this accessible to common folks. To deny them this right of checking their data is just wrong.

But why tell them at all if there’s nothing they can do about it? You can’t put the genie back in the lamp. Continue reading

Sayakenahack architecture

I know the picture is a bit hard to read, but I wanted to make sure I had a detailed enough picture to understand the ‘innards’ of sayakenahack. Sometimes when you’re building stuff on the fly, and bottom-up, it’s good to take a step back, and have a top-down view.

I’ll be expanding this post over time, wanted to get my thoughts down quickly on paper before I moved on.

Intro to serverless

Serverless is a new-ish buzzword. It’s about building full-blown applications without servers (not even virtuals).

No EC2 instances — at all!

Some folks thought that just because I was on AWS, I was calling it serverless. Not so, lots of people use AWS for EC2, which are virtual servers. This blog, is hosted on a virtual server, but sayakenahack ran without any servers (virtual or otherwise). Except that one spot instance at the bottom right of the picture, we’ll get to that later.

That doesn’t mean I ran it on Elven magic and sparkly fairy-dust though. At some point, there were servers involved. But I never managed or ran those servers, they were abstracted from me by AWS services such as API Gateway (which is awesome by the way) and Lambda.

The beauty of serverless is :

  • I don’t have the headache of managing a full-blown OS stack (with requires more skills than I have)
  • It can scale till kingdom come Amazon has designed their serverless offerings to scale, and they do it beautifully.
  • It’s cheaper. MUCH cheaper.

With API Gateway for example, I could focus purely on building the resources and methods, without worrying about Apache configurations.

With lambda I can write python code natively (almost magically), without building out an EC2 instance.

And with DynamoDB, I get a database that can do anywhere from 10 to 10,000 writes per seconds without worrying about clustering, mirroring , etc. And even at 37 mln rows, the DB still qualifies for the free-tier.

That’s awesome (if I do say so myself).

Now of course there is a drawback. Without full granular control of Apache/Nginx, certain edge cases you need might not be possible, and DynamoDB is somewhat limited in it’s capability (although I’m not sure if that’s DynamoDBs fault, or because all NoSQL databases are like that).

But overall, for sayakenahack, serverless was the way to go. It might not work for you, but it worked beautifully for me.

Next up, we’ll look at the Holy Trinity of Serverless (API Gateway, Lambda, DynamoDB).

–stay tuned.

 

Sayakenahack.com answering the questions

OK, this is my last post on sayakenahack.com, and I’ve got a script scheduled to run at Sunday midnight to tear down the database. So if you wanna check, you better do it now, cause in 3 days time, it’ll be gone.

*poof*

But here are my thoughts on this whole debacle — and it’s going to get emotional, so don’t say I didn’t warn you.

So let’s start with the basics.

The right to know

I believe that if you’re data was leaked online, you have a right to know.

You might choose to “not know”, but that is a right you can choose to exercise. No one should be allowed to withhold that information from you.

I believe that you have a right to know about it, in a timely manner. Authorities can’t sit on the data for weeks without letting you know on any pretense.

I believe that the correct authority to do tell you about leaks is the MCMC. But till today they have made no attempt to create such a service, not even communicated a plan to implement one. There is no evidence to suggest they have (or had) any intention to do anything about it.

If I can code sayakenahack within 4 weeks (in my sparetime, while holding a 9-to-5 job, being a father and husband) there is no logical reason why the MCMC or the telcos couldn’t do something better in a shorter time-frame.

Even if you can’t do anything about it

I believe the right to know about a breach should exists even if you can’t do anything about it.

If you have terminal un-treatable cancer — does that mean a Doctor shouldn’t tell you about it? If you’re on  plane that’s about to crash, should the pilot remain silent?

You have a right to know about the leak. Regardless of whether you can do anything about it.

Only hackers and geeks should see the data

This data is freely available for anyone to download. The only people with the skills to find it though, are people we generally refer to as ‘geeks’ or ‘hackers’.

To ban sayakenahack is to say geeks and hackers can access the data — but not the average joe. It’s emphasizing that normal people don’t deserve that knowledge while geeks and hackers do.

This is elitism, and it’s wrong.

When Lowyat published the initial report, the knew of it’s importance. But chose to remove the article when the MCMC came. They continue to side with the MCMC, in saying that “sheer amount of information made available on the site could subject it to abuse.”

They fail to mention that the ‘sheer amount of information’ is already made available, just not to common folks, but to geeks and hackers. Effectively Lowyat is saying that it’s OK for geeks and hackers to have this data, but god-forbid the average joe get a hold of it.

God-forbid the actual data subject who is actually impacted be notified, the great Gods of Lowyat think that’s too much.

Oh, and btw, when Lowyat published the article, the site took roughly 200+ concurrent users at one time. When the star published the article in the morning, the site did 2000+ concurrent users (10 times more!)

Most Malaysians are technically illiterate and don’t visit Lowyat. They shouldn’t have less information because of it.

Manipulating vs. Masking

Lowyat’s editor then goes on to tell the Malaysian insight that “It’s blocked because it’s not right to manipulate the stolen data”.

Oh, give me a fucking break!

The word ‘manipulate’ is a dishonest choice. I mask the data, not manipulate it. No IT professional would ever make confuse manipulation with masking. Manipulation carries a negative connotation, that implies I’m changing the data in some way. Masking though is the intentional removal of data, to protect its confidentiality.

I’m masking. I’m not manipulating.

I went out my way to ensure that enough data was left so that users could still identify their numbers, yet not enough for somebody else to guess.

If you buy anything a credit card, your masked credit card number will be on the printed receipt. Generally the first 6  and last 4 digits are in the clear, while the middle 10 are masked (“replaced with asterisks”). This ensures that there’s enough information left on the receipt to trace a transaction, but not enough for fraudsters to get a card number. That’s a PCI-DSS rule, the gold-standard security framework for credit card processing, and I’ve spent 10 years deploying PCI systems. Masking is an acceptable practice to do this sort of thing.

So give me a break, with your ‘manipulating’. I love Lowyat to bits, but today they failed me big time.

How can we be sure the site is secure?

How can we be sure my site is secure?

Well there’s no such thing as an un-hackable website, and that includes things like Maybank2u.com, or CIMB-Clicks. Do we tell the banks to shut-down their sites just because the might be hacked?

No. We weigh the benefits and risk, and make calculated decisions as to what to do. Just because something is hackable doesn’t mean we take it offline.

We try our best to reduce the risk, until it reaches an acceptable level.

I would argue that the benefits of sayakenahack far outweighs the risk of it getting hacked. Hence continue to believe this was the right thing to do.

What if your site gets hacked?

So there are two parts here, how do I prevent a hack, and how do I mitigate the impact of breach (if it occurs).

First-off, the data is masked at source. i.e. the database only has the masked data. Making the data less valuable to fraudster. This is in the “minimize impact” bucket.

Below is the full representation of the data in the DB (for ic number 12345):

The data is masked at source, not in transit. So even I have no way to retrieve the full phone number from the DB. That’s why I can’t provide folks their full phone numbers — it just doesn’t work that way.

True, maybe this data is still valuable — but how would you extract it?

The DynamoDB is capped at 20 reads/seconds (10 RCUs in AWS parlance). Reading the data at full capacity, would take you 3 weeks (provided you knew all the ICs). If you were guessing ICs, my rough estimate is 3 years to dump the DB.

So maybe you have the skills to query the API, throttle it to avoid attention, and patiently write it to a DB for 3 weeks. If you got those skills — you probably found the files online already.

More AWS technical site

*skip this if you’re uninterested

Now onto the prevent the hack bucket.

My AWS account is protected with a super long password, and 2FA protected with the code on just one iPhone. I work on the AWS console exclusively via my Ubuntu virtual machine, that I spun up for this purpose, and I intend to destroy that VM image on Sunday as well.

Let’s get more details.

The entire architecture of sayakenahack is serverless. The html is served from a S3 bucket on Amazon via Cloudfront — which has TLS enabled. The javascript on the html calls an API that is hosted on API Gateway (which also is via cloudfront and has TLS enabled). It took me a few iterations, but the API and website exist on the same domain (no same-origin policy violations, but I still keep it CORS enabled).

Which means that there are no servers to hack on this thing.

No RFI,  No LFI.

No un-patched Apache version, or some shitty PHP bug that a server-ed site would be vulnerable to.

No SQLi (tehcnically it’s a noSQL database), or CMS Vulnerability. No Windows server to patch, no FTP, SMTP vuln, and all the other crap that gets servers in trouble.

It doesn’t mean that this is unhackable, it just means that I reduce my attack surface tremendously by getting rid of any servers.

The API calls a Lambda function that reads from the DynamoDB with an IAM that allows for only read access from the DB. Ensuring integrity of data.

A seprate IAM that allows for full DynamoDB access, the one I use to insert rows into the DB, is now keyless because I de-activated the key for it.

Finally the HTML site is a ‘minimalist’ design and I wrote it to be easily readable, any programmer can vet the code to detect bugs (or malicious stuff, like malvertising or bitcoin mining). The full code source is available on github for anyone can vet — in fact some already have!

Oh and I don’t log any API requests (they are cached though) — but that’s not the same as logging.

That’s just the tip of the icerberg of what I have on AWS. This thing is a labour of love.

So what

Let’s compare all of the above — with this!!

That’s the election commission website, that publishes your full name, and voting location based on a simple IC entry. The site is marked as insecure by Google Chrome because it doesn’t even have TLS.

TLS!!!! In fucking 2017, your website doesn’t have TLS??

A simple thing, that a free LetsEncrypt certificate would solve in less than 5 minutes.

What that means, is that when you search for your voting information on the website, the data is transferred in clear across the internet for anyone in the middle to see. It also means that your browser is not authenticating the site, and anyone can create a fake SPR website and make it look identical.

If you’re logged onto the SPR website from a kopitiam WiFi, I can see the data you’re sending (and receiving) just by logging on the same WiFi.

Trust on the internet

Fundamentally, when you log onto the SPR website, you’re trusting all the infrastructure between you and SPR, kopitiam Wifi included.

Do you trust the SPR? How about their vendor? How about the company that supplied them the servers?

How about the guy managing their database? Or the company that host their datacenter?

Their SysAdmin? Their Web Admin? All of their guys who wrote their code? Trust all of them?

Oh, and if you’re logging on to the site from home on Unifi — you’re probably trusting your stock-standard Dlink DIR-615 router, that’s hackable from the open internet.

The internet is built on a whole load of trust, and maybe you don’t trust me, but have you ever considered the number of people who un-suspectingly trust by just visiting a simple website?

Just saying, maybe sayakenahack isn’t a problem when the Election Commission’s website is marked as insecure. Why doesn’t Lowyat complain about the ‘sheer amount of data’ on the Election Commissions website?

This is sayakenahack.

That 1 cookie is for Google Analytics, it’s the only bit of ‘data capturing’ I do, and it’s industry standard. I collect data about who visits the site, and see what their load/lag times are, to ensure the site is operational and working well. But that doesn’t capture query strings, so no IC numbers are tracked.

More importantly, the site is TLS protected. All data between you and servers is encrypted, meaning you don’t have to trust the internet providers, or WiFi connection.

So, I go through great lengths protecting the site, and definitely more effort than the SPR.

As a bonus, here’s the PTPTN website, that allows you to check your balance. To be fair, they at least have a TLS equivalent, but they don’t re-direct you to it. So it’s still possible to access their non-encrypted site. See that bit on the browser that says “Not Secure”.

 

Sayakenahack forces re-direction. There is no un-encrypted version for either the website or API.

 

Yes, but these are government websites

Yes, the government (both state and federal) are exempt from PDPA.

But the damage inflicted on victims by a breach is identical whether the data comes from the government or private companies. I’m not sure why that exemption is there.

Secondly, there exist an exemption clause in the PDPA, specifically section 45(2)(f)(ii) that states there is an exemption IF:

(f) processed only for journalistic, literary or artistic purposes shall be exempted from the General Principle, Notice and Choice Principle, Disclosure Principle, Retention Principle, Data Integrity Principle and Access Principle and other related provisions of this Act, provided that—

(ii) the data user reasonably believes that, taking into account the special importance of public interest in freedom of expression, the publication would be in the public interest;

I don’t know how you define, public interest, but the site got 100,000 visits today (even with the ban kicking in at 12pm) signals to me that there is public interest.

I know Public Interest doesn’t literally mean things that interest the public, but you can’t argue that this is something people should be aware of. And don’t give me bullshit about hackers querying this instead of ‘real-users’. Hackers would use the API, and bypass the Google Analytics, the 100,000 is purely from the Google Analytics data.

Now, the lawyer types tell me that that ‘journalistic’ may not apply to bloggers, but this might be a good test case. Let’s see.

Also, I don’t know of any journalist in Malaysia, that could trawl through hacker forums getting the data, and then stand up a site that can support querying a 37mln row database, while serving 2000 concurrent users.  Do you?

There is one more exemption I might play here — but that’s a long shot, and I’m playing my cards close to my chest on that one.

Conclusion

To be honest, I’m afraid.

Afraid that next time I land in Malaysia, I end up in handcuffs at the back of a police car.

But sometimes, you gotta do what’s right, and not just what’s ‘legally permissible’.

Post-note

To all the reporters who’ve contacted me. I’m cancelling all interviews for now. This is my last post on the matter — at least for the next week.

See you on the other side.