Here’s one thing that’s already changed post GE14

In 2015, I was invited to a variety program on Astro to talk about cybersecurity.

This was just after Malaysian Airlines (MAS) had their DNS hijacked, but I was specifically told by the producer that I could NOT talk about the MAS hack, because MAS was a government linked company, and they couldn’t talk bad about GLCs.

Then half-way through the interview they asked me about government intervention, and I said something to the effect of “Governments are part of the problem and should refrain from censoring the internet”, that sound-bite never made it to TV because it was censored.

This was some stupid variety show called VBuzz, on a Tamil TV channel (of all places), tucked away in the Astro labyrinth of channels, and even then they were absolutely piss scared of being critical of anything even remotely close to the government. My statement wasn’t even directed at the Malaysian government, it was directed at government intervention in general, but alas, they feared too much and censored it out.

To be fair, I’m 100% certain the station would not have been in any trouble if they just broadcasted all of what I said (I’ve been more critical on other mediums like the blog and radio), but the producers chose to err on the side of caution.

When I asked why it was censored, they said it was because of the ‘law’. When I pressed her to give the actual law in question, and her response (quite nonchalantly) was that it was an un-written law!

Then…GE14 happened and….

Two days ago, on Astro Awani (LIVE!!), a commentator openly criticized directly the SPR commissioner, a Government Agency….and Astro didn’t censor now. Far more critical than what I said, and Far worse, but somehow magically the media found their spine that day.

It’s now a viral clip of how press freedom actually looks like, and hopefully this refreshing change will permeate through all of Malaysian media.

Gov TLS Audit : Architecture

Last Month, I embarked on a new project called GovTLS Audit, a simple(ish) program that would scan 1000+ government websites to check for their TLS implementation. The code would go through a list of hostnames, and scan each host for TLS implementation details like redirection properties, certificate details, http headers, even stiching together Shodan results into a single comprehensive data record. That record would inserted into a DynamoDB, and exposed via a rest endpoint.

Initially I ran the scans manually Sunday night, and then uploaded the output files to S3 Buckets, and ran the scripts to insert them into the DB.

But 2 weeks ago, I decided to Automate the Process, and the architecture of this simple project is complete(ish!). Nothing is ever complete, but this is a good checkpoint, for me to begin documenting the architecture of GovTLS Audit (sometimes called siteaudit), and for me to share.

What is GovTLS Audit

First let’s talk about what GovTLS Audit is — it’s a Python Script that scans a list of sites on the internet, and stores the results in 3 different files, a CSV file (for human consumption), a JSONL file (for insertion into DynamoDB) and a JSON file (for other programmatic access).

A different script then reads in the JSONL file and loads each row into database (DynamoDB), and then uploads the 3 files as one zip to an S3 bucket.

On the ‘server-side’ there are 3 lambda functions, all connected to an API Gateway Endpoint Resource.

  • One that Queries the latest details for a site [/siteDetails]
  • One that Queries the historical summaries for the site [/siteHistory]
  • One that List all scan (zip files) in the S3 Bucket [/listScans]

Finally there’s a separate S3 bucket to serve the ‘website’, but that’s just a simple html file with some javascript to list all scan files available for download. In the End, it looks something like this (click to enlarge):


Continue reading

Read this before GE14

Let’s start this post the same way I start my day — by looking at Facebook.

Facebook made $40 Billion dollars in revenue in 2017, solely from advertising to pure schmucks like you. The mantra among the more technically literate is that facebook doesn’t have users it has products that it sells to advertisers, it just so happens that all its products are homo-sapien smart-phone totting urbanites (just like you!)

The platforms meteoric rise from nobody to top-dog, is a dream-story in Silicon Valley, but underneath the veneer of wholesome innovation lies a darker secret, one that could be responsible for the polarization of entire communities, including our own. And it’s all because of their most valuable employee.

No, not Mark Zuckerberg, but the real genius behind the blue and white site. The one responsible for billions of ad revenue facebook generates yearly, and unsurprisingly she’s female.

Anna Lytica and Machine Learning

There’s probably thousands of post your facebook friends make everyday, but she decides which 3 to fit onto your smartphone screen first, and the next 3 and so forth. From the millions of videos shared every hour, she painstakingly picks the few you’d see in your timeline, she decides which ads to show you, and which advertisers to sell you too, underneath the hood in the giant ad behemoth, she lies working all day, everyday.

She isn’t a person, ‘she’ is an algorithm, a complex program that does billions of calculations a second, and for this post we’ll give her the name… Anna Lytica.

Facebook doesn’t talk about her much, she is after all a trade secret (sort of), but what she does and how she does it, might be as much a mystery to us, as it is to Mr. Zuckerberg. Machine Learning algorithms are complex things, we know how to build them, and train them, but how they actually work is sometimes beyond our understanding.

Google can train Alpha-Go to play a game, but how it makes decisions is unknown to Google and even itself — it just IS a Go player.And it is really sad, when we watch these AI algorithms make amazing discoveries, but are unable to explain their rationale to us mere humans. It’s the reason why Watson, IBMs big AI algorithm, hasn’t taken off in healthcare, there’s no point recommending a treatment for cancer, if the algorithm can’t explain why it chose the treatment in the first place.

This is hard to grasp, but AI isn’t just a ‘very powerful’ program, AI is something else entirely. We don’t even use traditional words like write or build to refer to the process of creating them (like we do regular programs), instead we use the word train.

We train an algorithm to play Go, to drive, or to treat cancer. We do this the same way we breed dogs, we pick specimens with the traits we want, and breed them till we end up with a something that matches our desires. How a dog works, and what a dog thinks is irrelevant. If we want them big, we simply breed the biggest specimens, the process is focused entirely on outcome.

Similarly, how the algorithm behaves is driven by what it was trained to do. How it works is irrelevant, all that matters is outcome. Can it play Go, can it drive, can it answer jeopardy? If you want to understand an algorithm you need to know what it was trained to do.

Anna Lytica, was trained to keep you browsing Facebook, after all the companies other endeavors like internet.org, and instant articles were built with the same intention. And while good ol’ Mark stated that he’s tweaking Anna to reduce the time people spend on Facebook, this is something new, an exception to the years Facebook tweaked her to keep you on their site.

After all the average monthly user spends 27 minutes per day in the app, and if you go by daily users, they spend about 41 minutes per day on Facebook. If that’s the end-result of tweaking Anna to ensure we spend less time on Facebook — God help us all!

And while it’s difficult to understand how Anna works, its very easy to guess how she’ll behave. If the end result of Anna’s training is to keep you browsing Facebook, then human psychology reveals a simple trait all humans share — confirmation bias. Continue reading

Gov.My TLS audit: Version 2.0

Last week I launched a draft of the Gov.my Audit, and this week we have version 2.0

Here’s what changed:

  1. Added More Sites. We now scan a total of 1324 government websites, up from just 1180.
  2. Added Shodan Results. Results includes both the open ports and time of the Shodan scan (scary shit!)
  3. Added Site Title. Results now include the HTML title to give a better description of the site (hopefully!).
  4. Added Form Fields. If the page on the root directory has an input form, the names of the fields will appear in the results. This allows for a quick glance at which sites have forms, and (roughly!) what the form ask for (search vs. IC Numbers).
  5. Added Domain in the CSV. The CSV is sorted by hostname, to allow for grouping by domain names (e.g. view all sites from selangor.gov.my or perlis.gov.my)
  6. Added an API. Now you can query the API can get more info on the site, including the cert info and HTTP headers.
  7. Released the Serverless.yml files for you to build the API yourself as well ūüôā

All in all, it’s a pretty bad-ass project (if I do say so myself). So let’s take all that one at a time.

Continue reading

Sayakenahack: Epilogue

I keep this blog to help me think, and over the past week, the only thing I’ve been thinking about, was sayakenahack.

I’ve declined a dozen interviews, partly because I was afraid to talk about it, and partly because my thoughts weren’t in the right place. I needed time to re-group, re-think, and ponder.

This blog post is the outcome of that ‘reflective’ period.

The PR folks tell me to strike while the iron is hot, but you know — biar lambat asal selamat.

Why I started sayakenahack?

I’m one part geek and one part engineer. I see a problem and my mind races to build a solution. Building sayakenahack, while difficult, and sometimes frustrating, was super-duper fun. I don’t regret it for a moment, regardless of the sleepless nights it has caused me.

But that’s not the only reason.

I also built it to give Malaysians a chance to check whether they’ve been breached. I believe this is your right, and no one should withhold it from you. I also know that most Malaysians have no chance of ever checking the breach data themselves because they lack the necessary skills.

I know this, because 400,000 users have visited my post on “How to change your Unifi Password“.

400,000!!!

If they need my help to change a Wifi password, they’ve got no chance of finding the hacker forums, downloading the data, fixing the corrupted zip, and then searching for their details in file that is 10 million rows long — and no, Excel won’t fit 10mln rows.

So for at least 400,000 Malaysians, most of whom would have had their data leaked, there would have been zero chance of them ever finding out. ZERO!

The ‘normal’ world is highly tech-illiterate (I’ve even talked about it on BFM).¬† Sayakenahack was my attempt to make this accessible to common folks. To deny them this right of checking their data is just wrong.

But why tell them at all if there’s nothing they can do about it? You can’t put the genie back in the lamp. Continue reading

Why does SayaKenaHack have dummy data?

Why does sayakenahack have dummy data? If I enter “123456” and “112233445566” I still get results.

I was struggling with answering this question, as some folks have used it to ‘prove’ that I was a phisher. We’ll get to that later, for now I hope to answer why these ‘fake’ IC numbers exist in the sayakenahack.

Firstly, I couldn’t find a good enough way to validate IC numbers as I was inserting them into the database. Most of you think that IC numbers follow a pre-define pattern :

  • 6-digit birthday (yymmdd format)
  • 2-digit state code
  • 4-digit personal identifier, where the last digit is odd for men, and even for women.

But, there are still folks with old IC numbers, and the army have their own format. Not to mention that the IC Number field  can be populated by passport numbers (for foreigners) and Company registration IDs. So instead of cracking my head on how to validate IC numbers, I decided to pass them all in.

The only ‘transformation’ I do is to strip them of all non-AlphaNumeric characters and uppercasing any letters in the result. This would standardize the IC numbers in the database, regardless of source file format.

Had I done some validation, I might have removed these dummy entries — but fortunately I didn’t.

Upon further analyzing the data, I went back to the original source files and notice something strange, the account numbers belonged to some strange names. And then it made sense — this was Test data.

Test data in a Production Environment to be exact.

And when the Database for the telco was dumped, the telco’s didn’t remove these test accounts from their system. So what we have is a bunch of dummy accounts, with dummy IC numbers. Continue reading

SayaKenaHack.com

On the 19th of October, Lowyat.net reported that a user was selling the personal data of MILLIONS of Malaysians on their forum. Shortly after, the article was taken down on the request of the MCMC, only to put up again, a couple of days later.

Lowyat later reported that a total of 46.2 Million phone numbers were exposed,  and the data included IC numbers, Addresses, IMSI, IMEI and SIM numbers as well. In short, a lot of data from a lot of people.

So Malaysia joined the ranks of The Phillipines, Turkey and South Africa to have data on their entire population leaked on the internet. [Spoiler alert: This is not a good thing]

Where can I check?

You can head over to a site I created: sayakenahack.com to check if you’re part of the breach. So far I’ve loaded data from Maxis, Digi, Celcom and UMobile onto the site. I’ll be adding the smaller telcos later this week (stay tuned).

Medical council, etc…I’m still debating whether I should put that in. Maybe some doctors don’t want to be identified as doctors, so that data stays out for now.

Waah… That means you downloaded illegal data?

Technically yes, the data might be illegal. But any geek can find it online, it’s a google search away.

I’m just making the data available to the ‘normals’, people who don’t look around in hacker forums.

Plus all data is masked, so only the first 4 and last 2 digits of the phone number is available. Which is almost as good as the masking of credit card numbers on your printed receipts.

I also don’t publish any names or addresses. If you’re unhappy with this, you should be unhappy with the Election Commission website that publishes your name in FULL on their website upon entering just an IC number. Similar to PTPTN etc.

Did you pay for the Data?

No. Contrary to what’s being reported the data is available for FREE online. Even the ‘hacker’ who was selling it on Lowyat was basically a re-seller.

I did not pay for the data, I would never validate the business case of reselling stolen data.

If I search for my IC, will you log my data?

No.

In technical terms, I’ve switched of logging for my API Gateway, CloudFront & Lambda.

If I wanted your data — I wouldn’t need you to search for you. I already have it.

OMG I’m breached !!! What can I do?

Unfortunately, there’s little you can do.

Your IC number is a permanent fixture of your life –and can’t be changed. This is bad design, but it’s the design we have at the moment.

If you lose your Phone Number, Credit Card details or E-mail address, you’d still have some form of mitigating the damage. But if someone gets your IC number, you can’t go to the NRD and get them to issue you a new one.

To be fair IC numbers (in their modern form) are at least 25 years old, so I’m not blaming anyone — but the reality is that we should either stop using IC numbers so extensively , or find some way to make them mutable. Not and easy task, but until that happens the damage of this leak will continue… in perpetuity.

Now onto the good news!

The leak is from 2014, so the chances of you having the same phone is minuscule. I know of only one person whose phone is older than 3 years old, everybody else has changed their phone. So IMEI numbers (which are tied to your phones) from 2014 are pretty useless.

IMSI and SIM are almost the same as well. Over the past 3 years, I’m almost certain a large percentage of the victims (50-80%) would have their sim cards swapped — primarily from buying a new phone that required a micro or nano sim or from porting telcos, or just losing their phones.

What’s not so good is the fact that most people still keep their Name, Address and Phone Number. So those are the top 3 (4 if you count IC Numbers) data elements in the breach, and unfortunately their almost all there.

Where did the data come from?

Well……

The breach includes not just Telco data but Jobstreet and various other sources as well. Let’s just focus on Telco because that’s the big one.

There’s only 2 possibilities on where the telco data came from:

  • Someone hacked into individuals telcos and took it; or
  • Someone hacked a central source with all the data

Now, consider that all Telco’s are in this breach — including Altel, PLDT, Redtone, etc. Which self-respecting hacker, with the skills to hack Maxis, Digi and Celcom, is going to waste time on Altel? Really?!

Consider also, that if you downloaded the data, (which I obviously have), it’s clear as day where the leak came from. It’s so clear, Stevie Wonder can see where the data was leaked from.

I’m hoping over the next few days somebody somewhere will make an announcement.

In the mean-time stay safe Malaysia.

End notes and Special Thanks

Thanks to Bin Hong for alerting me that I had a few logs on the GitHub repository. I’ve torn down the old repo and created a new one.

Thanks to Ang YC for letting me know I gave too much info to folks.

Thanks to **rax***n for sharing the data on the *ahem* site.

Thanks to Ridhwan Daud for correcting my API spelling. (it’s case sensitive).

All data available on sayakenahack.com is available somewhere on the web. I’m just making sure that it’s not just geeks/hackers who have this data, but the average citizen can also be informed if they’re part of the leak.

I’m especially proud of the architecture underlying sayakenahack. It’s completely serverless, and I’ll make a post about it soon. But learning DynamoDB and about a gazillion AWS services to deploy this was both fun and tiring.

For now, you can build your own version of sayakenahack with the data, by using the api at:

https://sayakenahack.com/api/v1/pwn?icNum=12345

I’ve changed the API many times. I promised this version is stable for the next 3 months.

The api is CORS enabled, so you can call it with javascript on your browser. There’s only one endpoint for now, I’ll documenting the API and will publish some documentation soon.

I spent a good 40+ hours building all of this, the code is mostly available on my GIT repository. Couple of elements aren’t there (lambda function to query DynamoDB) — but I’ll upload that when time permits.

Writing a WordPress Restoration script

WordPress sites get hacked all the time, because¬†the typical WordPress blogger install 100’s of shitty plugins and rarely updates their site. On the one hand, it’s great that WordPress has empowered so many people to begin blogging without requiring the ‘hard’ technical skills, on the other it just gives criminals a large number of potential victims.

Two years ago, when I studied the details of phishing attacks that targeted Maybank and RHB,¬†I found that attackers use compromised WordPress sites to host their phishing content. They’d first hack into¬†a seemingly random WordPress¬†website, host their phishing content there, and then blast out emails to unsuspecting victims with links to pointing back to their hacked bounty. If the hack works they’d get free username and passwords, and if they were ever caught, most evidence would point to the unsuspecting WordPress site owner.

So if you have a WordPress site (like me), chances are you’re in the cross-hairs of hackers already, and securing your site is the responsible thing to do.

In general WordPress sites should be:

  • Updated Automatically
  • Use a minimal number of plugins
  • Use plugins only from reputable publishers
  • Use themes only from reputable publishers–and have only one theme in the install directory
  • Employ strong passwords for the admin & user
  • Have the permissions of the underlying folders set accordingly (i.e.CHMOD them all)

But even if you took all precautions to hardened your site, there’s always a possibility of it getting hacked. No security is perfect,¬†and you should look into backups–backup often and to a separate location. That way, a compromised site can be rebuilt,¬†even if it were defaced. The last thing you want is to lose your precious design and data, because some one installed a shitty plugin over the weekend.

Today, I’ll walk through a short bash script I wrote to backup (and restore) a WordPress installation from scratch. It took me quite a while to write this (partly because I have no experience with Bash scripts), but I thought it would be good to walkthrough the details of the script and what it does.

The full script is available on github here, and the usage instructions will be maintained there. The write-up below describes code the first production release, linked here, even though¬†I’ve since updated the scripts to include some modifications, and as we speak I’m just about the release version 1.2.

So here we go…

Special Thanks

The following 3 folks, were greatly influential in the writing of the script, listed in no particular order. No to mention, the wonderful folks at stackoverflow that helped tremendously as well.

Thanks to Andrea Fabrizi for the awesome DropboxUploader script
Thanks to Ben Kulbertis for the awesome Cloudflare update script
Thanks to Peteris.Rocks for inspiring me with his Unattended WordPress Installation script

Pre-Requisites

As a pre-requisite to all this, I made the following decisions.

The back ups would be stored in DropBox– Dropbox has free options (up to 2GB) and has versioning by default.All your backups are versioned and kept for 30 days (not just the latest upload, which gets destroyed if you’re hit by malware). Doing this on AWS requires extra work, which I wasn’t prepared to do, and AWS has no free tier for S3 storage.

Also, I use CloudFlare to maintain the DNS. It’s optional of course, but I needed a DNS provider that had an API, and they were¬†the logical choice. This allowed the script to update your DNS as well.

Finally, the script assumes a standard LAMP stack, i.e. Linux (specifically Ubuntu 16.04), Apache , MySql and PHP. PHP is enforced by WordPress itself so that’s fine.But the ‘trend’ these days is to have NGINX instead of Apache, and MariaDB instead of MySQL. I kept things in ‘classic’ mode for now, I may revisit in the future. Continue reading

Relax dear-citizen your contactless card is relatively safe—ish

As Malaysia slowly (but surely) migrates to Chip and Pin, some banks have taken the opportunity to issue not just new Pin-enabled cards, but contactless-enabled ones as well.

To be clear,¬†Banks are only mandated to issue new Pin cards (replacing the signature cards you¬†had before),¬†but are taking the opportunity to also embed contactless capabilities into them as well. After all they’re already issuing new cards to every (single!) card holder, might as well get them on the contactless bandwagon while they’re at it.

The reason for being so gung-ho about contactless is purely economical. Research suggest that the easier payment methods become, the more money people are willing to spend. People with credit cards spend more than people with just cash, and 0% interest schemes have been a godsend¬†to retailers. Contactless payments, which don’t involve cumbersome Pins or signatures, are clearly the next evolutionary step, with one research paper suggesting they increase customer spending by nearly 10%.

Banks make money from small percentages per transactions, the more transactions at higher amounts, the more money they stand to make. So if an extra dollar worth of electronics in a contactless card increases revenue by 10%–why not?!

Pins are for security, Contactless is for convenience

But while PINs are a security feature, contactless is all about convenience. And conveniences trade-off security, so it stands to reason that contactless cards are less secure than regular ‘contact’ ones.

The question is whether that trade-off is worth the increase in convenience. After all, nothing is absolutely secure, and in today’s criminally infested internet, keeping your money under the mattress is safer than keeping it in a bank–but nobody does it because the mattress would be too inconvenient.

So what convenience are you getting with a contactless cards?

For one thing, no more waiting for a receipt printout to sign on, or bending down to an inconveniently placed pinpad to type in your PIN. Plus, for someone with gigantic fingers like me, I jump on the opportunity to avoid having to fidget with pinpads that must have been designed for dwarf children after they’ve been struck by the ray gun from Honey I shrunk the kids.

But that’s about it–the only convenience contactless cards provide is that you can do contactless payments–up to a specified amount.

The question now is what security trade offs are you making for this remarkable feature? Continue reading

The safest place for your money is under the mattress

money-under-mattress

When I was in school, we joked about people who kept their money under the mattress, that somehow those who didn’t use banks were less intelligent than people who did.The general thinking was that smart people kept their money in the bank,¬†where it¬†was safe from theft, fire and flood, while still collecting¬†interest.

In the 80’s this was a compelling argument, when interest rates were high and banks really did provide security,but is that thinking still applicable today?

In June of 2000, Maybank launched their ‘new’ internet banking platform, Maybank2u, which allowed their customers to do¬†their banking online, outside of traditional branches or even ATMs. Few years later, it begun offering online purchases and soon after the mobile¬†app was launched.

But while online banking platforms¬†brought¬†convenience, they also introduced new security threats — and it wasn’t clear whose job it was to secure against those new threats, and who would be liable for inevitable financial losses.

Was it going to be bank who assumed liability, just like they did before, or would it be the account holder, or possibly a mixture of both?

The answer depends on who gets attacked, because not all attacks are equal.

Not all attacks are equal

There’s two types of attack, one where the bank itself is attacked, and another where the account holder is targeted instead.

When someone walks into a bank¬†¬†with the threat of violence, and walks out with $30,000 of the banks cash, the bank absorbs all the loses. After all, that’s why your money is in their safe and not under the mattresses.

507d7acb92f46ed8d8779be14e3f2051But there exist another class of attack–customer impersonation, where the attacker isn’t threatening violence or even ‘attacking’, but trying to fool the bank into believing they are the rightful account holders. In other words, the attacker is trying to impersonate you, to get to your money.

And in the digital world, customer impersonation is far more common. Consider the case of ATM fraud.

ATMs identify a user by verifying their ATM cards, and then prompting them for the PIN. More specifically, the ATM first authenticates the inserted ATM card (is this card real?) and then proceeds to ask the user for the PIN (is the person the accountholder?), once an ATM is satisfied, it then proceeds to grant the user access to the account.

Hence if an attacker¬†managed to steal your card and knows your PIN, the ATM has no way to differentiate between you and the attacker.¬†Anyone could take your money from your account, by just having your ATM card and PIN, in contrast robbers attacking a bank would simply be taking the bank’s cash…not yours.

Credit Card fraud is another prime example,¬†but at least in Malaysia¬†end customers have their liability capped at RM250 provided they report their lost cards in a ‘reasonable’ amount of time. For debit cards and ATM cards are not protected in the same way. Which is strange because the poorer sections of society who need more protection usually have debit instead of credit cards.

But even credit card users need to be wary, because changes in the liability model are bound to happen when we introduce Chip and Pin. (read more here)

To summarize, customer impersonation isn’t the same as a bank robbery, when the bank issues you credentials (like PINs, passwords or ATM cards), the responsibility to secure those credentials are yours–and if those credentials are compromised, then you’ll have to shoulder some of the financial losses as well.

Continue reading