All posts filed under “Misc

Just a collection of stuff (mostly from my old blog)

comment 0

First I deleted my most popular tweet — then I deleted 2000 more.

Two weeks ago, I rage-tweeted something regarding Malaysian politics that got a lot more viral than I liked (I’ve censored out the profanity for various reasons, most notably, there are teenagers who read this blog). It was a pointless collection of 200 characters, that somehow resonated with people enough to be shared across social media. Obviously, since it was me, the tweet was filled with a small collection of profanities, and laced with just the right amount of emotive content 🙂

But then things started getting bad.

Soon after I tweeted, I received messages from folks I hadn’t met in decades, showing me screenshots of their whatsapp group that had my tweet — my wife’s chinese speaking colleagues were showing it to her at work — I checked, and nearly 2,000 people retweeted it, which isn’t typical for me, and frankly speaking pretty scary.

As much as I’d like to have my content shared, the tweet in question is nothing but couple of crude words pieced together in a ‘rage-tweet’. And I understand that it emotionally resonates with folks who are angry, but if this the level of discourse we’re having on  Malaysian social media, we should be alarmed. Completely pointless rants being viralled is not how we ubah, it is the absolute opposite of how we ubah!

Research on the virality of articles from the New York Times showed that ‘angry’ content was more viral than any other, beating out awe, surprise and even practical value. The angrier the content, the more likely it would be shared. A rage-tweet is more likely to go viral than something like fuel-saving tips, even though the latter clearly is more valuable to readers.

At this point, I’d rant about how the media has a responsibility to look beyond clicks and ads, and to think about the impact of their content on society, but since I owned the tweet, I simply deleted it. Of course, I can’t stop the screen-shots being shared across whatsapp, but we do what we can.

Deleting your tweets

That got me thinking, twitter is a cesspool of angry farts screaming at each other, and that has some value.

But while, what I tweet today, may be relevant and acceptable today, it may not be 2-3 years from now. Kinda like how Apu from the Simpsons was acceptable and non-offensive in the 90’s.

I’m ashamed to say it, but I once thought that Michael Learns to Rock was a great rock band, in context, thats acceptable for a 12 year old 2 decades ago, before even Napster or Limewire. Of course, as a adult in 2018, I’m thoroughly aware that AC/DC are the greatest rock band ever, and Michael Learns to Rock, well they’re not exactly Denmark’s best export.

And that’s the problem, twitter removes context  — it’s very easy to take a 140 character tweet from 5 years ago out of context. Nobody cares about context on a platform that limits users to 140 characters (or 280 characters since end 2017). Maybe you quoted an article from TheMalaysianInsider, which, guess what, no longer exist. Context is rather fluid on twitter, and it changes rapidly over weeks, let alone the years from your first tweet.

For example,  this tweet from Bersatu’s Wan Saiful:

Gee, I wonder who he was talking about, a simple internet search will give you the answer, but that’s not the point.

Wan Saiful changed his opinion,  and he’s explained why, people should be allowed to change their mind.Freedom to change your opinion not just perfectly fine, it’s a per-requisite for progress.If we allow our tweet history to be a ball-and-chain that ties us to our old idealogy, how could we ever progress? Everybody changes their mind — and that’s OK.

The point is twitter should not be a historical archive — it should be current. A great place to have an informed discussion of current affairs, but not a place to keep old, out-dated and out of context material floating around.

Hence, I decided to delete all my tweets that were older than 90 days old, and here’s how.

comments 3

I scanned 1000 government sites, what I found will NOT shock you

Previously, I moaned about dermaorgan.gov.my, a site that was probably hacked but was still running without basic TLS. This is unacceptable, that in 2018, we have government run websites, that ask for personal information, running without TLS.

So I decided to check just how many .gov.my sites actually implemented TLS, and how many would start being labled ‘not secure’ by Google in July. That’s right, Google will start naming and shaming sites without TLS, so I wanted to give .gov.my sites the heads up!

Why check for TLS?

TLS doesn’t guarantee a site is secure (nothing does!), but a site without TLS signals lack of care from the administrator. The absence of TLS is an indicator of just how lightly the security of the servers has been taken.

Simply put, TLS is necessary for not sufficient for security — and since it’s the easiest thing to detect for, without running intrusive network scans, it seems like the best place to start.

How I checked for TLS?

But first I needed a list of .gov.my sites.

To do that, I  wrote a web-crawler that started with a few .gov.my links, and stored the results. It then repeated the process for the links, the links of the links…and so forth. After 3 iterations, I ended with 20,000 links from 3,000+ individual hostnames (a word I wrongly use in place of FQDN, but since the code uses hostnames, I’m sticking to it for now — please forgive me networking nerds)

I then manually filtered the hostnames to those from a .gov.my or .mil.my domain and scanned them for a few things:

  • Does it have a https website ( if it doesn’t redirect)
  • Does it redirect users from http to https
  • Does the https site have a valid certificate
    • Does it match the hostname
    • Does it have a fresh certificate (not expired)
    • Can the certificate be validated — this required all intermediary certs to be present
  • What is the IP of the site
  • What is the asn of the IP
  • What is the server & X-Powered-By headers returned by the host

Obviously, as I was coding this, my mind got distracted and I actually collected quite a bit more data, but those fields are in the csv for you the Excel the shit out off! The repository contains both a json and jsonl file that has more data.

Now onto the results

comment 0

Another Day, Another breach

220,000 is a lot of people. It’s the population of a small town like Taiping, and roughly twice the capacity of Bukit Jalil Stadium.

Yet today, a data breach of this size, barely registers in the news-cycle. After all, the previous data breach was 200 times bigger, and occurred just 3 months ago. How could we take seriously something that occurs so frequently, and on a scale very few comprehend.

Individually, each breach is not particularly damaging, it’s a thin thread of data about victims, but they do add up. Criminals use multiple breaches, and stitch together a fabric of the victims identity, eventually being able to forge credit card applications in their name, or to perform typical scams.

But if you’re thinking of avoiding being in a breach, that’s an impossible task. The only Malaysians that weren’t part of the telco breach, were those without mobile phones. In the organ donor leak, the victims were kind-hearted souls who were innocent bystanders in the war between attackers and defenders on the internet.

The only specific advice that would work, would be to not subscribe to mobile phone accounts and don’t pledge your organs. That is not useful advice.

I wanted this post to be about encouraging people to stop worrying about data breaches, and move on with their lives. To accept that the price of living in a hyper-connected world, is that you’ll be data breach victim every now and then — I wanted to demonstrate this by actually going out and pledging my organs to show that we shouldn’t be afraid.

But when I went to the Malaysian organ donation website (demarorgan.gov.my), I was greeted by all too common “Connection is Not Secure” warning. Which just made my head spin!

comments 4

Writing Millions of rows into DynamoDB

While designing sayakenahack, the biggest problem I faced was trying to write millions of rows efficiently into DynamoDB. I slowly worked my way up from 100 rows/second to around the 1500 rows/second range, and here’s how I got there.

Work with Batch Write Item

First mistake I did was a data modelling error. Sayakenahack was supposed to take a single field (IC Number) and return the results of all phone numbers in the breach. So I initially modeled the phone numbers as an array within an item (what you’d called a row in regular DB speak).

Strictly speaking this is fine, DynamoDB has an update command that allows you to update/insert an existing item. Problem is that you can’t batch an update command, each update command can only update/insert one item at a time.

Running a script that updated one row in DynamoDB (at a time) was painfully slow. Around 100 items/second on my machine, even if I copied that script to an EC2 instance in the same datacenter as the DynamoDB, I got no more than 150 items/second.

At that rate, a 10 million row file would take nearly 18 hours to insert. That wasn’t very efficient.

So I destroyed the old paradigm, and re-built.

Instead of phone numbers being arrays within an item, phone numbers were the item itself. I kept IC Number as the partition key (which isn’t what Amazon recommend), which allowed me to query for an IC Number and get an array of items.

This allowed me to use DynamoDB’s batch_write_item functionality, which does up to 25 request at once (up to a maximum of 16MB). Since my items weren’t anywhere 16MB,  I would theoretically get a 25 fold increase in speed.

In practice though, I got ‘just’ a 10 fold increase, allowing me to write 1000 items/second, instead of 100. This meant I could push through a 10 million row file in under 3 hours.

First rule of thumb when trying to write lots of rows into DynamoDB — make sure the data is modeled so that you can batch insert, anything else is painfully slow.