Had a blast at PyConSG 2019, really cool to be in the presence of so many pythonistas. Would definitely recommend, especially since python is one of the more broadly used languages (AI, Blockchain, RPA, etc). My talk was on AWS…
I’ve been really digging into Lambda Layers lately, and once you begin using layers you’ll wonder how you got by without them.
Layers allow you to package just about anything into lambda, but in a modular way. So elements of your code that don’t change much, can be packaged into layers, while keeping your actual lambda deployment for just the code that’s changing.
It’s akin to docker cache, where you keep the un-changing elements higher up in your docker file, separate from the code that always changes. The difference though, is that docker cache speeds up builds, while layers speeds up lambda deployments.
But layers aren’t magic, and they’re still limited by the AWS size limit, hence your entire function (including all it’s layers) need to be no larger than 250MB (unzipped).
Which is tough for something like spaCy — because its default installation size on a AWS Linux is ~400MB (or 492MB based on my quick installation on lambci for python3.7). So, in order to get spaCy working on a lambda, certain tweaks are going to be necessary.
Some have tried working around this problem by installing spaCy onto the lambda container on cold-start — i.e. pull the data into lambda only when you have access to the 512MB in
/tmp. Cool solution, but it almost completely fills out
/tmp, and makes a cold-start even slower.
A more optimal solution would be to reduce the size of the spaCy installation and have it fit into a layer! Fortunately I found a GitHub issue after some googling that enables us to do exactly this.
It involves removing unnecessary language files, which spaCy lazy load in, If you’re only interested in one language, you can simply remove the unnnecessary language files in the
After manually removing all non-English (en) language files, I managed to reduce the size of the spaCy package to 110MB, which fits very nicely into a lambda layer, in the end my lang directory only had the following files:
As a bonus, I also included the English en_core_web_sm-2.1.0 model, to make the lambda layer fully usable on its own .
Finally I published it as a publicly available layer, for anyone to consume. One of the amazing things about layers, is that once a layer is made, it can be shared across AWS for anyone to consume.
On the 20th of July, Singaporean authorities announced a data breach affecting SingHealth, the country largest healthcare group. The breach impacted 1.5 million people who had used SingHealth services over the last 3 years.
Oh boy, another data breach with 1.5 million records … **yawn**.
But Singapore has less than 6 million people, so it’s a BIG deal to this island I currently call home. Here’s what happened.
According to the official Ministry announcement administrators discovered ‘unusual’ activity on one of their databases on 4-Jul, investigations confirmed the data breach a week later, and public announcement was made 10 days after confirmation.
4-Jul : IHiS’ database administrators detected unusual activity on one of SingHealth’s IT databases
10-Jul : Investigations confirmed the data breach, and all relevant authorities were informed
12-Jul : A Police Report is made
20-Jul : A public announcement is made
The official report states that “data was exfiltrated from 27 June 2018 to 4 July 2018…no further illegal exfiltration has been detected”.
The point of entry was ascertained to be “that the cyber attackers accessed the SingHealth IT system through an initial breach on a particular front-end workstation. They subsequently managed to obtain privileged account credentials to gain privileged access to the database”
And finally that “SingHealth will be progressively contacting all patients…to notify them if their data had been illegally exfiltrated. All the patients, whether or not their data were compromised, will receive an SMS notification over the next five days”
Let’s start this post the same way I start my day — by looking at Facebook.
Facebook made $40 Billion dollars in revenue in 2017, solely from advertising to pure schmucks like you. The mantra among the more technically literate is that facebook doesn’t have users it has products that it sells to advertisers, it just so happens that all its products are homo-sapien smart-phone totting urbanites (just like you!)
The platforms meteoric rise from nobody to top-dog, is a dream-story in Silicon Valley, but underneath the veneer of wholesome innovation lies a darker secret, one that could be responsible for the polarization of entire communities, including our own. And it’s all because of their most valuable employee.
No, not Mark Zuckerberg, but the real genius behind the blue and white site. The one responsible for billions of ad revenue facebook generates yearly, and unsurprisingly she’s female.
Anna Lytica and Machine Learning
There’s probably thousands of post your facebook friends make everyday, but she decides which 3 to fit onto your smartphone screen first, and the next 3 and so forth. From the millions of videos shared every hour, she painstakingly picks the few you’d see in your timeline, she decides which ads to show you, and which advertisers to sell you too, underneath the hood in the giant ad behemoth, she lies working all day, everyday.
She isn’t a person, ‘she’ is an algorithm, a complex program that does billions of calculations a second, and for this post we’ll give her the name… Anna Lytica.
Facebook doesn’t talk about her much, she is after all a trade secret (sort of), but what she does and how she does it, might be as much a mystery to us, as it is to Mr. Zuckerberg. Machine Learning algorithms are complex things, we know how to build them, and train them, but how they actually work is sometimes beyond our understanding.
Google can train Alpha-Go to play a game, but how it makes decisions is unknown to Google and even itself — it just IS a Go player.And it is really sad, when we watch these AI algorithms make amazing discoveries, but are unable to explain their rationale to us mere humans. It’s the reason why Watson, IBMs big AI algorithm, hasn’t taken off in healthcare, there’s no point recommending a treatment for cancer, if the algorithm can’t explain why it chose the treatment in the first place.
This is hard to grasp, but AI isn’t just a ‘very powerful’ program, AI is something else entirely. We don’t even use traditional words like write or build to refer to the process of creating them (like we do regular programs), instead we use the word train.
We train an algorithm to play Go, to drive, or to treat cancer. We do this the same way we breed dogs, we pick specimens with the traits we want, and breed them till we end up with a something that matches our desires. How a dog works, and what a dog thinks is irrelevant. If we want them big, we simply breed the biggest specimens, the process is focused entirely on outcome.
Similarly, how the algorithm behaves is driven by what it was trained to do. How it works is irrelevant, all that matters is outcome. Can it play Go, can it drive, can it answer jeopardy? If you want to understand an algorithm you need to know what it was trained to do.
Anna Lytica, was trained to keep you browsing Facebook, after all the companies other endeavors like internet.org, and instant articles were built with the same intention. And while good ol’ Mark stated that he’s tweaking Anna to reduce the time people spend on Facebook, this is something new, an exception to the years Facebook tweaked her to keep you on their site.
After all the average monthly user spends 27 minutes per day in the app, and if you go by daily users, they spend about 41 minutes per day on Facebook. If that’s the end-result of tweaking Anna to ensure we spend less time on Facebook — God help us all!
And while it’s difficult to understand how Anna works, its very easy to guess how she’ll behave. If the end result of Anna’s training is to keep you browsing Facebook, then human psychology reveals a simple trait all humans share — confirmation bias.
Channel News Asia posted last week that hackers could steal your info by just knowing your phone number.
Woah!! Must be some uber NSA stuff right–but no, it was a couple of guys with Metasploit and they required a LOT more than ‘just’ the phone number.
The post was an add-on to a current affairs show called Talking Point, that aired an episode last week about cybersecurity, which (like most mainstream media reporting) had more than a few errors I’d like to address.
Problem 1: Cost of cybercrime — but no context
The show starts off, by highlighting that Cybercrime cost Singaporeans S$1.25bln, which might be true, but lacks context, or rather had the context removed.
Because the very report that estimated the cost, also mentioned that society was willing to tolerate malicious activity that cost less than 2% of GDP, like Narcotics (0.9%) and even pilferage (1.5%). S$1.25bln is less than 0.3% of Singapore’s GDP, and is long way off the 2% threshold. Giving out big numbers without context gives readers the wrong impression.
So allow me to provide context on just how big that S$1.25bln is.
In 2010, Singapore’s retail sector lost S$222 mln to shrinkage, a term used to describe the losses attributed to employee theft, shoplifting, administrative error, and others. Had we split the cost of cybercrime across different industry based on their percentage of overall GDP, the total losses for cybercrime on the retail sector in 2015 would be $225 mln–almost identical to what the sector lost to shrinkage….7 years ago!
Cybercrime is a problem, but not one that is wildly out of proportion to the other issues society is facing.
As with all new shiny equipment, a newly installed router in your home requires a few things to be configured to properly secure it.
Goes without saying, that you should change your WiFi password the moment the technician leaves your home, but there are other things you’d need to configure in order to secure your router against common attacks.
Now remember, even if you follow all the advice on this post, there’s a strong chance that you’d still be hacked somewhere down the road–especially if you’re relying on a crappy consumer grade router, but taking these precautions raises your security level above the general population, giving you an edge over everybody else, and sometimes the difference between being hacked and staying safe could be one simple configuration on a router.
For this post, I’m going to use the standard Dlink 868L router that StarHub gave me when I signed-up for their 1Gbps package. While the post is specific, the general principles still apply to any router you own.
Step 1: Logon to the router
Goes without saying, all changes have to be made on the router itself. The good news is that all general purpose routers like the Dlink-868L come with a web interface, i.e. the router host a website on your network that you can use to change settings.
Fire up a browser like Chrome or Firefox (God forbid you’re on Internet Explorer), and point the address bar to http://192.168.0.1 and you ‘should’ come to the router homepage (image below). If not, try the other possible addresses, like http://192.168.1.1 or http://10.1.1.1, if none of those work, you’ll need to go to your ipconfig on your local windows client to determine the ‘gateway’ ip address of your router.
Once there, you’ll see the following screen. For most StarHub customers, just logon with the admin user and leave the password field blank–as in don’t enter anything for the password.
While it’s still not explicitly clear that the StarHub DDOS was executed by Mirai, a recently released malware built specifically for DDOS, the timing and similarity of it to other Mirai attacks leave little room for doubt–at least to me.
If indeed, StarHub was a victim of a Mirai based attack, it would seem extremely odd that their CTO would reference phishing emails as a vector for infection. So a few things don’t quite line up here, including the advice from the CTO to change the default username and password, when Brian Krebs already reported that doesn’t quite help:
Several readers have pointed out that while advising IoT users to change the password via the device’s Web interface is a nice security precaution, it may or may not address the fundamental threat. That’s because Mirai spreads via communications services called “telnet” and “SSH,” which are command-line, text-based interfaces that are typically accessed via a command prompt (e.g., in Microsoft Windows, a user could click Start, and in the search box type “cmd.exe” to launch a command prompt, and then type “telnet” <IP address> to reach a username and password prompt at the target host).
The trouble is, even if one changes the password on the device’s Web interface, the same default credentials may still allow remote users to log in to the device using telnet and/or SSH.
If you’re more technically inclined, I strongly suggest listening the feature interview on last week’s risky business podcast.
But the last piece of advice that the StarHub CTO gave, that didn’t make sense to me at all was this:
“If you were to buy a webcam from Sim Lim Square, try to get a reputable one”
Again, this may seem like good advice, but it doesn’t conform to the evidence. Brian Krebs has a list of devices that are hack-able, and they include the likes of Panasonic, RealTek, Samsung and Xerox. All of which regular consumers would consider ‘reputable’.
So StarHub claimed that you should change your passwords–but doesn’t protect you from Mirai.
StarHub claim that you should buy equipment from ‘reputable’ suppliers, but even reputable suppliers produce hackable IOT devices, that can’t be secured.
Finally StarHub are going to be sending technicians out in the field to help subscribers, and while this is laudable, it’s not a sustainable solution. It only fixes a short-term problem, because as long consumers continue to buy hack-able IOT devices, the threat isn’t going to go away.
And how often can StarHub afford to send technicians to make home visits before the cost start becoming un-bearable?
The way to view this issue is from a legal, economical and technical perspective–and in that order.
When I was in school, we joked about people who kept their money under the mattress, that somehow those who didn’t use banks were less intelligent than people who did.The general thinking was that smart people kept their money in the bank, where it was safe from theft, fire and flood, while still collecting interest.
In the 80’s this was a compelling argument, when interest rates were high and banks really did provide security,but is that thinking still applicable today?
In June of 2000, Maybank launched their ‘new’ internet banking platform, Maybank2u, which allowed their customers to do their banking online, outside of traditional branches or even ATMs. Few years later, it begun offering online purchases and soon after the mobile app was launched.
But while online banking platforms brought convenience, they also introduced new security threats — and it wasn’t clear whose job it was to secure against those new threats, and who would be liable for inevitable financial losses.
Was it going to be bank who assumed liability, just like they did before, or would it be the account holder, or possibly a mixture of both?
The answer depends on who gets attacked, because not all attacks are equal.
Not all attacks are equal
There’s two types of attack, one where the bank itself is attacked, and another where the account holder is targeted instead.
When someone walks into a bank with the threat of violence, and walks out with $30,000 of the banks cash, the bank absorbs all the loses. After all, that’s why your money is in their safe and not under the mattresses.
But there exist another class of attack–customer impersonation, where the attacker isn’t threatening violence or even ‘attacking’, but trying to fool the bank into believing they are the rightful account holders. In other words, the attacker is trying to impersonate you, to get to your money.
And in the digital world, customer impersonation is far more common. Consider the case of ATM fraud.
ATMs identify a user by verifying their ATM cards, and then prompting them for the PIN. More specifically, the ATM first authenticates the inserted ATM card (is this card real?) and then proceeds to ask the user for the PIN (is the person the accountholder?), once an ATM is satisfied, it then proceeds to grant the user access to the account.
Hence if an attacker managed to steal your card and knows your PIN, the ATM has no way to differentiate between you and the attacker. Anyone could take your money from your account, by just having your ATM card and PIN, in contrast robbers attacking a bank would simply be taking the bank’s cash…not yours.
Credit Card fraud is another prime example, but at least in Malaysia end customers have their liability capped at RM250 provided they report their lost cards in a ‘reasonable’ amount of time. For debit cards and ATM cards are not protected in the same way. Which is strange because the poorer sections of society who need more protection usually have debit instead of credit cards.
To summarize, customer impersonation isn’t the same as a bank robbery, when the bank issues you credentials (like PINs, passwords or ATM cards), the responsibility to secure those credentials are yours–and if those credentials are compromised, then you’ll have to shoulder some of the financial losses as well.
This week, I’ll put the final touches on my move from Malaysia to Singapore.
So, I felt it would a good idea to read through some Singaporean tech articles to see how tech events played out on the little red dot, and offer some unsolicited and completely useless advice on them.
It wasn’t easy shifting through a boat-load of gadget reviews masquerading as tech journalism (I guess some things are the same in every country), but underneath the hundreds of phone reviews and fiber broadband comparison, I found a little interesting report on illegal downloads.
The Singapore Straits time reports that:
A local law firm that started proceedings to go after illegal downloaders in Singapore on behalf of two Hollywood studios said it will cooperate with the local authorities to ensure no abuse of process.
It follows a rare intervention by the Attorney-General’s Chambers (AGC) in civil applications made by Samuel Seow Law Corp (SSLC) in the High Court last month.
“We will work with the local authorities to ensure that there will be no unnecessary alarm to consumers who receive the letters of demand we plan to send out,” Mr Samuel Seow, managing director of SSLC, told The Straits Times yesterday.
This is just a re-hashed version of what happened last year in Singapore, when the same law firm went after downloaders of another movie, the difference is that this time they’ll be doing it under the watchful eyes of the AGC.
There is something to be said here about copyright-trolling, the abuse of power and the bullying tactics usually involved. But, we’ll leave that discussion for another day.
Today, I want to explore a little bit about anonymity and how many people have a mistaken notion about what it is.