Previously, I moaned about dermaorgan.gov.my, a site that was probably hacked but was still running without basic TLS. This is unacceptable, that in 2018, we have government run websites, that ask for personal information, running without TLS.
So I decided to check just how many .gov.my sites actually implemented TLS, and how many would start being labled ‘not secure’ by Google in July. That’s right, Google will start naming and shaming sites without TLS, so I wanted to give .gov.my sites the heads up!
Why check for TLS?
TLS doesn’t guarantee a site is secure (nothing does!), but a site without TLS signals lack of care from the administrator. The absence of TLS is an indicator of just how lightly the security of the servers has been taken.
Simply put, TLS is necessary for not sufficient for security — and since it’s the easiest thing to detect for, without running intrusive network scans, it seems like the best place to start.
How I checked for TLS?
But first I needed a list of .gov.my sites.
To do that, I wrote a web-crawler that started with a few .gov.my links, and stored the results. It then repeated the process for the links, the links of the links…and so forth. After 3 iterations, I ended with 20,000 links from 3,000+ individual hostnames (a word I wrongly use in place of FQDN, but since the code uses hostnames, I’m sticking to it for now — please forgive me networking nerds)
I then manually filtered the hostnames to those from a .gov.my or .mil.my domain and scanned them for a few things:
- Does it have a https website ( if it doesn’t redirect)
- Does it redirect users from http to https
- Does the https site have a valid certificate
- Does it match the hostname
- Does it have a fresh certificate (not expired)
- Can the certificate be validated — this required all intermediary certs to be present
- What is the IP of the site
- What is the asn of the IP
- What is the server & X-Powered-By headers returned by the host
Obviously, as I was coding this, my mind got distracted and I actually collected quite a bit more data, but those fields are in the csv for you the Excel the shit out off! The repository contains both a json and jsonl file that has more data.