Data is the natural by-product of every computer mediated interaction. It stays around forever, unless it’s disposed of. It is valuable when reused, but it must be done carefully. Otherwise, its after-effects are toxic. - Bruce Scheneier
As society moves towards a ‘knowledge’ based society, data naturally becomes a by product. Every digital action you make produces just a tiny bit of data, but the fact remains we perform a LOT of these transactions per day, adding to the growing amount of data about us floating around.
What we use to buy we cash, we now buy with credit cards — with every swipe, digital data is created and stored, it records the amount of the transaction, where the transaction took place, and the banks bill the customer, which means it can tie it to an address a person, their age, their income and even their preferences.
Photos we only use to share in person, we now share digitally on social networks–all those photos are stored–permanently, and they’re tagged with meta data regarding the photos location and the names of people in the photo. A lot more data, and a lot more public.
When we use to pay toll booths in cash, we now use touch N’ Go, so there is a full blown record of where we travelled and at what time. Coupled with the CCTV footage they can even identify which vehicle you used. Tie that with your credit card and we can determine where you fueled before you got on the highway, coupled with CCTV footage from the Fuel station we know how many people were in the vehicle.
Add to the new things we have, like our Resume profiles on LinkedIN, our current locations on foursquare, our political affiliations by the comments we leave on blogs or Malaysiakini–the list goes on. What used to be something you’d only reserve for your close friends at the kopitiam now is public knowledge, provided some one takes the trouble to Google your name.
And the list literally goes on and on, and all these add the amount of our personal data stored digitally online–data that can be used to determine who you are, where you are, what you like, what your political beliefs and religious inclinations–even your medical history and sexual orientation. I’m not kidding, there’s a story I love to link to which tells of a supermarket who knew a teenager was pregnant before her father did.
It has become imperative that we as users look towards protecting our data online, but there also is an imperative for governments to regulate the way our data can be used–even by governments themselves.
The technological limits
Corporations (like supermarket chains) have a responsibilities to their shareholders to maximize profit, that’s just a fact of life, corporations try to do more business every year, and one way to do more business with customers is to know more about their customers and offer targeted promotions.
The first step about knowing more about customers, is getting more data about customers. In the past, this was all a sort of technological fairy tale, storing that much information about customers was problematic for two reasons. Firstly, the cost of storing the data was pretty high and secondly, the computation power required to process the data was too expensive to justify. The technology at the time limited the amount of data corporations could gather, store and process–they just couldn’t do it at a economical price, and hence they only targeted the rich customers. If you’ve ever been on Business Class on an airline before, you’d notice the stewardess actually knows your full name, it made sense to track the rich.The rich are usually public figures anyway, so there wasn’t an issue–however, as the technological limit starts to disappear–the corporations can begin to track everyone.
For one thing, Storage cost have quickly reduced to nearly zero, Amazon stores 1GB of data for less than $0.09/mo and Nimbus.IO is looking to get that down to a $0.06/mo level. If corporations have an entire GB of data regarding you, all they’d have to do is to make sure they could increase profits derived from you by $0.06 per month in order for the storage of data to be financial feasible–quite frankly 6 cents isn’t that hard to do. Particularly if I have nearly a Million data points about your preferences, locations or family members–trying to squeeze out 6 cents of profit should be easy once that amount of data is readily available.
In terms of computational power, with cloud providers like Amazon and tools like Big Query, the computational power for processing of this enormous amount of data has become cheaper and more readily available to corporations. Processing 1GB of data on Big Query cost just 3.5 cents. That’s an absolute pittance if I can get you to upgrade your mobile phone, or your car, refinance your house–or even to buy just one extra product on your weekly groceries. What’s 3.5 cents anyway?
As it becomes more technologically practical to store and process data, corporations also begin collecting more data, and the vicious cycle continues–corporations collect and process more data which allows them to collect and process even more data. To a point where they know more about us, than our spouses.
Corporations aren’t going to stop collecting the data because they’ve suddenly developed a conscience and a respect for public privacy, they have a responsibility to actually make more money, Corporations always look to local legal regulations to determine how far they can go. Unless there is a legal framework in place to limit the way corporations collect data–specifically personal data–corporations will just push the technological limit of how much data they can collect, and as the technological limits disappears–the limits of the data they can collect and process disappear with it.
The only way to guarantee the private companies play by the rules, is to have the rules in the first place. Without a legal framework it’s a free for all jungle out there, where corporations have our most intimate details and fully comply with the law–because there is no law.
Personal Data Protection Act
So in comes the Personal Data Protection Act 2010, which aims to regulate “the processing of personal data in commercial transactions and to provide for matters connected therewith and incidental thereto”.
The act only refers to Personal Data, which is defined as:
“personal data” means any information in respect of commercial
(a) is being processed wholly or partly by means of equipment operating automatically in response to instructions given for that purpose;
(b) is recorded with the intention that it should wholly or partly be processed by means of such equipment; or
(c) is recorded as part of a relevant filing system or with the intention that it should form part of a relevant filing system, that relates directly or indirectly to a data subject, who is identified or identifiable from that information or from that and other information in the possession of a data user, including any sensitive personal data and expression of opinion about the data subject; but does not include any information that is processed for the purpose of a credit reporting business carried on by a credit reporting agency under the Credit Reporting Agencies Act 2010;
I’ve always been skeptical of complicated legal definitions and this is definitely one of them. While the law is broad in the sense that it refers to data that can identify a data subject in person, it doesn’t really define it in more concrete terms, including things like email addresses or telephone numbers appear to be a grey area in this case. To me a Facebook account name, or a LinkedIn ID, or even an obfuscated email are all personal data whether they can uniquely identify you or not.
I’m no lawyer here, and of course these act are written in such archaic language, most normal people won’t understand, but this is my best effort to decipher the law:
Also, I’m not particularly happy the law only applies to Data processed in Malaysia, with the era of Cloud technology it’ll be too easy to migrate the data to the Amazon cluster in Singapore and be scot-free to do what I please with it. That’s why the EU data privacy laws prevent personal data from being exported out of the EU, that way the personal data would always fall under the ambit of the law without any loopholes to bypass enforcement agencies.
However, on a high level there are 7 principles of the law that briefly are:
Principle 1: The General Principle
The act doesn’t allow anyone to process your personal data unless you give your consent. Now there are some exemptions of course, which include the administration of justice.
It also, goes on to state that personal data cannot be processed unless 3 criteria are met, that the processing of the data is legal, the processing of the data is necessary to the purpose that data subject consented to and– my personal favorite– the personal data is adequate but not excessive in relation to that purpose.
In my mind this would mean that ‘excess’ personal data can be processed–therefore can’t even be collected–if it is excessive in relation to the purpose. So if I were to sign up for Groupon or a new Email service, asking my race would technically be illegal since it is excessive to the purpose. Does Groupon really need to know my race for any reason? Personally I wasn’t too happy that I was asked for my race when I signed up for Nuffnang.
Principle 2: Notice and Choice Principle
The act also stipulates that the data subject be given notice in written form whenever their personal data is being processed, which include whenever the data is shared with a 3rd party.
Personally I’d like to know how banks obtained my phone number and name to offer me things like Insurance policies, cash advances and even housing loan refinancing. If you’re a bank officer hoping to call a guy named Keith Rozario, let me offer you a quick tip–unless you plan to give ME money, I’m not interested.
The principle also goes on to stipulate in very fine detail how the written notice is to be executed, up unto the language and content of the notice.
Principle 3: Disclosure Principle
The act prevents anyone from disclosing your personal data other than for the purpose it was collected for in the first place.
Also the Disclosure principle goes on to state that the Personal Data can’t be disclosed to 3rd parties unless a notice is written to the data subject (Principle 2).
Principle 4: Security Principle
The act also requires that the data user take reasonable steps to secure the personal data collected from loss or misuse. Unfortunately it is ridiculously ambiguous in the actual stipulations. It only goes so far as saying “ data processer provides sufficient guarantee in respect of the technical and organisational security measures and ensure that reasonable steps are taken to ensure compliance with the security measures”.
The act itself doesn’t even mention encryption or user access, merely organisational security measures–that falls far short of what I expected. I wasn’t really expecting PCI DSS here, but the security principle should be at the crux of the act, anyone who collects and processes personal data has a definite responsibility to secure the data–it merely leaves the definition at “reasonable steps”, and one can imagine how many man hours would be wasted explaining IT security concepts to Judges.
Principle 5: Retention Principle
The act requires that personal data shall not be retained on any system for longer than necessary for the purpose of the data. It also mandates that the data user take ‘reasonable steps’ to destroy or delete the data once it is to be disposed.
Now this raises some interesting questions, if mobile service operators or internet service providers requires my Identity card to prove my identity, to they then need to store my identity card number once it’s intended purpose has been fulfilled.
Principle 6: Data Integrity Principle
Under the act, the data user (not the data subject) shall be held responsible for the data to be accurate, complete, not misleading and kept up to date.
Of course, you hear stories of people being declared bankrupt and having a difficult time trying to be declared non-bankrupt (is there such a word?). The act places the onus on the data user to ensure that the data is accurate and complete, not misleading and up to date. So an agency like CTOS or Insurance companies need to ensure all data stored are accurate, and if the agency does not respond in a reasonable time they could in theory be charged under the law.
Principle 7: Access Principle
The act allows for any personal data to be readily accessible to the data subject in question. The data subject must also be allowed to correct the data in case of any inaccuracy.
I’d really like to email Maxis and Unifi to get all the information they have on me, there’s a TedTalk about just how much information your phone company has on you–and it is scary. A total of 40,000 lines of code about just 1 person for just 6 months worth of data, now you understand why 1GB of data may not be enough.
Unfortunately, although the act has been passed and gazetted, it is yet to be enforced. The reason is because a commissioner has not been appointed to enforce the law which protects personal data from being misused or sold for commercial purposes. So while the act is a step in the right direction, it still hasn’t been enforced, meaning we’ve planed to take a step in the right direction but haven’t really begun walking yet.
Also the act falls short on many fronts, including not applying to the government or government agencies, it also fails to specify the security requirements, and doesn’t mandate that personal data stay within the confines of Malaysia–therefore within the confines of the law.
It’ll also be interesting to see whether the act mandates the amount of personal data that organizations can collect. Of course the act is far longer than just the principles, and you can check out the full text of the act here.
Additional Reading Material
3. A collection of articles from an MSC workshop on the PDP from the MSC website.
Picture courtesy of : http://www.flickr.com/photos/ian-s/2152798588/sizes/m/in/photostream/