Data guys versus Lawyers and Politicians
Nate Silver is currently the internet darling of the big data folks, not only did he accurately predict the correct outcome in all 49 states for the US presidential election, he correctly pointed out that Florida would be a toss up before eventually leaning towards Obama. That’s like predicting a coin-toss would end on it’s side. While all of that may seem remarkable, this isn’t a story of a boy-genius but rather the dawn of a new age–an age driven by data.
Nate isn’t alone on this, The Slate reports that 2 different pundits got the entire analysis spot on as well, not to mention a 3 man team in North Carolina armed only with robo-callers who also made a spot-on prediction. This isn’t some savant ability that Nate has, this is just pure hard core science at work, and the people that use the science are the ones making the accurate predictions, while the people that ignore it–are left behind.
And just who got left behind? The usual opinion writers, like Ann Coulter who predicted Romney would win by a 273-265 margin, Newt Gingrinch who predicted an ever igger margin for Romney and of course Jim Cramer who predicted such a insane number that it probably isn’t even worth typing here–but I’ll type it anyway. Good ol’ Jim predicted Obama would win by a whoooping 440-98 margin, off by more than a 100 point margin…but at least he got the winner right, and I’m sure predicting the stock market isn’t anything like predicting an election, and 100 points means nothing in the stock market.
Answering the tough questions: Watson vs. Humans
IBM have always been on the cutting edge of innovation, they’ve moved more becoming merely a computer company to what is probably the first truly all encompassing technology company, they don’t just make fancy gadgets or shiny tinga-ma-jigs, they make actual solutions for real-world problems.
In 1996, IBM introduced the world to Deep Blue. Kasparov met Deep Blue and wasn’t impressed, he had no reason to be, he defeated Deep Blue 4-2, and walked away comfortably.
However, in 1997, IBM re-introduced the world to the 2nd version of Deep Blue (unofficially named Deeper Blue), and this time Kasparov was beaten –but not by much. Kasparov is the Tiger Woods, Pele and Michael Jordan of the Chess world, and he was beaten by a super computer with 11.38 GFLOPs of power.
In turns out though, we had nothing to be afraid off, Chess is after all a pretty simple game when you break it down, the number of possible moves are finite, together with the number of possible scenarios to play out. It’s not an easy game to master, but as it turns out playing chess is infinitely easier than just plain talking.
In fact, of all the talking games, Jeopardy seems the most difficult. At the end of this post, I will make an argument to show that Jeopardy — a simple talking game — is about 6,500 times more difficult than Chess (a game we often associate with genius). Turns out Kasparov has to bow to Ken Jennings. More…
Is MAS updating it’s own Wikipedia page?
Continuing my series on bigdata and Google bigquery, I’ve decided to share a rather interesting snippet of information regarding our very own Malaysian Airlines and their wikipedia page.
First, just to illustrate how important Wikipedia is in general, the Malaysian Airlines Wikipedia page gets roughly 30,000 hits per month. That’s just one page of Wikipedia getting more hits than my entire website, I can’t tell you how frustrated that makes me.
Having a negative sounding Wikipedia page is pretty bad for business, particularly if 30,000 potential customers view it every month. That’s a web page that needs some serious attention if you’re the marketing manager of Malaysian Airlines.
Unfortunately for MAS (and every business organization there is), Wikipedia has a policy about updating your own Wikipedia page–you’re not allowed to do it. Wikipedia has to keep to it’s original intention of being an online repository of information that is fair, balanced and neutral. Having marketing gurus or corporate big wigs updating their own Wikipedia entry isn’t exactly in the best intentions of anyone, however Wikipedia doesn’t strictly enforce the policy and leave it up to the crowd.
Fortunately, the crowd have responded, sites like WikiScanner allow users to see which IP addresses updated which Wikipedia articles. Some have gone to the extent of correlating those IP addresses to the owners and determining if companies are updating their own Wikipedia pages against the general guidelines. Let’s see if Malaysian Airlines can join that group of companies who’ve been slapped on the wrist for changing the Wikipedia pages of their organizations. More…
Wikipedia from a Malaysian perspective
Wikipedia is quite possibly mankinds greatest repository of information. It’s an amazing concept that all the data in Wikipedia was created, documented and moderated by a crowd of people for FREE, that just makes the whole thing even more interesting.
For the past two days, I’ve been writing about Bigquery and Big Data in general. Today I want to go through the results of my searches on BigQuery and really delve into the data to get some useful information.
For instance, I was curious who Aiman abmajid was. This guy is the undisputed King of Wikipedia in Malaysia and I was intrigued as to who he was and why was he updating so many Wikipedia entries (some more than 900 times per article).
A quick Google search, brought me to Wikipedia which led me to the following: More…
What is big data

In 2011, an estimated 1.2 TRILLION Gigabytes of data was created. That’s roughly 200GB for every man women and child in the world–In just one year. That’s every person in the world watching almost 300 feature length films every day, and this is the average.
The reason is simple, we now keep digital records of our transactions (e-banking and credit cards), our running patterns, our spending habits and even our wedding photos–and that’s just commercial end user applications.
What about corporations who track thousands of data points per second for their manufacturing plants and supermarkets tracking the purchases of customers. We’re creating and gobbling far more data than before, and the trend doesn’t look to be stopping. Every day, we create 2.5 quintillion bytes of data — so much that 90% of the data in the world today has been created in the last two years alone. More…
