Visualising Facebook

Your correspondent was shocked to learn that 34% of his Facebook friends are married. Still in his 20s, he does not want to contemplate settling down quite yet. Knowing that 64% of his online friends are male does not help either—more so because only 57% of Facebook is comprised of women. When he lamented these facts (on Facebook of course) he was asked the obvious question: “Did you go through your friends list and count?”

Well, no. The number-crunching comes courtesy of Wolfram|Alpha, a sort of search engine for quantifiable facts. Begun in 2009 by Stephen Wolfram, a British scientist and entrepreneur, the online service serves up answers to queries by harnessing information from its own databases. It can compute things like the distance between the Earth and the Moon on your parents’ first Valentine dinner, for example. Its latest feature lets people analyse their Facebook account for free. Enumerating and plotting the vagaries of one’s online life is at times surprising. Your correspondent wouldn’t have thought he was many times more active in 2011 than this year, in terms of status updates, sharing links, photos, etc (chart below).

Since the service began a few weeks ago, more than 400,000 Facebook users have let Wolfram|Alpha examine their digital bits—an outpouring of interest that caught the firm by surprise, says Luc Barthelet, Wolfram|Alpha’s executive director. The company plans to expand into other “personal analytics” services. Mr Barthelet declined to be more specific, but it could well entail analysing users’ email patterns and other social media behaviour.

In February Wolfram|Alpha rolled out a Pro service. At $4.99 per month it gives people the ability to process their own data, or even download Wolfram|Alpha’s information on a query. Such information is potentially very useful as it comes from the service’s own curated databases. Thus, armed with data on homicides in African countries, for example, Wolfram|Alpha can generate various types of graphs (scatter plot, raw data plots, bivariate histograms) to help users understand their information better. It can create a heat map to visualise the data geographically. And it lets users overlay other data, such as GDP of the country, to make, in this case, a GDP-neutralised heat map.

The Wolfram|Alpha “answer engine” is based on Mathematica, a software program developed by Mr Wolfram that can perform elaborate calculations. After the site’s launch in 2009 it was criticised for being limited in what it could do: solve mathematical problems, answer some scientific questions, but nothing out of the ordinary. Since then it has expanded considerably. As it moves beyond computing the world into analysing the individual, it is providing fresh new ways to look at life.

Also published on economist.com.

Free image from here.

Online software piracy: Head in the clouds

As more people use “cloud computing” services like webmail and do word-processing via a browser, software makers fret that today’s software piracy will migrate to the cloud too. The Business Software Alliance (BSA), a trade group, this month released a survey that emphasises that 30% of users in rich countries and 45% in poor ones have a “likelihood of sharing log-in credentials for paid services.” It is “a worrisome new avenue for software licence abuse,” says the BSA’s boss, Robert Holleyman.

Yet the closer one looks at the BSA’s study, the murkier such conclusions become.

Take the dramatic figures above. It is not quite so bad. The percentages come from a question in which people were asked if they had ever shared their log-in details for paid services. Some 15% of people in rich countries and 34% in poor countries said they had for personal use. For business use, it was 30% and 45% respectively. The larger figures amplify the BSA’s point, but they are not necessarily the most accurate.

Moreover the respondents were only those who had paid for cloud services, which was a fraction of users. Cloud services are generally based on a “freemium” model, whereby basic use costs nothing and a premium version is paid for. According to the BSA’s own data, only half of computer users tap cloud services, of which only one-third use it for business, of which two-thirds pay. Of the small subset that remain, the minority share log-ins.

This changes things considerably. If the BSA figures were adjusted for all this, the potential piracy figures could be as low as between 2% and 6% of users—as much as 20 times less than the group claims. (The BSA’s data is online here.)

Worse, the BSA and Ipsos Public Affairs, who conducted the survey, didn’t think to ask or examine whether sharing log-in details violated the terms of service. It may very well be the contrary: that the service had communal uses as a feature. Mr Holleyman bends over backwards to acknowledge as much on a blog post. Yet the overall impression that the BSA gives is that cloud users are poised to rob firms of their rightful revenue.

There are other anomalies. The BSA only considered PC use, when many people use cloud services over tablets and mobile phones, especially in poor places. And the survey, of 14,702 people in 33 countries, presumes to speak with confidence about the “developing” world but not a single African country is represented—an odd omission, since it is a fast growing market.

The annual BSA piracy study released this year in May estimated losses to the PC software industry in 2011 of $63 billion. That princely sum would make software piracy the 66th largest economy in the world, worth more than Syria and Croatia. The BSA reaches that amount by multiplying the estimated number of computers containing pirated software with the retail price of the software.

It is a specious way of calculating piracy (as we explored in an article in 2005 entitled “BSA or just BS?”). Many people would not buy the product at the expensive retail price. That’s why they steal it, after all. Still, the BSA’s dubious figures influence public policy. Mr Holleyman was invited to testify at a congressional hearing on July 25th on cloud computing, where his prepared remarks specifically cited “credential sharing” as a piracy challenge.

Be it in the cloud or back down on Earth, software piracy is theft and is wrong. The crime should be prosecuted and technically prevented as much as possible. But the way we think about the extent of the problem must be grounded in reality. Anything less is wrong too.

Written with Kenn Cukier. Also published on economist.com.

Image from here.

Sense about Statistics

Sense about Science guide to statistics

Quantification in an argument is seen as a big plus. If someone quotes some statistics, their argument suddenly seems so much more convincing. Yet, statistics is a funny thing. It can be hyped and sensationalised. The same data can be analysed differently to reach conclusions that support your argument. Most people do it mistakenly but there are many cunning minds out there who do it deliberately. People need to protect themselves from this misrepresentation of numbers.

Surprisingly, it is not hard to do. The latest Sense about Science initiative on making sense of statistics gives you quick run through on how can we question these numbers and abstract the true meaning out of them. It is a good example of how being mildly skeptical is generally a good thing.

Some interesting examples that I will quote from the guide are:

  1. Literary Digest carried out a survey before the 1936 US Presidential Election. It mailed out millions of ballot papers and got two million back; a huge sample, most of which backed the Republican candidate Alf Landon.But the addresses to which they had been sent came from a directory of car owners and from the telephone directory: a biased sample, since in 1936 only the better-off owned cars or had telephones. Franklin D Roosevelt, the Democrat, won the election in a landslide.
  2. When it was claimed that in the ten most deprived areas in the UK 54% of teenage girls were likely to fall pregnant before the age of 18, it didn’t take long for people to realise this could not be true – it would mean over half of teenage girls from these areas being pregnant. The real figure was 5.4%.
  3. If you had a room with ten teachers all earning between £20,000 – £30,000, with a If you had a room with ten teachers all earning between £20,000 – £30,000, with a mean salary of £24,900 and a median (mid-point) salary of £25,000 and then someone who earns a million pounds walked into the room, the mean would increase to £114,000 but the median would hardly change. By using the median or mode (most common value) this distortion can be reduced, providing a more representative average salary.

I hope people make use of this guide to help themselves and the society better understand numbers which form a such a significant part of our lives today.