Predicting behaviour from user data

Since people follow rather stable routines, it is possible to predict their behaviour (within a range of certainty) from analysing their activities in the past. One important research in this direction was carried out in the context project at the University of Helsinki from 2002-2005, with a focus on what places people go and where they meet.

Today, tremendous amounts of behavioural data is generated through web log statistics, tracking cookies and beacons, and mobile phone positions (cell towers and GPS). New mechanisms evolve that make this data also usable, even in real time (e.g. Google’s Map Reduce algorithm). This is the result of a Structure Big Data conference that promises an “inevitable, even irresitible surveillance society” (Jeff Jonas, an IBM engineer quoted in a Computerwold article)

While the ability to “look into people’s minds” scares privacy experts, it also promises to deliver perfect filters for users who feel lost in the tremendous stream of news and information. And it offers them a personalized experience of services.

Another point of concern:

The higher the amount and variety of data collected, the more unique the data sets are that a single person produces. One example is the website visitor identification through the browser footprint. It might look pretty generic on first view, but since it includes the fonts installed, version numbers of plugins, etc., very few people actually have the same browser footprint.
While the data itself is usually collected in a “non-identfying, anonymized form”, the combined data sets render anonymity an illusion.

[update 02/2012:]

The New York Times had an extensive report on how large supermarkets extensively collect data on their customers. Despite the data pieces being rather trivial (who buys what when), they can conclude from the large numbers and the pretty unchanging behaviour of each customer the personal needs of each customer very precisely.

They even feature a story about targeting a pregnant teenager with baby products where even the teenager’s father didn’t know (yet) that his daughter was pregnant. While this is probably a rare case, it shows that the large numbers and decent data mining can not only report but even predict personal needs and wishes.