Predicting behaviour from user data

Since people follow rather stable routines, it is possible to predict their behaviour (within a range of certainty) from analysing their activities in the past. One important research in this direction was carried out in the context project at the University of Helsinki from 2002-2005, with a focus on what places people go and where they meet.

Today, tremendous amounts of behavioural data is generated through web log statistics, tracking cookies and beacons, and mobile phone positions (cell towers and GPS). New mechanisms evolve that make this data also usable, even in real time (e.g. Google’s Map Reduce algorithm). This is the result of a Structure Big Data conference that promises an “inevitable, even irresitible surveillance society” (Jeff Jonas, an IBM engineer quoted in a Computerwold article)

While the ability to “look into people’s minds” scares privacy experts, it also promises to deliver perfect filters for users who feel lost in the tremendous stream of news and information. And it offers them a personalized experience of services.

Another point of concern:

The higher the amount and variety of data collected, the more unique the data sets are that a single person produces. One example is the website visitor identification through the browser footprint. It might look pretty generic on first view, but since it includes the fonts installed, version numbers of plugins, etc., very few people actually have the same browser footprint.
While the data itself is usually collected in a “non-identfying, anonymized form”, the combined data sets render anonymity an illusion.

[update 02/2012:]

The New York Times had an extensive report on how large supermarkets extensively collect data on their customers. Despite the data pieces being rather trivial (who buys what when), they can conclude from the large numbers and the pretty unchanging behaviour of each customer the personal needs of each customer very precisely.

They even feature a story about targeting a pregnant teenager with baby products where even the teenager’s father didn’t know (yet) that his daughter was pregnant. While this is probably a rare case, it shows that the large numbers and decent data mining can not only report but even predict personal needs and wishes.


Why Google might not so really love open source

Contrasting my earlier estimation of Google’s Android plans, Symbian’s Executive Director Lee Williams recently explained his sharp take on the Android (business) model on GigaOm. Obviously, he’s a competitor, but he also manages to shed an interesting light on potential Google plans:

The Android System is basically open, but to use it in any reasonable means (if you are not a true hacker), you need a Google Account for Mail, Maps, Market, etc. And this account isn’t just something but a unique identifier for Google to collect all of your information, your habits, and device usage in one basket. This enables them to send you highly profiled and personalized ads (which can be sold expensively, I guess).
While you personally could say, “I don’t mind”, it’s a problem for a lot of other service providers who are not able “to get through” to the customer because s/he is already tied to Google.

Additionally, the applications that enforce this strong Google Account/device connection are all proprietary, i.e. not open. Google is really serious about protecting the apps that as their series of “Cease and Desist” letters showed. And because they are so central for the Android OS, Lee Williams has a good point in claiming that Android itself is not really open. Neither concerning these central apps, nor for other service providers. Hopefully, his Symbian Foundation will keep this case in mind.

And again, it looks like a “the winner takes it all” attempt that’s one of the biggest factors of uneasiness in my mixed feelings towards Google.

thanks Fee for pointing me to this.


My Android phone: A Hero without (blue) Teeth

Exciting, exciting: the box half open

Exciting, exciting: the box half open

A couple of days ago I touched (ha!) my first Android phone. It’s more than just a test drive this time, I dropped my S60 Nokia for it. I “am” on a HTC Hero now. HTC is stepping out of its “just a OEM manufacturer” shadow once more with this phone (they was already building T-Mobile’s MDA and the O2 XDA).

Old and new, side by side

Old and new, side by side

How does it feel?

The unboxing gave me a solid first impression, from the packaging to the metal and rubber device casing. Also turning it on was of course a carefully designed pleasure but I also admit that I had all the information ready that you are asked for during your first steps (such as all your user accounts from social networks, Google login, etc.). For complete newbies that might be a bit overwhelming but I guess that this isn’t the target group anyway. Although it’s pretty large (esp. compared to my Nokia candybar) it’s also rather thin and therefore fits into pockets easily (the rubber makes it a little difficult to get it out of there again, however).

First doubts about everyday compatibility...

First doubts about everyday compatibility...

All you get in a box!

All you get in a box!

Two really good things

… in comparison to usual phones:

The Mail Widget (part of HTC’s own “Sense” UI) on one of the home screens provides you with your mail just a litteral fingerstroke away and even notifies you via the mailbox icon on the main screen. I had email on the Nokia, too, and it was really helpful to check for important messages in some difficult situations. But it was built like an annex to the regular SMS interface, took a long time to load and was just not so easy to use. Now, it is really an option e.g. to tidy up my inbox on a train ride home, including some smaller replies right away.

The second great thing, to little surprise, is the Android Market. The (Nokia) Symbian community is an active one, too, but you can’t access its fruits as easily (at the momet they are restarting anyways, with Symbian turned open source). And there are really surprising and playful apps, like the Metal Detector (by Kurt Radwanski) that makes unintended use of the built-in compass.

There are also a couple of nice aspects that are less impressive on their own but contributing to the overall experience, such as all the widgets that you can fill your many screens with, the Blackberry-like trackball, or a standardized mini USB connector for the power supply (still worth mentioning, unfortunately). A third point would be rooting the phone and discovering its Linux guts, but that’s more a fun “because you can” — oh wait. You also need it for tethering (i.e. phone as internet uplink for the laptop)!

There are downsides, too

(this section is relatively long because I was so surprised and disapointed that a phone of this class fails on what I would consider basic tasks):

Androids love to talk via wifi but they are almost silent on Bluetooth (you can attach Bluetooth headsets! wow!). Bluetooth, however, is an established method for exchanging data between small devices, like phone to PC and even more so phone to phone. In a recent study on young people and their phones done I did for my work, Bluetooth turned out to be the second important function of the phone (right after texting) because it is so easy to swap ringtones, pictures from the last party, vcards or anything. Any device has Bluetooth, anyone can use it. I had to install swiFTP (a plus for the Market but not for Android) to make my computer talk to my phone. I always made fun of the oh-so-avantgarde iPhone users who were still passing phone numbers via pen & paper. I would have never believed that a phone of today could make this misstake a second time.

The more I traveled for business reasons, the more I’ve learnt to appreciate my phone as a moving hotspot. 3G and Bluetooth drain down my battery like mad but my computer is online whereever I want (almost). The Android phone puts and end to this. No Bluetooth, no tethering. Now, most of the internet is on the Android phone already — true. But there are a couple of applications and stuff that I want to start from my computer (and note that you can’t attach files from your computer to emails on your phone without swiFTP or a cable). I read about Wireless Tether for Root that would still make it possible if I used some minor force to get root access. Which I did right away despite a couple of warnings that it also might brick the phone (thanks Jesterz and Dayzee). Having to digg so radically means that tethering wasn’t kind of forgotten but really made unavailable deep inside. WHY on Earth?

Then I have this nice Address Book on my Mac. Several hundered entries with birthdays and tags in the notes and so on. Android does everything for you as soon as you go to Google. But I don’t want to put all my addresses on Google (and I guess a couple of people in my Adress Book don’t want to be listed there, either). Google Contacts has no field for birthdays, too. So, how to sync? Android and iSync? No way (remember: Bluetooth doesn’t work). Android does sync via USB cable and HTC’s HTC Sync with Outlook (only), they say. I can barely remember such efforts and restrictions from my first Siemens phone 10 years ago. Can this be taken serious, additionally on the Mac and on-the-go?

  • Android and ActiveSync/Exchange? Granted, that’s built-in. But where would I find a trustworthy Exchange server (and for free because I think syncing my data with my devices should be nothing I pay for regularly).
  • I also tried vcardIO and Andook Lite (by Fezza) which would at least import address books from the SD card (i.e. no sync) but the applications failed before they completed their job (they are pretty beta and maybe my address book is too large). [update 2009-10-22] vcardIO had problems with the images included. Without it works very nice, except that birthday are stored as notes]
  • Android and SyncML? There is a Funambol client but it doesn’t seem to work with my o2 account. I never had to think about syncML with my Nokia, it just worked (everything was set up simply via configuration SMS!)


It’s still a great phone, the HTC Sense is a very welcome improvement over the regular Android interface and it’s all worth fighting with the downside issues. It’s completely inadequate for a phone built more or less with an open source attitude, however, to constrain the user so heavily in basic connectivity.

If there is someone out there with a non-paid, no cable, no Google solution for me, please let me know!

. .

Why Google loves Open Source

Marvin the android by kertong

Marvin the android by kertong

As (one of?) the first developer of an open source operating system for mobile phones, at least at a large scale, Google put a lot of effort into something that is available for free to anyone. Cnet was asking Andy Rubin, responsible for mobile platforms, to explain why. I found his answers so interesting that I want to wrap up some bits here:

Rubin/Google says they will profit from open access in the end (the more searching the more advertising exposure). “There’s a natural connection between open source and the advertising business model: Open source is basically a distribution strategy” with no barrier for adoption and thus maximizing outreach.

This is the definition of openness: it’s not just open source, it’s the freedom to get the information that you’re actually looking for.

This reads like from the Hacker Manifesto! It’s worth noting that Google by its sheer size can be a threat to this ideal…

They think they would loose more revenue by attempting to lock up their services just for their customers than by sharing an as open as possible internet with their competitiors:

We’re confident enough in our advertising business and our ability to help people find information that we don’t somehow demand they use Google. If somebody wants to use Android to build a Yahoo phone, great.

With Google not know as being overly philantropic, this makes a pretty strong argument against walled gardens, from a business point of view. It appears to be heavily based on Google’s dominant position in the (ad) market, however.

Android at Google's HQ by secretlondon123

Android at Google’s HQ by secretlondon123

Some nice side effects: Having a cross device operating system makes it easy for third party developers to get their services onto various devices–which will make Android more attractive, again. And it’s a great thing for software companies to provide a more consistent user experience (so designers should like it).

Good to know: In Asia, stylus input is often prefered over fingers because writing Asian letters is easier and more accurate this way.

thanks to fee for twittering this.


Jaiku is dead – hail to the new Jaiku?

This news is already a couple of months old, but it reached me now and struck me: Jaiku got abandoned by Google.

atmasphere is shedding a tear

atmasphere is shedding a tear

I have to admit that I didn’t use Jaiku all that much, basically because of a lacking base of “followers” or–even more important–people to follow. Back then, I was “following” a guy I got to know at ars electronica, and even though we were pretty far away and didn’t exchange that much on other channels, I had the impression of knowing a little bit of his life, some of his feelings, his overall mood. All created by those tiny, subjective, and instant status messages (he was also posting pretty frequently, which is a precondition but also comes by itself once everyone is addicted…). I didn’t get this experience out of any other channel. And it became my standard argument why “those private and boring details of someone’s daily life” are actually pretty valuable.

When I logged in today (6 months after my last message…), I wanted to add someone’s twitter feed. Adding other channels to your stream was actually one of the big pluses of Jaiku over Twitter (Robert Gaal has 3 more)! But all the cool options were gone (example), no other feeds to read nor to add, no nothing. Just the simple message box (which, at least, is still working).

Then I checked the phone client, which was actually much more than that: It was a replacement of your phonebook, giving you quite a bit of status information about your contacts. You could even see whether the other one was using her/his phone currently, so you didn’t have to call in vain or talk to the answering machine instead.

This feature is missing as well (you could operate Jaiku even through SMS, but I get this service is no longer supported, either…). Btw: All of this came out of a Finnish research project a couple of years ago.

On the other hand, Jaiku is now Open Source! And this means, anyone could start a similiar service. Which is great (Jaiku founder Jyri says). Unfortunately, it appears to me, that the spirit of Jaiku was also based on an substantial amount of hardware and money that allowed to run the service smoothly and provided, e.g., to receive status updates via SMS for free. So, it might be more a some- than an anyone who could create “JaiTwo”.

I’ll try to keep an eye on the great Jaiku team, as they are up to something new for sure. Meanwhile, I’ll have to turn to the twitterverse…


buddies and business

L'épicerie in Lyon

The more I get into relation detection via communication data, the more services come to my mind. But of course, I don’t invent this wheel for the first time (Pete Warden’s blog brought a lot of evidence to me): In an article two years from now (already!) ZDnet UK has a nice portrait about the emerging business of email analysis. A positive focus is put on Clearwell Systems because of their special (unique?) ranking algorithm (oha! — I bet Google pays very close attention). Its software

weighs the background data and content of each email for several factors, including the name of the sender, names of recipients, how many replies the message generated, who replied, how quickly replies came, how many times it was forwarded, attachments and, of course, keywords.

Well, so do I… But in the light of a fully grown business, ranking emails gets away from a personal (autonomous) assistant that is just nice to have, handy and good for reflection. With the huge amounts of email produced every day and about every topic relevant to any business process, corporate email archives contain pretty any information a manager, and — more delicately — a prosecutor can desire:

Email has come to be viewed as a source of truth. If you want to know what really happened, you look at the email.

As it became clear to me, too, during my research, collecting and archiving (intercepting?) all electronic conversations improves the the basis for statistical analysis and heuristics and hence the quality of the ranking a lot. A lot of entities (Google, security authorities) are after our data, consequentially.

Pete Warden has to receive an honrable mention once more because his position of “trying to generate a useful index with no human intervention” resonates with my basic motivation, too. I find his blog to be imensly interesting and very relevant for my thesis: Like expoiting the time information inherent to email that I thought of using in some kind of “contact profiling”, all the privacy issues entangled, especially in business context, and drawing profit from the knowledge that accumulates often unnoticed in a company (or workgroup). And he complains about the missing Gmail Api, too. All written in a very comprehensive manner.


Google takes care of you!

When I was on the way of looking for a film I saw at this years ars electronica, I got quite a good result via Google:

But instead of the website I got a smart advise by Google as you can see below:
google anit malware
(you can right-click “view image” for the better readable version, untill I have better skripts for that)

I had my doubts with the Google warning but the ars-link was exactly the same, so I took it and found a very nice page, describing the film, trailer, and several articles linking to all the awards the film has won so far (siggraph et al.!) – as expected.
Did that site become “badware” by Google-algorithms? and
Why did Google/stopbadware not provide any “no badware” button as we know it from spam and everything?

Has anyone ever experienced something similiar?


best replica attractor

This is just a small and simple experiment to see if i can attract a huge amount of spam for my newly installed gmail-account. The more I succeed in doing so, the better I will perform i a newly created game from the Digital Playground Class at FH Potsdam. Very intersting class, but for the bots only this link does count: my attractive replica site!


predict the tomorrow

Everyone uses Google (or search engines in general) to find something from the past: What are the soccer results from last week end, who wrote an article about surveillance, where is that “critical update” for my webbrowser? Google finds out the questions and needs of a lot of people (e.g. 50% of all US-search) and with a little extrapolation one could say: of the world. The (monthly) statistics on the psyche of the world can be inspected at the Google Zeitgeist.
The future is nothing random but created by ourselves everyday through actions that are driven exactly by these questions and needs that condense at the search interface of Google. Wouldn’t Google be able to predict the things to come?

While this is one of the stunning (at least to me) results of the Google and Borges class at Humboldt University, I developed a game concept that takes one step back and leaves prophecy to the players.


google gets news wrong?

google news of 2007-01-08

A subway accident described as “naturally funny” (“Von Natur aus witzig”)? In a usual newspaper one would think of a macabre mistake by the editor, but Google News usually aggregates its contents from other sources without editing them.

This sentence of offence was not to be found on Rheinische Post in the original post but somewhere completely different:

rp news of 07-01-08

Assuming Google and Google News are operating on algorithms, how could that happen?