Stealing Metadata

There’s been some buzz of late about enriching data that is submitted into systems with tags. The social bookmarking site Deli.cio.us and the photo management site Flickr are some of the best examples of gathering and utilizing metadata (in other words, data about data). They both provide view of which topics (or “tags”) are hot at any given time. This is great because it’s a glimpse into how viral a topic or link is based on their respective users’ use of the application. Unfortunately, the information-gathering aspect of phenomena is pretty archaic, if not badly flawed.


The task of gleaning and making sense of collected metadata, while a challenge, is more or less solved. If a particular topic is hot, it’s simply referenced by a lot by people. The challenge is less about what to do with metadata and more about how to get it. The problem, as far as I can see it, is that the collection of metadata is far too disruptive in its current incarnation. Sure, you can ask users to enter some tags or descriptive text about a piece of information, but from an end-user’s perspective this is little more than a nuisance. Any part of the experience that is not tied to the user’s end-goals diminishes the user’s experience.
In my opinion, the holy grail here is to somehow steal metadata from the user’s interaction without disrupting that experience if possible. We rely heavily on technology today, and so we’re constantly “talking” to machines. Beyond our own interactions, technology already exists that allows us to gather a lot of metadata without asking the user a single question. Consider the following examples:

  • A digital camera equipped with a GPS notices that, after some travelling, it has arrived at a relatively popular destination (e.g. an amusement park). It subsequently lights up and asks the user if, for the next few hours, he’ll be using the camera to take pictures of friends or family. It asks for some more information, but the user doesn’t have to provide it. Photos taken during that timespan are tagged with “June 26, 2005”, “Hershey Park” and “Pennsylvania”
  • The next time you sync up your MP3 player, your PC asks if you’d like to find out some more about Wilco. It’s asking because it noticed you’ve played the latest Wilco album religiously for the last few weeks (because it’s keeping a hit count of which songs you play). It could also ask you if you’d like to volunteer this information for public polling.
  • After a nurse enters some drug and patient information into her tablet PC while making her rounds, the information is sent to a centralized repository that seeks out causal relationships between combinations of drugs and certain physical reactions. This information, in its isolated form, is relatively useless, but in aggregate it can provide an early indicator of potential dangers – all without disrupting the normal work experience.

Of course, questions can be asked of users – and will be tolerated – so long as there is some immediate or near-immediate perceived value to the user. In addition, users may be willing to sacrifice some of their time and effort for a larger cause (sort of the same way some users install screensavers that steal CPU cycles for cancer research).
Privacy is another potential issue. Before you steal metadata, you should ask the user if it’s ok to do so – especially if that information is being taken back to a centralized place for public consumption.
It’s becoming more and more difficult to make sense of the massive oceans of information and ideas that are flowing through the Internet. One way to elevate the information worth elevating is to ask users to categorize it – a clear violation of user-centered design. I think we should be focusing more on devising systems that pay attention to metadata around our experiences with them. Much can be learned if the systems simply chose to listen.

Leave a Reply

Your email address will not be published. Required fields are marked *