Opera’s MAMA Project : Peering Into Web Pages

A few days ago, the kids at Opera Software launched a search engine that studies what most typical search engines ignore: what’s going on inside of a web page. It’s called MAMA for “Metadata Analysis and Mining Application.” MAMA is a “structural Web-page search engine—it trawls Web pages and returns results detailing page structures, including what HTML, CSS, and script is used on it, as well as whether the HTML validates.”

The initial results are insightful and occasionally fascinating. A few choice statistics:

  • Over 65% of served pages in China use Flash vs. 35% in the U.S. for example (any theories behind this? Bizarre).
  • A little over 4% of pages validate.
  • People still use a ton of FONT tags and embedded (inline) CSS.

Opera has published key findings as well as a view of what “the average page” looks like.

