A recent trending topic on the Web involves an image of India’s Prime Minister Narendra Modi. There is nothing unusual about the image itself, nor was it associated with any current news item. The hubbub was because searching Google Images for “Top 10 Criminals in History” landed the image of Mr. Modi right alongside the likes of Al Capone and Charles Manson. Feature writers immediately jumped on the result as Google having committed some colossal blunder. But of course it was no blunder at all. At the top of the search results page Google posts this calm disclaimer:
“These results don’t reflect Google’s opinion or our beliefs; our algorithms automatically matched the query to web pages with these images.”
The search results were the logical outcome of an algorithmic search through an ocean of unstructured data. To be fair, the algorithms did a stellar job of finding infamous criminals, but along with the criminals they found presidents, actors, and scientists. When you want to find a needle in a haystack and you have nothing to lose, you may as well try your luck. Your search results may contain a lot of noise, but you may very well find what you are looking for.
We are all so impressed and amazed by the power of algorithmic search that some may wonder why we bother with conventional metadata. For one thing, algorithmic search engines feed off volume; lots of data, lots of queries, lots of observed behaviors after the search, and a wide distribution of users. These generate the trends that power the search algorithms to give results that seem almost clairvoyant. But when the universe of data and users is not so massive the algorithms starve and the results get noisier. When the volume of data is constrained and the need for accuracy is high, as is the case with most private collections, that’s when you need real metadata. By “real” metadata, I mean metadata that is structured to fit a schema and that has been moderated for accuracy.
Having structured metadata (i.e. schemas and taxonomies) gives the users control over the content. It allows certain contextual meanings and assumptions to be assigned to the metadata that are relevant to the content and relevant to the users, thus guaranteeing both relevance and accuracy in the search returns. A good example is the metadata that was created for the Paley Center for Media to tag the U.S. broadcasts of the Olympics. Fine distinctions were made between “Athletes”, “Commentators”, “Talent”, “Celebrities”, and “Cameo” appearances across more than 40 years of Olympics broadcasts. Those are abstract distinctions that are important for the types of searches wanted by the Paley Center, but they would never happen organically with an algorithmic search. That is why services like Crawford’s Engage asset management platform and Metaforce metadata writing service are so powerful for professional content management. Algorithms are certainly useful, but when it’s your content that you’re searching and monetizing, anything less than a structured approach would be well, criminal.