ADOTAS – As I stepped onto the Metro-North train the other day, I spotted a series of ads from Google near the doorways. The one that really captured my attention is below. It left me asking why Google (and others) are releasing these ads. The short-term answer is, as Marc Rotenberg, executive director of the Electronic Privacy Information Center, says in this New York Times piece, “I think they’re made to justify certain business practices.”
The longer-term answer is: sunk costs and the path of least resistance. It’s cheaper to fight for the least amount of regulation than it is to change the Google algorithm and to make the significant hardware changes to go with it.
Yet a new Comscore report out yesterday indicates $12.4 billion in advertising is wasted every year. The biggest driver, according to the report, was when ads are placed next to content that is “unsafe.” As the ReadWriteWeb story states, “Among ads that were seen, 72 percent of the campaigns in the study ran alongside site content that was ‘not brand safe.’ If you’re advertising cheeseburgers, and your ad runs next to a news article about the obesity epidemic, you’re not likely to get much value out of it.” That means all the data collected about individuals’ behavior online can be for naught.
What I wanted isn’t always what I want. To correct for this problem, Google uses a keyword-plus-statistics technique that often doesn’t work. I may in fact be looking for beetles that buzz, even if I have previously looked for Beetles that beep. To Google the word beetle is a token, a series of letters in a particular order, B-E-E-T-L-E.
Now add the statistics. The tokens “beetle” and “beep” were probably in the presence of the token “car” or “Volkswagen.” Likewise, the tokens “beetle” and “buzz” were probably in the presence of the token “insect.” Associations are made despite every stats professor in the nation drilling the same famous phrase into our heads: “Correlation does not prove causality.” My old Volkswagen quite often buzzes and beeps.
Semantic technology provides data on the meaning and context of every word, sentence and paragraph. Two measures are used to define the efficacy of the webpage description: precision and completeness. Precision is straightforward. Is it right?
Completeness is about reach: Does it include all that should have been included in a set of identifiable conditions — entities (people, places and things), categories and emotions?
The diagram below shows a complex sentence with two notions of sentiment. The fly-away boxes show two assigned definitions. The first is an understanding of a major brand — Motorola. The second is the use of the word “great.” What makes this sentence tough to understand is that “lousy” is negative and “great” is positive. But which objects in the sentence are the adjectives attributed to?
The “lousy” in front of Motorola is easy — an adjective in front of a noun modifies it. But the adjective “great” is way out of position — to the right of the “buttons” it refers to. Yet semantic technology still gets it right. The SBJ (“the buttons”) at the bottom of the fly-away tells me so, indicating “great” is associated with “buttons.”
Keywords with some statistics don’t get this right. They declare the sentence “neutral” simply because a positive and negative cancel each other out. For user-generated content (UCG), this analysis is wrong.
So accuracy in understanding the meaning of a word makes a difference. What about completeness or reach? Below you see two diagrams for the word “gas.” The first shows just the upper and lower branches of hierarchically related meanings when the gas is used in the context of gasoline. Less specific forms of gas are fuel and hydrocarbon, whereas more specific forms of gas include the 21 additional concepts you see.
When the definition of gas changes to “natural gas,” the associated concepts changes dramatically. Any system that does not take account of the in-context definition of the word and its associated concepts will have very little completeness or reach. In other words, keywords plus statistics have very little reach.
These diagrams are showing only the hierarchal relationships between words and only at two levels. But you can think of all kinds of relationships between words that would again change these diagrams and the extension of levels beyond two that are possible.
Semantic technology maps out all the levels and relationship types between all words. This approach tackles the completeness or reach task of content targeting in a way that keywords plus statistics never will.
Good to know may not be good for me. Everyone in the ad technology business says they have one goal — get the right ad in front of the right person at the right time. But when Comscore says $12.4 billion went right down the toilet, it screams for improvement in how our digital understanding of ads, content and people match up, so that this waste disappears.
So in the end you could argue Google’s “good to know” campaign is a slick piece of PR to downplay the privacy issues in digital advertising. Even if you had every Google user give an explicit “OK” to turning over privacy for Google services, what really is missing is how much better those services could and should be. Good to know isn’t good for me, but it is good for Google.