Using Big Data Techniques to Explode the Keyword Targeting Myth
For my experiment, I crawled several thousand pages from The New York Times. Using The Times’ pages was a natural choice, both because lots of data experiments use this source, and also because I know they share the keywords they associate to each URL as a metadata field. After doing some de-duplication and other cleaning, I ended up with a my data set of 2,142 stories, choosing articles in three main categories: politics, art and entertainment and business.
I wanted to run an A/B test on the data, modeling the association between the known category of a URL and its assigned keywords as the A side of the test. Next, I modeled the association between the known category of a URL and a feature set, where the feature set is semantically generated data about that URL in the form of entities – the people, places, things and emotions present in the content. This is the B side of the test. The difference between the models will express the advantages of one form of contextual targeting over another.
By the way, the Big Data term to use here is that we are constructing a “supervised learning” model. This means we know the categorical outcome of each URL as “truth” and are testing the independent variables that will prove to be the best predictors of a category to assign new URLs later.
I used a Big Data estimation package called R (http://www.r-project.org/) and the user interface R-Studio (http://www.rstudio.com/). This was just one of several options out there, including Orange, which is much more GUI driven and has good set of estimators (http://orange.biolab.si/).
For both the A and B test I chose to use an ensemble model approach. This means that I use multiple sampling runs and multiple estimation models in combination in what is termed a “machine learning” approach to finding the best fit. After lots of iterative experimentation, I chose to use the following estimation models; Maximum Entropy, Support Vector Machine (SVM) and Random Forests (RF).
The A Test Outcome
The table below shows the core measurements of fit for the keyword A-side test. The Ensemble Recall measure is about the ability of keywords to “nominate” or identify a category at all. This means that keywords can be used about 70% of the time to nominate a category. It also means that 3 out of 10 times, keywords will fail to accurately represent what an article is actually about.
The accuracy measures per model type indicate the percentage of time the category nominated is correct. Averaging the three models means that keywords correctly identify the category only in 1 out of 3 cases.
The B Test Outcome
Semantic processing, whose goal it is to mimic the comprehension and richness of human understanding, provides a stark difference. In the B side of the test, semantic data correctly identifies a category almost every time as shown by the 95% recall measure.
In terms of accuracy, instead of 1 out of 3 stories being correct, 2 out of 3 are correct.
Thinking about the two measures together and comparing the A and B tests provides even starker relief. It is not at all clear at all why anyone would use a targeting technique that fails to assign an understanding of content 30% of time, and when it does it is wrong 66% of the time. This does not seem to be a technology you can rely on to satisfy the demands of advertisers.
Semantic data, on the other hand, almost always provides an answer and only gets 1/3 of them wrong. In practice, the accuracy number is typically higher as the specific needs of advertisers are understood in terms of optimization, audience characteristics and brand safety concerns.
A successful advertisement is a vital factor that helps contribute to growing your top line. With so many display ad impressions wasted and misplaced, it is not that we can immediately fix all of the problems. But technology gives us a new chance to start moving in a new direction to re-establish the credibility of online ads: a fundamental step in making the economics of digital advertising work for buyer, seller and consumer.
Leave a Comment
- SXSW News: Kenshoo Integrates with Oracle for Social Marketing March 7th 2014 AUSTIN, March 7, 2014 (ADOTAS) –- Kenshoo, the global leader in [...] more »
- Study: Consumers Want a More Personalized Mobile Home Screen March 7th 2014 ADOTAS – New research from mobile analytics company Flurry shows noteworthy [...] more »
- Twelvefold Introduces Spectrum for Video: Real-Time, URL-Level Video Ad Placements Across All Screens March 6th 2014 SAN FRANCISCO, March 6, 2013 (ADOTAS) – Twelvefold, a big [...] more »
- DataXu Adds Video to Private Exchange Capabilities March 6th 2014 BOSTON, March 6, 2014 (ADOTAS) – DataXu, a leading provider [...] more »
- YuMe, Magid and Razorfish Reveal CTV Creative Best Practices in Latest Study March 6th 2014 REDWOOD CITY, Calif., March 6, 2014 (ADOTAS) – YuMe, Inc. [...] more »
- RR Donnelley Announces $350 Million Debt Offering March 6th 2014 CHICAGO, March 6, 2014 (ADOTAS) – RR Donnelley & Sons [...] more »
- Getty Images to Showcase Free, Legal Embed Capability at SXSWi March 6th 2014 NEW YORK and AUSTIN, March 5, 2014 (ADOTAS) – Global digital [...] more »
- You Have My Data, Now Stop Retargeting Me! March 7th 2014
- The Top 5 New Video Ads: Snickers, Pepsi, HUVr, Hugo Boss March 7th 2014
- Spotlight on Search: Yahoo! Gemini vs. Google Enhanced Campaigns March 6th 2014
- 6 Trends That Reinforce the Need for Unified Data Collection March 6th 2014
- 4 Important Lessons B2B Marketers Learned in 2013 March 5th 2014
- Marketing Operations Manager - Healthcare
- Director Digital Engagements
- Website Designer
- Online Account Manager
- Online Media Buyer
- SXSW News: Kenshoo Integrates with Oracle for Social Marketing - Responsivemts | Responsivemts: [...] Report Rattles Currency's WorldBrands' Organic Facebook Reach Has Plummeted Since OctoberSXSW News: Kenshoo Integrates
- #DailyDigital It’s Friday. Facebook Is Doing Something to the Newsfeed. Again. | Trey Peden - Digital. Marketing. Management.: [...] Five Things the FTC Will Get Tough on in 2014 I wanted to read
- Jeff Yablon: of course ... there's always the question of revenue split ... interesting new service just
- Articles Written by Bob Bentz | Bob Bentz: [...] 7/14/08 — Adotas [...]