A new front has opened up in the tracking cookie battle. A follow-up report from the same researchers (including independent privacy advocate Ashkan Soltani and privacy lawyer Chris Jay Hoofnagle) that called out Quantcast and Clearspring for re-building HTTP cookies in 2009 through the use of Adobe Flash cookies has discovered that Hulu (one of the publishers fingered last time) had been using both Flash and cache cookies to respawn cookies and create ever-persistent, hard-to-delete tracking tools.
Cache cookies? You mean ETags?
Yes — long speculated about, it seems at least one major publisher has been employing entity tags (or ETags) for tracking consumers with “persistent” cookies (Evercookies?) and respawning deleted HTTP cookies. As current Googler Dean Gaudet explained all the way back in 2003, this method “attempts to get the browser to store unique ID information in its cache in a manner which will be communicated to the server at a later date.”
ETags stay good for a long time, aren’t typically flushed with a cookie purge (but can be removed through clearing the browser cache) and can report data across multiple browsers since they’re not limited to one. According to Soltani, the code works even if cookies are blocked and private-browsing is enabled.
Respawning is a big no-no — Violating consumer wishes? You oughta be ashamed! — but the use of ETags simply shows that, once again, companies tracking consumers have found another way to fly under the radar and dodge transparency.
At a time when privacy advocates are frothing at the bit, the government is contemplating legislation and/or regulation, and the Internet-using public in general has no clue if online privacy actually exists, Hulu essentially tried to a pull a fast one on its users through the technology of analytics company KISSmetrics.
What Hath Hulu Done?
Hulu was employing Flash cookies for cookie respawning (and not for the first time), but the practice was in-house this time instead of performed by a third party such as QuantCast. In addition, the company also used ETags provided through KISSmetrics code to respawn cookies and track users.
Ryan Singel on Wired explains that “when a user visited Hulu.com, they would get a ‘third-party’ cookie set by KISSmetrics with a tracking ID number. KISSmetrics would pass that number to Hulu, allowing Hulu to use it for its own cookie. Then if a user visited another site that was using KISSmetrics, that site’s cookie would get the exact same number as well.”
The report notes that the script used “includes other code that indicates its author is aware of tracking and the risk of data collection about the user. For instance, it includes a function to detect the collection of information that credit card companies require websites to control more carefully.”
The researchers are particularly disturbed by the use of ETag because they claim to avoid this practice, a user would have to flush his/her cliche between website visits.
In response to the report, KISSmetrics tweeted a link to this page on its website:
“As of July 30, 2011 KISSmetrics uses standard first-party cookies to generate a random identity assigned to visitors to our customers sites. This identity by itself does nothing. Consumers can clear these cookies to clear the randomly generated identity KISSmetrics generated. What information is tied to that random identity is controlled by our customers.”
In other words, We have no idea what that rogue company was doing with our technology!
Not surprising, Soltani called BS. The insinuation is that KISSmetrics at least explained to Hulu how the company could use its code as a tracking ETag, but there’s certainly no smoking gun — plausible deniability can be a beautiful thing.
Soltani and crew’s study uses a different methodology from recent Carnegie Mellon research that found only 20% of the 100 most traversed websites using Flash cookies only two of those were using them to respawn cookies. While that study only examined homepages, Soltani et al. attempted to recreate browser behavior by making 10 clicks on same domain within a session.
On the top 100 sites (including government ones), the team discovered 5,675 HTTP cookies — a significant spike from the 3,602 found in 2009 –with 4,915 placed by third parties. (Before you’re all aghast about tags slowing loading time, consider technology like TagMan’s recently introduced Smart Tag Loading.) The biggest cookie depots were wikia.com (242), legacy.com (230), foxnews.com (185), bizrate.com (175), drudgereport.com (168), myspace.com and time.com (both at 151). Google had a cookie presence on 97 of the top 100 sites (89 with Google cookies and 77 with DoubleClick cookies).
The report also notes that 17 of the 100 most popular sites used HTML5 storage cookies, which the researchers believe will become a universal tracking tool. With 5 megabytes of data storage abilities, HTML5 storage are much larger than the other methods (4 kb for HTTP cookies and 100 kb for Flash Cookies), but what concerns Soltani and crew is that the default expiration setting is permanent. Unlike HTTP cookies, which have expiration dates, HTML5 storage cookies can remain in place until a user decides to dump the cookie bin.
Down from 281 in 2009, Soltani and crew found 100 Flash cookies on the top 100 sites. The researchers’ 2009 report claimed that — for numerous major publishers including Hulu — Clearspring and Quantcast were recreating cookies after users deleted them by storing code in Flash cookies. That ended in a class action lawsuit that Quantcast and Clearspring settled for $2.4 million along with a promise not to engage in that practice again. A similar suit against Specific Media was dismissed earlier this year.
Adobe prefers the term “local shared objects” (LSOs) to Flash cookies. Designed to support web technologies such as Flash Player and browsers employing HTML5 LSOs save small pieces of information such as logins or preferences locally on a computer rather than on a site’s server. Many publishers claim to use Flash cookies for internal metrics rather than tracking purposes for advertising, taking advantage of the fact they are saved in a different location than HTTP cookies.
Since the Soltani and crew report, numerous browser add-ons have appeared for cleaning out the Flash cookie stash while Adobe worked with Mozilla and Google to build a new API for clearing LSOs from a browser as well as plugins that install the API.
However, the Soltani team makes the bold claim that the use of Flash cookies is “functionally equivalent to respawning…. Whether or not a website respawns, if it uses Flash cookies, it can uniquely and persistently track individuals even in situations where the user has taken reasonable steps to avoid online profiling.”
“Part of our point here concerns the arms race between trackers and consumers,” Hoofnagle wrote in the Wired comments section. “Although the industry has stated in principle that individuals should be able to opt out, they have defined the opt out very narrowly, and in this case, made it impossible except for weirdos like us to block it!”
Yes, the industry is trying to narrow the parameters of the debate to behavioral targeting. As stated before, respawning is bad voodoo in everybody’s book — including industry self-regulation efforts — but tracking is a more nuanced practice than the critics give credit for. Consider the big blowup with Stanford Security Lab last week over cookies.
Arguably, consumers that use Do Not Track technology will receive irrelevant ads (and the same ones over and over) that they will ignore — and banner blindness is a big enough problem as is. If enough consumers were to sign up with the Do Not Track program, it could potentially cause CPMs to drop even lower and sink publisher revenue.
Singel comments that, “if a user came to Hulu.com from an ad on Facebook, and then later, using a different browser on the same computer, visited Hulu.com from Google, and then at some point signed up for the premium service, KISSmetrics would be able to tell Hulu all about that user’s path to purchase (without knowing who that person was).”
That’s attribution gold right there. However, Singel also notes that ETags could enable data sharing (even PPI) among multiple sites — not that anybody is doing it, just that it’s possible. It’s arguable that holistic government regulation could keep this in line as it doesn’t seem industry self-regulation efforts go far enough.
But more and more, a best-of-all-worlds solution seems to be ditching the opaque cookie-dropping practices and letting consumers opt in — allowing them to trade user data for content. It could certainly solve the transparency issue.