Answers Served: Blekko’s Skrenta Discusses Social Search, Spam Clock
ADOTAS – As of press time, the just-launched Spam Clock claims that more than 153 million pages of spam have been created since the beginning of the year. Introduced by recently launched social search engine Blekko, cofounder and CEO Rich Skrenta explains in a blog that the goal of the clock is to bring attention to the ever-growing clutter gumming up the web. Organizing search verticals in engines like Blekko, he writes, will create a “disincentive for polluting the web.”
Skrenta took a few minutes to discuss the launch of Blekko and its recent improvements as well as share his thoughts on SEO and the Spam Clock.
ADOTAS: What was the impetus to launch Blekko? Was there one moment in particular in which you said, “I gotta make a better search engine”?
We saw that the web was continuing to grow at an accelerating rate, yet there were fewer and fewer search engines — in fact we’re down to just two search engines now, Google and Bing. Search is critically important to web users, but we have just two companies providing this service. We thought there was room for a different editorial approach to web search.
When Blekko publicly launched, I initially agreed with Business Insider’s Henry Blodgett that Google isn’t broken, but I’m not so sure of that anymore: Is Google’s search engine in disrepair? Can it be fixed?
We’ve gone from 1 billion pages on the web in the year 2000 to over 100 billion pages today. English wikipedia is only 3.5 million articles, and it’s done. There are 15 million local businesses. There are 70,000 titles in the Netflix catalog.
Where are all these billions and billions of new pages coming from? A majority have been created to attack and exploit search engines for monetary gain. The cost to add a new page to the web is effectively zero, and search engines will send you money if you can attract any searches. So there is a huge market that has evolved to produce more and more pages designed to catch searches.
Do you feel Google’s myriad initiatives (i.e., display advertising, mobile, social media) have caused it to drop the ball on its core competency, search?
The fundamental problem is that, since urls that attract searches have an economic value, there is a motivation to create more of them. PageRank was originally designed to measure link popularity to rank web pages. But this approach no longer works, since there are more bad links on the web than good ones. And it’s only going to get worse.
Obviously slash tags are one approach to dealing with the plethora of spam and lackluster content gumming up the web (i.e., cut through the crap) — what other methods do you see and are you employing them in Blekko?
We have standard algorithmic tools to identify and remove spam. But we’re past the point where the spam is a minority of the content one the web, and we can just run some algorithmic filters to scrub out the bad stuff. At this point there are far more spam urls than legitimate ones. So we’re taking the approach of identifying the good content rather than trying to identify all of the bad content.
An example — the top 50-100 health sites collectively have millions of pages, and can answer nearly any health question you have. If you do a health-related search, these are the sites that you want to get your information from — sites like mayoclinic.com, nhs.gov and webmd.com. These are sites where medical professionals are authoring and organizing the content.
On the other hand, there are millions of sites with health information that are coming up in health searches now where the content is not authored by health professionals. Sites like ehow.com do not employ doctors to write their medical articles. There are tasks being bid on Amazon’s Mechanical Turk to write health articles for six cents per 100 words. The people taking up these tasks have no business writing medical content for the web.
How have you enhanced Blekko since the public launch? What improvements do you hope to introduce in the near future?
We added an integration with Facebook, called /likes, that uses your social circle to enhance your search results. If you or your friends have “liked” a site, that information will be featured in the search results, and you can restrict search results to pages that have been liked by you or someone you know.
Can you explain the analytics that Blekko offers for each page?
Our goal is to open up all of the data from our web crawl, including backlinks, anchortext, duplicated text, our classifier scores, and our ranking factors.
How is the number on the Spam Clock calculated? What qualifies a page as spam?
We did an extrapolation based on the growth of the web and the fraction of spam that we’re seeing.
Do you believe in good SEO, or is it nothing but bad apples? What constitutes good SEO (if it exists)?
Good SEO is what we call “appropriate discoverability.” Sometimes there is good content that can’t be found because it hasn’t be published to the web correctly. Newspapers have great content for restaurant reviews, movie reviews, and garage sales. But their content almost never comes up in search engines because of poor SEO issues. Often this would be the best content available for users, so it’s a shame that it isn’t ranking well.
But there is far more shady activity promoting spam pages. We see millions of WordPress blogs that have been hacked to add invisible spam links. There are commercial software packages you can buy to post millions of comments on forums around the web. You can hire labor pools to write fake review and post fake tweets. All to cheat users and deceive algorithmic search rankings.
How can search advertisers take advantage of social search services? Can SEO tactics be employed for a social search engine? Should they?
We’ve been seeing an increase in social media spamming. But it lags far behind traditional web SEO.
What an intriguing idea -search without span. blekkko may be just what I need to do serious research. Spam is truly getting in the way of finding content.
Great article, I have seen a rise in these spam web pages but had no idea the depth.