Webmasters favour Google over other search engines: Study

topnews 16 November 2007

Washington, November 16: Penn State researchers have found that website policy makers who use robots. txt files to specify what is open and what is off limits to web crawlers favour Google over other search engines

The study involved more than 7,500 websites, said C. Lee Giles, the David Reese Professor of Information Sciences and Technology, whose team developed a new search engine called BotSeer for the study.

“We expected that robots. txt files would treat all search engines equally or maybe disfavour certain obnoxious bots, so we were surprised to discover a strong correlation between the robots favoured and the search engines’ market share, ” said Giles of Penn State’s College of Information Sciences and Technology (IST).

Robots. txt files are known for regulating Web crawlers, also known as “spiders” and “bots”, which mine the Web 24/7 for everything from the latest news to e-mail addresses. Web policy makers use the files found in a website’s directory to restrict crawler access to non-public information. These files are also used to reduce server load which can result in denial of service and shut down Web sites.

The researchers have now found that some web policy makers and administrators are writing robots. txt files that are not uniformly blocking access. They say that such files give access to Google, Yahoo and MSN while restricting other search engines.

Although the study did not reveal any reason as to why web policy makers opt to favour Google, the researchers believe that the choice was made consciously.

“Robots. txt files are written by Web policy makers and administrators who have to intentionally specify Google as the favoured search engine, ” Giles said.

The finding has been described in a paper titled ‘Determining Bias to Search Engines from Robots. txt’, given at the recent 2007 IEEE/WIC/ACM International Conference on Web Intelligence in Silicon Valley. (ANI)