01 December 2007

News Web Sites Seek More Search Control

The desire for greater control over how search engines index and display Web sites is driving an effort launched by leading news organizations and other publishers to revise a 13-year-old technology for restricting access. Currently, Google, Yahoo and other top search companies voluntarily respect a Web site's wishes as declared in a text file known as "robots.txt," which a search engine's indexing software, called a crawler, knows to look for on a site. The formal rules allow a site to block indexing of individual Web pages, specific directories or the entire site, though some search engines have added their own commands. The proposal unveiled by a consortium of publishers, known as Automated Content Access Protocol (ACAP), seeks to have those extra commands — and more — apply across the board.