Internet Guide Logo

Robots Exclusion Standard

Last Edit: 10/01/17

The Robots Exclusion Standard - also referred to as the 'Robots Exclusion Protocol', or simple, the 'Robots.txt Standard / Protocol' - is a standard that enables website to control access to their content by web robots and crawlers. Why would websites want to limit access to their content? One reason is that web crawlers have been accused of using considerable bandwidth and have been known to contribute to a website exceeding their bandwidth and being made unavailable. Websites also like to limit access to certain sections of their website, such as their image folder: so that search engines cannot place a websites pictures into their image search database.

The Robots Exclusion Standard was initiated by Martijn Koster; who developed the first ever web search engine: ALIWEB. It has been claimed that Koster suggested the creation of the Robots Exclusion Standard after his server was made unavailable by a 'rouge' misbehaving web crawler. Due to his experience with developing an early web crawler, Koster was able present his Robots Exclusion Standard proposal to CERN in 1994 - Berners-Lee invented the World Wide Web at CERN in the early 1990's. The Robots Exclusion Standard was adopted by the prominent search engines of 1994-1995: primarily AltaVista, Yahoo!, Lycos and WebCrawler. The Robots Exclusion Standard has continued to be adhered to by prominent search engines, such as: Google, Bing and Yahoo! and Yandex.

The Robots Exclusion Standard is referred to as the 'Robots.txt Protocol' because it uses a file named: robots.txt. Each subdomain, port and protocol needs it's own robots.txt file. The robots.txt syntax is simple, it contains an 'allow' or 'disallow' command for each web crawlers; the web crawler is identified by it's user-agent. The allow and disallow syntax is shown below: