Digital Marketing - Study Notes:
Instructions for robots
The robots.txt file is a simple text file placed on your web server, which tells web crawlers whether or not they should access a file on your website. The robots.txt file controls how search engine spiders see and interact with your web pages.
If there are any areas of your site that you would rather a search engine didn’t crawl – i.e. that you don’t want it to index – you can issue a robots.txt file that disallows the URL path that you ultimately want to block.
If you disallow any page that contains a forward slash, you can block all access to the site. You can also block individual folders or image files if you wish them not to be crawled. (See slide ‘Instructions for robots’ for examples of these instructions.)
You can check if a site has a robots.txt file from any browser. The robots.txt file is always located in the same place on any website, so it is easy to determine if a site has one. Just add "/robots.txt" to the end of a domain name.
Back to TopShane Lyons
Shane Lyons is head of Search & Analytics at Mediaworks, an award-winning media and communications agency. He has been working in Digital both in Ireland and abroad for the last 8 years and now specializes in SEO and Analytics.
