View Full Version : Robots.txt file

02-15-2012, 05:49 AM
I would like to know that which things of website, an expert SEO must put into robots.txt file for disallow? Which pages is better to not show to Search Engines?

02-15-2012, 06:59 AM
Robots.txt file is use block the page and urls.

02-15-2012, 07:11 AM
Robots.txt allows to disable file to Google. So its depend on you that what file you do not want to show to Google.

02-15-2012, 07:11 AM
it is better to restrict folders like cgi, images, scripts, inc, functions, lib etc which are not needed to index by any search engine.

02-17-2012, 02:46 AM
I would like to disallow pages containing,
-Personal/ Legal information
-Admin login details
-Duplicate files
-cgi bin pages

02-17-2012, 02:54 AM
I would like to disallow pages containing,
-Personal/ Legal information
-Admin login details
-Duplicate files
-cgi bin pages

Venus Brown
02-17-2012, 03:55 AM
You may use robot.txt for duplicate pages, comment pages, older webpage which are no longer required etc.

02-18-2012, 12:29 AM
Robots.txt file is a file, which is initially checked by the search engine crawlers (robots), and based on it search engine decides, which pages have to do index, and which pages do not have to index. If you want search engines does not access the certain folders, you can use simple robot txt command "Disallow: /cgi-bin/" (without the quotes) – and the directory will not be accessible for search engine. Some search engine optimization experts claim that bots do not follow the rules, but it is not true. You can also use robots.txt file declaring your site map without creating a Google and Yahoo account, in which you have to submit it manually.

06-15-2012, 08:14 AM
With the help of disallow attribute of the robot.txt one can restrict the crawler to access the certain webpage as well as the category. Some critical information that need not be necessary to index into the search engine result pages should be formatted in robot.txt with disallow attribute.

06-15-2012, 08:52 AM
Mostly use in duplicate content pages and payment and admin login pages.

06-18-2012, 01:21 AM
A robots.txt file on a website will function as a request that specified robots ignore specified files or directories when crawling a site.

06-18-2012, 01:33 AM
The discussions are very helpful

06-18-2012, 04:27 AM
You can disallow pages or folders like admin folders, cgi-bin, image folder, which are not relevant to the search engines. Robots.txt helps tell spiders what is useful and public for sharing in the search engine indexes and what is not.

06-19-2012, 02:40 AM
As far as I know it is vast when search engines often visit your site and index your content but often there are cases when indexing parts of your online content is not what you want.

06-19-2012, 02:49 AM
Robot.text file is a simple notepad file which is used for getting information about crawling and caching.

06-19-2012, 04:16 AM
If we wants to hide some pages from google then we create robots.txt file.

06-19-2012, 04:25 AM

When google crawl on sites and site owner donot want to crawl the site, then robot.txt file is used ...

06-19-2012, 05:39 AM
thanks for shearing the great answers..

06-19-2012, 06:03 AM
Robots.txt allows to disable file to Google.

06-19-2012, 06:25 AM
Robots.txt file is use block the page and urls.

06-19-2012, 07:45 AM
Robots.txt is a file which is used to exclude content from the search engine bots. Robots.txt is also called Robots Exclusion Protocol.

In general, we prefer that our website pages are indexed by search engines. But there may be some content that we don’t want to be crawled by search engine bots. Like the personal images folder and cgi-bin, and many more. The main idea is we don’t want them to be indexed.