Where to put it
The short answer: in the top-level directory of your web server
The longer answer:
When a robot looks for the “/robots.txt” file for URL, it strips the path component from the URL (everything from the first single slash), and puts “/robots.txt” in its place.
For example, for “http://www.example.com/shop/index.html, it will remove the “/shop/index.html”, and replace it with “/robots.txt”, and will end up with “http://www.example.com/robots.txt”.
So, as a web site owner you need to put it in the right place on your web server for that resulting URL to work. Usually that is the same place where you put your web site’s main “index.html” welcome page. Where exactly that is, and how to put the file there, depends on your web server software.
Remember to use all lower case for the filename: “robots.txt”, not “Robots.TXT.
What to put in it
The “/robots.txt” file is a text file, with one or more records. Usually contains a single record looking like this:
User-agent: * Disallow: /cgi-bin/ Disallow: /tmp/ Disallow: /~joe/
In this example, three directories are excluded.
To exclude all robots from the entire server
User-agent: * Disallow: /
To allow all robots complete access
(or just create an empty “/robots.txt” file, or don’t use one at all)
To exclude all robots from part of the server
To exclude a single robot
To allow a single robot
To exclude all files except one
This is currently a bit awkward, as there is no “Allow” field. The easy way is to put all files to be disallowed into a separate directory, say “stuff”, and leave the one file in the level above this directory:
Alternatively you can explicitly disallow all disallowed pages: