How can I set up robot txt
Basics / Robots.txt
In the file robots.txt you can specify which behavior of search robots you want on your site.
If you want to protect your content from unauthorized access, read the relevant sections on configuring web servers, for example.
The so-called Robots Exclusion Standard Protocol regulates how you can use a file robots.txt affect the behavior of search engine robots on your domain. This protocol has now grown into a quasi-standard even without RFC.
It is true that the use of the page can also be determined in individual HTML files with the help of a meta tag for search engines, but this only applies to the individual HTML file and at most all of the pages within it that can be accessed through links, not other resources such as B. Pictures. In a central robots.txt, on the other hand, you can specify which rules should apply to directories and directory trees, regardless of the file and reference structure of your web project. As there is no written RFC, the robots.txt and its syntax are not always interpreted in a uniform manner by the robots. The additional use of meta tags in HTML files is therefore recommended in cases of undesired indexing by the robot, if the robot did not interpret the robots.txt or did not interpret it correctly.
The (there can only be a maximum of one such file per (sub) domain) must be stored under this name (all letters in lower case) in the root directory of the domain's web files. The URI for the domain example.org is therefore http://example.org/robots.txt. Only in this way can it be found by search engine robots that visit the project. This means that you can only use the robots.txt technology if you have your own domain, but not with web space offers where you only get a homepage directory on a server without accessing the root directory of the domain.
The robots.txt is a pure text file and can be edited with any text editor.
Structure of a robots.txt 
In the second data set, all robots are prohibited, the two subdirectories /photos/ and / temp / read out. In addition, access to the file fotoalbum.html forbidden.The first line is just a comment line. Comments are introduced by a gate symbol and can also begin on the line.
One consists of records (records), which in turn basically consist of two parts. The first part specifies which robots () the following statements apply to. In the second part, the instructions themselves are noted. The instructions are to forbid something to the previously determined robots ().
Each line of a data record begins with one of the two permitted keywords or. This is followed by the relevant information, separated by a colon and a space. A blank line is noted between the data records.
User agent 
Within a data record, at least one line must begin with. Only one entry is possible after this. If you want to address more than one particular robot, you have to write down several lines one below the other that begin with - as in the first data record in the example above.
Either the wildcard (asterisk), which means “all robots”, or the name of a certain robot, whose name you must know, is allowed. No distinction is made between upper and lower case. More than one record for all robots is not allowed.
The lines that begin with are noted below the lines that begin with. The information about this is then taken into account by the robots that were specified in the same data record. Empty lines within a data record are not permitted.
You can write down a path after each line that begins with. The robots will not index any path on your side that begins with this path specification.
The statements are processed in sequence from the first to the last line. The first entry that matches the path to be checked wins.
does not limit you to entire directory or file names. Sections are also possible, /picture fits (next to /picture itself) / picture / vacation just like on /images or bild123.jpg. You should therefore make sure to note a trailing slash in the directory paths.
Placeholders such as or are only known to some search engines and should therefore be avoided.
Extensions to the original protocol 
Even the original protocol is merely a recommendation; the extensions presented here are being used by Google, Microsoft and Yahoo! supported since 2008.
Order of user agents 
Originally, the robots.txt was processed strictly from top to bottom. Therefore, the instructions for all robots (user agent: *) should be at the very end of the list. In addition, the name of a user agent had to be known down to the upper and lower case.
The information about the user agent is only the beginning of the user agent string, so it is synonymous with.
Only one data record of the robots.txt can be applied to a robot. The robot must therefore determine the data record that applies to it by determining the data record with the most precise user agent information that still results in a match. All other records are ignored. The order of the data records is therefore not important.
The original protocol did not allow individual files or directories to be indexed.
was established in 1996 introduced in order to enable individual releases within actually blocked paths. It is not necessary to explicitly release objects that no other entry in the robots.txt matches.
It should be noted that the entries in the robots.txt have always been processed in sequence until the first matching one. Accordingly, allow entries for paths that were previously excluded by Disallow are actually ineffective:
At Google it was suspected that due to this procedure, paths would be unintentionally excluded from the indexing and changed the processing sequence for its own index: First all allow entries are checked one after the other, only then are the disallow entries processed.
Since this deviation is only documented by Google and it makes no difference to Google, the order should be followed line by line with regard to other search engines.
The extended protocol recognizes two wildcard characters for the path information:
- : any number of characters
- : End of line
A sitemap contains the structure of your website in machine-readable form.
You can enter the complete URI of the sitemap, which, unlike the robots.txt itself, can be saved anywhere under any name in the robots.txt.
Although the sitemap may contain useful additional information for search engines, it only makes sense to create one in a few cases, for example with very large or very complex pages. Make sure that your pages are linked to each other. If a human visitor can find all pages, every bot can be trusted to do so.
Approach recommended by Google 
To ensure that certain pages are not indexed by Google, a "ban" via robots.txt is very unreliable. For example, if the Google bot B. is seen via an external link, he still picks up the page.
In order to reliably prevent pages from ending up in the Google index, the relevant page must<meta name="robots" content="noindex">
can be specified.
In order to remove pages from the Google index, access in the robots.txt must not be forbidden and the budget tag must be set.
However, this is not useful for non-HTML resources, since a PDF file, for example, cannot contain such a meta element. In this case the tag can be used.
- ↑ Google Webmaster Central Blog: Improvements to the Robots Exclusion Protocol
- ↑ robotstxt.org: Extended Draft 1996
- ↑ sitemaps.org: sitemaps protocol
- ↑ Hacker News: 20326445
- ↑ Google developer: robots meta tag
See also 
Web links 
- What are the best non-drug store lotions
- How do businesses thrive without advertising
- Antivirus programs installed online are useful
- What is mental power
- Is someone following Orphan Black
- Why do we have to save elephants from extinction?
- Who was your strangest houseguest?
- What should I add to my bedroom
- What are the telemedicine websites earning
- What antibiotics treat insect bites
- What does AppDynamics
- Can I use 409 on laminate floors
- What are the most beautiful fossils found
- Is driving drunk driving buzzing
- What is one-tier direct sales
- Are Gunnar glasses worth the money
- Which is better mindfulness or abundance of soul
- Where can I find male Hindi monologues
- Salesforce certifications are free
- What is the future of philately
- What mud is better for growing plants?
- Which Murakami books do you suggest?
- Donald Trump is a loving father
- You may faint from a panic attack