Preparing for 642-436 is easy for a 1Y0-259 professional who has done 70-536 as well as 70-642 as compared to a professional who has only done 642-642 series.
The file robots.txt is a text based document that should be included in the root of your domain, and it essentially contains instructions to any robots that comes to your site about what they are and are not allowed to index. To communicate with the crawler, you need a specific syntax that it can understand. In its most basic form, the text might look something like this:
User-agent: *
Disallow: /
These two parts of the text are essential. The first part, User-agent:, tells a crawler what user agent, or crawler, you’re commanding. The asterisk (*) indicates that all crawlers are covered, but you can specify a single crawler or even multiple crawlers. The second part, Disallow:, tells the crawler what it is not allowed to access. The slash (/) indicates “all directories.”
When you’re writing robots.txt, remember to include the colon (:) after the User-agent indicator and after the Disallow indicator. The colon indicates that important information follows to which the crawler should care about it. You won’t usually want to tell all crawlers to ignore all directories. Instead, you can tell all crawlers to ignore your temporary directories by writing the text like this:
User-agent: *
Disallow: /tmp/
Or you can take it one step further and tell all crawlers to ignore multiple directories:
User-agent: *
Disallow: /temp/
Disallow: /users/
Disallow: /adm/listing.html
That piece of text tells the crawler to ignore temporary directories, private directories, and the web page (title Listing) that contains links — the crawler won’t be able to follow those links. One thing to keep in mind about crawlers is that they read the robots.txt file from top to bottom and as soon as they find a guideline that applies to them, they stop reading and begin crawling your site. So if you’re commanding multiple crawlers with your robots.txt file, you want to be careful how you write it. This is the wrong:
User-agent: *
Disallow: /tmp/User-agent: CrawlerName
Disallow: /temp/
Disallow: /adm/listing.html
This bit of text tells crawlers first that all crawlers should ignore the temporary directories. So every crawler reading that file will automatically ignore the temporary files. But you’ve also told a specific crawler (indicated by CrawlerName) to disallow both temporary directories and the links on the Listing page. If you want to command multiple crawlers, you need to first begin by naming the crawlers you want to control. Only after they’ve been named should you leave your instructions for all crawlers. Written properly, the text from the preceding code should look like this:
User-agent: CrawlerName
Disallow: /temp/
Disallow: /links/listing.html
User-agent: *
Disallow: /temp/
Each search engine crawler goes by a different name, and you can see them at your web server log.