<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>robotstxt.ca</title>
	<atom:link href="http://www.robotstxt.ca/feed" rel="self" type="application/rss+xml" />
	<link>http://www.robotstxt.ca</link>
	<description>All About robots.txt</description>
	<lastBuildDate>Sat, 03 Dec 2011 18:52:04 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
		<item>
		<title>Robots.txt Information</title>
		<link>http://www.robotstxt.ca/archives/6</link>
		<comments>http://www.robotstxt.ca/archives/6#comments</comments>
		<pubDate>Sat, 12 Nov 2011 21:26:46 +0000</pubDate>
		<dc:creator>xpctechnology</dc:creator>
				<category><![CDATA[SEO]]></category>

		<guid isPermaLink="false">http://robotstxt.ca/?p=6</guid>
		<description><![CDATA[All about Robots.txt! Web indexing robots are used by many search engines such as Google, Inktomi, AltaVista and others. These web indexing robots are also known as spiders. These spiders/robots are the tools used by engines to harvest data for &#8230; <a href="http://www.robotstxt.ca/archives/6">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p><strong>All about Robots.txt!</strong><br />
Web indexing robots are used by many search engines such as Google, Inktomi, AltaVista and others. These web indexing robots are also known as spiders. These spiders/robots are the tools used by engines to harvest data for their search engines. When you submit your website to the engines, you are effectively asking the search engines to send their web indexing robot to your website so that it can be crawled and added to their database.</p>
<p><img class="alignnone size-full wp-image-7" title="how a page is spidered by search engines" src="http://robotstxt.ca/wp-content/uploads/2011/11/how-a-page-is-spidered-by-search-engines.gif" alt="" width="759" height="395" /></p>
<p><strong>So why do i need a robots.txt file?</strong><br />
Web-Indexing Robots can be controlled as to which part of your site they index by installing a simple text file called robots.txt in the root path of the server with explicit instructions on what the spider is and is not permitted to index on your website.<br />
You can define which paths are off limits for spiders to visit an block off such . This is useful for such things as large directories of information, personal information, and parts of the website containing large amounts of recursive links, among others. Now it is possible to include robots.txt indexing information directly in your meta tag and in some cases this is preferable if only one page needs to be controlled. You can use a meta tag like this to tell the robot it is ok to index this page and follow links it finds on this page. However, if you have whole directories and multiple pages you want to control the indexing of then you need a robots.txt file to ease the burden of managing this task.</p>
<p><strong>How accurate does my robots.txt tag have to be?</strong><br />
You need the correct path of the files or directories that reflect the web viewable path of the server.<br />
Example: many servers use htdocs as the web root, but the ftp root will be different. Your robots.txt tag should not include the htdocs directory in front of the file/directory because the htdocs folder is not viewable on the web&#8230;the files in the htdocs are what need to be listed if you whish to control the spiders indexing of them.</p>
<p><strong>Do I have to have a robots.txt file in order to have search engines index my site?</strong><br />
The short answer is no! A web indexing robot will crawl your site unless told not to. However lets go a little deeper than that. A good web indexing robot such as Googlebot are considered well behaved web spiders and will attempt to find your robots.txt file before it indexes your site. As well good robots will look at your meta tags file and check for the</p>
<p><strong>Where does the robots.txt go?</strong><br />
Your robots.txt file is placed in the root directory. What does that mean? It means it should go in the same directory level as your home page (default.htm etc). You will know if you got it right if you can type in the following into your browser <a href="http://www.robotstxt.ca/robots.txt">http://www.robotstxt.ca/robots.txt</a> and see your robots.txt tag come up, naturally replace our URL with your URL. If your still confused you can use the free testing wizard at <a href="http://www.sitesubmit.ca">www.sitesubmit.ca</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.robotstxt.ca/archives/6/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

