{"id":700,"date":"2012-05-22T15:05:38","date_gmt":"2012-05-22T15:05:38","guid":{"rendered":"http:\/\/www.withinweb.com\/info\/?p=700"},"modified":"2012-05-22T15:09:38","modified_gmt":"2012-05-22T15:09:38","slug":"the-robots-txt-file","status":"publish","type":"post","link":"https:\/\/www.withinweb.com\/info\/the-robots-txt-file\/","title":{"rendered":"The robots.txt file"},"content":{"rendered":"<p>When it comes to SEO, most people understand that a Web site must have content, \u201csearch engine friendly\u201d site architecture\/HTML, and meta data such as title tags, graphic alt tag tags and so on.<\/p>\n<p>However, some web sites totally disregarded the robots.txt file. When optimizing a Web site: don\u2019t disregard the power of this little text file.<\/p>\n<p><strong>What is a Robots.txt File?<\/strong><\/p>\n<p>Simply put, if you go to www.domain.com\/robots.txt, you should see a list of directories of the Web site that the site owner is asking the search engines to \u201cskip\u201d (or \u201cdisallow\u201d). However, if you\u2019re not careful when editing a robots.txt file, you could be putting information in your robots.txt file that could really hurt your business.<\/p>\n<p>There\u2019s tons of information about the robots.txt file available at the Web Robots Pages, including the proper usage of the disallow feature, and blocking \u201cbad bots\u201d from indexing your Web site.<\/p>\n<p>The general rule of thumb is to make sure a robots.txt file exists at the root of your domain (e.g., www.domain.com\/robots.txt). To exclude all robots from indexing part of your Web site, your robots.txt file would look something like this:<\/p>\n<p>User-agent:<br \/>\n* Disallow: \/cgi-bin\/<br \/>\nDisallow: \/tmp\/<br \/>\nDisallow: \/junk\/<\/p>\n<p>The above syntax would tell all robots not to index the \/cgi-bin\/, the \/tmp\/, and the \/junk\/ directories on your Web site.<\/p>\n<p>There are situations where you might use the Robots.txt file to cause issues with your site optimisation.\u00a0 For instance if you include a * Disallow: \u201c\/\u201d in your Robots.txt file it will be telling the search engines not to crawl any part of the web site giving you no web presence \u2013 not what you want.<\/p>\n<p>Another point to watch out for is if you modify your Robots.txt file to dissallow old legacy pages and directories \u2013 you should really do a 301 permanent redirect to pass the value from the old Web pages to the new web pages.<\/p>\n<p><strong>Robots.txt Dos and Don\u2019ts<\/strong><\/p>\n<p>There are many good reasons to stop the search engines from indexing certain directories on a Web site and allowing others for SEO purposes.<\/p>\n<p>Here\u2019s what you should do with robots.txt:<\/p>\n<p>* Take a look at all of the directories in your Web site. Most likely, there are directories that you\u2019d want to disallow the search engines from indexing, including directories like \/cgi-bin\/,\u00a0 \/wp-amin\/,\u00a0 \/cart\/,\u00a0 \/scripts\/,\u00a0 and others that might include sensitive data.<br \/>\n* Stop the search engines from indexing certain directories of your site that might include duplicate content. For example, some Web sites have \u201cprint versions\u201d of Web pages and articles that allow visitors to print them easily. You should only allow the search engines to index one version of your content.<br \/>\n* Make sure that nothing stops the search engines from indexing the main content of your Web site.<br \/>\n* Look for certain files on your site that you might want to disallow the search engines from indexing, such as certain scripts, or files that might contain e-mail addresses, phone numbers, or other sensitive data.<\/p>\n<p><strong>Here\u2019s what you should not do with robots.txt:<\/strong><\/p>\n<p>* Don\u2019t use comments in your robots.txt file.<br \/>\n* Don\u2019t list all your files in the robots.txt file. Listing the files allows people to find files that you don\u2019t want them to find.<br \/>\n* There\u2019s no \u201c\/allow\u201d command in the robots.txt file, so there\u2019s no need to add it to the robots.txt file.<\/p>\n<p>By taking a good look at your Web site\u2019s robots.txt file and making sure that the syntax is set up correctly, you\u2019ll avoid search engine ranking problems.\u00a0 By disallowing the search engines to index duplicate content on your Web site, you can potentially overcome duplicate content issues that might hurt your search engine rankings.<\/p>\n<p><strong>Test a robots.txt file<\/strong><\/p>\n<p>Google provides a facility as part of there Webmaster Tools system to enable you to test a robots.txt file.<\/p>\n<p>Test a site&#8217;s robots.txt file:<\/p>\n<p>On the Webmaster Tools Home page, click the site you want.<br \/>\nUnder Health, click Blocked URLs.<br \/>\nIf it&#8217;s not already selected, click the Test robots.txt tab.<br \/>\nCopy the content of your robots.txt file, and paste it into the first box.<br \/>\nIn the URLs box, list the site to test against.<br \/>\nIn the User-agents list, select the user-agents you want.<\/p>\n<p>Any changes you make in this tool will not be saved. To save any changes, you&#8217;ll need to copy the contents and paste them into your robots.txt file.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>When it comes to SEO, most people understand that a Web site must have content, \u201csearch engine friendly\u201d site architecture\/HTML, and meta data such as title tags, graphic alt tag tags and so on. However, some web sites totally disregarded<span class=\"ellipsis\">&hellip;<\/span><\/p>\n<div class=\"read-more\"><a href=\"https:\/\/www.withinweb.com\/info\/the-robots-txt-file\/\">Read more <span class=\"screen-reader-text\">The robots.txt file<\/span><span class=\"meta-nav\"> &#8250;<\/span><\/a><\/div>\n<p><!-- end of .read-more --><\/p>\n","protected":false},"author":40,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[7],"tags":[],"class_list":["post-700","post","type-post","status-publish","format-standard","hentry","category-soe"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.4 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>The robots.txt file - PHP Web Applications<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.withinweb.com\/info\/the-robots-txt-file\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"The robots.txt file - PHP Web Applications\" \/>\n<meta property=\"og:description\" content=\"When it comes to SEO, most people understand that a Web site must have content, \u201csearch engine friendly\u201d site architecture\/HTML, and meta data such as title tags, graphic alt tag tags and so on. However, some web sites totally disregarded&hellip;Read more The robots.txt file &#8250;\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.withinweb.com\/info\/the-robots-txt-file\/\" \/>\n<meta property=\"og:site_name\" content=\"PHP Web Applications\" \/>\n<meta property=\"article:published_time\" content=\"2012-05-22T15:05:38+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2012-05-22T15:09:38+00:00\" \/>\n<meta name=\"author\" content=\"paulv\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"paulv\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"4 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/www.withinweb.com\\\/info\\\/the-robots-txt-file\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/www.withinweb.com\\\/info\\\/the-robots-txt-file\\\/\"},\"author\":{\"name\":\"paulv\",\"@id\":\"https:\\\/\\\/www.withinweb.com\\\/info\\\/#\\\/schema\\\/person\\\/04da5531c302d55ffcd777fe81dbb93c\"},\"headline\":\"The robots.txt file\",\"datePublished\":\"2012-05-22T15:05:38+00:00\",\"dateModified\":\"2012-05-22T15:09:38+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/www.withinweb.com\\\/info\\\/the-robots-txt-file\\\/\"},\"wordCount\":772,\"commentCount\":0,\"articleSection\":[\"SOE\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\\\/\\\/www.withinweb.com\\\/info\\\/the-robots-txt-file\\\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/www.withinweb.com\\\/info\\\/the-robots-txt-file\\\/\",\"url\":\"https:\\\/\\\/www.withinweb.com\\\/info\\\/the-robots-txt-file\\\/\",\"name\":\"The robots.txt file - PHP Web Applications\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/www.withinweb.com\\\/info\\\/#website\"},\"datePublished\":\"2012-05-22T15:05:38+00:00\",\"dateModified\":\"2012-05-22T15:09:38+00:00\",\"author\":{\"@id\":\"https:\\\/\\\/www.withinweb.com\\\/info\\\/#\\\/schema\\\/person\\\/04da5531c302d55ffcd777fe81dbb93c\"},\"breadcrumb\":{\"@id\":\"https:\\\/\\\/www.withinweb.com\\\/info\\\/the-robots-txt-file\\\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/www.withinweb.com\\\/info\\\/the-robots-txt-file\\\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/www.withinweb.com\\\/info\\\/the-robots-txt-file\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/www.withinweb.com\\\/info\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"The robots.txt file\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/www.withinweb.com\\\/info\\\/#website\",\"url\":\"https:\\\/\\\/www.withinweb.com\\\/info\\\/\",\"name\":\"PHP Web Applications\",\"description\":\"Information and support for products of WithinWeb.com\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/www.withinweb.com\\\/info\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"https:\\\/\\\/www.withinweb.com\\\/info\\\/#\\\/schema\\\/person\\\/04da5531c302d55ffcd777fe81dbb93c\",\"name\":\"paulv\",\"url\":\"https:\\\/\\\/www.withinweb.com\\\/info\\\/author\\\/paulv\\\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"The robots.txt file - PHP Web Applications","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.withinweb.com\/info\/the-robots-txt-file\/","og_locale":"en_US","og_type":"article","og_title":"The robots.txt file - PHP Web Applications","og_description":"When it comes to SEO, most people understand that a Web site must have content, \u201csearch engine friendly\u201d site architecture\/HTML, and meta data such as title tags, graphic alt tag tags and so on. However, some web sites totally disregarded&hellip;Read more The robots.txt file &#8250;","og_url":"https:\/\/www.withinweb.com\/info\/the-robots-txt-file\/","og_site_name":"PHP Web Applications","article_published_time":"2012-05-22T15:05:38+00:00","article_modified_time":"2012-05-22T15:09:38+00:00","author":"paulv","twitter_card":"summary_large_image","twitter_misc":{"Written by":"paulv","Est. reading time":"4 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.withinweb.com\/info\/the-robots-txt-file\/#article","isPartOf":{"@id":"https:\/\/www.withinweb.com\/info\/the-robots-txt-file\/"},"author":{"name":"paulv","@id":"https:\/\/www.withinweb.com\/info\/#\/schema\/person\/04da5531c302d55ffcd777fe81dbb93c"},"headline":"The robots.txt file","datePublished":"2012-05-22T15:05:38+00:00","dateModified":"2012-05-22T15:09:38+00:00","mainEntityOfPage":{"@id":"https:\/\/www.withinweb.com\/info\/the-robots-txt-file\/"},"wordCount":772,"commentCount":0,"articleSection":["SOE"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/www.withinweb.com\/info\/the-robots-txt-file\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/www.withinweb.com\/info\/the-robots-txt-file\/","url":"https:\/\/www.withinweb.com\/info\/the-robots-txt-file\/","name":"The robots.txt file - PHP Web Applications","isPartOf":{"@id":"https:\/\/www.withinweb.com\/info\/#website"},"datePublished":"2012-05-22T15:05:38+00:00","dateModified":"2012-05-22T15:09:38+00:00","author":{"@id":"https:\/\/www.withinweb.com\/info\/#\/schema\/person\/04da5531c302d55ffcd777fe81dbb93c"},"breadcrumb":{"@id":"https:\/\/www.withinweb.com\/info\/the-robots-txt-file\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.withinweb.com\/info\/the-robots-txt-file\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/www.withinweb.com\/info\/the-robots-txt-file\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.withinweb.com\/info\/"},{"@type":"ListItem","position":2,"name":"The robots.txt file"}]},{"@type":"WebSite","@id":"https:\/\/www.withinweb.com\/info\/#website","url":"https:\/\/www.withinweb.com\/info\/","name":"PHP Web Applications","description":"Information and support for products of WithinWeb.com","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.withinweb.com\/info\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Person","@id":"https:\/\/www.withinweb.com\/info\/#\/schema\/person\/04da5531c302d55ffcd777fe81dbb93c","name":"paulv","url":"https:\/\/www.withinweb.com\/info\/author\/paulv\/"}]}},"_links":{"self":[{"href":"https:\/\/www.withinweb.com\/info\/wp-json\/wp\/v2\/posts\/700","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.withinweb.com\/info\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.withinweb.com\/info\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.withinweb.com\/info\/wp-json\/wp\/v2\/users\/40"}],"replies":[{"embeddable":true,"href":"https:\/\/www.withinweb.com\/info\/wp-json\/wp\/v2\/comments?post=700"}],"version-history":[{"count":7,"href":"https:\/\/www.withinweb.com\/info\/wp-json\/wp\/v2\/posts\/700\/revisions"}],"predecessor-version":[{"id":711,"href":"https:\/\/www.withinweb.com\/info\/wp-json\/wp\/v2\/posts\/700\/revisions\/711"}],"wp:attachment":[{"href":"https:\/\/www.withinweb.com\/info\/wp-json\/wp\/v2\/media?parent=700"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.withinweb.com\/info\/wp-json\/wp\/v2\/categories?post=700"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.withinweb.com\/info\/wp-json\/wp\/v2\/tags?post=700"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}