robots.txt, nofollow, noindex, and Search Engine Behavior

Published: 06th September 2010
Views: N/A
Ask About This Article Print Republish This Article
Search Engine Behavior



Take a quick look at the second page of search results for this site, about a week after google began indexing the page. Some links have far more information! Google Search Results Example nofollow noindex index follow Clearly, there is something "different" about the top two results, from the rest. What is different is that the Googlebot never visited the bottom 8 links. It knows about those links, but it never officially went to those links because they are outlawed by the robots.txt file. So google officially doesn't know what is on those pages, and hence only displays the link, as that link was found on a separate page.

nofollow



The same behavior would have been found if those pages had been given the "nofollow" attribute - the googlebot would not have officially visited them, and although the result could appear in the Search Engine Result Page (SERP), it would be without any description or title. At the bottom of this page, by the way, is the full text of my drupal 6 robots.txt file. One of my main goals is to eliminate duplicate content, remember. If you are very serious that you don't even want links for certain pages to appear in google at all, unfortunately you can't achieve that goal simply by using nofollow or robots.txt. You need noindex.


noindex



Ironically, in order to have your pages not appear in google, you need to allow the pages to be read by google with your robots.txt file and allow the links to be dofollow (not have the rel="nofollow" attribute). Then, you need to add the "noindex" meta tag to your page. This will result in the page not appearing in the SERPs.

Page Rank



The various search engines appear to have their own separate algorithms for determining the value of pages on the internet. Most have some sense of this rank being dependent on quality links. So how do these issues affect pagerank? From my research, I believe that nofollow links do not affect the receiving page's pagerank, but followed links do affect the receiving page's pagerank, even if the receiving page is robots.txt'd to be invisible. Hence, the links in the screenshot at the beginning of this article do have pagerank, even though google doesn't know what they contain, and hence google indexed them. If those links had also been nofollowed, google probably would not have indexed them, because they would not have had any pagerank.


Drupal and nofollow and robots.txt and pagerank



I think probably it would be handy for drupal to have an option such that it would nofollow any pages that are disallowed in the robots.txt file. So far, the nofollow options for drupal (nofollowlist, or nofollowing based on input format or user class) do not have this behavior. The reason why this behavior would be desirable is that it would help the googlebot (and presumably the other bots) to focus their attention (whether it's called pagerank or not) on the areas of one's site that one wants public: the areas not in the robots.txt file!

Robots.txt



(my robots.txt file: http://palma-seo.com/robots.txt )



# $Id: robots.txt,v 1.9 2007/06/27 22:37:44 goba Exp $

#

# robots.txt

#

# This file is to prevent the crawling and indexing of certain parts

# of your site by web crawlers and spiders run by sites like Yahoo!

# and Google. By telling these "robots" where not to go on your site,

# you save bandwidth and server resources.

#

# This file will be ignored unless it is at the root of your host:

# Used: http://example.com/robots.txt

# Ignored: http://example.com/site/robots.txt

#

# For more information about the robots.txt standard, see:

# http://www.robotstxt.org/wc/robots.html

#

# For syntax checking, see:

# http://www.sxw.org.uk/computing/robots/check.html



# Directories

User-agent: *

Disallow: /userlist/content

Disallow: /userlist/content/

Disallow: /s/

Disallow: /*/book/*

Disallow: /*/book*

Disallow: /*/book

Disallow: /*/export*

Disallow: /*/export/*

Disallow: /database/

Disallow: /includes/

Disallow: /misc/

Disallow: /modules/

Disallow: /sites/

Disallow: /themes/

Disallow: /scripts/

Disallow: /updates/

Disallow: /profiles/

Disallow: /profile

Disallow: /profile/

# Files

Disallow: /xmlrpc.php

Disallow: /cron.php

Disallow: /update.php

Disallow: /install.php

Disallow: /INSTALL.txt

Disallow: /INSTALL.mysql.txt

Disallow: /INSTALL.pgsql.txt

Disallow: /CHANGELOG.txt

Disallow: /MAINTAINERS.txt

Disallow: /LICENSE.txt

Disallow: /UPGRADE.txt

# Paths (clean URLs)

Disallow: /admin/

Disallow: /comment/reply/

Disallow: /contact

Disallow: /logout/

Disallow: /node/add/

Disallow: /search/

Disallow: /user/register/

Disallow: /user/password/

Disallow: /user/password

Disallow: /user/login/

Disallow: /user/

# Paths (no clean URLs)

Disallow: /?q=es/

Disallow: /?q=es

Disallow: /?q

Disallow: /?q=admin/

Disallow: /?q=comment/reply/

Disallow: /?q=contact/

Disallow: /?q=logout/

Disallow: /?q=node/add/

Disallow: /?q=search/

Disallow: /?q=user/password/

Disallow: /?q=user/

Disallow: /?q=user/register/

Disallow: /?q=user/login/

Disallow: /user/

Disallow: /user

Disallow: /admin

Disallow: /admin/

Disallow: /node/add

Disallow: /node/add/

Disallow: /aggregator/

Disallow: /aggregator

Disallow: /comment/

Disallow: /comment

Disallow: /contact

Disallow: /contact/

Disallow: /logout

Disallow: /logout/

Disallow: /search/

Disallow: /search

Disallow: /tribune

Disallow: /tribune/

Disallow: /calendar

Disallow: /calendar/

Disallow: /Calendar

Disallow: /Calendar/

#Disallow: /tracker

Disallow: /tracker/

Disallow: /*/track/

Disallow: /tracker?

Disallow: /*/feed$

Disallow: /*/feed*

Disallow: /*/feed/

Disallow: /blog/

Disallow: /*/track$



Disallow: /*/subscribe

Disallow: /*/subscribe/

Disallow: /*/subscribe*

# Views and Forum module problem:

Disallow: /*sort=

# Image module problem

Disallow: /*size=

#This avoids the creation of a duplicate home-page.

# The URL http://example.com/node is a duplicate of http://example.com/.

Disallow: /node$

Disallow: /print/

Disallow: /es

Disallow: /es/

Disallow: /category

Disallow: /category/

Disallow: /messages

Disallow: /messages/

Disallow: /taxonomy

Disallow: /taxonomy/

Disallow: /taxonomy_vtn

Disallow: /taxonomy_vtn/

Disallow: /aggregator

Disallow: /aggregator/

Disallow: /*/guestbook

Disallow: /node

Disallow: /node/

#This disallows the numerical forum urls (can still access at

# /forums/nicaragua etc).

Disallow: /forum/

Disallow: /image_captcha

Disallow: /image_captcha/

Disallow: /?

Disallow: /?page=*

Disallow: /?page=

Disallow: /?page=1

Disallow: /?page=2

Disallow: /?page=4

Disallow: /?page=3

Disallow: /?page=5

Disallow: /?page=6

Disallow: /?page=7

Disallow: /?page=8

Disallow: /?page=9

Disallow: /?page=10

Disallow: /?page=11

Disallow: /?page=12

Disallow: /?page=13

Disallow: /?page=14

Disallow: /?page=15

Disallow: /?page=16

Disallow: /?page=17

Disallow: /popular

Disallow: /popular/

Disallow: /node/

Disallow: /search

Disallow: /piwik

Disallow: /piwik.php

Disallow: /piwik/

Disallow: /search$

Disallow: /*?page=0,0$

Disallow: /*?page=0,1000$

Disallow: /central-america-latest-blogs

Disallow: /central-america-news

Disallow: /central-america-latest-blogs?page=*

Disallow: /central-america-news?page=*

Allow: /

Allow: /sites/*/files/



Palma SEO by SEO Hilo is your solution provider for website implementation. I specialize in drupal installation and search engine optimization consultation. Check out SEO Hawaii website for more useful tips.

This article is free for republishing
Source: http://peterpalma2.articlealley.com/robotstxt-nofollow-noindex-and-search-engine-behavior-1731697.html


Report this article Ask About This Article Print Republish This Article


Loading...
More to Explore
 


Ask a Professional Online Now
27 Experts are Online. Ask a Question, Get an Answer ASAP.
Type your question here...
Optional:
Select...