robots.txt for path-info [message #26997] |
Wed, 24 August 2005 12:59 |
!alex
Messages: 23 Registered: February 2004 Location: germany
Karma: 0
|
Junior Member |
|
|
Here`s a suggestion to prevent google & co from indexing ton`s of pages with no real content by robots.txt for the path-info theme:
User-agent: *
Disallow: /forum/index.php/r/reply_to/
Disallow: /forum/index.php/u/
Disallow: /forum/index.php/r/quote/
Disallow: /forum/index.php/mv/msg/
Disallow: /forum/index.php/m/
Disallow: /forum/index.php?
Disallow: /forum/index.php/sel/
Disallow: /forum/index.php/r/
Disallow: /forum/index.php/i/
Disallow: /forum/index.php/mn/tree/
Disallow: /forum/index.php/mv/tree/
Disallow: /forum/pdf.php
Disallow: /forum/index.php/pv/
Disallow: /forum/index.php/sp/
Disallow: /forum/index.php/rm/
Disallow: /forum/index.php/a/
I also use this in my htdig-config file in this way:
exclude_urls: /cgi-bin/ .cgi /forum/index.php/r/reply_to/ /forum/index.php/u/ /forum/index.php/r/quote/ /forum/index.php/mv/msg/ /forum/index.php/m/ /forum/index.php? /forum/index.php/sel/ /forum/index.php/r/ /forum/index.php/i/ /forum/index.php/mn/tree/ /forum/index.php/mv/tree/ /forum/pdf.php /forum/index.php/pv/ /forum/index.php/sp/ /forum/index.php/rm/ /forum/index.php/a/
Would be great to hear if i missed something or should remove some of these Directories from the disallow/exclusion list.
With htdig this works pretty well preventing double-listings, for google i updated my robots.txt today due to pretty much load on my server caused by a big crawl ...
Any Experiences/Suggestions welcome.
Peace,
Alex
[Updated on: Wed, 24 August 2005 13:00] Report message to a moderator
|
|
|
Re: robots.txt for path-info [message #31535 is a reply to message #26997] |
Thu, 04 May 2006 14:27 |
matthieu_phpmv
Messages: 44 Registered: November 2004
Karma: 0
|
Member |
|
|
Here's mine on phpmyvisites.net/robots.txt
Quote: |
User-agent: *
Disallow: /forums/pdf.php
Disallow: /forums/index.php/m
Disallow: /forums/index.php/sp/
Disallow: /forums/index.php/ef/
Disallow: /forums/index.php/mv/
Disallow: /forums/index.php/r/
Disallow: /forums/index.php/pmm/
Disallow: /forums/index.php/rm/
Disallow: /forums/index.php/sel/
Disallow: /forums/index.php/pv/
Disallow: /forums/index.php/ma/
Disallow: /forums/index.php/u/
Disallow: /forums/index.php/s/
Disallow: /forums/index.php/h/
Disallow: /forums/index.php/i/
Disallow: /forums/index.php/l/
Disallow: /forums/rdf.php
|
With this file bots will ONLY spider the messages, it is what I want. (no profile, no pdf messages, etc. only real and uniques message)
[Updated on: Fri, 05 May 2006 12:09] Report message to a moderator
|
|
|