1. This website uses cookies. By continuing to use this website you are giving consent to cookies being used.
    For information on cookies and how you can disable them visit our Cookie Usage page.
    Dismiss Notice

Others robots.txt

Discussion in 'Web Development' started by narinesa, Jul 15, 2009.

  1. narinesa

    narinesa New Member

    Need some help.
    I have a seperate file for key works in a file named robots.txt
    I have this file in the root directory - same as index.html

    As in this example:

    User-agent:*
    Disallow:/
    Allow:/robots.txt
    /Investigators,
    Private Detective,
    etc

    I try to test my robots.txt and I get a message:
    "missing / at start of file or folder name"
    Don't know what this means or how to fix what.

    Looks like I don't have the correct coding ?
     
  2. ishkey

    ishkey Moderator, Logos, Sports Crests Staff Member Verified Member

    If it helps here is mine

    # /robots.txt file for http://cleandeck.net/
    # mail webmaster@cleandeck.net for constructive criticism

    User-agent: webcrawler
    Disallow:

    User-agent: lycra
    Disallow:

    User-agent: *
    Disallow: /tmp
    Disallow: /logs

    Just remember most of the bad spider crawlers ignore robots text. the email is bogus, it's used more as bait. the baddies get their email address and stop crawling my site.
     
  3. narinesa

    narinesa New Member

    Ok I understand your piece:
    User-agent: *
    Disallow: /tmp

    Disallow: /logs
    Now I want to add keyworks here to be indexed from robots.txt.

    Is this possible to do keywords in robots.txt or do I have the entire concept wrong ?

    (Don't want the keywords in index.html)
     
  4. ishkey

    ishkey Moderator, Logos, Sports Crests Staff Member Verified Member

    robot.txt is not for key words, your html, php files hold them.
    don't know why you don't???
    so don't put any in.
    robot.txt was used in the old days to tell the spider what to index/ what to leave alone. The nice ones still respect this file.
    If you want real protection use the .htaccess file.
    something like this:

    Options -Indexes

    <Files 403.shtml>

    <limit GET POST PUT>
    order deny,allow


    order allow,deny
    allow from all
    </Files>


    ErrorDocument 404 /oops.html

    #get rid of bad bots
    RewriteEngine on
    RewriteCond %{HTTP_USER_AGENT} ^BadBot [OR]
    RewriteCond %{HTTP_USER_AGENT} ^EvilScraper [OR]
    RewriteCond %{HTTP_USER_AGENT} ^FakeUser
    RewriteRule ^(.*)$
    http://go.away/
    deny from 217.199.217.3
    deny from 98.131.11.144
    deny from 63.251.179.32
    deny from 193.46.236.151
    deny from 195.251.117.228
    deny from 190.72.184.105
    deny from 89.149.241.126
    deny from 85.140.206.177
    deny from 195.251.117.0/24
    deny from 85.140.0.0/16
    deny from 89.15.191.25
     
  5. narinesa

    narinesa New Member

    I was looking to maintain a seperate robots.txt file to do 2 things:
    1. to disallow all directories
    2. to use the keywords.

    Don't see this is possible.

    I have to do robots.txt to dirallow all directories and
    meta code all keywords in my index.html

    Also is this the correct statement for my index.html
    <meta name="robots" contents="noindex,nofollow">

    Sorry, I am a newbie - just started this last month - alot to learn.
     
  6. ishkey

    ishkey Moderator, Logos, Sports Crests Staff Member Verified Member

    Yep you are right - but all you are leaving out are the good ones like google, yahoo. ms, just to name a few. The baddies do not give a S#%* about your meta tags or robots.txt file they will find the weak spot.

    Don't understand your reasons for no index or not wanting to put keywords where they belong. If your content is written well enough, you may not need keywords. Some say they are on the way out, I say about halfway out.
     
  7. narinesa

    narinesa New Member

    My understanding of "noindex,nofollow" is that the crawler will not index my entire index.html page and follow the links and pick up junk.
    Is my statement wrong ?

    Also by providing a meta keywords in my index.html, crawlers will index the keywords. At least this is how I now have it setup.

    Correction would be appreciated.
     
  8. ishkey

    ishkey Moderator, Logos, Sports Crests Staff Member Verified Member

    the NOFOLLOW directive only applies to links on the page it is written. It's entirely likely that a robot might find the same links on some other page without a NOFOLLOW (perhaps on some other site), and so still arrives at your undesired page.
    you got it
     
  9. narinesa

    narinesa New Member

  10. ishkey

    ishkey Moderator, Logos, Sports Crests Staff Member Verified Member

    No it is not necessary to use this file and their will not be any problems, but if you do use it, you are right it has to be in the root directory.