Others robots.txt

narinesa · Jul 15, 2009

Need some help.
I have a seperate file for key works in a file named robots.txt
I have this file in the root directory - same as index.html

As in this example:

User-agent:*
Disallow:/
Allow:/robots.txt
/Investigators,
Private Detective,
etc

I try to test my robots.txt and I get a message:
"missing / at start of file or folder name"
Don't know what this means or how to fix what.

Looks like I don't have the correct coding ?

ishkey · Jul 15, 2009

If it helps here is mine

# /robots.txt file for http://cleandeck.net/
# mail webmaster@cleandeck.net for constructive criticism

User-agent: webcrawler
Disallow:

User-agent: lycra
Disallow:

User-agent: *
Disallow: /tmp
Disallow: /logs

Just remember most of the bad spider crawlers ignore robots text. the email is bogus, it's used more as bait. the baddies get their email address and stop crawling my site.

narinesa · Jul 15, 2009

Ok I understand your piece:
User-agent: *
Disallow: /tmp
Disallow: /logs
Now I want to add keyworks here to be indexed from robots.txt.

Is this possible to do keywords in robots.txt or do I have the entire concept wrong ?

(Don't want the keywords in index.html)

ishkey · Jul 15, 2009

robot.txt is not for key words, your html, php files hold them.
don't know why you don't???
so don't put any in.
robot.txt was used in the old days to tell the spider what to index/ what to leave alone. The nice ones still respect this file.
If you want real protection use the .htaccess file.
something like this:

Options -Indexes

<Files 403.shtml>

<limit GET POST PUT>
order deny,allow

order allow,deny
allow from all
</Files>

ErrorDocument 404 /oops.html

#get rid of bad bots
RewriteEngine on
RewriteCond %{HTTP_USER_AGENT} ^BadBot [OR]
RewriteCond %{HTTP_USER_AGENT} ^EvilScraper [OR]
RewriteCond %{HTTP_USER_AGENT} ^FakeUser
RewriteRule ^(.*)$ http://go.away/
deny from 217.199.217.3
deny from 98.131.11.144
deny from 63.251.179.32
deny from 193.46.236.151
deny from 195.251.117.228
deny from 190.72.184.105
deny from 89.149.241.126
deny from 85.140.206.177
deny from 195.251.117.0/24
deny from 85.140.0.0/16
deny from 89.15.191.25

narinesa · Jul 15, 2009

I was looking to maintain a seperate robots.txt file to do 2 things:
1. to disallow all directories
2. to use the keywords.

Don't see this is possible.

I have to do robots.txt to dirallow all directories and
meta code all keywords in my index.html

Also is this the correct statement for my index.html
<meta name="robots" contents="noindex,nofollow">

Sorry, I am a newbie - just started this last month - alot to learn.

ishkey · Jul 15, 2009

Yep you are right - but all you are leaving out are the good ones like google, yahoo. ms, just to name a few. The baddies do not give a S#%* about your meta tags or robots.txt file they will find the weak spot.

Don't understand your reasons for no index or not wanting to put keywords where they belong. If your content is written well enough, you may not need keywords. Some say they are on the way out, I say about halfway out.

narinesa · Jul 15, 2009

My understanding of "noindex,nofollow" is that the crawler will not index my entire index.html page and follow the links and pick up junk.
Is my statement wrong ?

Also by providing a meta keywords in my index.html, crawlers will index the keywords. At least this is how I now have it setup.

Correction would be appreciated.

ishkey · Jul 16, 2009

the NOFOLLOW directive only applies to links on the page it is written. It's entirely likely that a robot might find the same links on some other page without a NOFOLLOW (perhaps on some other site), and so still arrives at your undesired page.

<META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">
Click to expand...

Also by providing a meta keywords in my index.html, crawlers will index the keywords. At least this is how I now have it setup.
Click to expand...

you got it

narinesa · Jul 16, 2009

Thanks for your assistance. I am online.

www.ForYourEyesOnlyAgency.com

ishkey · Aug 13, 2009

No it is not necessary to use this file and their will not be any problems, but if you do use it, you are right it has to be in the root directory.

Log in or Sign up

Others robots.txt

narinesa New Member

ishkey Moderator, Logos, Sports Crests Staff Member Verified Member

narinesa New Member

ishkey Moderator, Logos, Sports Crests Staff Member Verified Member

narinesa New Member

ishkey Moderator, Logos, Sports Crests Staff Member Verified Member

narinesa New Member

ishkey Moderator, Logos, Sports Crests Staff Member Verified Member

narinesa New Member

ishkey Moderator, Logos, Sports Crests Staff Member Verified Member

Log in or Sign up

Others robots.txt

narinesa New Member

ishkey Moderator, Logos, Sports Crests Staff Member Verified Member

narinesa New Member

ishkey Moderator, Logos, Sports Crests Staff Member Verified Member

narinesa New Member

ishkey Moderator, Logos, Sports Crests Staff Member Verified Member

narinesa New Member

ishkey Moderator, Logos, Sports Crests Staff Member Verified Member

narinesa New Member

ishkey Moderator, Logos, Sports Crests Staff Member Verified Member

Useful Searches