How To Control Search Engine Robots
Wouldn’t it be nice to be able to leave some code in your web site to tell the search engine spider crawlers to make your site number one? Unfortunately a robots.txt file or robots meta tag won’t do that, but they can help the crawlers to index your site
better and block out the unwanted ones.
First a little definition explaining:
Search Engine Spiders or Crawlers – A web crawler (also known as web spider) is a program which browses the World Wide Web in a methodical, automated manner. Web crawlers are mainly used to create a copy of all the visited pages for later processing by a search engine, that will index the downloaded pages to provide fast searches.
A web crawler is one type of bot, or software agent. In general, it starts with a list of URLs to visit. As it visits these URLs, it identifies all the hyperlinks in the page and adds them to the list of URLs to visit, recursively browsing the Web according to a set of policies.
Robots.txt – The robots exclusion standard or robots.txt protocol is a convention to prevent well-behaved web spiders and other web robots from accessing all or part of a website. The information specifying the parts that should not be accessed is specified in a file called robots.txt in the top-level directory of the website.
The robots.txt protocol is purely advisory, and relies on the cooperation of the web robot, so that marking an area of your site out of bounds with robots.txt does not guarantee privacy. Many web site administrators have been caught out trying to use the robots file to make private parts of a website invisible to the rest of the world. However the file is necessarily publicly available and is easily checked by anyone with a web browser.
The robots.txt patterns are matched by simple substring comparisons, so care should be taken to make sure that patterns matching directories have the final ‘/’ character appended: otherwise all files with names starting with that substring will match, rather than just those in the directory intended.
Meta Tag – Meta tags are used to provide structured data about data.
In the early 2000s, search engines veered away from reliance on Meta tags, as many web sites used inappropriate keywords, or were keyword stuffing to obtain any and all traffic possible.
Some search engines, however, still take Meta tags into some consideration when delivering results. In recent years, search engines have become smarter, penalizing websites that are cheating (by repeating the same keyword several times to get a boost in the search ranking). Instead of going up rankings, these websites will go down in rankings or, on some search engines, will be kicked off of the search engine completely.
Index a site – The act of crawling your site and gathering information.
How can the robots.txt file and meta tag help you?
In the robots.txt you can tell the harmful ‘web crawlers’ to leave your web site alone, and give helpful hints to the ones you want to crawl your site. Here is an example on how to disallow a web crawler to search your site:
# this identifies the wayback machine
User-agent: ia_archiver
Disallow: /
ia_archiver is the crawler name for the wayback machine that you may have heard of, and the / after disallow tells ai_archiver not to index any of your site. The # allows you to write comments to yourself so you can keep track of what you typed.
Type the above three lines into notepad from your computer and save it to the root directory of your web site as robots.txt. Web crawlers look for this document first at a web site before doing anything else. This helps the crawler to do its job, and helps the web site owner tell the spider what to do. Say for instance you have some data that you don’t want the crawlers to see. (Like duplicate content for other browser referrer pages)
You can deter crawlers from indexing the ‘duplicate’ directory by typing this into your robots.txt file. Or if you would like to have the robots.txt file created for you, visit www.rietta.com/robogen. To validate your robots.txt file to make sure it works properly you can visit www.searchengineworld.com/cgi-bin/robotcheck.cgi
User-agent: *
Disallow: /duplicate/
The * after user-agent says that this action applies to all crawlers and /duplicate/ after disallow tells all crawlers to ignore this directory and not search it. For each user-agent and disallow line there must be a blank space between them in order for it to function correctly. So this is how you would create the above two commands into a robots.txt file:
# this identifies the wayback machine
User-agent: ia_archiver
Disallow: /
User-agent: *
Disallow: /duplicate/
One thing to note that is very important: Anyone can access the robots.txt file of a site. So if you have information that you don’t want anyone to see don’t include it into the robots.txt file. If the directory that you don’t want anyone to see is not linked to from your web site the crawlers won’t index it anyway.
An alternative to blocking indexing of your site is to put a meta tag into the page. It looks like this:
You put this into the tag of your web page. This line tells the robot crawlers not to index (search) the page and not to follow any of the hyperlinks on the page. So as an example tells the robots crawlers to not index the page, but follow the hyperlinks on this page.
Did you know that Google has its own tag?
It looks like this: This tells the Google robot crawler not to index the page, not to follow any of the links, and not to keep from storing cached versions of your web site. You will want this done if you update the content on your site frequently. This prevents the web user from seeing outdated content that isn’t refreshed because of storage in the cache.
You can use the tag to specifically talk to Google’s robots to avoid complications or if you are optimizing your site for Google’s search engine. This concludes this month’s article.
Until the next article have a great day!









Here is another great blog regarding getting a lot of traffic from Twitter with your Wordpress blogs. Here it is: http://www.wordpressrobot.com/5-steps-to-becoming-highly-infectious-on-twitter You can use unlimited Twitter accounts to tweet from and use proxies if you want. This can absolutely bring in lots of visitors from Twitter to all your Wordpress sites. All fully on autopilot!
I found your site via google thanks for the post. I will bookmark it for future reference. Thanks Stock market Forum
Hey thanks for sharing this tutorial. Worked for me.
I believe when you need a dictionary to read the first paragraph of a blog post, you really wouldn’t want to continue, especially when you’re in a rush.
found your site on del.icio.us these days and truly liked it.. i bookmarked it and will probably be back to take a look some more later
Fantastic blog! I truly love how it’s easy on my eyes and the data are well written. I am wondering how I can be notified whenever a new post has been made. I have subscribed to your rss feed which must do the trick! Have a nice day!
I really appreciate this post. We have to have much more people liek you bringing value towards the community.
I am thinking about local seo marketing but I am in yet another country, does any person aid me with some options?
Terrific web-log, I seldomly find web-logs that I follow nowadays, articles like yours remind me of the fact that my browsers has bookmarks.
Aurum is a sweet liqueur from Italy, made from rum, tea, and tangerines. Aurum’s color (the word is Latin for gold) is due to saffron, which is grown in the Abruzzo region of Italy where the liqueur is historically produced. [1]
Hello, The nice article sites just found your site on with bing helped me a lot! I really like the info, thank you.
Notice to ALL Internet marketers! There is a NEW kind of Internet marketing forum that has just opened their doors to the general public. This forum was created by Internet marketers, for Internet marketers. Each member tests and reveals new strategies and techniques that works to the ENTIRE community. Learn techniques only found in high dollar exclusive forums! If you want to learn new techniques and strategies that can take your Internet marketing career to the next level, you have got to join this forum for at least one month to find out for yourself! This forum will be your LAST training ground! Hope to see you there!
When I see a really great article I usually do three things:1.Share it with the close friends.2.save it in some of the best social sharing sites.3.Make sure to visit the same site where I read the post.After reading this article I am really concidering doing all three…
When I stumble upon a really good post I usually do some things:1.Forward it to all the relevant contacts.2.save it in all of the favorite social bookmarking sites.3.Be sure to return to the same blog where I first read the article.After reading this article I’m seriously thinking of doing all three…
That is the way i think too
If you have dentures, don’t use artificial sweetener, cause you’ll get a fake cavity.
This bill is the farthest thing from communism. It is a compromise but not communism. Gosh, it isn’t even socialism. The bill keeps the private health insurance industry in place, forces it to be more efficient and expands its market by 30 million people. We are free to make our own choices and it is completely ridiculous to say that our government runs “everything in our lives”.
Dreamin. I love blogging. You all express your feelings the right way, because they are your feeling, focus on your blog it is great.
Apple now has Rhapsody as an app, which is a great start, but it is currently hampered by the inability to store locally on your iPod, and has a dismal 64kbps bit rate. If this changes, then it will somewhat negate this advantage for the Zune, but the 10 songs per month will still be a big plus in Zune Pass’ favor.
Was looking for information on cash loans so thanks!
Wonderful job. I am going to need a bit of time to entertain the info.
Hello… I can not access your rss feed… Something trouble? Can you fix it? Thanks
I am glad to be a visitant of this stark weblog ! , appreciate it for this rare information! here .
That was a superb article. I do not agree with each single thing that you said but still good nonetheless. On a side note, I am so ecstatic that the NFL is back. It seems like I been waiting forever. This has to be my favorite time of the year. Sorry, I’m rambling. lol
I was searching for more information on How To Control Search Engine Robots on AOL and this page was the first site I saw about it. Thanks for sharing and now I know where to look for cool stuff anytime I need it
Hello, discovered your site accidentally doing a search on google however We?lmost all certainly be returning. ? How do i believe in Lord when only a week ago I got my language captured in the curler of the electric typewriter?
This is one of the top sites I have visited in some time, thanks for writing!. Favourited and will be back for more.
I am glad to be a visitor of this utter weblog ! , thanks for this rare information! here .
You completed various nice points there. I did a search on the issue and found nearly all people will go along with with your blog.
my sentiments and I will instantly snatch your rss feed to be updated on any upcoming content you may publish,I am really fan of your web;
What are your must have research resources?
Sorry for the huge review, but I’m really loving the new Zune, and hope this, as well as the excellent reviews some other people have written, will help you decide if it’s the right choice for you.
Hello there! I really enjoy reading your blog! If you keep making such great posts I will come back every day to keep reading!
do you have twitter or facebook,I want to be your fans in hurry.1$
I tried to subscribe to your rss feed, but had a problem adding it to google reader. Could you please check this out.
For those of us that do article marketing for our internet business, here a site I found that might be helpful.
Hey, i¡¯m mostly an avid reader of magazines but every now and then I like to sit by my laptop and browse some quality blogs for interesting information. Thanks for making my sandwich and cup of coffee that more pleasant!
Just wanted to reocmmend a good article marketing site here.
I’d be inclined to concur with you one this subject. Which is not something I usually do! I love reading a post that will make people think. Also, thanks for allowing me to comment!
Sorry for the huge review, but I’m really loving the new Zune, and hope this, as well as the excellent reviews some other people have written, will help you decide if it’s the right choice for you.
Hi there! I simply wish to give a huge thumbs up for the nice info you may have here on this post. I will probably be coming back to your blog for extra soon.
Surprising time in here; I forget to do my other activity because of your wonderful site. It doesn’t matter with me, because it is worthed and I will learn new knowledge, hope progressively I can meet with your speech. Linguists and educationalists (in my school) had conflicted and debated in several subject, I got the correction when I read the full article here. The positive effects of debatable discussion in my school are great brain for future time (for me and for my friends). Many subjects and topics with great confusing material in my school, but I have initiation step that your site has better correct conclusion. The above discussion in Linguists and educationalists is great. I’ve used several techniques for my research, for example online comparison. Would you mind if I make citation for my future project? (Of course I will tell you later, once I got the project plan in my hand). Thanks for your attention in reading my comments; you can shot me in my comment details to execute this project, so you can to be as a great part in my project.
Hi I love your comment and it is so good and I am definetly going to save it. One thing to say the Superb analysis you have done is trully remarkable.No one goes that extra mile these days? Well Done. Just another suggestion you caninstall a Translator Application for your Worldwide Readers …
I have been surfing online more than three hours today, yet I never found any interesting article like yours. It’s pretty worth enough for me. In my view, if all webmasters and bloggers made good content as you did, the net will be much more useful than ever before.
I consider that is a thing many people perform and you are not by yourself.
I personally think there is a motive to writing articles that only a few posses and frankly you have it , you genius?
Wow – now that’s perspective! I think we often react in agreement or disagreement because of our emotions, but hearing another side, passionately presented, really makes us think!
Keep the faith, my Internet friend; You are a first-class writer and deserve to be heard.
What makes the difference is the reality. I am still of the opinion that it is better to be document the whole time.