|
Crawling Across Chaos and Time Without End
|
|
Jan
23
2009
Obama Loosens Robots.txt for Open GovernmentAs well as being true to his words about Guantanamo;
- new President Obama has kicked out the blocking mechanisms from the Whitehouse website…. Apart from the fact that I’d never actually checked, I was astounded to find that the previous robots.txt file went to over 2300 lines of blocked files and folders! Now it has just two; User-agent: * Disallow: /includes/ You can check the file for yourself on any site, the Whitehouse’s is here, http://www.whitehouse.gov/robots.txt It’s not necessary to have the file, and no robot or spider, indexing the web, actually has to abide by it, but it’s useful to speed up trawling and get properly indexed by the “good” spiders. The robots.txt for this site exists and can be examined here, http://strangelyperfect.tv/robots.txt. The reason it’s bigger than the above is because I was getting spurious returns because of the use of the language translator(s) that I’ve used. To avoid confusion and to stop the Google downgrade that happens for multiple pages of the same content, I block the “virtual” pages from the index. I will review this soon because the plugin designs have changed since I did it. Like Obama, I’m for open-ness. It does make one wonder about the mentality the previous administration that so wanted to clamp down on the mechanisms of democratic government – or maybe it just confirms, yet again, what we already knew. And bizarrely, they seemed unaware that there is no compunction whatsoever for a robot spider to pay any attention to the actual file! It’s in the specs. Links: http://www.robotstxt.org/robotstxt.html – Robot Usage from the Organisation that sets the standard Amazon Related:
Related Posts by TagsImprove the web with Nofollow Reciprocity.
|
© 2007-2010 Strangely Perfect All Rights Reserved
Strangely Perfect is Digg proof thanks to caching by WP Super Cache