blekkobot/scoutJet web crawler
Blekkobot (formerly known as ScoutJet) is the web crawler for blekko, a Silicon Valley based search engine created by the founders of DMOZ and Topix.
We are developing next generation search technology, and kindly request that you permit Blekkobot access to your site so that we may refine our relevance algorithms with the broadest variety of content available from the Internet.
blekkobot obeys robots.txt
You can prevent Blekkobot from indexing all or part of your
site by including the following lines in your site's
# Allow only specific directories
You can also limit the rate at which Blekkobot crawls your page using the Crawl-delay directive:
# Limit Blekkobot's crawl rate (example is to crawl no more than 1 page every 5 seconds)
In addition, Blekkobot understands wildcards and Allow.
For more details on how to create and edit your own
robots.txt file consult these useful resources: Wikipedia article on the Robots exclusion standard and Robotstxt.org.
blekkobot IP ranges
Blekkobot crawls from the following IP ranges:
199.87.248.*, 199.87.249.*, 199.87.250.*, 199.87.251.*, 199.87.252.*, 199.87.253.*, 199.87.254.*, 199.87.255.*
38.99.96.*, 38.99.97.*, 38.99.98.*, 38.99.99.*
blekkobot comes in peace
Blekkobot tries its best to crawl politely. But if you do experience a problem with Blekkobot, please let us know at crawler (at) blekko (dot) com.
do you still honor the ScoutJet user agent?
Yes. We will continue to honor the Scoutjet user agent sting in robots.txt in addition to the Blekkobot user agent. You only need to specify one or the other, not both, user agent string in your robots.txt.
why can't I submit my site?
We don't currently accept URL submissions. We discover and add new sites to our index by link-crawling. Basically you just need to make sure that you allow Blekkobot to crawl your site, and then we will need to discover it via links from other sites.
Do bad crawlers sometimes pretend to be Blekkobot or Scoutjet?
Yes, we've had reports of this happening. You can tell the fakes by using the IP address ranges listed above. They also frequently don't use the current User-Agent of our crawler.