CRAWLER

General information about the Grapeshot site crawler

What is it

The Grapeshot crawler is an automated robot that visits pages to examine and analyse the content, in this sense it is somewhat similar to the robots used by the major search engine companies.

The Grapeshot crawler is identified by having one of the following user-agents:

Mozilla/5.0 (compatible; GrapeshotCrawler/2.0; +https://www.grapeshot.com/crawler/)

Mozilla/5.0 (iPhone; CPU iPhone OS 8_3 like Mac OS X) AppleWebKit/600.1.4 (KHTML, like Gecko) Version/8.0 Mobile/12F70 Safari/600.1. 4 (compatible; GrapeshotCrawler/2.0; +https://www.grapeshot.com/crawler/)

The Grapeshot crawler can be identified by requests coming from Grapeshot owned IP address ranges, if you are suspicious about requests being spoofed you should first check the IP address of the request against the appropriate RIPE database, using a suitable whois tool or lookup service. In general the only valid addresses you should be seeing are in the address range 89.145.95.0 to 89.145.95.255 (89.145.95.0/24). At time of writing the only addresses in use for Grapeshot crawlers are 89.145.95.41 to 89.145.95.46.

Why is it crawling my site

Grapeshot assists advertisers to contextually place adverts on pages, to do this it is necessary to examine, or crawl, the page to determine which category, or categories, it best matches.

Pages are only ever visited on demand, so if the Grapeshot Crawler has visited your site then this means an ad was recently placed on that page where the Grapeshot information was either not yet available or needed to be refreshed. For this reason you will often see a request from the Grapeshot crawler shortly after a user has visited a page. The Crawler systems are engineered to be as friendly as possible, such as limiting request rates to any specific site, automatically backing away if a site is down or slow or is repeatedly returning non-200 (OK) responses.

It is important to be aware that there may be a significant chain of systems involved that cause Grapeshot to be analysing your site. Grapeshot has partnered with and provides real time contextual information to a number of Real Time Bidding (RTB) systems, such as Rubicon, Admeld, AppNexus and many others. These RTB systems are often used by other third party adserver systems as part of their ad serving strategy. Even major adserver providers such as Google/DoubleClick often pass their ad inventory through such RTB systems.

Blocking with robots.txt

Firstly note that Grapeshot is not providing a search engine system to anyone, we never make the crawled contents of your site available by any search or other system. As discussed in the previous section we are only analysing your site because an ad has been placed on your site that has caused us to be queried about the context of the page.

With a robots.txt files you may block the Grapeshot Crawler from parts or all of your site, as shown in the following examples:

Block specific parts of your site:

User-agent: grapeshot
Disallow: /private/
Disallow: /messages/

Block entire site:

User-agent: grapeshot
Disallow: /

Allow Grapeshot to crawl site:

User-agent: grapeshot
Disallow:

See also the wikipedia article for more details and examples of robots.txt rules.

All that said, we of course take any request to desist crawling any site, or parts of a site, or any other feedback on the Crawler operations seriously and will act on it in a prompt and appropriate manner, if this is the case for you please don't hesitate to contact us at crawler@grapeshot.com and we will be happy to exclude your site, or otherwise investigate immediately.

More information

If you think your site is being visited in error, or the crawler is causing your site problems then please email Grapeshot at crawler@grapeshot.com and we will investigate.