Show Reel - Epicentre CrawlerShow Reel - Epicentre Crawler

Crawlers tend to crawl and get a whole HTML page - which can be dangerous as most web pages have navigation links, partner links and advertisements which can be "noise" compared to the "meat" or core content of the web page.

This means tools underlying Google Adsense can scan a whole web page and get too excited by the words in the peripheral zones of a page, rather than drawing contextual inference from just the proper part of the web page.

Grapeshot can isolate the most important words on a page, which invariably are at the heart of the story, and then use that "epicentre" to work out to the story limits, thereby avoiding picking up all the erroneous words on a web page. Epicentre goes to the heart of the matter - the core part of a page, to the text that matters.

An Epicentre example
  • This is a web page from a UK publisher, The Sun, which is riddled with navigation links and in-line advertisements
  • Web Page from The Sun newspaperWeb Page from The Sun newspaper

  • The Epientre Crawler identifies the core body of the page, shaded green here, and usefully bypassing the advertisements and erroneous links
  • Shaded Extract from The Sun newspaperShaded Extract from The Sun newspaper

  • Grapeshot turns the web page in to a core body of text, from which Grapehot's WordRank can start evaluating the significance of each word, to determine Keyword Extraction and Categorisation
  • Text Extract from The Sun newspaperText Extract from The Sun newspaper

When you check the Epicentre box on the Showreel demos listed below, you are actually deploying the Epicentre Crawler, in real time, on the web page of your own choice: