Web Scraping, Data Extraction, and Web Mining

Do you need to web-scrape web data into your database, spreadsheet or any other application? In just minutes, you can use Kantu to do all the web-harvesting you need – automatically and without coding.

Extract Anything

Quickly turn web page content into structured data all without coding, IT resources, or headaches. Whether it’s price lists, stock information, financial data or any other type of data, Kantu can extract it. Kantu can even extract text from videos and PDF documents. The data can be written to standard CSV text files or you can Kantu’s API to write directly to databases.

Kantu's screen scraping solution allows you to visually mark the data that you want to extract ("scrape"). You simply draw pink frame(s) around the data that you need. Kantu then retrieves the data directly from the HTML source or extracts it visually by using high-quality OCR (Optical Character Recognition). The OCR approach works not only for web scraping, but also for PDF scraping, images (screen scraping) and videos.

Visual Web Scraping with Kantu in Chromium

This screenshot shows the Extraction wizard inside the Kantu Editor. Essentially this is a tiny graphical editor that allows you the draw, move and delete green and pink frames.

Top

Real-World Use Cases

Some real-word examples of how Kantu is used to extract data:

  • Download data from various online banking sites, consolidate them and upload to Google Spreadsheets for order processing
  • Update internal systems with the latest exchange rates and stock-market quotations.
  • Extract data from PDF invoices via OCR (receipt OCR)
  • Gather search engine rankings.
  • Monitor order status from e-commerce portals. See what orders you still need to fulfill, when they were ordered, and all applicable details.
  • Gather bookings for any type of resort, or area.
  • Gather price, quantity, item name, description, etc., from a supplier’s website.
  • Check competitor’s shipping rates on major shopping sites.
  • Monitor web-server availability and status.
  • Extract product images and specification documents.
  • Extract useful information from encyclopedia and journal websites.

I run hundreds of macros against hundreds of websites each week. If it wasn't for Kantu I would have to sit around all day and download data.
Tim Schwartz, USA - More user quotes

Top

Why Choose Kantu for Web Scraping/Data Extraction?

Works with every website

Even websites that use dialog boxes, frames, Javascript, Flash, Flex, Java and even AJAX can be automated with Kantu.

Zero learning curve

Kantu integrates with every Windows scripting or programming language, so there's no need to learn a new language to work with Kantu.

You're in full control

Kantu is an application that you can run on your own machine(s), not a hosted service. You have full control over it and it never expires.

Built-in toolset

Kantu comes with sample macros, scripts and programs (with complete source code) that you can easily customize for your own needs.

Built-in OCR and PDF data extraction

Kantu is the only web scraping tool with built-in zonal OCR features. So it can extract information even from videos or PDF. This works also great for receipt OCR.

Custom script creation available

Our tech support can help you getting started, and even create the first data extraction scripts for you – at no additional cost.

For more in-depth information on how Kantu data extraction works technically, visit the web scraping user manual.

Just a quick note to say thanks as we have now just about finished development of the application (macros) for which it was purchased. Overall a really excellent product, and fantastic support. It will undoubtedly save us a lot of time and money in the coming months, and no doubt we will find lots of new ways to make use of it.
Jon Ross, USA - More user quotes

Top
Follow a9t9 on Twitter
Contact Us
Download Kantu Freeware
Copyfish for Chrome/Firefox
Subscribe to the a9t9 automation software newsletter . We'll send you updates on new releases that we're working on.