Visual Web Scraping
Web Scraping, Data Extraction, and Web Mining
Do you need to web-scrape web data into your database, spreadsheet or any other application? In just minutes, you can use KantuX to do all the web-harvesting you need – automatically and without coding.
Quickly turn web page content into structured data all without coding, IT resources, or headaches. Whether it’s price lists, stock information, financial data or any other type of data, KantuX can extract it. KantuX can even extract text from videos and PDF documents. The data can be written to standard CSV text files or you can use KantuX’s API to write directly to databases.
KantuX's screen scraping solution allows you to visually mark the data that you want to extract ("scrape"). You simply draw pink frame(s) around the data that you need. KantuX then retrieves the data directly from the HTML source or extracts it visually by using high-quality OCR (Optical Character Recognition). The OCR approach works not only for web scraping, but also for PDF scraping, images (screen scraping) and videos.
This screenshot shows the Extraction wizard inside the KantuX Editor. Essentially this is a tiny graphical editor that allows you the draw, move and delete green and pink frames.Top
Real-World Use Cases
Some real-word examples of how KantuX is used to extract data:
- Download data from various online banking sites, consolidate them and upload to Google Spreadsheets for order processing
- Update internal systems with the latest exchange rates and stock-market quotations.
- Extract data from PDF invoices via OCR (receipt OCR)
- Gather search engine rankings.
- Monitor order status from e-commerce portals. See what orders you still need to fulfill, when they were ordered, and all applicable details.
- Gather bookings for any type of resort, or area.
- Gather price, quantity, item name, description, etc., from a supplier’s website.
- Check competitor’s shipping rates on major shopping sites.
- Monitor web-server availability and status.
- Extract product images and specification documents.
- Extract useful information from encyclopedia and journal websites.
I run hundreds of macros against hundreds of websites each week. If it wasn't for KantuX I would have to sit around all day and download data.
Tim Schwartz, USA - More user quotes
Why Choose KantuX for Web Scraping/Data Extraction?
Works with every website
Zero learning curve
KantuX integrates with every Windows scripting or programming language, so there's no need to learn a new language to work with KantuX.
You're in full control
KantuX is an application that you can run on your own machine(s), not a hosted service. You have full control over it and it never expires.
KantuX comes with sample macros, scripts and programs (with complete source code) that you can easily customize for your own needs.
Built-in OCR and PDF data extraction
KantuX is the only web scraping tool with built-in zonal OCR features. Zonal OCR is a type of optical character recognition allows the software to read specific areas or "zones" of a document. So it can extract information even from videos or PDF. This works also great for receipt OCR.
Custom script creation available
Our tech support can help you getting started, and even create the first data extraction scripts for you – at no additional cost.
For more in-depth information on how KantuX data extraction works technically, visit the web scraping user manual.
Just a quick note to say thanks as we have now just about finished development of the application (macros) for which it was purchased. Overall a really excellent product, and fantastic support. It will undoubtedly save us a lot of time and money in the coming months, and no doubt we will find lots of new ways to make use of it.
Jon Ross, USA - More user quotes