Screen Scraping

Data extraction (“Screen scraping” ) is a very important technique in data migration and integration scenarios. With its accurate OCR screen scraping features Kantu essentially adds an “Data API” to every Windows, Mac and Linux application. This includes terminal, remote desktop (RDP), mobile phone emulators and even the new Amazon (AWS) AppStream secure application streaming service. For more information please read screen scraping with OCR.


Screen scraping: The video starts at 0:42. We use OCRExtractRelative to extract the temperature from the remote desktop display of a smart phone app.

User Manual: Screen Scraping with Kantu

The sections below describe how to do screen scraping with Kantu technically. Visual screen scraping can be used on the desktop and in the browser. For browser automation, screen scraping inside the browser is the only option if you want to extract data from a PDF, image or video. If the data is part of a regular website, you have the additional option to do web scraping with selenium ide commands.

Text Recognition (also called Screen Scraping, OCR)

Kantu can use OCR to search for text on the screen. Optical Character Recognition (OCR) works on screenshots of the rendered web page. Just like the automated UI test commands, it works independently of the HTML page source code and document browser object. Thus, it works equally well on a simple website and on highly complex websites, canvas objects, inside images and videos and for PDF testing.

Enable and test the text recognition on the OCR tab
Enable and test the text recognition on the OCR tab, and combine them with XClick.

OCRExtract | image | variable and OCRExtractRelative | image | variable

Do you need to extract values from a video, scrape text from an image or extract text from a PDF? Then the OCRExtract commands helps. As the name suggests, it uses OCR to get the information. There are two ways to specify the text to extract:

Option 1: OCRExtract - Define OCR area via image

This method is the easiest. Kantu looks for the image, and then extracts the text from it. But if the content of the image area changes a lot, then the image is no longer found reliably. That is why we recommend to use OCRExtractRelative.

Option 2: OCRExtractRelative - Define OCR area in image with green and pink boxes

This method uses the green/pink box scheme, as described in the relative clicks section. The key difference here is that the content of the pink box is not clicked, but OCR'ed. And the OCR text result is stored in the variable. So only the content of the pink rectangle is used as input for OCR. No other data leaves the local system.

OCRExtract explained
Only the area inside the pink box is used as input for OCR.

OCRExtractRelative for Screen Scraping
Here we read the temperature from a mobile phone app via a remote desktop connection.

How to extract text from PDF

The OCRExtractRelative command is the best solution to extract text from PDF for specific coordinates. You load the PDF into Chrome, and then use OCRExtractRelative command to find the area with the text and extract it. This is also called zonal OCR. Kantu ships with the "DemoPDFTest_with_OCR" macro that shows how to get text from any PDF.

How to extract text from PDF
OCRExtractRelative runs Zonal OCR on area marked with the pink box.

Option 3: Use regular expression to extract text (available soon, contact us for early beta access))

The second method is regex=(regular expression). The regular expression is applied to the OCR result of the complete active screenshot area, and the match(es) are returned. Conceptually the OCRExtract | regex=.... command works just as sourceExtract | regex=... . The key difference is that OCRExtract regex works on the OCR text result, and the sourceSearch regex works on the HTML page source code. So the "only" difference is the input, the regular expression logic is the same.

Text Recognition Commands without Extraction

These commands use OCR to find a certain text and then do something.

XClick/XMove | ocr=text to search@pos=x

Robotic Process Automation: Text recognition and XClick combined are very useful for robotic process automation (RPA). When you specify XClick with OCR text as input, Kantu searches for the text, and then clicks on it. They key difference to the "good old" selenium IDE Click (locator) commands is that this works 100% visually. So it works absolutely on every web page, image, video or PDF. For more information see the XClick command.

Center of the OCR match

Every OCR search sets the ${!OCRX} and ${!OCRY} internal variables if a match is found. If more than one match is found, the location of the first match is used. The x/y value is the center of bounding rectangle of the found OCR word(s). This is the value that is used with the "XClick | Ocr=..." command. For image search we have !imageX/!imageY values and for OCR search the !ocrX/!ocrY value pair.

OCRSearch | text to search | variable

The OCRSearch command searches for a given text (partial matches ok) and stores the number of matches in the variable. If you want to check if the x-th match of a text exists, you can use the @pos parameter: OCRSearch | text to search@pos=x | variable. Conceptually the OCRSearch command is similar to sourceSearch. The key difference is that OCRSearch works visually on a screenshot, and the sourceSearch command works on the HTML page source code.

Top

OCR Engine, plans and privacy

How does Kantu generate the OCR results? By design, Kantu operates 100% locally and no data ever leaves your machine. The OCR feature is different and that is why it is disabled by default. There are 3 different settings on the Kantu OCR tab:

OCR disabled

This is the default settings. All OCR commands are blocked and no data leaves your machine.

OCR via online ocr api

When the OCR commands are enabled, Kantu takes a screenshot of the visible part of the website inside the browser and sends it to the OCR API for processing (with OCRExtract, only the part inside the pink box is send). The OCR API returns the result, and Kantu uses it to find the right word on the right place on the screen. On a fast internet connection, the run time for the OCR process is typically less than a second. After the screenshot is processed, it is deleted from the OCR server. Absolutely nothing is stored on the server. We know this for sure, because the OCR.space OCR API is developed in-house. OCR.space has the best, most strict privacy policy from all OCR providers.

Since we use the OCR.space OCR engine, the OCR API documentation, the list of supported OCR languages, tips and tricks apply to the Kantu OCR features as well. On the OCR tab, you can define the default OCR language. And witht the !OCRLanguage internal variable you can set the OCR language per macro. !OCRLanguage takes the 3-letter ISO language code as input.

Kantu includes 100 free OCR conversions per day. The conversion counter is automatically reset every day. More conversions can be purchased as part of our XModule PRO and Enterprise plans.

Offline OCR

We understand that some organizations can not allow the use of any cloud services at all. In this case we recommend our on-premise Kantu OCR server installation. The Kantu OCR Server is a special version of the OCR.space Local Self-hosted, On-Premise OCR Server. It runs 100% locally and requires no Internet connection. One Kantu Offline OCR server can be used with all Kantu installations in your company - so only one license is required. After the OCR server is installed, enter the URL of the server and its api key on the Kantu OCR settings tab. The Kantu OCR server is available as paid add-on for Kantu XModule Enterprise Edition users. For more information and to order the Kantu Offline OCR package please contact sales.

Top

OCR-driven Robotic Process Automation (RPA)

Tips for debugging OCR automation issues:

Tip 1: Kantu always stores the last screenshot that it makes as "_lastscreenshot" on the visual tab. So you can check there if the screenshot contains the information that you need.

_lastscreenshot
The last screenshot taken as input for OCR and computer vision is stored as _lastscreenshot. So you see what Kantu sees.

Tip 2: The "Test OCR button" on the OCR tab and the "Find" button when OCRSearch is selected as command both trigger an OCR conversion and display the result as overlay in the browser. This allows you to check if the OCR conversion was accurate. If you find any problems, please report them to us.

Screen Scraping via API

Kantu contains a command-line application programming interface (API) to automate more complicated tasks and integrate with other programs or scripts for complete Robotic Process Automation (RPA).

Web Scraping vs Screen Scraping

Screen Scraping means getting information from a screenshot or video image. Web scraping means getting information from inside the web browser. If you want to extract data from inside the Firefox or Chrome browser see Web scraping with Selenium IDE.

Subscribe to the a9t9 automation software newsletter . We'll send you updates on new releases that we're working on.