Visual Desktop Automation

Desktop Automation with an open-source browser extension? Yes, no problem with Kantu and its XModules.

Automate your desktop workflow the "Kantu-way" by using computer vision. Kantu is the fastest way to create stable robotic process automation (RPA) scripts with image and text recognition on Windows, Mac and Linux.

Rock-stable visual desktop automation, screen scraping and application UI testing

Kantu uses the latest image and text recognition technologies to automate applications just like a human does. Leave windows titles, windows handles, class names and other windows internals to the developers. And even if you are a developer – Kantu gives you a break while it tests your app.

Even if you have the skills to automate your application by "coding" a test, wouldn’t you rather use your time to create the application, and not debugging and testing the test automation scripts themselves?

Screen Scraping

Data extraction (“Screen scraping” ) is a very important technique in data migration and integration scenarios with surface automation. With its accurate OCR screen scraping features Kantu essentially adds an “Data API” to every Windows, Mac and Linux application. For more information please read screen scraping with OCR.

App Scripting via API

Kantu contains a command-line application programming interface (API) to automate more complicated tasks and integrate with other programs or scripts for complete Robotic Process Automation (RPA).

This means that you can access Kantu’s web and desktop automation functionality from any programming language on Windows, Mac and Linux. Developers can use Python, PowerShell, C#, Java, SAP, VBS, Visual Basic,... or any other programming or scripting language to embed and control Kantu directly in their applications.

Any task you can do on a computer can be scripted with Kantu and thus automated. Awesome!
Frank Zimmermann, Lufthansa IT - More user quotes

Desktop Automation, also known as Robotic Process Automation (RPA)
Kantu is the tool for Cross-Platform Robotic Process Automation (RPA). Kantu runs on Windows, Mac and Linux.

User Manual: Desktop Automation with Kantu

The DesktopAutomation XModule is a native app for Windows, Mac and Linux. It adds "hands" and "eyes" to the Kantu core. The XModule directly interacts with the operating system and allows Kantu to run computer vision directly on the desktop, move the mouse and simulate keystrokes. The DesktopAutomation XModule is included in the Kantu XModules Installer.

How to create desktop automation macros:

  • In the Kantu settings go to the VISION tab and select "Desktop Automation" as operating mode. This switches the Kantu eyes from the web browser to the desktop. It also switches the "Select" and "Find" buttons operate on the desktop.

  • Visual macros are best constructed like a Lego car: Add XClick after XClick command to the macro, and build the macro step by step.

  • Note that the "Record" button is only for Selenium IDE type browser automation macros. Recording is not available for the XClick, XMove and XType commands.

desktop automation

XDesktopAutomation | true/false: You can use this command to switch the Kantu eyes between browser and desktop. It overwrites the global Kantu eyes setting on the "Vision" settings page for the current macro. Note that a switch between desktop and browser scope changes the coordinate system, too. In browser view XClick (0,0) is the top left point inside the browser viewport, and in desktop mode XClick (0,0) refers to the top left point on your desktop.

CaptureDesktopScreenshot | file name or full path: This command allows you to script taking desktop screenshots. If the screenshot name parameter is just a name (e. g. "LinuxScreenshot"), the screenshot is stored in the internal HTML5 storage, just the like "inside the browser" commands CaptureScreenshot and captureEntirePageScreenshot do. If parameter is an absolute path (e. g. "c:\test\desktopscreenshot.png"), the screenshot will be stored directly on the hard drive.

Desktop Automation Demo Macros

Kantu ships with the DemoXDesktopAutomation and DemoXDesktopAutomation_OCR demo macros.

The DemoXDesktopAutomation macro uses Kantu itself to demo visual GUI automation and GUI testing. For this purpose the macro firsts restricts the image search range to the area under test (clipping area) to the tab area. Then, one after another, the macro selects tabs, and tests the "Clear" button. Finally on the "Vision" tab it uses visualAssert to make sure that the test sequence was successful.

The DemoXDesktopAutomation_OCR macro automates exactly the same workflow, but instead of using images of the tabs to select, it uses text recognition to read the text on the screen. So it uses the string "Logs" to find the "Logs" tab. Text recognition (OCR) makes the macro very robust against color and font changes. And the macro is easier to create, since you do not have to create input images for each command, you can just type text.

Other RPA demos: Browser Extension Testing with Kantu (forum post).

Subscribe to the a9t9 automation software newsletter . We'll send you updates on new releases that we're working on.