Extract and Verify the text from image using Selenium WebDriver

16 April, 2014
Currently WebDriver does not have any direct methods which can extract text from an image. If we want to extract and verify text from an image, we cab use OCR (Optical Character Recognition) technology.

OCR as per this informative article is:

OCR software extracts all the information from the image into easily editable text format.Optical character recognition (OCR) is a system of converting scanned printed/handwritten image files into its machine readable text format. OCR software works by analyzing a document and comparing it with fonts stored in its database and/or by noting features typical to characters


There are great many free OCR software tools. If your preferred programming is in Java then you can use one of the Java OCR libraries to extract text from an image. Here I use Asprise OCR java library. To work with Asprise OCR library, follow the below simple steps:

1. Download Asprise OCR libraries , depending on the operating system you are using .

2. Unzip the downloaded folder and add the aspriseOCR jar file to your working directory . You can also download the single jar file from here.

3. Also Copy the "AspriseOCR.dll" file from unzipped downloaded folder and save it under "C:\Windows\System32"

We shall extract text from the below sample image:


Reference:http://asprise.com/product/ocr/javadoc/index.html

The code to read the text from above image :


import java.awt.Image;  
import java.awt.image.RenderedImage;  
import java.io.IOException;  
import java.net.URL;  
import javax.imageio.ImageIO;  
import org.openqa.selenium.By;
import org.openqa.selenium.WebDriver;  
import org.openqa.selenium.firefox.FirefoxDriver;  
import org.testng.annotations.BeforeTest;  
import org.testng.annotations.Test;  
import com.asprise.util.ocr.OCR;  
  

public class ExtractTextFromImage {
	 WebDriver driver;  
	   
	 @BeforeTest  
	  public void setUpDriver() {  
	   driver = new FirefoxDriver();  
	  }  
	   
	 @Test  
	 public void start() throws IOException{  
	    
	 driver.get("http://www.automationace.com/2014/04/extract-and-verify-text-from-images-using-selenium-webdriver.html");  
	 String imageUrl=driver.findElement(By.xpath("//*[@id='post-body-6308533711630672689']/div[1]/div/a/img")).getAttribute("src");  
	 System.out.println("Image source path : \n"+ imageUrl);  
	  
	 URL url = new URL(imageUrl);  
	 Image image = ImageIO.read(url);  
	 String s = new OCR().recognizeCharacters((RenderedImage) image);  
	 System.out.println("Text From Image : \n"+ s);  
	 System.out.println("Length of total text : \n"+ s.length());  
	 driver.quit();  
	      
	 /* Use below code If you want to read image location from your hard disk     
	  *     
	   BufferedImage image = ImageIO.read(new File("Image location"));     
	   String imageText = new OCR().recognizeCharacters((RenderedImage) image);    
	   System.out.println("Text From Image : \n"+ imageText);    
	   System.out.println("Length of total text : \n"+ imageText.length());     
	        
	   */   
	}  
}

The output of the above code is:


Image source path : 
http://4.bp.blogspot.com/-gQebJ3MTCsI/U1ZnAHB0PDI/AAAAAAAAAOA/Zrl3oqdu0FQ/s1600/ExtractTextFromImage.jpg
Text From Image : 
 
IF YOU THINK IT'S
EXPENSIVE
TO HIRE A
PROFESSIONAL,
WAIT UNTIL YOU HIRE 
AN AMATEUR.

Length of total text : 
43



No comments:

Post a Comment