Skip to main content

NODE.JS: EXTRACT TEXT FROM IMAGE USING TESSERACT



In this article, we will see how to extract text from images using Tesseract.

So let's start with this use-case,

Suppose you have 300 screenshot images in your mobile which has an email attribute that you need for some reason like growing your network or for email marketing.
To get an email from all these images manually into CSV or excel will take a lot of time.
So now we will check how to automate this thing.


First, you need to install Tesseract OCR(An optical character recognition engine) pre-built binary package for a particular OS.
I have tested it for Windows 10.
For Windows 10, you can install it from here.
For other OS you make check this link.
So once you install Tesseract from windows setup, you also need to set path variableprobably,
'C:\Program Files\Tesseract-OCR' to access it from any location.

Then you need to install textract library from npm.

To read the path of these 300 images we can select all images and can rename it to some name.
For example, we have renamed it to 'image' then there will image(1) to image(300) images,
So that we can read the image path dynamically using the loop index.

NodeJs Code:

var textract = require('textract');
var jsonexport = require('jsonexport');
const fs = require('fs');
var emailList = [];//To store all email that we have extracted.
for (let i = 1; i <= 300; i++) {
    var name = 'image(' + i + ').jpg';//Image type is jpg.
    textract.fromFileWithPath(name, function (error, text) {
        console.log(text)//extracted text
        //By some split logic we can get email from particular image depending upon image.
        var email = text.split("Email")[1];
        emailList.push({ Email: email });
        if (emailArray.length == 300) {
            jsonexport(emailList, function (err, csv) {
                if (err) return console.log(err);
                fs.writeFile('EmailList.csv', csv, function (err) {
                    if (err) throw err;
                    console.log('Congrats! Email List created for 300 emails');
                });
            });

        }
    })

}

The code is self-explanatory.
We have used jsonexport library to convert the email list to CSV format and then we have used fs.writeFile to export it to CSV file.

I hope you like this article and if any doubts please let me know in the comment section.

Comments

Popular posts from this blog

How to use Ngx-Charts in Angular ?

Charts helps us to visualize large amount of data in an easy to understand and interactive way. This helps businesses to grow more by taking important decisions from the data. For example, e-commerce can have charts or reports for product sales, with various categories like product type, year, etc. In angular, we have various charting libraries to create charts.  Ngx-charts  is one of them. Check out the list of  best angular chart libraries .  In this article, we will see data visualization with ngx-charts and how to use ngx-charts in angular application ? We will see, How to install ngx-charts in angular ? Create a vertical bar chart Create a pie chart, advanced pie chart and pie chart grid Introduction ngx-charts  is an open-source and declarative charting framework for angular2+. It is maintained by  Swimlane . It is using Angular to render and animate the SVG elements with all of its binding and speed goodness and uses d3 for the excellent math functio...

Understand Angular’s forRoot and forChild

  forRoot   /   forChild   is a pattern for singleton services that most of us know from routing. Routing is actually the main use case for it and as it is not commonly used outside of it, I wouldn’t be surprised if most Angular developers haven’t given it a second thought. However, as the official Angular documentation puts it: “Understanding how  forRoot()  works to make sure a service is a singleton will inform your development at a deeper level.” So let’s go. Providers & Injectors Angular comes with a dependency injection (DI) mechanism. When a component depends on a service, you don’t manually create an instance of the service. You  inject  the service and the dependency injection system takes care of providing an instance. import { Component, OnInit } from '@angular/core'; import { TestService } from 'src/app/services/test.service'; @Component({ selector: 'app-test', templateUrl: './test.component.html', styleUrls: ['./test.compon...

How to solve Puppeteer TimeoutError: Navigation timeout of 30000 ms exceeded

During the automation of multiple tasks on my job and personal projects, i decided to move on  Puppeteer  instead of the old school PhantomJS. One of the most usual problems with pages that contain a lot of content, because of the ads, images etc. is the load time, an exception is thrown (specifically the TimeoutError) after a page takes more than 30000ms (30 seconds) to load totally. To solve this problem, you will have 2 options, either to increase this timeout in the configuration or remove it at all. Personally, i prefer to remove the limit as i know that the pages that i work with will end up loading someday. In this article, i'll explain you briefly 2 ways to bypass this limitation. A. Globally on the tab The option that i prefer, as i browse multiple pages in the same tab, is to remove the timeout limit on the tab that i use to browse. For example, to remove the limit you should add: await page . setDefaultNavigationTimeout ( 0 ) ;  COPY SNIPPET The setDefaultNav...