Skip to main content

How to Read and Analyze data from an Excel file in Node.js

Have you ever needed to get and analyze the data from an Excel (.xlsx) Spreadsheet in node.js? If not, if you ever do, here’s a solution.

In this article we will focus on,

  1. Reading the data from the spreadsheet file.
  2. Understanding how the data is returned.
  3. Getting a list (names) of worksheets in a workbook.
  4. Specifying from which sheet we want data.

Reading The File

Setup

You will of need to have node.js installed.

You can create whatever Excel file you like, a screenshot of the simple file I will use for this article is below. I named the file Data.xlsx.

This is image title

I will be using Visual Studio Code (VSCode).

  1. Open your terminal window and type, npm init. You can just accept the defaults which will make our main file index.js as seen in your package.json file.
  2. In your terminal window, type npm install-excel-file to install the package that makes it all work.
  3. Create a file named index.js and open it.

Reading The Excel File

We are now ready to read the file.

Add the following code to your index.js file.

const xlsxFile = require('read-excel-file/node');

xlsxFile('./Data.xlsx').then((rows) => {


}

Side Note: In the require statement, we specified /node. This is important as this can also be done with JavaScript in the browser by using,

import xlsxFile from 'read-excel-file'

The Code

The first line imports the package.

The second line reads the file, returning a promise. When it is done reading we get arrays.

Each array (the row parameter of the callback function) contains a row in the spreadsheet. Each of these has three arrays corresponding to the three columns of our spreadsheet.

Viewing the Raw Data

Modify your code to verify we are getting the data from the file.

const xlsxFile = require('read-excel-file/node');

xlsxFile('./Data.xlsx').then((rows) => {
 console.log(rows);
 console.table(rows);
})

Run this in debug mode, F5, and then run without debug, Ctrl+F5.

Running in both will help us visualize what we are getting back as well verify the data was read from the file.

F5

This is image title

CRTL+F5

This is image title

This gives a a better picture of the array structure.

Getting The Actual Data

Since the data is in arrays, it is easily accessible. Two way’s of getting to the data are below.

forEach Loop

Modify your code as follows to see each piece of data and it’s data type.

The data parameter contains contains the spreadsheet cell entry.

const xlsxFile = require('read-excel-file/node');

xlsxFile('./Data.xlsx').then((rows) => {
rows.forEach((col)=>{
        col.forEach((data)=>{
          console.log(data);
          console.log(typeof data);
    })
})
})

For our purpose the output is just a list, but we have access to the data and can verify its data type.

This is image title

for..in Loop

Modify your code to use the for…in loop.

const xlsxFile = require('read-excel-file/node');

xlsxFile('./Data.xlsx').then((rows) => {
for (i in rows){
       for (j in rows[i]){
           console.dir(rows[i][j]);
}
   }
})

Similar output when run but this code allows us to see the array structure better and how to access the array elements.

rows[i][j] contains the data in each spreadsheet cell.

Dealing With Multiple Sheets

What if we have two worksheet named Dev and Exec?

By default, when we read, the first sheet is read. However we can,

  • Get a list of sheet names
  • Specify the Sheet from which we want data.

Modified Excel File

I have modified my Excel file, Data.xlsx, to have two sheets.

This is image title
Dev Sheet

This is image title
Exec Sheet

Getting A List of Sheet Names

The sheet names are returned as an array of objects. Each object has the structure,

{name : ‘sheetname’}

In our case it would return this array, [{name : ‘Dev’}, {name : ‘Exec’}]

Modify your code as follows to get a list of worksheets.

const xlsxFile = require('read-excel-file/node');

xlsxFile('./Data.xlsx', { getSheets: true }).then((sheets) => {
      sheets.forEach((obj)=>{

           console.log(obj.name);
       })
   })

Here we have passed the object parameter { getSheets: true }.

Then used a forEach loop to loop through the sheets array and access each object’s name property.

This is image title

Accessing worksheet names. Dev, Exec.

Specifying The Sheet From Which To Get Data

To specify the desired worksheet we will pass the {sheet : } object parameter. This has two forms. You can specify by sheet number or the sheet name. To access sheet 2, we could use one of the following,

  • {sheet : 2 }
  • {sheet : ‘Exec’}

Modify you code as follows to access the Exec sheet’s data.

const xlsxFile = require('read-excel-file/node');

xlsxFile('./Data.xlsx', { sheet: 'Exec' }).then((rows) => {
for (i in rows){
        for (j in rows[i]){
            console.log(rows[i][j]);
        }
    }
})

This is image title
Sheet2, Exec, data

Conclusion

The npm package does all the work of reading the file. The important part is how the data is returned. It is returned as a two-dimensional array.

Knowing this gives us insight in to getting the data and working with it. Knowing we can get worksheet names allows us to process entire workbooks if needed.

Thank you for reading!

Comments

Popular posts from this blog

How to use Ngx-Charts in Angular ?

Charts helps us to visualize large amount of data in an easy to understand and interactive way. This helps businesses to grow more by taking important decisions from the data. For example, e-commerce can have charts or reports for product sales, with various categories like product type, year, etc. In angular, we have various charting libraries to create charts.  Ngx-charts  is one of them. Check out the list of  best angular chart libraries .  In this article, we will see data visualization with ngx-charts and how to use ngx-charts in angular application ? We will see, How to install ngx-charts in angular ? Create a vertical bar chart Create a pie chart, advanced pie chart and pie chart grid Introduction ngx-charts  is an open-source and declarative charting framework for angular2+. It is maintained by  Swimlane . It is using Angular to render and animate the SVG elements with all of its binding and speed goodness and uses d3 for the excellent math functio...

Understand Angular’s forRoot and forChild

  forRoot   /   forChild   is a pattern for singleton services that most of us know from routing. Routing is actually the main use case for it and as it is not commonly used outside of it, I wouldn’t be surprised if most Angular developers haven’t given it a second thought. However, as the official Angular documentation puts it: “Understanding how  forRoot()  works to make sure a service is a singleton will inform your development at a deeper level.” So let’s go. Providers & Injectors Angular comes with a dependency injection (DI) mechanism. When a component depends on a service, you don’t manually create an instance of the service. You  inject  the service and the dependency injection system takes care of providing an instance. import { Component, OnInit } from '@angular/core'; import { TestService } from 'src/app/services/test.service'; @Component({ selector: 'app-test', templateUrl: './test.component.html', styleUrls: ['./test.compon...

How to solve Puppeteer TimeoutError: Navigation timeout of 30000 ms exceeded

During the automation of multiple tasks on my job and personal projects, i decided to move on  Puppeteer  instead of the old school PhantomJS. One of the most usual problems with pages that contain a lot of content, because of the ads, images etc. is the load time, an exception is thrown (specifically the TimeoutError) after a page takes more than 30000ms (30 seconds) to load totally. To solve this problem, you will have 2 options, either to increase this timeout in the configuration or remove it at all. Personally, i prefer to remove the limit as i know that the pages that i work with will end up loading someday. In this article, i'll explain you briefly 2 ways to bypass this limitation. A. Globally on the tab The option that i prefer, as i browse multiple pages in the same tab, is to remove the timeout limit on the tab that i use to browse. For example, to remove the limit you should add: await page . setDefaultNavigationTimeout ( 0 ) ;  COPY SNIPPET The setDefaultNav...