Skip to main content

How to generate PDFs with Node.js

How to generate PDFs with Node.js

  1. PhantomJS
  2. Puppeteer
  3. Html-pdf-chrome

What would we do without PDFs? It’s the safest, most secure way of storing visual information, and it’s accessible on every major platform in the world. Dynamically creating a printer-friendly version of a webpage, a user’s invoice or event tickets all require creating secure information that a user can download, email, or print. With such practical functionality, the possibility of using PDFS in the foreseeable future remains high for both end-users and web application developers.

Technically implementing a PDF generator is surprisingly difficult. There’s no standardized tool for it, and the operation is more complex than one would assume. As much as we’d love to tell you the definitive method for generating PDFs, there simply isn’t one.<

If you are lacking time and patience to do it yourself, there are paid options out there. These include:

All hassle-free, PDF Crowd is the only one without a free option and, PDF Generator doesn’t have any paid options between free and $59/month.

On the other hand, if you’d rather implement a solution yourself, the best option is to create a headless browser using Node JS. A headless browser is simply a GUI-less browser that can run in the background of a webpage and continue to implement solutions that a browser normally does. There are several headless Node browsers that generate PDFs.

generating PDFs

PhantomJS

One headless browser that’s been around for some time is PhantomJS. It’s an open-source JavaScript API that’s been in use since 2011. Unfortunately, as of 2018, it is no longer in active development, so it may not continue to be a viable option for much longer. However, it’s syntax is simple to learn even for beginners, so for now it remains a staple among headless browsers.

The easiest way to create a PDF using PhantomJS is the render method. This function allows the headless browser to create a file out of any webpage; not just PDFs, but JPEGs, PNGs, and more. The method for doing this is incredibly simple.

var webPage = require('webpage');
var page = webPage.create();

page.viewportSize = { width: 1920, height: 1080 };
page.open("http://www.google.com", function start(status) {
  page.render('google_home.pdf, {format: 'pdf', quality: '100'});
  phantom.exit();
});

In this example, PhantomJS asynchronously opens the Google homepage in the background of your normal browser, and then renders the entire page as a PDF when it’s finished. If you need to give your users a printer-friendly PDF of your page, you could copy and paste this code with minimal alteration. If you need to render something else, like an invoice or a receipt, you could use a string template, and load it as a web page with PhantomJS in the same manner.

const puppeteer = require("puppeteer");
(async () {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.setViewport({ width: 1440, height: 900, deviceScaleFactor: 2 });
await page.goto("file:///practice/resumeapp/resume.html", {
waitUntil: "networkidle2"
});
await page.pdf({
path: "resume.pdf",
pageRanges: "1",
format: "A4",
printBackground: true
});

await browser.close();
})(); 

Puppeteer

Perhaps the most common headless browser for Node is Puppeteer, an API that is run by Google Chrome. Thanks to that Google backing, Puppeteer is extremely powerful, and theoretically can do anything that Chrome can do. The advantages Puppeteer has over Phantom is that it’s better documented, supported, and more flexible. It is a bit more difficult for a newcomer to create PDFs, but there is plenty of information on how to use it on it’s GitHub page.
Puppeteer’s method for creating a PDF out of a webpage is much the same as Phantom’s. It makes a series of asynchronous calls in the background of your browser. Here is an example of the same code as above, done with Puppeteer.

const puppeteer = require('puppeteer');

(async () => {

  const browser = await puppeteer.launch();

  const page = await browser.newPage();

  await page.goto('https://google.com', {waitUntil: 'networkidle2'});

  await page.pdf({path: 'hn.pdf', format: 'A4'});

  await browser.close();

})();

As you can see, it’s nearly the same as the PhantomJS code, but a bit more complicated. However, the advantage of Puppeteer is its versatility. The Puppeteer API has a wide range of options involving PDF generation that you can find here. The above code produces this result, vs the original page:

undefined

Html-pdf-chrome

A third option for generating PDFs from HTML in Node is by using a library called html-pdf-chrome. This isn’t a separate headless browser, but a library that utilizes Chrome’s built-in headless browsing capabilities for the sole purpose of generating PDFs. Of the methods for conversion, this is the simplest to use that we’ve seen. The only limitation is that you must have a recent version of Chrome or Chromium installed on your machine in order to use it. Like Puppeteer and Phantom, you can convert an external page to a PDF, or load a page from a template to make something more customized. Here is an example of how to use html-pdf-chrome to convert an external page into a PDF.

import * as htmlPdf from 'html-pdf-chrome';
const options: htmlPdf.CreateOptions = {
  port: 9222, // po
};
const url = 'https://github.com/westy92/html-pdf-chrome';
const pdf = await htmlPdf.create(url, options);

As you can see, it’s very simple to use. Unfortunately, it is not as well-documented as Puppeteer and doesn’t seem to have as many options as Puppeteer or Phantom. Html-pdf-chrome is best for simpler jobs, perhaps, while Puppeteer may be a better option for more complex tasks like creating invoices.

These are just a few of the possible solutions available right now. As long as people need PDFs from the web, the field will keep evolving, and new tools will be created. Perhaps one will become the industry standard. In the meantime, one of these options can provide you with a good place to start. If you have found other options for generating PDFs with Node, please let us know! We’re always looking for new solutions.

Comments

Popular posts from this blog

4 Ways to Communicate Across Browser Tabs in Realtime

1. Local Storage Events You might have already used LocalStorage, which is accessible across Tabs within the same application origin. But do you know that it also supports events? You can use this feature to communicate across Browser Tabs, where other Tabs will receive the event once the storage is updated. For example, let’s say in one Tab, we execute the following JavaScript code. window.localStorage.setItem("loggedIn", "true"); The other Tabs which listen to the event will receive it, as shown below. window.addEventListener('storage', (event) => { if (event.storageArea != localStorage) return; if (event.key === 'loggedIn') { // Do something with event.newValue } }); 2. Broadcast Channel API The Broadcast Channel API allows communication between Tabs, Windows, Frames, Iframes, and  Web Workers . One Tab can create and post to a channel as follows. const channel = new BroadcastChannel('app-data'); channel.postMessage(data); And oth...

Certbot SSL configuration in ubuntu

  Introduction Let’s Encrypt is a Certificate Authority (CA) that provides an easy way to obtain and install free  TLS/SSL certificates , thereby enabling encrypted HTTPS on web servers. It simplifies the process by providing a software client, Certbot, that attempts to automate most (if not all) of the required steps. Currently, the entire process of obtaining and installing a certificate is fully automated on both Apache and Nginx. In this tutorial, you will use Certbot to obtain a free SSL certificate for Apache on Ubuntu 18.04 and set up your certificate to renew automatically. This tutorial will use a separate Apache virtual host file instead of the default configuration file.  We recommend  creating new Apache virtual host files for each domain because it helps to avoid common mistakes and maintains the default files as a fallback configuration. Prerequisites To follow this tutorial, you will need: One Ubuntu 18.04 server set up by following this  initial ...

Working with Node.js streams

  Introduction Streams are one of the major features that most Node.js applications rely on, especially when handling HTTP requests, reading/writing files, and making socket communications. Streams are very predictable since we can always expect data, error, and end events when using streams. This article will teach Node developers how to use streams to efficiently handle large amounts of data. This is a typical real-world challenge faced by Node developers when they have to deal with a large data source, and it may not be feasible to process this data all at once. This article will cover the following topics: Types of streams When to adopt Node.js streams Batching Composing streams in Node.js Transforming data with transform streams Piping streams Error handling Node.js streams Types of streams The following are four main types of streams in Node.js: Readable streams: The readable stream is responsible for reading data from a source file Writable streams: The writable stream is re...