Skip to main content

How to use PhantomJS with Node.js

PhantomJS is a headless WebKit scriptable with a JavaScript API multiplatform, available on major operating systems as: Windows, Mac OS X, Linux, and other Unices. It has fast and native support for various web standards: DOM handling, CSS selector, JSON, Canvas, and SVG. PhantomJS is fully rendering pages under the hood, so the results can be exported as images. This is very easy to set up, and so is a useful approach for most projects requiring the generation of many browser screenshots (if you're looking how to create only screenshots we recommend you to read instead this article).

In this article, you will learn how to use PhantomJS with Node.js easily using a module or manipulating it by yourself with Javascript.

Requirements

You will need PhantomJS (installed or a standalone distribution) accesible from the PATH (learn how to add a variable to the PATH in windows here). In case it isn't available in the path, you can specify the executable to PhantomJS in the configuration later.

You can obtain PhantomJS from the following list in every platform (Windows, Linux, MacOS etc) in the download area of the official website here.

Note

there's no installation process in most of the platforms as you'll get .zip file with two folder, examples and bin (which contains the executable of PhantomJS).

Once you know that PhantomJS is available in your machine, let's get started !

A. Using a module

If you want to use a module to work with PhantomJS in Node.js, you can use the phantom module written by @amir20. This module offers integration for PhantomJS in Node.js. Although the workflow with Javascript ain't the same that the Javascript that you use to instruct PhantomJS, it's still easy to understand.

To install the module in your project, execute the following command in the terminal:

npm install phantom --save

Once the installation of the module finishes, you will be able to access the module using require("phantom").

The workflow (of creating the page and then with the page do other things) remains similar to the scripting with plain Javascript in PhantomJS. The page object that is returned with createPage method is a proxy that sends all methods to phantom. Most method calls should be identical to PhantomJS API. You must remember that each method returns a Promise.

The following script will open the Stack Overflow website and will print the html of the homepage in the console:

var phantom = require("phantom");
var _ph, _page, _outObj;

phantom.create().then(function(ph){
    _ph = ph;
    return _ph.createPage();
}).then(function(page){
    _page = page;
    return _page.open('https://stackoverflow.com/');
}).then(function(status){
    console.log(status);
    return _page.property('content')
}).then(function(content){
    console.log(content);
    _page.close();
    _ph.exit();
}).catch(function(e){
   console.log(e); 
});

If you're using Node.js v7+, then you can use the async and await features that this version offers. 

const phantom = require('phantom');

(async function() {
    const instance = await phantom.create();
    const page = await instance.createPage();
    await page.on("onResourceRequested", function(requestData) {
        console.info('Requesting', requestData.url)
    });

    const status = await page.open('https://stackoverflow.com/');
    console.log(status);

    const content = await page.property('content');
    console.log(content);

    await instance.exit();
}());

It simplifies the code significantly and is much easier to understand than with Promises.

B. Own implementation

As you probably (should) know, you work with PhantomJS through a js file with some instructions, then the script is executed providing the path of the script as first argument in the command line (phantomjs /path.to/script-to-execute.js). To learn how you can interact with PhantomJS using Node.js create the following test script (phantom-script.js) that works with PhantomJS perfectly. If you want to test it, use the command phantomjs phantom-script.js in a terminal:

/**
 * phantom-script.js
 */
"use strict";
// Example using HTTP POST operation in PhantomJS
// This website exists and is for test purposes, dont post sensitive information
var page = require('webpage').create(),
    server = 'http://posttestserver.com/post.php?dump',
    data = 'universe=expanding&answer=42';

page.open(server, 'post', data, function (status) {
    if (status !== 'success') {
        console.log('Unable to post!');
    } else {
        console.log(page.content);
    }

    phantom.exit();
});

The previous code should simply create a POST request to a website (check obviously that you have internet access while testing it).

Now we are going to use Node.js to cast a child process, this Node script should execute the following command (the same used in the command line):

phantomjs phantom-script.js

To do it, we are going to require the child_processmodule (available by default in Node.js) and save the spawn property in a variable. The child_process.spawn() method spawns a new process using the given command (as first argument), with command line arguments in args (as second argument). If omitted, args defaults to an empty array.

Declare a variable child that has as value the returned value from the used spawn method. In this case the first argument for spawn should be the path to the executable of phantomjs (only phantomjs if it's in the path) and the second parameter should be an array with a single element, the path of the script that phantom should use. From the child variable add a data listener for the stdout (standard output) and stderr (Standard error output). The callback of those listeners will receive an Uint8Array, that you obviously can't read unless you convert it to string. To convert the Uint8Array to its string representation, we are going to use the Uint8ArrToString method (included in the script below). It's a very simple way to do it, if you require scability in your project, we recommend you to read more ways about how to convert this kind of array to a string here.Create a new script (executing-phantom.js) with the following code inside:

/**
 * executing-phantom.js
 */

var spawn = require('child_process').spawn;
var args = ["./phantom-script.js"];
// In case you want to customize the process, modify the options object
var options = {};

// If phantom is in the path use 'phantomjs', otherwise provide the path to the phantom phantomExecutable
// e.g for windows:
// var phantomExecutable = 'E:\\Programs\\PhantomJS\\bin\\phantomjs.exe';
var phantomExecutable = 'phantomjs';

/**
 * This method converts a Uint8Array to its string representation
 */
function Uint8ArrToString(myUint8Arr){
    return String.fromCharCode.apply(null, myUint8Arr);
};

var child = spawn(phantomExecutable, args, options);

// Receive output of the child process
child.stdout.on('data', function(data) {
    var textData = Uint8ArrToString(data);

    console.log(textData);
});

// Receive error output of the child process
child.stderr.on('data', function(err) {
    var textErr = Uint8ArrToString(err);
    console.log(textErr);
});

// Triggered when the process closes
child.on('close', function(code) {
    console.log('Process closed with status code: ' + code);
});

As final step execute the previous node script using:

node executing-phantom.js

And in the console you should get the following output:

<html><head></head><body>Time: Thu, 09 Feb 17 06:24:55 -0800
Source ip: xx.xxx.xxx.xxx

Headers (Some may be inserted by server)
REQUEST_URI = /post.php?dump
QUERY_STRING = dump
REQUEST_METHOD = POST
GATEWAY_INTERFACE = CGI/1.1
REMOTE_PORT = 57200
REMOTE_ADDR = 93.210.203.47
HTTP_HOST = posttestserver.com
HTTP_ACCEPT_LANGUAGE = de-DE,en,*
HTTP_ACCEPT_ENCODING = gzip, deflate
HTTP_CONNECTION = close
CONTENT_TYPE = application/x-www-form-urlencoded
CONTENT_LENGTH = 28
HTTP_ORIGIN = null
HTTP_USER_AGENT = Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/538.1 (KHTML, like Gecko) PhantomJS/2.1.1 Safari/538.1
HTTP_ACCEPT = text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
UNIQUE_ID = WJx7t0BaMGUAAHxjI0EAAAAD
REQUEST_TIME_FLOAT = 1486650295.8916
REQUEST_TIME = 1486650295

Post Params:
key: 'universe' value: 'expanding'
key: 'answer' value: '42'
Empty post body.

Upload contains PUT data:
universe=expanding&amp;answer=42</body></html>

We, personally prefer the self implemented method to work with PhantomJS as the learning curve of the module is steep (at least for those that knows how to work with PhantomJS directly with scripts), besides the documentation ain't so good.

Happy coding !

Comments

Popular posts from this blog

4 Ways to Communicate Across Browser Tabs in Realtime

1. Local Storage Events You might have already used LocalStorage, which is accessible across Tabs within the same application origin. But do you know that it also supports events? You can use this feature to communicate across Browser Tabs, where other Tabs will receive the event once the storage is updated. For example, let’s say in one Tab, we execute the following JavaScript code. window.localStorage.setItem("loggedIn", "true"); The other Tabs which listen to the event will receive it, as shown below. window.addEventListener('storage', (event) => { if (event.storageArea != localStorage) return; if (event.key === 'loggedIn') { // Do something with event.newValue } }); 2. Broadcast Channel API The Broadcast Channel API allows communication between Tabs, Windows, Frames, Iframes, and  Web Workers . One Tab can create and post to a channel as follows. const channel = new BroadcastChannel('app-data'); channel.postMessage(data); And oth...

Certbot SSL configuration in ubuntu

  Introduction Let’s Encrypt is a Certificate Authority (CA) that provides an easy way to obtain and install free  TLS/SSL certificates , thereby enabling encrypted HTTPS on web servers. It simplifies the process by providing a software client, Certbot, that attempts to automate most (if not all) of the required steps. Currently, the entire process of obtaining and installing a certificate is fully automated on both Apache and Nginx. In this tutorial, you will use Certbot to obtain a free SSL certificate for Apache on Ubuntu 18.04 and set up your certificate to renew automatically. This tutorial will use a separate Apache virtual host file instead of the default configuration file.  We recommend  creating new Apache virtual host files for each domain because it helps to avoid common mistakes and maintains the default files as a fallback configuration. Prerequisites To follow this tutorial, you will need: One Ubuntu 18.04 server set up by following this  initial ...

Working with Node.js streams

  Introduction Streams are one of the major features that most Node.js applications rely on, especially when handling HTTP requests, reading/writing files, and making socket communications. Streams are very predictable since we can always expect data, error, and end events when using streams. This article will teach Node developers how to use streams to efficiently handle large amounts of data. This is a typical real-world challenge faced by Node developers when they have to deal with a large data source, and it may not be feasible to process this data all at once. This article will cover the following topics: Types of streams When to adopt Node.js streams Batching Composing streams in Node.js Transforming data with transform streams Piping streams Error handling Node.js streams Types of streams The following are four main types of streams in Node.js: Readable streams: The readable stream is responsible for reading data from a source file Writable streams: The writable stream is re...