PhantomJS is a headless WebKit scriptable with a JavaScript API multiplatform, available on major operating systems as: Windows, Mac OS X, Linux, and other Unices. It has fast and native support for various web standards: DOM handling, CSS selector, JSON, Canvas, and SVG. PhantomJS is fully rendering pages under the hood, so the results can be exported as images. This is very easy to set up, and so is a useful approach for most projects requiring the generation of many browser screenshots (if you're looking how to create only screenshots we recommend you to read instead this article).
In this article, you will learn how to use PhantomJS with Node.js easily using a module or manipulating it by yourself with Javascript.
Requirements
You will need PhantomJS (installed or a standalone distribution) accesible from the PATH (learn how to add a variable to the PATH in windows here). In case it isn't available in the path, you can specify the executable to PhantomJS in the configuration later.
You can obtain PhantomJS from the following list in every platform (Windows, Linux, MacOS etc) in the download area of the official website here.
Note
there's no installation process in most of the platforms as you'll get .zip
file with two folder, examples and bin (which contains the executable of PhantomJS).
Once you know that PhantomJS is available in your machine, let's get started !
A. Using a module
If you want to use a module to work with PhantomJS in Node.js, you can use the phantom module written by @amir20. This module offers integration for PhantomJS in Node.js. Although the workflow with Javascript ain't the same that the Javascript that you use to instruct PhantomJS, it's still easy to understand.
To install the module in your project, execute the following command in the terminal:
Once the installation of the module finishes, you will be able to access the module using require("phantom")
.
The workflow (of creating the page and then with the page do other things) remains similar to the scripting with plain Javascript in PhantomJS. The page object that is returned with createPage
method is a proxy that sends all methods to phantom. Most method calls should be identical to PhantomJS API. You must remember that each method returns a Promise.
The following script will open the Stack Overflow website and will print the html of the homepage in the console:
If you're using Node.js v7+, then you can use the async and await features that this version offers.
It simplifies the code significantly and is much easier to understand than with Promises.
B. Own implementation
As you probably (should) know, you work with PhantomJS through a js file with some instructions, then the script is executed providing the path of the script as first argument in the command line (phantomjs /path.to/script-to-execute.js
). To learn how you can interact with PhantomJS using Node.js create the following test script (phantom-script.js
) that works with PhantomJS perfectly. If you want to test it, use the command phantomjs phantom-script.js
in a terminal:
The previous code should simply create a POST request to a website (check obviously that you have internet access while testing it).
Now we are going to use Node.js to cast a child process, this Node script should execute the following command (the same used in the command line):
To do it, we are going to require the child_processmodule (available by default in Node.js) and save the spawn property in a variable. The child_process.spawn()
method spawns a new process using the given command (as first argument), with command line arguments in args (as second argument). If omitted, args defaults to an empty array.
Declare a variable child that has as value the returned value from the used spawn method. In this case the first argument for spawn
should be the path to the executable of phantomjs (only phantomjs
if it's in the path) and the second parameter should be an array with a single element, the path of the script that phantom should use. From the child variable add a data
listener for the stdout (standard output) and stderr (Standard error output). The callback of those listeners will receive an Uint8Array, that you obviously can't read unless you convert it to string. To convert the Uint8Array to its string representation, we are going to use the Uint8ArrToString method (included in the script below). It's a very simple way to do it, if you require scability in your project, we recommend you to read more ways about how to convert this kind of array to a string here.Create a new script (executing-phantom.js
) with the following code inside:
As final step execute the previous node script using:
And in the console you should get the following output:
We, personally prefer the self implemented method to work with PhantomJS as the learning curve of the module is steep (at least for those that knows how to work with PhantomJS directly with scripts), besides the documentation ain't so good.
Happy coding !
Comments
Post a Comment