PhantomJS is a headless web browser that renders web pages without a graphical user interface. It’s executable in JavaScript. The execution of the JavaScript code can’t be monitored through the browser. It can only be done through a command line prompt.
With PhantomJS, we can perform the following tasks:
Page automation: PhantomJS provides access to web pages for reading and scraping their data with the DOM API or libraries like jQuery, simplifying HTML DOM tree traversal.
Screen capture: PhantomJS allows taking screenshots of a web page programmatically with tools like Canvas and SVG. The captured image can be stored in commonly used formats like PNG, JPEG, and PDF. It also allows us to specify the area of the page that needs to be captured.
Headless testing: PhantomJS allows functional testing of user interfaces. Moreover, to make the testing robust, it can also be used with other testing tools like CasperJS, Jasmine, QUnit, Mocha, and WebDriver.
Network monitoring: PhantomJS allows network traffic monitoring, i.e., page loading and export as standard HAR files. It can also be used with YSlow and Jenkins to automate a website's performance and behavior analysis.
Let’s consider a simple scenario of launching a basic ReactJS application in our local environment and using PhantomJS to retrieve the content of the application’s home page through the terminal.
Follow the steps given below to extract data from a simple react app:
Click the “Run” button in the widget below. It will run a simple react app, and we will be able to see it in the “Output” tab.
Click the “+” icon to open another terminal and execute the following command to fetch the content of the react app web page.
cd usercode && phantomjs code.js
Note: Please wait for the react app to be live in the “Output” tab before running the command mentioned above.
var url = '{{EDUCATIVE_LIVE_VM_URL}}'; var page = require('webpage').create(); var fs = require('fs'); page.open(url, function (status) { if (status !== 'success') { console.log('Unable to load the url!'); phantom.exit(); } else { window.setTimeout(function () { var results = page.evaluate(function() { return document.documentElement.innerHTML; }); console.log(results); phantom.exit(); }, 18000); } });
Let’s understand the above code line-by-line:
Line 4: The page.open()
function takes two parameters: the URL of the page to open and a callback function to handle the page loading status.
Lines 5–7: If the page loading is not equal to success
status, log an error message indicating that the page couldn't be loaded, and exit the Phantom.js process.
Lines 8–16: If the page loading is equal to success
status, we use page.evaluate()
to execute JavaScript code in the context of the web page. We retrieve the inner HTML content of the entire document, starting from the root element (<html>
). We log the retrieved content to the console and exit the Phantom.js process.
Warning: PhantomJS has been discontinued due to lack of contribution.
With PhantomJS, we can extract data, capture the screen of web pages, and convert a web page to a PDF. It also provides all the browser functionalities like HTTP request methods, reloading web pages, navigating others, clearing and deleting cookies, etc.
Free Resources