PDF Rendering Service

Technologies Used

MongoDB, Redis, Chromium (via puppeteer), Handlebars.js, npm, socket.io

Goal

Generating and mailing letters is a common task for every business, and my current employer is no exception. Many letters are created using Mail Merge or a third-party application – processes that are inaccessible to the Node.js services we develop.

The first service to require PDF rendering was one that created monthly statements for mortgage borrowers. The original implementation used the now deprecated PhantomJS headless browser and was baked-in to the single-threaded service. Additionally, there was only one template for every statement.

As more services were developed that also needed to generate PDFs, and as the business defined additional borrower statement templates, it became clear that an external, general purpose service was needed. I identified several requirements for this improved service:

  • Rendering must be handled by worker processes, allowing the service to scale
  • PhantomJS must be replaced by Chromium (via puppeteer)
  • Templates must be self-contained and follow a directory structure convention so they can be uniformly stored and used by the service
  • Templates must continue to use Handlebars and support LESS compilation
  • External assets, such as images and stylesheets, must be inlined with the Handlebars markup so rendered HTML contains all required data. This allows PDFs to be rendered immediately on page load.

Approach

Five repositories constitute the overall PDF Rendering ecosystem:

Repository Description
svc-pdfrender Master process. Presents REST API and websocket interface for render status updates. Submits render jobs to workers via a redis queue.
svc-pdfrender-worker Worker process. Receives jobs from master via render job queue, emits status updates, and saves final results to GridFS
common-pdfrender Contains code shared between Master and Worker processes, including methods for working with template and render files.
cli-pdfrender Development tool for creating and publishing templates.
lib-pdfrender Abstraction Library for services to interact with svc-pdfrender
svc-pdfrender-diagram

An architectural diagram showing the relationships between each PDFRender repository.

Template Development

A final, working template is one that is tarballed so it can be stored in MongoDB’s GridFS and used by the worker processes. Templates are written using the Handlebars templating engine, which supports partials and custom helper functions. In order for the worker processes to properly handle partials and helpers, the template must conform to a specific directory structure.

.env Defines environment variables for setting the URI of the rendering service and local dev port
assets Contains non-HTML assets, such as images and stylesheets, which are referenced relatively in the template and inlined by cli-pdfrender
partials Handlebars partials live in this directory, and partial names are registered using file names
data.json Sample data to feed into the template for testing purposes
helpers.js Node module that must export a function that receives hbs as its only argument. This function may require external node modules and is used to register helper functions
index.hbs Entrypoint for the Handlebars template
package.json npm-style manifest for the template. In addition to defining a template name and leveraging semantic versioning, this object defines a time-to-live (TTL) for the template so non-published templates are auto-purged, along with npm dependencies required by the helpers.js function.

While developing a template, cli-pdfrender can be used to watch the template directory and render changes as they are applied. Once a template is uploaded and renders successfully, it is stored in GridFS and a websocket event is broadcast with the template_id and render_id. If in development mode, cli-pdfrender will launch a local Chrome instance and use an iframe to display the rendered template once a render event is received via websocket. Of course, the CLI can also publish a template by setting TTL to 0.

Template Usage

Dependent services use lib-pdfrender to interact with svc-pdfrender. This library abstracts REST calls and websocket events so the user only needs to specify a renderer URI, template name when constructing a PDFRender object, listen for render and error events, and call the render method with data to be rendered.

 

// instantiation
const letter555Renderer = new PDFRender('https://renderer.example.com', 'letter_555');

// listen for completed events
letter555Renderer.on('completed', (renderJob) => {
    // renderJob contains the URIs where you can download the html, pdf and metadata
    console.dir(renderJob.uris);
    // do what you'd like with the uris to the files.
    // the renderJob class also contains a convenience method for downloading the pdf to a custom directory
    return renderJob.download(outputDir).then((outPath) => console.log(`pdf downloaded to ${outPath}`));
});

letter555Renderer.on('error', (err) => //handle error);

// initialize to have the templateID set up
letter555Renderer.init().then(() => {
    // ready to render
}).catch((err) => {
    // init errors will also be catchable here
});

let data = [{}, {}];

data.forEach((d) => {
    // render jobs will be queued until the websocket connection to the renderer is established
    return letter555Renderer.render(d).then((job) => {
        // listen for job-specific events here
    });
});

On construction, the PDFRender instance uses the provided template name to look up the template_id of the template. Optionally, a specific template version may be provided. If there is a conflicting version number for any reason, an incremental build number is used.

To submit a render job, the render method issues a POST request using the template_id and sets the request body to a JSON object which is used to populate the template. The service immediately returns a render_id, and the library listens via websocket for render and error events related to that render_id. Those events are relayed through JavaScript events for consumption by the dependent service, which is expected to download the rendered PDF and/or HTML immediately. Render files are auto-purged 10 minutes after creation.

Worker Processes

While the CLI tool inlines assets into the handlebars files, partials and helpers still need to be registered by the worker at render time. Registering partials is straightforward – read each partial and call the registration method passing in the filename less its .hbs extension.

 

Helpers are more complex as they are registered with the function exported by helpers.js. This module may include npm modules which need to be installed prior to registration. Each worker uses npm programmatically to read dependencies from the template directory and install them to a working directory prior to registration.

Potential Improvements

This service was originally written in just one month, so there are a few points for improvement

  • cli-pdfrender enforces a specific directory structure but does not provide an init command to start a new template
  • Helper registration runs arbitrary JavaScript. This is a significant security concern, one that was de-prioritized as the service lives on an internal network. Future versions should sandbox execution of helpers.js so malicious code cannot hijack the worker threads.
  • Additional templating engines beyond Handlebars should be supported, such as pug, as well as other style languages like SCSS

About The Author

Kyle Anderson
I'm a media and IT professional and JavaScript developer who worked most recently as an Associate Broadcast IT Engineer (Tier II) for CNN in Atlanta. One of my life-long goals is to help bridge data divides - missing connections between software systems and data stores - promoting inter-system communication and automation. Many of the projects described here reflect this goal in some way or another.