How to generate PDFs with Node.js and Puppeteer

Puppeteer remains one of the most popular ways to generate PDFs with Node.js because it drives a real Chromium browser. That means your HTML, CSS, fonts, and JavaScript render much more faithfully than older HTML-to-PDF libraries. In this guide, we'll cover a modern Puppeteer setup, how to generate PDFs from both URLs and raw HTML, the most useful PDF options, the common failure cases teams run into, and why browser-based PDF generation becomes harder once it moves into production.

What is Puppeteer?

Puppeteer is the Node.js browser automation library maintained by the Chrome team. It is widely used for testing, screenshots, scraping, and PDF generation. For PDFs, the core flow is simple: open a page, wait for the content to be ready, then call page.pdf() with the right options.

The big advantage is rendering fidelity. If your document already exists as HTML and CSS, Puppeteer lets you use the browser's real layout engine instead of maintaining a separate PDF templating stack.

Install Puppeteer

Install Puppeteer in your Node.js project:

npm install puppeteer

The examples below use modern ESM syntax. If your project still uses CommonJS, you can translate import statements to require().

Puppeteer installation

Generate a PDF from a URL

If your invoice, report, or dashboard already exists at a live URL, the simplest setup is to load that page and export it as a PDF. In production, many teams write the PDF to a buffer first, then upload it to S3, attach it to an email, or return it from an API route.

import puppeteer from 'puppeteer';
import { writeFile } from 'node:fs/promises';

const browser = await puppeteer.launch({ headless: 'new' });

try {
  const page = await browser.newPage();

  await page.goto('https://example.com/invoice/123', {
    waitUntil: 'networkidle0',
  });

  const pdf = await page.pdf({
    format: 'A4',
    printBackground: true,
    margin: {
      top: '20mm',
      right: '12mm',
      bottom: '20mm',
      left: '12mm',
    },
  });

  await writeFile('invoice.pdf', pdf);
} finally {
  await browser.close();
}

waitUntil: 'networkidle0' is a reasonable default for many server-rendered pages, but it is not magic. If your frontend fetches data after initial load, you may need extra waiting logic before calling page.pdf().

Generate a PDF from raw HTML

Many apps never expose a public URL for the document. Instead, they render an HTML string from a template engine such as Handlebars, EJS, or React server rendering, then pass that HTML directly to Puppeteer with page.setContent().

import puppeteer from 'puppeteer';
import { writeFile } from 'node:fs/promises';

const html = `
<!doctype html>
<html>
  <head>
    <meta charset="UTF-8" />
    <style>
      body { font-family: Arial, sans-serif; padding: 40px; }
      h1 { color: #5935dd; }
      .total { margin-top: 24px; font-weight: bold; }
    </style>
  </head>
  <body>
    <h1>Invoice #INV-001</h1>
    <p>Customer: Acme Inc.</p>
    <p class="total">Total: $1,240.00</p>
  </body>
</html>
`;

const browser = await puppeteer.launch({ headless: 'new' });

try {
  const page = await browser.newPage();
  await page.setContent(html, { waitUntil: 'load' });

  const pdf = await page.pdf({
    format: 'A4',
    printBackground: true,
  });

  await writeFile('invoice.pdf', pdf);
} finally {
  await browser.close();
}

This approach is often easier to secure because it avoids a public route, but you still need to think about external assets like fonts, logos, and images. If your HTML references private assets, Puppeteer still needs a way to load them.

Useful page.pdf() options

Puppeteer exposes a lot of PDF configuration. These are the options most teams actually use:

await page.pdf({
  path: 'output.pdf',
  format: 'A4',
  printBackground: true,
  preferCSSPageSize: true,
  margin: { top: '2cm', bottom: '2cm', left: '1cm', right: '1cm' },
  displayHeaderFooter: true,
  headerTemplate: '<div style="font-size:10px;width:100%;text-align:center;">Invoice header</div>',
  footerTemplate: '<div style="font-size:10px;width:100%;text-align:center;">Page <span class="pageNumber"></span> / <span class="totalPages"></span></div>'
});
  • format: use a standard page size such as A4 or Letter.
  • printBackground: necessary if your design uses background colors or images.
  • preferCSSPageSize: useful when your HTML already defines page sizing with CSS.
  • margin: almost always needed for printable business documents.
  • displayHeaderFooter: handy for page numbers and simple document metadata, but limited in styling.

Common Puppeteer PDF issues

  • Blank or incomplete PDF: the page loaded, but your application had not finished rendering async data before page.pdf() ran.
  • Missing fonts or images: webfonts, private assets, or third-party resources may still be loading or blocked when the PDF is generated.
  • Authenticated pages exporting the login screen: if the document lives behind auth, you may need cookies, headers, or a login flow before navigating to the page.
  • Header and footer templates look broken: Puppeteer's header/footer system is useful, but it supports only limited HTML/CSS and often surprises teams expecting full-page styling.
  • PDF looks different from the browser view: print rendering has its own layout behavior, especially around page breaks, fixed elements, and long tables.

In practice, the most fragile part is timing. Many teams discover they need a mix of waitUntil, custom selectors, font readiness checks, or explicit delays before the document is stable enough to export.

Why Puppeteer gets hard in production

Puppeteer is excellent for prototypes, internal tools, and low-volume document generation. The difficulty starts when PDF rendering becomes a core application feature. At that point, you are no longer just calling page.pdf(); you are operating a browser fleet.

  • Concurrency: multiple PDF jobs mean multiple Chromium processes, which can consume CPU and RAM very quickly.
  • Queues and retries: failed renders, timeouts, and retry logic become part of your application architecture.
  • Serverless constraints: cold starts, package size, and browser dependencies make Lambda, Vercel, or container deployments more complex.
  • Maintenance: Chrome versions, Linux dependencies, and rendering regressions all become your responsibility.
  • Observability: once jobs run asynchronously, you need logs, alerts, retry policies, and ways to inspect failed renders.

That doesn't mean Puppeteer is a bad choice. It just means self-hosting browsers becomes an operations problem as soon as your PDF volume or reliability requirements increase.

Scaling challenges

When a managed PDF API makes more sense

If your goal is to generate PDFs reliably rather than operate Chromium in production, a managed API can be a better fit. The HTML/CSS approach stays the same, but you stop maintaining browser infrastructure, queue workers, and scaling logic yourself.

That is where a service like Doppio fits: you send HTML or a URL, define your PDF options, and let the service handle browser execution, scaling, and document delivery. This is especially attractive for teams building invoices, reports, exports, certificates, or serverless workflows where browser maintenance is pure overhead.

Summary

Puppeteer is still one of the best ways to generate PDFs with Node.js when you want direct control over Chromium and are comfortable operating the browser yourself. It works well for dashboards, invoices, reports, and any HTML-based document. But the real complexity shows up in timing issues, asset loading, authentication, concurrency, and infrastructure maintenance.

If you want to keep the browser-quality rendering model without running Chromium yourself, a managed API can remove most of that operational burden. For a broader tool comparison, see Puppeteer vs Playwright for PDF generation. For modern page styling techniques, see the CSS Paged Media guide.

How to generate PDFs with Node.js and Puppeteer