Microsoft Office formats like .docx, .xlsx, and .pptx are mainstays in content creation for a reason, but that reason isn’t universal accessibility and web distribution.
These are proprietary file formats that :
- Need specific software/plugins to embed in websites and view on browsers,
- Can cause visual inconsistencies between systems. A document that looks perfect on one computer can be misaligned on another depending on OS, screen resolution, and even fonts installed.
- Can bloat file sizes because of how they store embedded metadata, images, and formatting information.
The solution? Converting these documents to PDF. It’s versatile, well-supported, ensures cross-platform compatibility, and provides a consistent viewing experience for everyone.
Libraries like Apryse — with SDK’s for desktop, mobile, client and server — are making this easier than ever, offering conversion without MS Office installed or any other external dependencies.
This guide will walk you through the essential steps and best practices for transforming your Office documents into web-ready PDFs using Apryse and Node.js. It assumes you already have a client application capable of uploading a DOCX. The server-side script here accepts and processes the uploaded file, converts it, and can send a PDF back to the client.
Why convert office documents to PDF?
That part is easy to answer — PDFs are the closest thing to a universal document format.
MS Office files require specific software, but PDFs can be opened on virtually any device — smartphones, tablets, computers, and even e-readers — with free, built-in readers. This means your document reaches more people, regardless of their technology or operating system.
But there are more arguments than just availability.
- PDFs are lightweight compared to Office files, meaning faster load times, reduced memory usage, and just an overall better user experience even in browser environments with limited resources or slow internet speeds.
- PDFs lock in your original design, fonts, layout, and graphics. Whether you created the original DOCX on a Mac or Windows PC, or whether it’s opened on desktop or mobile browsers, the PDF version of it appears identical to every viewer.
- PDFs make it easy for any search engine/document management system you implement on top of them to read and index their content, and so you get better SEO overall. Even scanned documents can be made searchable with OCR.
- PDFs can be extensively tagged and optimized for accessibility, so users with screen readers aren’t excluded from viewing your content. PDFs also support interactive elements — forms, annotations, and multimedia, making the experience more inclusive.
PDFs are just an all-round smarter choice for performance, a11y, and user engagement on the web.
Coding a DOCX-to-PDF Service
Prerequisites
Make sure you’re running Node.js 18+. Then initialize a new Node.js project.
You’ll need to install these:
- The core Apryse library for our conversion
dotenv
for managing environment variablesexpress
: To set up the web server.multer
: As file upload middleware.cors
: To handle cross-origin requests.
Use your package manager of choice.
npm install @pdftron/pdfnet-node dotenv express cors multer
Finally, for your free Apryse API key, visit dev.apryse.com.
Put it in an .env file in your project folder. Here’s what mine looks like.
APRYSE_API_KEY = apryse_trial_api_key_here
Step 1: The Absolute Basics
First of all; a minimal example to show how Apryse makes converting a DOCX file to a PDF dead simple.
const { PDFNet } = require("@pdftron/pdfnet-node");
require("dotenv").config();
async function main() {
// perform the conversion, point to a source DOCX
const pdfdoc = await PDFNet.Convert.officeToPdfWithPath("./my-doc.docx");
// save the result to a PDF
await pdfdoc.save("./my-doc.pdf", PDFNet.SDFDoc.SaveOptions.e_linearized);
// done!
console.log("Saved!");
}
PDFNet.runWithCleanup(main, process.env.APRYSE_API_KEY)
.catch((error) =>
console.error("Apryse library failed to initialize:", error)
)
.then(function () {
PDFNet.shutdown();
});
No optional parameters, no complex setup — just pass the file path of your DOCX file to officeToPdfWithPath
. Then, save it to disk. Note that we’re running this with PDFNet.runWithCleanup
to handle initialization and cleanup for us.
For a server though — you won’t have access to local file storage in many cases, especially if you’re deploying to serverless functions. Instead, you’ll typically be accepting a file uploaded from a client browser, and the client will expect you to convert the file and send back a PDF.
So here’s how we’ll use Apryse then.
Step 2: The real example
In real-world scenarios, that minimal example we started with won’t be enough. Let’s talk about the challenges you’ll face, and explore how Apryse provides the tools to address them.
Scalability for High-Traffic
Whether your webapp is intranet or public-facing, you’ll probably need to handle multiple conversion requests simultaneously. This’ll need efficient resource management and the ability to monitor and cancel tasks.
So, instead of this one-and-done, disk-based approach:
// Step 1: point to a source DOCX and perform the conversion
const pdfdoc = await PDFNet.Convert.officeToPdfWithPath("./my-doc.docx");
We’ll adopt a buffer/memory-based solution (using Filters) to process files entirely in memory. Now we can seamlessly handle uploaded files (e.g., from a Form component in a React app) without relying on temporary disk storage.
// We're now passing the uploaded DOCX file in memory as a binary buffer
async function convertDocxToPdfFromMemory(fileBuffer) {
// Step 1: Create a PDFDoc container for the destination PDF
const pdfdoc = await PDFNet.PDFDoc.create();
// Step 2: Use an in-memory filter for the source DOCX
const memoryFilter = await PDFNet.Filter.createFromMemory(fileBuffer);
// Step 3: Initialize a streaming conversion object
const conversion =
await PDFNet.Convert.streamingPdfConversionWithPdfAndFilter(
pdfdoc, // Target PDFDoc object
memoryFilter // Source memory filter
);
//...
}
Instead of loading the file from disk, we process the uploaded file directly in memory using Filter.createFromMemory
.
Fortunately, creating that fileBuffer
(that is, the raw byte data of the DOCX file) from our uploaded DOCX is pretty easy thanks to multer
, but we’ll get to that in a second.
What’s critical here is that by creating a dedicated conversion object for each request, it’s easier to track individual conversion progress (even cancel the task, if needed) and optimize resource allocation across threads — let’s take a look at that next.
Progress Monitoring, Feedback, Error Handling
The minimal version is fast enough for smaller DOCX files with only a few pages, but legal, medical, financial etc. documents are going to be much, much larger. And long-running processes that can result in timeouts will always equal poor user experience if users are left waiting without feedback, or without errors gracefully handled.
Good news; that new streamingPdfConversionWithPdfAndFilter function we used? It can address exactly that issue.
So instead of this:
// Step 1: convert
const pdfdoc = await PDFNet.Convert.officeToPdfWithPath("./my-doc.docx");
// Step 2: save to PDF
await pdfdoc.save("./my-doc.pdf", PDFNet.SDFDoc.SaveOptions.e_linearized);
We’ll now do this:
// Step 1: create a PDFDoc container for the destination PDF
const pdfdoc = await PDFNet.PDFDoc.create();
// Step 2: convert file buffer to a Filter
const filter = await PDFNet.Filter.createFromMemory(fileBuffer);
console.log("Memory filter created.");
// Step 3: Initialize the streaming conversion object
const conversion = await PDFNet.Convert.streamingPdfConversionWithPdfAndFilter(
pdfdoc, // target PDFDoc object
filter // memory filter created from file buffer
);
console.log("Conversion initialized.");
// Step 4: Convert + monitor conversion progress
while (
(await conversion.getConversionStatus()) ===
PDFNet.DocumentConversion.Result.e_Incomplete
) {
await conversion.convertNextPage(); // process each page incrementally
console.log(
`Progress: ${Math.round((await conversion.getProgress()) * 100)}% - ` +
`${await conversion.getProgressLabel()}`
);
}
// Step 5: Handle success or errors
if (
(await conversion.getConversionStatus()) ===
PDFNet.DocumentConversion.Result.e_Success
) {
console.log("Conversion succeeded!");
const pdfBuffer = await pdfdoc.saveMemoryBuffer(
PDFNet.SDFDoc.SaveOptions.e_linearized
);
console.log("PDF conversion complete!");
return pdfBuffer; // Return or send the buffer as a response
} else {
const errorString = await conversion.getErrorString();
console.error("Conversion failed:", errorString);
throw new Error(`PDF Conversion Error: ${errorString}`); // can integrate with error monitoring tools
}
Using convertNextPage
, we process the document incrementally. This will minimize resource spikes, making it far more suitable for handling large files or multiple simultaneous requests.
Next, streamingPdfConversionWithPdfAndFilter
resolves to a DocumentConversion
object which provides:
**getProgress**
(): A float value (e.g., 0.33) indicating completion percentage (33%).**getProgressLabel**
(): A descriptive label (e.g., “Converting page 2”).
You can send these values as live updates to the client using WebSockets or Server-Sent Events (SSE) to improve user experience.
Finally, we now have granular error handling!
**getErrorString**
(): Get a detailed error description for failed conversions.**getWarningString**
(): Get warnings for potential issues encountered during conversion.
You can either console.log
these or seamlessly integrate your existing logging tools/services, send meaningful error responses, or even retry conversions if needed.
Customization
Different use-cases can (and frequently, will) demand specific output — bookmarks, table of contents updates, or structured tags. We can specify a number of options by passing in an options object — an instance of the OfficeToPDFOptions class.
You can tweak a massive number of options for all kinds of MS Office documents this way, not just DOCX, but let’s cover some of the most common ones for our use-case:
// Step 1 : PDFDoc creation here
// Step 2 : Filter creation here
// Step 3 : Setting up conversion options
const options = new PDFNet.Convert.OfficeToPDFOptions();
/* Examples of options */
// set locale for the conversion
options.setLocale("en-US");
// preserve bookmarks in the output PDF
options.setIncludeBookmarks(true);
// preserve inline comments as annotations in the output PDF (default is e_off)
options.setDisplayComments(
PDFNet.Convert.OfficeToPDFOptions.DisplayComments.e_annotations
);
// allow incremental saving (v. useful for large documents!)
options.setIncrementalSave(true);
// password to open your source DOCX if it is encrypted
options.setPassword("your-password");
// and many more.
// Step 4: Initialize the streaming conversion object
const conversion = await PDFNet.Convert.streamingPdfConversionWithPdfAndFilter(
pdfdoc, // Target PDFDoc object
memoryFilter, // Source memory filter
options // Conversion options
);
// ... rest of the code
Preserving Fonts
If your DOCX uses custom or non-standard fonts, they’ll need to be preserved. Now, Apryse’s SDK can detect embedded fonts in your source DOCX and use them automatically, but if they aren’t present, you have two options.
First, if you know which font(s) your DOCX uses, you can self-serve them by pointing to a custom font resource path, URI (ideal) or filesystem (unlikely; but go ahead, if that’s something your production environment can handle).
Gather all font files your DOCX will require, and place them in a folder structure like this:
/your/resource/directory
├── fonts.json
├── YourFont-Regular.ttf
├── YourFont-Bold.ttf
└── other-fonts
You need to create a fonts.json file that’ll tell Apryse how to use the fonts in that folder.
{
"fontList": [
{
"coverage": "U+20-7F,U+A0-370,U+374-376,...",
"ext": ["ttf", "ttf.brotli"],
"family": "YourFont",
"id": "yourfont",
"variants": {
"500": "YourFont-Regular",
"700": "YourFont-Bold"
}
// ...other fonts
}
]
}
Here, **coverage**
needs to be Unicode ranges supported by the font, **ext**
the accepted font file extensions, **family**
the logical name of the font family (e.g., Lato), and finally, **variants**
the mapping of font styles/weights to actual font files.
💡 Apryse provides a font package for your use here if you need further hints on this structure, or if you’d rather not create your own.
Then, before your conversion code, point to this full resource path like so:
// set the URL pointing to the hosted font resource dir
PDFNet.WebFontDownloader.setCustomWebFontURL("https://your-domain/webfonts/");
// allow PDFNet to access the network to download missing fonts when possible.
PDFNet.WebFontDownloader.enableDownloads();
// if a font isn't embedded in the source DOCX, and you don't provide a font catalogue/resource folder yourself, the Apryse library attempts to find a close match automatically.
const options = new PDFNet.Convert.OfficeToPDFOptions();
// your other options values here
options.setLocale("en-US");
// then, if you're on a platform that needs it…
options.setSmartSubstitutionPluginPath("./font-substitution-path");
// rest of conversion code here
This needs a
pdftron_layout_resources.plugin
file, which is automatically built into the library and will be used automatically on client/server libraries. (If you’re developing for mobile platforms, it’ll need to be explicitly set with the path to the substitution plugin in your options object.)
That will give you good-enough results, but for the best conversion quality, you should ideally be embedding fonts in your source DOCX, or a separate resource DOCX that you place in your resource directory.
Step 3: Putting It All Together
Let’s combine everything we’ve covered thus far — scalability, error handling, monitoring, performance — into a finalized version of our server code for accepting a DOCX file upload from the client with the multer
middleware, converting DOCX to PDF, and sending back a PDF that the user can download.
require("dotenv").config();
const express = require("express");
const multer = require("multer");
const { PDFNet } = require("@pdftron/pdfnet-node");
const cors = require("cors");
const app = express();
const port = 3001;
// Step 0: Multer middleware init
const upload = multer(); // Store file in memory, no temporary disk storage
// Step 0: CORS setup here
// ...
// Conversion function for DOCX to PDF
async function convertDocxToPdfFromMemory(fileBuffer) {
return PDFNet.runWithCleanup(async () => {
// Step 1: Create PDFDoc container for final PDF
const pdfdoc = await PDFNet.PDFDoc.create();
// Step 2: Create a memory filter from the uploaded file buffer
const memoryFilter = await PDFNet.Filter.createFromMemory(fileBuffer);
// Step 3: Customize with options object as needed
const options = new PDFNet.Convert.ConversionOptions();
// Step 4: Initialize the streaming conversion object
const conversion =
await PDFNet.Convert.streamingPdfConversionWithPdfAndFilter(
pdfdoc, // Target PDFDoc
memoryFilter, // Source Filter (from file buffer)
options // Conversion options
);
// Step 5: Actual conversion progress loop
while (
(await conversion.getConversionStatus()) !==
PDFNet.DocumentConversion.Result.e_Success
) {
await conversion.convertNextPage(); // Process each page
console.log(
`Progress: ${Math.round((await conversion.getProgress()) * 100)}% - ${await conversion.getProgressLabel()}`
);
}
console.log("Conversion complete. Saving PDF...");
const pdfBuffer = await pdfdoc.saveMemoryBuffer(
PDFNet.SDFDoc.SaveOptions.e_linearized
);
console.log("PDF saved to memory buffer.");
return pdfBuffer; // Return the buffer containing the converted PDF
}, process.env.APRYSE_API_KEY); // Ensure API key is passed
}
// Finally: POST route for file upload and conversion
app.post("/convert-to-pdf", upload.single("file"), async (req, res) => {
if (!req.file) {
return res.status(400).json({ error: "No file uploaded." });
}
try {
// Convert DOCX file buffer to PDF buffer
const pdfBuffer = await convertDocxToPdfFromMemory(req.file.buffer);
// Send the converted PDF back with appropriate headers
res.set({
"Content-Type": "application/pdf",
"Content-Disposition": "attachment; filename=converted.pdf",
}); // This'll trigger a PDF file download for the client
res.send(pdfBuffer); // Send the PDF buffer as response
} catch (error) {
res.status(500).json({ error: "Conversion failed." });
}
});
app.listen(port, () => {
console.log(`Server running at http://localhost:${port}`);
});
When we call multer
() without specifying a storage configuration:
- It uses in-memory storage by default.
- Uploaded files are stored in memory as a Buffer and made available in
req.file.buffer
, which, as you’ve seen before, is the raw byte representation of our uploaded DOCX file.
It then acts as middleware for our Express app, processing exactly one file. You could absolutely adapt this to handle multiple DOCX uploads by replacing upload.single(‘file’)
with upload.array(‘file’, maxCount),
where maxCount
is the maximum number of files to be accepted.
Uploading Calibre’s 1.3 MB demo.docx from my client, this server gives me back a 129 KB PDF file that I can download on the client, with all the kitchen-sink formatting in this 8-page DOCX — paragraphs, graphics, dropcaps, footnotes, styled text, fonts, highlights, links, tables, lists, bookmarks, and more — preserved.
I’d call that a win!
Lessons Learnt
PDFs are great for the web, but converting Office documents to PDF can be a pain (not to mention potentially expensive), especially if you’re automating.
With Apryse, we’ve built both a solution that not only handles the real-world complexities of converting MS Office DOCX files to web-ready PDFs, but also does so in a way that does not require you to wire up an existing Office installation for jury-rigged conversions on your backend — in fact, it doesn’t require any MS Office license at all, on your end or your users.
You can check out the Apryse documentation here.. As you move forward, this foundation can be extended and adapted to suit your specific needs, whether you’re integrating it into an intranet tool, a public-facing app, or a high-traffic enterprise system. With Apryse’s capabilities, you’re well-equipped to tackle any document conversion challenge.