Extract Anything from Any PDF: Inside Foxit’s Advanced Extraction Engine

Basic PDF extraction libraries break on scanned documents, complex tables, and form fields, leaving downstream pipelines starved of clean data. Foxit’s PDF Structural Extraction API combines OCR, layout recognition, and AI parsing to return all twelve PDF element types as structured JSON, ready for RAG, BI, and CRM workflows.
Your PDF extraction pipeline passes unit tests against the sample invoices you built it on. Then production arrives and you’re looking at 47% garbled output on the Q4 contract batch because half those documents are scanned TIFFs wrapped in a PDF envelope, and your extraction library has no concept of what an image-only page actually is.
The failure modes are specific. PyMuPDF’s get_text() returns empty strings on scanned PDFs because it reads content streams directly, and image-only pages carry no text stream. pdfplumber’s table detection merges rows when column widths span non-uniform grids, which is standard in any financial statement that mixes summary and line-item rows on the same page. Embedded images containing meaningful text (stamped signatures, engineering drawing annotations, letterhead logos) get silently dropped. The library extracts coordinates for the XObject reference but does nothing with the raster data inside. Form fields built on non-standard annotation types (AcroForms using widget annotations with custom action streams) lose their values entirely when you serialize to text.
The architectural distinction that creates this problem is the difference between content serialization and semantic extraction. A PDF converter reads a content stream and writes out whatever character sequences it finds in rendering order. An extraction engine understands the spatial relationships between those character sequences: that two columns of text at x=72 and x=320 are parallel body copy, that the row at y=210 belongs to the table starting at y=180, that the text block repeating on every page is a header carrying lower retrieval weight in a RAG index. Output that lacks spatial and semantic classification looks correct on screen but breaks every downstream consumer that depends on structure.
BI dashboards require numbers tied to the right row labels. AI ingestion pipelines require heading hierarchy to chunk accurately. CRMs require form field values extracted from AcroForm widget dictionaries, delivered with field names intact. The delta between what basic extraction libraries return and what those systems can actually consume is where document pipeline engineering hours accumulate.
How Foxit’s PDF Structural Extraction Engine Works Under the Hood
Foxit exposes this capability as the PDF Structural Extraction (Trial) endpoint inside the PDF Services API (POST /pdf-services/api/documents/pdf-structural-extract). Trial status means the schema is versioned at v1.0.7 and may evolve, but the contract is stable enough to build against today, and the endpoint runs against the production base URL at developer-api.foxit.com.
The engine runs three coordinated layers. The OCR layer operates on rasterized page content, recognizing characters from image-based PDFs and scanned documents across 200+ languages. The layout recognition layer applies spatial analysis to identify column boundaries, reading order, table cell boundaries, figure regions, and header/footer zones. The AI-based parsing layer classifies extracted objects semantically, resolving ambiguous blocks (a text run that spans two layout columns, or a figure caption that reads syntactically like a section heading) into typed elements.
All three layers run inside Foxit’s core PDF engine, which powers 700 million+ users across 20+ years of production deployments. That engine has native awareness of PDF internal structures: content streams, XObject dictionaries, AcroForm field trees, and annotation layers. The OCR layer operates on the same internal page representation the rendering engine uses, so it handles annotated PDFs where text overlaps image regions, and form fields where the visual display and stored value diverge.
The same Structural Extraction endpoint is also Step 1 of Foxit’s PDF Translation (Trial) workflow, which signals that the extraction output is structured enough to backbone a full rewrite-and-rerender pipeline.
NVIDIA’s July 2025 NeMo Retriever research on PDF extraction showed that specialized OCR-based pipelines outperform general-purpose vision-language models on retrieval recall and throughput for complex elements including tables, charts, and infographics. VLMs produce plausible-looking output on clean documents but degrade on exactly the edge cases (multi-column scans, mixed-content pages, annotated overlays) that a specialized pipeline handles systematically.
The Full Object Map: All 12 Extractable PDF Element Types
The Structural Extraction schema v1.0.7 defines twelve element types in the type enum: title, head, paragraph, table, image, headerFooter, form, hyperlink, footnote, sidebar, annotation, and formula.
The API exposes no per-object filter parameters. The only request body fields are documentId (required) and password (optional, for protected PDFs). The engine extracts the full element graph and returns everything in one asynchronous round-trip. You filter client-side on the returned JSON. The design is correct for the workload because partial extraction would require re-running layout recognition per request, costing more compute than transmitting the full element set in a single ZIP.
The result is a ZIP archive. At minimum it contains StructureInfo.json, whose top-level analyzeResult object holds version, pages, elements, and info. Documents that contain figures or tables also produce additional binary files (image renditions and table renditions) alongside the JSON, referenced from individual elements so the JSON payload stays manageable on large documents.
Each element in the document-wide flat elements array carries its own id, type, content, region (with page and an 8-point boundingBox polygon), and score confidence value. A table element adds its cell grid. A form element adds field data. An image element points to its binary file in the ZIP. Because title, head, and paragraph elements appear in document reading order in the elements array, they chunk cleanly on semantically correct boundaries, which is what a RAG index needs to return complete, coherent passages.
Each type maps directly to a downstream use case: table feeds financial reporting pipelines, form drives automated CRM data entry, image routes to computer vision workflows or document archives, annotation builds compliance audit trails, and head combined with paragraph elements in reading order feeds RAG ingestion.
API Walkthrough: The Four-Step Async PDF Extraction Flow
There’s no synchronous path. You upload, get a task ID, poll until completion, then download the result ZIP. Every request carries two headers: client_id and client_secret (lowercase snake_case, as specified in the API spec’s security schemes). Both come from the Developer Portal’s default application. Pass them as named HTTP headers on every request and do not use Authorization: Bearer.
The four-step sequence runs as follows:
The four-step sequence diagram uses two headers on every request: client_id and client_secret. Create a free developer account at account.foxit.com/site/sign-up (no credit card required, no sales call). Once you’re in, the credentials live under the default application in the Developer Portal. Copy the Client ID and Client Secret pair and treat them like any other API secret. Pass them as named HTTP headers on every call (lowercase snake_case, not Authorization: Bearer).
Step 1: Upload the PDF to
POST /pdf-services/api/documents/uploadasmultipart/form-datawith the file under field namefile. The 100MB ceiling is enforced with a413and error codeMAX_UPLOAD_SIZE_EXCEEDED. The response body returns{ "documentId": "doc_abc123" }.Step 2: Starts extraction with
POST /pdf-services/api/documents/pdf-structural-extract, passing{ "documentId": "doc_abc123" }. Add a"password"field for protected PDFs. The response is202 Acceptedwith{ "taskId": "task_xyz789" }.Step 3: Polls
GET /pdf-services/api/tasks/{task-id}. TheTaskResponsecarriestaskId,status,progress(0-100 integer),resultDocumentId, and an optionalerrorobject. Thestatusenum values arePENDING,IN_PROGRESS,COMPLETED, andFAILED. Portal narrative copy occasionally uses “PROCESSING,” but the schema enum value isIN_PROGRESS. Match your code against the enum. Poll untilCOMPLETEDand captureresultDocumentId.Step 4: Downloads with
GET /pdf-services/api/documents/{resultDocumentId}/download, which streams the ZIP archive. The optionalfilenamequery parameter overrides the default filename.
The complete cURL sequence for all four steps:
# Step 1: Upload
curl -X POST "https://na1.fusion.foxit.com/pdf-services/api/documents/upload" \
-H "client_id: YOUR_CLIENT_ID" \
-H "client_secret: YOUR_CLIENT_SECRET" \
-F "file=@invoice_batch.pdf"
# {"documentId":"doc_abc123"}
# Step 2: Start extraction
curl -X POST "https://na1.fusion.foxit.com/pdf-services/api/documents/pdf-structural-extract" \
-H "client_id: YOUR_CLIENT_ID" \
-H "client_secret: YOUR_CLIENT_SECRET" \
-H "Content-Type: application/json" \
-d '{"documentId":"doc_abc123"}'
# 202 Accepted: {"taskId":"task_xyz789"}
# Step 3: Poll task status
curl "https://na1.fusion.foxit.com/pdf-services/api/tasks/task_xyz789" \
-H "client_id: YOUR_CLIENT_ID" \
-H "client_secret: YOUR_CLIENT_SECRET"
# {"taskId":"task_xyz789","status":"COMPLETED","progress":100,"resultDocumentId":"result_def456"}
# Step 4: Download the result ZIP
curl "https://na1.fusion.foxit.com/pdf-services/api/documents/result_def456/download" \
-H "client_id: YOUR_CLIENT_ID" \
-H "client_secret: YOUR_CLIENT_SECRET" \
-o extraction_result.zip The Python version with a polling loop and ZIP parsing:
import requests, json, time, zipfile
BASE_URL = "https://na1.fusion.foxit.com/pdf-services/api"
HEADERS = {"client_id": "YOUR_CLIENT_ID", "client_secret": "YOUR_CLIENT_SECRET"}
# Step 1: Upload
with open("invoice_batch.pdf", "rb") as f:
doc_id = requests.post(
f"{BASE_URL}/documents/upload", headers=HEADERS, files={"file": f}
).json()["documentId"]
# Step 2: Start extraction
task_id = requests.post(
f"{BASE_URL}/documents/pdf-structural-extract",
headers={**HEADERS, "Content-Type": "application/json"},
json={"documentId": doc_id},
).json()["taskId"]
# Step 3: Poll until COMPLETED or FAILED
while True:
task = requests.get(f"{BASE_URL}/tasks/{task_id}", headers=HEADERS).json()
if task["status"] == "COMPLETED":
result_doc_id = task["resultDocumentId"]
break
if task["status"] == "FAILED":
raise RuntimeError(f"Extraction failed: {task.get('error')}")
time.sleep(2)
# Step 4: Download the result ZIP and save it locally for inspection,
# then parse StructureInfo.json from the saved file
response = requests.get(
f"{BASE_URL}/documents/{result_doc_id}/download", headers=HEADERS
)
with open("advanced-extraction-result.zip", "wb") as f:
f.write(response.content)
with zipfile.ZipFile("advanced-extraction-result.zip") as zf:
json_name = next(n for n in zf.namelist() if n.endswith("StructureInfo.json"))
result = json.loads(zf.read(json_name))["analyzeResult"]
print(f"Schema: {result['version']['schema']}, Elements: {len(result['elements'])}")
On a clean run you should see output like Schema: 1.0.7, Elements: 9 for a small invoice batch. You’ll also find a fresh advanced-extraction-result.zip next to your script. That ZIP holds the full API response, including StructureInfo.json and any rendered image or table binaries, so you can inspect everything the engine returned and not just the parsed JSON.
First, set up and activate a Python virtual environment in your project folder. The official venv guide covers the exact commands for macOS, Linux, and Windows.
Once the virtualenv is active, the sample only needs one third-party package. Drop this into a requirements.txt next to your script and install it with pip install -r requirements.txt:
requests>=2.31.0
If you’re on macOS, use Homebrew Python (brew install python) rather than the system Python from the Xcode command-line tools. The Xcode build is linked against LibreSSL, which is enough to make a correct sample fail.
The ZIP contains a StructureInfo.json file whose top-level object wraps everything under analyzeResult. Inside that wrapper you get a version object, a pages array, a flat elements array, and an info block with analysis metadata. Each element carries its own id, type, content, region (with page and an 8-point boundingBox polygon [x1,y1,x2,y2,x3,y3,x4,y4]), and a score confidence value:
{
"analyzeResult": {
"version": {
"schema": "1.0.7",
"software": "FoxitPDFAnalyzer",
"model": "idp-analysis"
},
"pages": [
{
"pageNumber": 1,
"size": { "width": 612, "height": 792, "unit": "point" },
"state": "success"
}
],
"elements": [
{
"id": "title1",
"type": "title",
"content": {
"text": "Q3 Revenue Summary",
"style": {
"fontName": "Helvetica",
"fontSize": 24.0,
"fontWeight": 0,
"fontItalic": false
}
},
"region": {
"page": 1,
"boundingBox": [72, 47, 317, 47, 317, 80, 72, 80]
},
"score": 0.76
}
],
"info": {
"basicInfo": {
"softwareVersion": "1.6.0",
"analyzedPageCount": 1,
"elementCounts": { "title": 1 }
},
"extendedMetadata": {
"pageCount": 1,
"isEncrypted": false,
"hasAcroform": false,
"language": "en"
}
}
}
} Elements of type table, image, and form carry additional type-specific payload on top of this base shape, and any rendered image or table binary lands as a sibling file inside the ZIP referenced from the element.
HTTP errors return a standard error envelope:
{ "code": "VALIDATION_ERROR", "message": "documentId is required" } The documented error codes include VALIDATION_ERROR (400), MAX_UPLOAD_SIZE_EXCEEDED (413), DOCUMENT_NOT_FOUND (404), STORAGE_ERROR, and INTERNAL_SERVER_ERROR (500).
Password-protected PDFs that arrive with no password parameter reach the processing stage before failing. That failure surfaces in the task status poll response after status reaches FAILED, so your error handler must inspect the task response body in addition to the HTTP status codes from the initial POST calls:
{
"taskId": "task_xyz789",
"status": "FAILED",
"progress": 0,
"error": {
"code": "INTERNAL_SERVER_ERROR",
"message": "Document is password-protected"
}
} Wiring Extracted PDF Data Into Your Workflow
Pattern 1: AI/RAG pipeline. Filter the flat elements array to title, head, and paragraph types. Chunk by heading hierarchy, iterating over the array in the order the engine returned it (document reading order is preserved across columns and pages). Embed each chunk and index in Pinecone, pgvector, or your vector store of choice. Correct reading order, as provided by the extraction engine, is the prerequisite for accurate RAG retrieval on multi-column and paginated documents. When chunks split mid-thought because a layout detector merged two columns, retrieval recall drops and answer quality follows.
Pattern 2: BI reporting. Filter elements by type == "table" client-side, then convert each table’s cell structure into a pandas DataFrame:
import pandas as pd
# `result` is the `analyzeResult` object loaded from StructureInfo.json
tables = [e for e in result["elements"] if e["type"] == "table"]
for i, tbl in enumerate(tables):
# Cells live at content.body.cells[]. Each cell carries rowIndex,
# columnIndex, and a nested paragraph whose content.text holds the value.
body = tbl["content"]["body"]
grid = [["" for _ in range(body["columnCount"])] for _ in range(body["rowCount"])]
for cell in body.get("cells", []):
text = cell.get("paragraph", {}).get("content", {}).get("text", "")
grid[cell["rowIndex"]][cell["columnIndex"]] = text
df = pd.DataFrame(grid[1:], columns=grid[0]) # first row as header
print(f"Table {i}: {df.shape[0]} rows x {df.shape[1]} cols")
# df.to_gbq("finance.q3_revenue", project_id="your-project") # BigQuery
# df.to_sql("q3_revenue", engine) # Postgres / Snowflake The row and column indices from the extraction schema map directly to DataFrame positions, so you get a correctly-structured table with zero manual parsing.
Pattern 3: n8n automation. The four-step flow maps to a chain of HTTP Request nodes in n8n. The first node uploads to POST .../upload and passes documentId through the item. The second sends POST .../pdf-structural-extract and captures taskId. A Loop Over Items construct with an HTTP Request node calling GET .../tasks/{taskId} on a two-second interval checks status until COMPLETED, then routes to the download node. The final HTTP Request node calls GET .../documents/{resultDocumentId}/download, and a Code node using n8n’s binary data helpers unpacks the ZIP and parses the JSON for routing to a Salesforce, HubSpot, Postgres, or Airtable node. The polling requirement makes this a multi-node workflow, but you write zero custom glue code and gain n8n’s built-in error routing and retry handling.
PDF Extraction Tools Compared: Foxit vs. Adobe, Google, Amazon, and Azure
| Tool | Underlying Approach | Ecosystem Lock-in | Handles Scanned PDFs | Pricing Model | Setup Overhead | Status |
|---|---|---|---|---|---|---|
| Foxit Structural Extraction | Proprietary OCR + layout recognition + AI (integrated core engine) | Cloud-agnostic REST API | Yes (dedicated OCR layer) | Subscription, no per-page credits | Low (2 credential headers, 4 REST calls) | Trial (schema v1.0.7) |
| Adobe PDF Extract API | Adobe Sensei ML, reading order + renditions | Adobe Document Services | Yes | Contact sales | Medium (Adobe SDK + ecosystem) | GA |
| Google Document AI | Cloud ML + generative AI, Document Object Model | Google Cloud required | Yes | Per-page pay-as-you-go | Medium-high (GCP + IAM) | GA |
| Amazon Textract | Deep learning OCR, key-value and table extraction | AWS-native | Partial (strong on forms, weaker on complex layouts) | Per-page pay-as-you-go | Medium (AWS + IAM) | GA |
| Azure Document Intelligence | Prebuilt + custom ML models | Azure ecosystem | Yes (prebuilt models) | Per-page + model training costs | High for custom models | GA |
Google Document AI and Azure Document Intelligence win on ecosystem integration if you’re all-in on those clouds. Adobe wins on PDF structural fidelity for workflows already inside the Adobe Document Services ecosystem. Amazon Textract excels on standardized form documents where its pre-trained schema fits the input. These are real advantages, and the comparison is honest only when those contexts are acknowledged.
Foxit’s case is strongest when you need a cloud-agnostic REST API with zero ecosystem dependency, full object coverage across all twelve element types, and enterprise throughput (10 to 10,000+ PDFs/day) with SOC 2, GDPR, and HIPAA compliance built in. The Structural Extraction status is a real trade-off to factor in. The schema at v1.0.7 is callable and stable enough for pipeline integration today, but GA competitors carry a finalized contract. Pin your parser to the version field in the response and you’re insulated from schema evolution.
Your First PDF Extraction API Call, Right Now
Go to developer-api.foxit.com, create a free developer account (no credit card required), and copy your Client ID and Client Secret from the default application. Use the built-in API Playground or import the Postman collection from the Developer Portal to run the four-step sequence: upload a real document (an invoice, a multi-page contract, or a scanned form), call pdf-structural-extract with the returned documentId, poll tasks/{taskId} until COMPLETED, then download via documents/{resultDocumentId}/download.
Unzip the result, open StructureInfo.json, and check three things: analyzeResult.version.schema should report 1.0.7, analyzeResult.elements[] should contain at least one table element and one form element if your source document includes those, and the ZIP root should contain the corresponding binary files for any image-type elements. That verification confirms the full extraction pipeline is wired correctly end-to-end.
The same endpoint pattern scales to enterprise volumes. Increase upload and poll concurrency horizontally and the architecture stays identical, with no schema changes, no infrastructure modifications, and no per-page credit consumption to track.
The engineering gap between what basic extraction libraries return and what downstream systems actually consume is where document pipeline hours accumulate. Structural Extraction closes that gap at the API layer, so the complexity stays in the engine and out of your codebase. Get started at developer-api.foxit.com.
PDF Structural Extraction FAQ
What is PDF structural extraction?
PDF structural extraction is the process of identifying and classifying the semantic elements inside a PDF, such as titles, paragraphs, tables, forms, images, and annotations, rather than just pulling raw text. Foxit’s PDF Structural Extraction API returns twelve distinct element types as structured JSON, preserving spatial relationships, reading order, and table cell grids so downstream systems like RAG pipelines, BI dashboards, and CRMs can consume the data without manual parsing.
Can Foxit's API extract text from scanned PDFs?
Yes. Foxit’s PDF Structural Extraction engine includes a dedicated OCR layer that recognizes characters from image-based and scanned PDFs across 200+ languages. The OCR runs on the same internal page representation as the rendering engine, so it handles edge cases like text overlapping image regions, stamped signatures, and engineering drawing annotations that basic libraries like PyMuPDF silently drop.
How does Foxit's PDF extraction API differ from Adobe, Google Document AI, and Amazon Textract?
Foxit’s API is cloud-agnostic with no ecosystem lock-in, requiring just two credential headers and four REST calls. Adobe PDF Extract requires the Adobe Document Services ecosystem, Google Document AI requires GCP and IAM setup, and Amazon Textract requires AWS infrastructure. Foxit also uses subscription-based pricing without per-page credits, while Google, AWS, and Azure all charge per page.
What PDF elements can Foxit's Structural Extraction API identify?
The API identifies twelve element types: title, head, paragraph, table, image, headerFooter, form, hyperlink, footnote, sidebar, annotation, and formula. Each element returns with its content, an 8-point bounding box polygon, page location, and a confidence score. Tables include full cell grids with row and column indices, forms include field data, and images are extracted as separate binary files inside the result ZIP.
How do I call the Foxit PDF Structural Extraction API?
The API uses a four-step asynchronous flow: upload the PDF via POST /documents/upload to get a documentId, start extraction with POST /documents/pdf-structural-extract, poll GET /tasks/{taskId} every two seconds until status is COMPLETED, then download the result ZIP via GET /documents/{resultDocumentId}/download. Authentication uses two headers, client_id and client_secret, available from the default application in the Foxit Developer Portal.
Is the Foxit PDF Structural Extraction API ready for production use?
The endpoint is currently in Trial status with schema version v1.0.7, meaning the contract is stable but may evolve. It runs on the production base URL at developer-api.foxit.com and is built on Foxit’s core PDF engine, which powers 700 million+ users across 20+ years of deployments. For production pipelines, pin your parser to the version field in the response to insulate against future schema changes.
Foxit MCP Server: Give AI Agents Direct Access to 30+ PDF Tools via Model Context Protocol

Learn how the Foxit MCP Server lets AI agents handle PDF conversion, OCR, merge, signing, and document workflows.
Building a document automation agent with raw REST calls means writing the same boilerplate every time: upload a file, poll for task completion, download the result, handle errors, and manage auth tokens across multiple endpoints. For PDF operations, that loop repeats for every conversion, OCR call, or merge operation in your pipeline. The Foxit PDF API MCP Server collapses those loops into 30+ directly callable tools, with the MCP Server handling upstream REST complexity internally.
This guide covers how the server registers, what it exposes, how Foxit’s eSign and DocGen REST APIs extend the same agent session into signing and document generation workflows, and a concrete four-step workflow you can replicate against your own documents.
MCP Architecture in 90 Seconds
The MCP specification defines three roles. The Host is the LLM runtime (Claude Desktop, VS Code with GitHub Copilot, or Cursor) that manages the conversation and decides when to call tools. The Server is the capability provider, a process that advertises tools over the MCP protocol and executes them against some underlying service. Tools are the individual callable operations each server exposes, defined by a JSON schema the host uses to understand inputs and outputs.
Foxit occupies both sides of this architecture. Foxit PDF Editor ships as an MCP Host, the first PDF application to do so, connecting outward to external MCP servers like Gmail or Salesforce so its AI assistant can reach those services. The Foxit PDF API MCP Server works in the other direction, exposing Foxit’s cloud PDF Services API as 30+ tools for any MCP Host to call.
The MCP Server exposes PDF Services operations: conversion between formats, content extraction, OCR, merge, split, compress, flatten, linearize, compare, watermark, form data import/export, security, and property inspection. Foxit’s eSign API and DocGen API are separate REST services that a single agent session can also reach. The MCP tools handle PDF processing, while direct HTTP calls to eSign and DocGen handle signing and template generation.
Prerequisites and Configuration
You need three things before registering the server:
- A Foxit developer account (free plan at developer-api.foxit.com, no credit card required) to obtain a
client_idandclient_secret - Python 3.11+ with the
uvpackage manager (or Node.js 18+ withpnpmfor the TypeScript version) - An MCP-compatible host such as VS Code with GitHub Copilot, Claude Desktop, or Cursor
Clone the repo from github.com/foxitsoftware/foxit-pdf-api-mcp-server, then register it in your host’s MCP config. For VS Code with GitHub Copilot, add the following to .vscode/mcp.json:
{
"servers": {
"foxit-pdf": {
"command": "uv",
"args": [
"--directory",
"/path/to/foxit-pdf-api-mcp-server",
"run",
"foxit-pdf-api-mcp-server"
],
"env": {
"FOXIT_CLOUD_API_HOST": "https://na1.fusion.foxit.com/pdf-services",
"FOXIT_CLOUD_API_CLIENT_ID": "your_client_id",
"FOXIT_CLOUD_API_CLIENT_SECRET": "your_client_secret"
}
}
}
} For Claude Desktop, the same three environment variables go into the env block of your claude_desktop_config.json under the mcpServers key, with command and args matching the structure above.
Set FOXIT_CLOUD_API_CLIENT_ID and FOXIT_CLOUD_API_CLIENT_SECRET as environment variables on your system before the host process launches. Passing credentials through prompt context is a security risk your production setup should address. The client_id and client_secret from your developer portal authenticate all MCP tool calls to the PDF Services API. Adding eSign to the same agent session requires its own OAuth2 token exchange (covered in the next section), keeping the two credential scopes isolated.
Restart your MCP host after saving the config. The server advertises all tools to the host on connection, so your agent can inspect available operations before invoking any.
PDF Services MCP Tools: Full Catalog
The 30+ tools organize into seven functional categories. Most tools expect a documentId returned by a prior upload_document call, and return a resultDocumentId you pass to download_document when you want the output locally. The exception is pdf_from_url, which accepts a URL directly.
Document Lifecycle
upload_document: upload a PDF, Office file, image, HTML file, or plain text file; returns adocumentIdfor subsequent operationsdownload_document: retrieve a processed result to a local file pathdelete_document: clean up stored files from cloud storage
PDF Creation (file to PDF)
pdf_from_word,pdf_from_excel,pdf_from_ppt: convert Office documents to PDFpdf_from_text,pdf_from_image,pdf_from_html: convert plaintext, image files, or HTML to PDFpdf_from_url: fetch a live URL and convert the rendered page to PDF
PDF Conversion (PDF to file)
pdf_to_word,pdf_to_excel,pdf_to_ppt: extract editable Office formats from a PDFpdf_to_text,pdf_to_html,pdf_to_image: export text, HTML, or image representations
Manipulation
pdf_merge: combine multiple PDFs into onepdf_split: split by page ranges, page count, or every page individuallypdf_extract: pull a subset of pages from a PDFpdf_compress: reduce file size by 30-70% depending on content typepdf_flatten: convert form fields and annotations to static content (required for compliance archiving workflows)pdf_linearize: optimize for Fast Web View so browsers can stream PDF pages incrementallypdf_watermark: apply text or image watermarks with configurable position, opacity, and rotationpdf_manipulate: rotate, delete, or reorder pages
Analysis
pdf_compare: diff two PDFs and return a color-coded annotation document showing changespdf_ocr: convert scanned or image-based PDFs to searchable text with multi-language supportpdf_structural_analysis: extract layouts, tables, images, form fields, metadata, and text as structured JSON
Security and Forms
pdf_protect: add password protection with 128-bit or 256-bit AES encryption and granular permission flagspdf_remove_password: strip password protection from a documentexport_pdf_form_data: extract form field values as JSONimport_pdf_form_data: populate form fields from a JSON payload
Properties
get_pdf_properties: return page count, page dimensions, PDF version, encryption status, digital signature info, embedded files, font inventory, and document metadata
The most-used operation in production document pipelines is pdf_from_word. Your agent uploads a DOCX file, gets back a documentId, then calls pdf_from_word with that ID. The underlying PDF Services API runs the conversion asynchronously, but the MCP Server handles polling internally and delivers the final result directly to your agent.
MCP tool call:
{
"name": "pdf_from_word",
"input": {
"documentId": "doc_abc123"
}
} {
"success": true,
"taskId": "task_xyz789",
"resultDocumentId": "doc_result456",
"message": "Word document converted to PDF successfully. Download using documentId: doc_result456"
} Pass doc_result456 to download_document to write the output PDF to disk, or feed it directly into another tool call like pdf_structural_analysis or pdf_compress as the next step in a chain.
Extending to eSign: Foxit’s Signing API as a Complementary REST Layer
After PDF processing via MCP tools, your agent can dispatch a document for signature by calling Foxit’s eSign REST API directly. The eSign API lives at https://na1.foxitesign.foxit.com with regional variants for EU (eu1.foxitesign.foxit.com), Canada (na2.foxitesign.foxit.com), and Australia (au1.foxitesign.foxit.com). These are direct HTTP calls from your agent to the eSign endpoints, coordinated alongside MCP tool calls in the same session.
Authentication uses OAuth2 client_credentials. The eSign token exchange is a distinct flow from the PDF Services header auth that backs your MCP tools:
import requests
resp = requests.post(
"https://na1.foxitesign.foxit.com/api/oauth2/access_token",
data={
"client_id": ESIGN_CLIENT_ID,
"client_secret": ESIGN_CLIENT_SECRET,
"grant_type": "client_credentials",
"scope": "read-write"
}
)
access_token = resp.json()["access_token"] The Foxit eSign API developer guide uses “folder” terminology throughout. The key endpoints in an automated signing flow are:
POST /folders/createfolder: create a signing folder from one or more PDF documents, define signers, subject, and messagePOST /folders/sendDraftFolder: dispatch the folder to signersPOST /templates/createtemplate: instantiate a folder from a saved template with pre-placed signature fieldsGET /folders/getFolderHistory: retrieve the full activity audit trail for a folder- Webhook channels for status callbacks: register a callback URL to receive real-time events when signers view, sign, or decline
A createfolder call takes the PDF output from your MCP pipeline, uploaded to eSign’s document storage after download_document retrieves it, and sets up the signing workflow:
POST /api/folders/createfolder
Authorization: Bearer {access_token}
Content-Type: application/json {
"folderName": "Acme Corp Contract - Q3 2025",
"sendNow": false,
"fileUrls": [
"https://your-storage.example.com/acme_contract_final.pdf"
],
"fileNames": [
"acme_contract_final.pdf"
],
"parties": [
{
"firstName": "John",
"lastName": "Smith",
"emailId": "[email protected]",
"permission": "FILL_FIELDS_AND_SIGN",
"sequence": 1
}
]
} Set sendNow to false to create a draft folder, then dispatch it with a separate call to /folders/sendDraftFolder. Alternatively, set sendNow to true to create and send in a single call. For files not accessible via URL, use base64FileString instead of fileUrls.
Foxit’s eSign API ships with HIPAA, eIDAS, ESIGN Act, UETA, 21 CFR Part 11, FERPA, and FINRA compliance built in. Audit trail records carry signer location, IP address, recipient identity, event timestamp, consent confirmation, security level, and complete folder history. For legal defensibility in regulated industries, capture and store these fields in your own data layer, because relying solely on Foxit’s folder history API for compliance record-keeping introduces a single point of failure in your audit chain.
End-to-End Workflow: AI Agent Automates a Sales Contract
Your sales ops agent receives a natural language instruction: “Generate a contract for Acme Corp, $48,000 ARR, send to [email protected] for signature.” The agent handles every step autonomously. Each call is labeled as either an MCP tool invocation or a direct REST call.

Step 1 uses MCP tool calls. The agent calls upload_document with the DOCX contract template, receives documentId: "doc_abc", then calls pdf_from_word. The MCP Server handles the async conversion and returns resultDocumentId: "doc_pdf" once it completes.
Step 2 uses an MCP tool call. The agent calls pdf_structural_analysis with documentId: "doc_pdf". The tool extracts party names, deal terms, and table data as JSON. The agent validates that required fields are present before proceeding.
Step 3 is a direct REST call to the DocGen API. The agent posts to /document-generation/api/GenerateDocumentBase64 with the validated field values merged into the contract template via {{dynamic_tags}} syntax. DocGen returns the finalized PDF with Acme Corp’s name, the $48,000 ARR figure, and correct dates populated.
Step 4 uses direct REST calls to the eSign API. The agent authenticates via OAuth2, uploads the DocGen output to eSign’s document storage, creates a signing folder via /folders/createfolder with [email protected] as the signer, and dispatches it via /folders/sendDraftFolder.
The LLM selects MCP tools for PDF processing and direct HTTP calls for eSign and DocGen because your system prompt specifies the endpoint contract for each step. The agent chains outputs across both call types, with coordination logic living in the prompt rather than in custom orchestration code you maintain separately.
Production Considerations: Error Handling, Rate Limits, and Data Governance
When you call PDF Services through the MCP Server, async polling happens inside the server process. Your agent receives a final resultDocumentId only after the task completes. When you call the raw PDF Services REST API directly, every operation returns a taskId you poll manually. The pattern below applies exponential backoff with a ceiling of 10 seconds per interval and a 30-second total timeout:
import time, requests
API_HOST = "https://na1.fusion.foxit.com/pdf-services"
auth_headers = {
"client_id": "your_client_id",
"client_secret": "your_client_secret"
}
def poll_task(task_id: str, max_wait: int = 30) -> str:
delay = 1
elapsed = 0
while elapsed < max_wait:
resp = requests.get(
f"{API_HOST}/api/tasks/{task_id}",
headers=auth_headers
)
data = resp.json()
if data["status"] == "COMPLETED":
return data["resultDocumentId"]
time.sleep(delay)
elapsed += delay
delay = min(delay * 2, 10)
raise TimeoutError(f"Task {task_id} timed out after {max_wait}s") The free developer plan at developer-api.foxit.com covers development and testing volumes. Production workloads above the free-tier threshold require a volume plan requested through the Developer Portal.
For data governance, all API traffic runs over TLS 1.2+, and documents at rest use AES-256 encryption. Foxit’s API security documentation covers SOC 2 Type II audit status, HIPAA BAA support, GDPR, CCPA, eIDAS, ESIGN Act, UETA, 21 CFR Part 11, FERPA, and FINRA requirements. Customer data runs in logically segmented environments. For healthcare, legal, or financial services pipelines, confirm your data residency requirements before connecting production document flows, because the eu1, na2, and au1 regional eSign endpoints determine where data is processed.
PDF API MCP Server FAQs
What is the Foxit PDF API MCP Server?
The Foxit PDF API MCP Server is an open-source Model Context Protocol server that exposes Foxit’s cloud PDF Services API as 30+ callable tools. Any MCP-compatible AI agent host, including Claude Desktop, VS Code with GitHub Copilot, and Cursor, can invoke these tools directly.
What PDF operations does the Foxit MCP Server support?
The server supports conversion (Word, Excel, PowerPoint, image, HTML, and URL to PDF and back), OCR, merge, split, extract, compress, flatten, linearize, watermark, compare, form data import/export, password protection, and full document property inspection across seven functional tool categories.
How does the Foxit MCP Server handle authentication?
PDF Services tools authenticate via a client_id and client_secret set as environment variables before the MCP host launches. The eSign API uses a separate OAuth2 client_credentials token exchange against https://na1.foxitesign.foxit.com/api/oauth2/access_token. The two credential scopes are isolated by design.
Does the Foxit MCP Server work with Claude Desktop and VS Code?
Yes. The server registers using a standard mcp.json config block for VS Code with GitHub Copilot or a claude_desktop_config.json block for Claude Desktop. The same config structure works for Cursor. All three hosts discover the server’s tools automatically on connection.
Is the Foxit PDF API MCP Server free to use?
The Foxit developer account is free with no credit card required and covers development and testing volumes. Production workloads above the free-tier threshold require a volume plan through the Developer Portal.
Run Your First Tool Call Now
Getting a working MCP tool call takes under 15 minutes:
Create a free developer account at developer-api.foxit.com (no credit card, instant access). Copy your
client_idandclient_secretfrom the dashboard.Set the three environment variables:
export FOXIT_CLOUD_API_HOST="https://na1.fusion.foxit.com/pdf-services"
export FOXIT_CLOUD_API_CLIENT_ID="your_client_id"
export FOXIT_CLOUD_API_CLIENT_SECRET="your_client_secret" - Clone the repo, register it using the config block from the Prerequisites section, restart your MCP host, and invoke
pdf_from_urlwith any public URL. You’ll have a confirmed PDF output in your working directory. The Developer Portal also includes a live API Playground for validating request payloads against the PDF Services API before wiring them into an agent.
For a full signing workflow, the minimum viable addition to the MCP setup is authenticating against the eSign OAuth2 endpoint and posting to /folders/createfolder with a static PDF. DocGen field population, pdf_structural_analysis validation, and webhook callbacks extend the same pattern incrementally from there.
Get your free API access at developer-api.foxit.com.
Automate Dynamic PDF Generation with the Foxit DocGen API: Word Templates, JSON Data, and Real API Calls

Skip the HTML-to-PDF headaches. Use Foxit’s DocGen API to turn Word templates and JSON data into clean, formatted PDFs with one API call.
If you’ve tried to generate a contract or invoice from HTML, you’ve probably burned hours on page-break-inside: avoid declarations that Chrome renders one way and a headless browser renders another. Headers and footers require separate print-media queries, and by the time you’ve got a repeating table header working correctly across pages, you’ve invested a full day of engineering into CSS that exists solely to trick a browser into behaving like a printer.
HTML documents reflow content into a viewport while PDF documents have fixed page geometry. Forcing one model into the other produces predictable failure modes: footnotes that collide with page footers, tables that split at the worst possible row, custom fonts that substitute silently, and signature blocks that drift off-page on longer documents.
There’s a larger practical cost too. For most teams, the authoritative source for enterprise document templates is already a Word file. Your legal team owns the NDA in .docx format. Finance owns the invoice in .docx format. Every structural change flows through Word because that’s where the tracked changes, formatting history, and review process live. Maintaining a parallel HTML version of each template doubles your maintenance surface from day one.
Foxit’s DocGen API eliminates that parallel entirely. You keep your templates as .docx files, embed data tags directly in Word, POST the base64-encoded template and a JSON payload to a single REST endpoint, and receive the rendered PDF (or DOCX) in the response body. You eliminate the browser rendering engine, the print-media CSS layer, and the overhead of a second template format.
How the Foxit DocGen API Works
The core model is a single synchronous POST to the GenerateDocumentBase64 endpoint at developer-api.foxit.com. Your request body carries three fields:
base64FileString: your .docx template, base64-encodeddocumentValues: a JSON object containing your merge dataoutputFormat: either"pdf"or"docx"
The API processes the template, resolves every tag against your data, and returns a JSON response containing base64FileString (the rendered document) and a message field confirming success or describing a failure. The exchange is fully synchronous, so you receive the finished document in the same HTTP response with no job ID to poll and no webhook to configure.
Authentication uses two HTTP headers: client_id and client_secret. Both come from the Foxit Developer Portal when you create an account. The free Developer plan provides 500 credits per year with no credit card required, and each GenerateDocumentBase64 call consumes exactly one credit. The Startup plan ($1,750/year) provides 3,500 credits. The Business plan ($4,500/year) covers 150,000 credits for production workloads. For context, Nutrient’s API starts at $75 for 1,000 credits, and Apryse requires a sales conversation before you can access pricing at all.
The complete call flow runs from template file to PDF on disk.
You can explore every endpoint in the live API playground at developer-api.foxit.com, and the portal includes a Postman collection you can import to run authenticated requests without writing a line of code first.
Build a Word Template with DocGen Tags
Open any .docx file in Microsoft Word and type your tags as plain text directly in the document. The DocGen API uses double-brace syntax: {{field_name}}. Tags go anywhere Word accepts text: headings, body paragraphs, table cells, headers, footers, or text boxes.
Scalar field tags resolve directly to the matching key from your documentValues JSON. A document header with {{customer_name}}, {{invoice_number}}, and {{invoice_date}} pulls those three values straight from the top-level keys of your payload.
For arrays, you wrap a single table row (the data row, not the header row) with {{TableStart:array_name}} and {{TableEnd:array_name}} markers. The wrapped row acts as a template row, and the API renders one output row per item in the JSON array. An invoice line-items table in Word looks like this:
| Description | Qty | Unit Price | Total |
|---|---|---|---|
{{TableStart:line_items}}{{description}} | {{qty}} | {{unit_price}} | {{total}}{{TableEnd:line_items}} |
Within the array row, ROW_NUMBER auto-increments with each rendered row. A SUM(ABOVE) field placed in the row directly below the {{TableEnd:line_items}} marker calculates a column total across all rendered data rows.
For nested JSON objects, use dot-notation in your tags. A shipping address block references {{shipping.street}}, {{shipping.city}}, and {{shipping.postal_code}}, mapping to properties nested inside a shipping object in your payload. The nesting can go multiple levels deep, so {{customer.address.city}} resolves against documentValues.customer.address.city.
For a working starting point, grab the downloadable invoice template from the foxit-demo-templates repo. The file is well under the 4 MB upload limit and demonstrates every pattern this article uses: scalar tags, {{TableStart:line_items}} / {{TableEnd:line_items}} with {{ROW_NUMBER}}, currency and date format switches, and subtotal / tax / total fields below the line-items table.
One sizing constraint applies while you build your own template. DocGen rejects uploads larger than 4 MB, so if you embed product photos, scanned letterhead, or full font subsets, compress the images before saving, drop embedded fonts where you can rely on system fonts, or split a large template into smaller per-section templates that you generate and merge separately.
Make Your First API Call: Generate a PDF from JSON
Run a quick pre-flight check before the first call to catch the issues that derail most clean-account run-throughs:
- Account created and
client_id/client_secretcopied from the Developer Portal API Keys section - Sample template saved locally as
invoice_template.docxin the directory you’ll run the script from - Template file size confirmed under 4 MB (
ls -lh invoice_template.docxon macOS or Linux, right-click → Properties on Windows)
With those in place, confirm your credentials work with a cURL call. The Foxit Developer Portal includes a Postman collection for this, but a quick cURL request against the API catches auth issues before any code runs:
curl -X POST "https://na1.fusion.foxit.com/document-generation/api/GenerateDocumentBase64" \
-H "client_id: YOUR_CLIENT_ID" \
-H "client_secret: YOUR_CLIENT_SECRET" \
-H "Content-Type: application/json" \
-d '{"base64FileString":"","documentValues":{},"outputFormat":"pdf"}' A 401 here means invalid credentials. A 400 with a message about the template confirms your headers are accepted and you can proceed to the full call.
Save your .docx template as invoice_template.docx in the same directory as this script, then run the complete generation:
import requests
import base64
CLIENT_ID = "your_client_id"
CLIENT_SECRET = "your_client_secret"
API_URL = "https://na1.fusion.foxit.com/document-generation/api/GenerateDocumentBase64"
# Read and encode the template
with open("invoice_template.docx", "rb") as f:
template_b64 = base64.b64encode(f.read()).decode("utf-8")
# Build the data payload
document_values = {
"customer_name": "Acme Corporation",
"invoice_number": "INV-2025-0042",
"invoice_date": "07/15/2025",
"due_date": "08/14/2025",
"line_items": [
{
"description": "API Integration Consulting",
"qty": 8,
"unit_price": 195.00,
"total": 1560.00
},
{
"description": "Document Automation Setup",
"qty": 1,
"unit_price": 750.00,
"total": 750.00
}
],
"subtotal": 2310.00,
"tax_rate": 0.08,
"tax_amount": 184.80,
"total_due": 2494.80
}
# Construct the request body
payload = {
"base64FileString": template_b64,
"documentValues": document_values,
"outputFormat": "pdf"
}
headers = {
"client_id": CLIENT_ID,
"client_secret": CLIENT_SECRET,
"Content-Type": "application/json"
}
response = requests.post(API_URL, json=payload, headers=headers)
if response.status_code == 200:
result = response.json()
pdf_bytes = base64.b64decode(result["base64FileString"])
if pdf_bytes[:5] != b"%PDF-":
raise ValueError("Response did not contain a valid PDF")
with open("invoice_output.pdf", "wb") as out:
out.write(pdf_bytes)
print("PDF written to invoice_output.pdf")
else:
print(f"Error {response.status_code}: {response.json().get('message')}") The success response is a JSON object with three keys: base64FileString (the rendered PDF, base64-encoded), fileExtension ("pdf"), and message ("PDF Document Generated Successfully"). Decoding and writing the bytes to disk gives you a complete, formatted PDF with every tag replaced by its corresponding data value. If you omit a key from documentValues, the API renders the corresponding tag as an empty string, producing a blank field in the output.
Advanced Data Scenarios: Arrays, Nested Objects, and Built-In Functions
The two-row invoice above works, but most production documents have more complex data shapes. Three patterns cover the majority of real-world cases.
For multi-row tables, the line_items array in the Python snippet above already shows the basic structure. To generate five rows, pass five objects in the array. The Word template row tagged with {{TableStart:line_items}} and {{TableEnd:line_items}} repeats exactly once per array item:
{
"line_items": [
{
"description": "UX Design Review",
"qty": 4,
"unit_price": 150.0,
"total": 600.0
},
{
"description": "Backend API Development",
"qty": 12,
"unit_price": 185.0,
"total": 2220.0
},
{
"description": "Database Schema Migration",
"qty": 3,
"unit_price": 200.0,
"total": 600.0
},
{
"description": "QA Testing",
"qty": 6,
"unit_price": 95.0,
"total": 570.0
},
{
"description": "Deployment and Documentation",
"qty": 2,
"unit_price": 175.0,
"total": 350.0
}
]
} The API generates exactly five table rows. Swap in 50 items and you get 50 rows, with page breaks handled by Word’s native pagination logic.
For nested objects, the DocGen API resolves dot-notation paths against the full depth of your JSON structure. A shipping confirmation template referencing {{customer.address.city}} works against this payload without any flattening on your end:
{
"customer": {
"name": "Sarah Chen",
"email": "[email protected]",
"address": {
"street": "742 Evergreen Terrace",
"city": "Portland",
"state": "OR",
"postal_code": "97201"
}
}
} In the Word template, {{customer.name}}, {{customer.address.city}}, and {{customer.address.postal_code}} each resolve to the correct nested value. You can reference the same nested object from multiple locations in the template, and the API populates each instance independently.
For numeric and date formatting, the DocGen API respects Word’s native field switch syntax. Adding \# Currency to a tag formats a numeric value as a currency string, so {{unit_price \# Currency}} renders 195.00 as \$195.00. Date fields accept \@ "MM/dd/yyyy" to control output format, so {{invoice_date \@ "MM/dd/yyyy"}} formats an ISO date string to 07/15/2025. To auto-calculate a column total, place a SUM(ABOVE) field in the Word table row immediately below {{TableEnd:line_items}} and the API evaluates it against the rendered data rows.
Error Handling and Production Readiness
The DocGen API returns a focused set of HTTP status codes. A 200 confirms successful generation. A 401 means your client_id or client_secret headers are invalid, and the fix is to re-copy the credentials from the Developer Portal. A 400 covers three cases. The first is a malformed request body, for example a missing base64FileString or outputFormat. The second is structural issues with the template itself, such as a {{TableStart}} marker placed outside its table row. The third is an oversize template; DocGen rejects .docx uploads larger than 4 MB, and the fix is to compress embedded images, drop embedded fonts, or split the template before re-encoding. The message field in every non-200 response body gives you the specific reason, so log it rather than discarding the response object.
A production wrapper handles all three cases and adds exponential backoff for transient server errors:
import requests
import base64
import time
def generate_document(client_id, client_secret, template_path,
document_values, output_format="pdf"):
API_URL = "https://na1.fusion.foxit.com/document-generation/api/GenerateDocumentBase64"
with open(template_path, "rb") as f:
template_b64 = base64.b64encode(f.read()).decode("utf-8")
payload = {
"base64FileString": template_b64,
"documentValues": document_values,
"outputFormat": output_format
}
headers = {
"client_id": client_id,
"client_secret": client_secret,
"Content-Type": "application/json"
}
max_retries = 3
for attempt in range(max_retries):
try:
response = requests.post(API_URL, json=payload,
headers=headers, timeout=30)
if response.status_code == 200:
return base64.b64decode(response.json()["base64FileString"])
if response.status_code == 401:
raise ValueError("Authentication failed: re-check client_id and client_secret")
if response.status_code == 400:
msg = response.json().get("message", "Bad request")
raise ValueError(f"Request error: {msg}")
if response.status_code >= 500:
if attempt < max_retries - 1:
wait = 2 ** attempt
print(f"Server error ({response.status_code}), retrying in {wait}s...")
time.sleep(wait)
continue
raise RuntimeError(f"Server error after {max_retries} attempts")
except requests.exceptions.Timeout:
if attempt < max_retries - 1:
time.sleep(2 ** attempt)
continue
raise
raise RuntimeError("Max retries exceeded") The wrapper raises immediately on 4xx responses because retrying a credential error or a malformed request produces the same result. Exponential backoff applies only to 5xx responses and timeouts, where the issue is transient.
Once generate_document() returns raw PDF bytes, routing them downstream takes three lines:
import boto3
s3 = boto3.client("s3")
pdf_bytes = generate_document(CLIENT_ID, CLIENT_SECRET, "invoice_template.docx", document_values)
s3.put_object(Bucket="my-documents-bucket", Key="invoices/INV-2025-0042.pdf", Body=pdf_bytes) To attach the output to an email, pass pdf_bytes directly as the smtplib attachment payload. To collect a signature on the generated document, base64-encode the bytes and POST them to Foxit’s eSign API with the signer’s email address in the request body. The full eSign API reference is at docs.developer-api.foxit.com.
Common Mistakes
A short list of the issues that account for almost every failed first run.
- Smart-quote autocorrect on braces. Word’s AutoCorrect can convert the second
{of{{into a curly-quote glyph, which breaks tag parsing silently. Disable “Straight quotes with smart quotes” under AutoCorrect Options, or paste tags as plain text. - Token case sensitivity.
{{Customer_Name}}and{{customer_name}}are different keys. Match the casing in your JSON exactly. TableStartandTableEndmust sit in the same Word table row. Splitting them across two rows, or placing either marker outside the table, leaves the loop unrendered with no error.- Template over 4 MB. The API rejects oversize uploads with a 400. Compress embedded images, drop embedded fonts where system fonts will do, or split the template into smaller pieces.
- Missing payload key. The API renders an unmatched tag as an empty string rather than failing, so a 200 response does not guarantee every field is populated. Spot-check the rendered PDF as part of any pipeline test.
- Auth header typos. Headers are
client_idandclient_secretin snake_case.Client-Id,ClientId, orX-Client-Idall return 401.
Run the Full Invoice Example End-to-End Right Now
Create a free account directly at account.foxit.com/site/sign-up. This skips the pricing-page redirect you hit from the marketing site and drops you straight into the account form.
- Open account.foxit.com/site/sign-up and complete the form (no credit card required).
- After verification, sign in to the Developer Portal and the Developer plan (500 credits per year) is active by default.
- Open the API Keys section and copy your
client_idandclient_secret.
With credentials in hand, run the example end-to-end:
- Download
invoice_full.docxfrom the foxit-demo-templates repo and save it locally asinvoice_template.docxin your working directory. The file is well under the 4 MB upload limit and exercises every tag pattern this article covers. - Paste your credentials into the
CLIENT_IDandCLIENT_SECRETvariables in the Python script from the previous section. - Edit the
document_valuesdictionary with your own customer name, invoice number, and line items. - Run the script and open
invoice_output.pdf.
The free Developer plan’s 500 annual credits cover this tutorial dozens of times over before you spend anything. The full API reference at docs.developer-api.foxit.com covers every endpoint parameter, the complete tag specification, all supported output formats, and the full GenerateDocumentBase64 request and response schema.
Get started with a free account (no credit card required) and generate your first dynamic PDF in under 10 minutes.
Foxit DocGen API Quickstart: Word Template to Pixel-Perfect PDF in Under 10 Minutes

Go from a Word template to a pixel-perfect PDF in under 10 minutes with the Foxit DocGen API. This guide covers template authoring, JSON payload structure, the GenerateDocumentBase64 call, and the most common errors that trip people up.
Most document generation quickstarts hand you a template with one text field, a trivial JSON payload, and no explanation of what breaks when you add a repeating table, a date format string, or a missing key. You end up in the docs trying to work backwards from a 400 error. This tutorial covers the complete Foxit DocGen API flow end-to-end: authoring a Word template with scalar fields, formatted dates, and repeating line-item rows; building the matching JSON payload; and POSTing everything to GenerateDocumentBase64 to retrieve a production-ready PDF. Working Python and cURL throughout.
Before starting, you’ll need a free Foxit developer account at developer-api.foxit.com (no credit card required; the free tier includes 500 credits/year), your client_id and client_secret from the developer dashboard, Python 3.x with requests installed (pip install requests), and Microsoft Word for template authoring.
How the Foxit DocGen API Works: One Endpoint, One Call
The GenerateDocumentBase64 endpoint accepts a single POST request and returns the rendered document in the same response body. You pass three things: your .docx template (base64-encoded), your structured JSON data, and the desired output format. The API merges template with data and returns the rendered file as a base64-encoded string.
Your .docx file defines layout, branding, and placeholder tokens. Your JSON payload carries the runtime values that populate those tokens. The API resolves every token in the template against the corresponding key in documentValues and renders the result as a PDF or DOCX.
The call is synchronous, returning the rendered file in the HTTP 200 response body with no job ID, polling loop, or webhook callback required. The request body always carries three keys: base64FileString (your .docx template, base64-encoded), documentValues (the JSON object whose keys map to template tokens), and outputFormat ("pdf" or "docx").
Authentication passes client_id and client_secret as custom HTTP headers on every request, with no OAuth 2.0 flow and no token exchange step.
Author Your Word Template with Dynamic Tags
Open Word and create a standard .docx. Place your dynamic content using double-curly-brace tokens typed directly in the document body.
For scalar string and number fields, the syntax is {{field_name}}. For a date with a specific display pattern, use {{ field_name \@ MM/dd/yyyy }}. The \@ format string controls how the API renders date values from your JSON payload.
An invoice template header section looks like this:
Invoice #: {{invoice_number}}
Date: {{ invoice_date \@ MM/dd/yyyy }}
Bill To: {{client_name}}
Address: {{billing_address}} Repeating table rows use a pair of range markers. Place {{TableStart:line_items}} in the first cell of the row you want to repeat, and {{TableEnd:line_items}} in the last cell of that same row. The array name in both markers (line_items here) must exactly match the key name in your JSON payload. Cells within the repeating row take individual field tokens:
| {{TableStart:line_items}}{{ROW_NUMBER}} | {{description}} | {{qty}} | {{unit_price}}{{TableEnd:line_items}} |
{{ROW_NUMBER}} auto-increments across all rendered rows. Word’s built-in SUM(ABOVE) formula in a totals row below the table still works for column totals.
Two authoring mistakes account for the majority of template parsing failures. Placing tokens inside merged table cells causes a parser error because the API can’t determine which logical cell owns the token. Using Word’s smart (curly) quotes instead of straight ASCII double-braces causes an encoding mismatch that returns a 400. Before uploading your template, check Word’s autocorrect settings and run a Find & Replace search for any {{ or }} pairs that got converted to curly equivalents.
Authenticate and Prepare the Template
Your client_id and client_secret from the developer dashboard at developer-api.foxit.com pass as custom headers on every request. The base64 module ships with Python’s standard library, so the encoding step adds no new dependencies to your project:
import base64
import requests
# Load and encode the .docx template
with open("invoice_template.docx", "rb") as f:
template_b64 = base64.b64encode(f.read()).decode("utf-8")
# Authentication and content-type headers
headers = {
"client_id": "YOUR_CLIENT_ID",
"client_secret": "YOUR_CLIENT_SECRET",
"Content-Type": "application/json"
}
# Scalar fields (the line_items array is added in the next section)
document_values = {
"invoice_number": "INV-2025-0042",
"invoice_date": "2025-07-15",
"client_name": "Meridian Software Inc."
}
payload = {
"base64FileString": template_b64,
"documentValues": document_values,
"outputFormat": "pdf"
} The cURL equivalent encodes the file on the fly and passes the same headers:
# Encode the template
TEMPLATE_B64=$(base64 < invoice_template.docx)
curl -s -X POST "https://na1.fusion.foxit.com/document-generation/api/GenerateDocumentBase64" \
-H "client_id: YOUR_CLIENT_ID" \
-H "client_secret: YOUR_CLIENT_SECRET" \
-H "Content-Type: application/json" \
-d "{\"base64FileString\": \"${TEMPLATE_B64}\", \"documentValues\": {\"invoice_number\": \"INV-2025-0042\", \"invoice_date\": \"2025-07-15\", \"client_name\": \"Meridian Software Inc.\"}, \"outputFormat\": \"pdf\"}" Build the JSON Data Payload for DocGen
The full documentValues payload maps scalar fields at the top level and places the repeating line-item array under the key that matches your {{TableStart:}} / {{TableEnd:}} marker name exactly.
{
"invoice_number": "INV-2025-0042",
"invoice_date": "2025-07-15",
"client_name": "Meridian Software Inc.",
"billing_address": "400 Pine Street, Suite 12, Seattle, WA 98101",
"line_items": [
{
"description": "API Integration Consulting",
"qty": "8",
"unit_price": "225.00"
},
{
"description": "DocGen Template Authoring",
"qty": "4",
"unit_price": "175.00"
},
{
"description": "QA and Deployment Support",
"qty": "2",
"unit_price": "150.00"
}
]
} A few type behaviors worth tracking. The API formats date values using the \@ format string in the template tag, so pass dates as ISO 8601 strings ("2025-07-15") and let the tag control the display format. Numeric quantities and prices work as either strings or integers. When a template token has no matching key in documentValues, the API leaves that placeholder blank in the output rather than returning an error, so missing keys produce silent blanks in your document.
Call the Generation Endpoint and Retrieve the PDF
This complete Python function adds the line_items array, posts to the endpoint, validates the response, and writes the output to disk:
import base64
import requests
with open("invoice_template.docx", "rb") as f:
template_b64 = base64.b64encode(f.read()).decode("utf-8")
headers = {
"client_id": "YOUR_CLIENT_ID",
"client_secret": "YOUR_CLIENT_SECRET",
"Content-Type": "application/json"
}
document_values = {
"invoice_number": "INV-2025-0042",
"invoice_date": "2025-07-15",
"client_name": "Meridian Software Inc.",
"billing_address": "400 Pine Street, Suite 12, Seattle, WA 98101",
"line_items": [
{"description": "API Integration Consulting", "qty": "8", "unit_price": "225.00"},
{"description": "DocGen Template Authoring", "qty": "4", "unit_price": "175.00"},
{"description": "QA and Deployment Support", "qty": "2", "unit_price": "150.00"}
]
}
payload = {
"base64FileString": template_b64,
"documentValues": document_values,
"outputFormat": "pdf"
}
response = requests.post(
"https://na1.fusion.foxit.com/document-generation/api/GenerateDocumentBase64",
headers=headers,
json=payload
)
if response.status_code == 200:
result = response.json()
pdf_bytes = base64.b64decode(result["base64FileString"])
# Confirm the response is a valid PDF before writing to disk
if pdf_bytes[:5] != b"%PDF-":
raise ValueError("Response did not contain a valid PDF")
with open("invoice_output.pdf", "wb") as f:
f.write(pdf_bytes)
print("PDF written: invoice_output.pdf")
else:
error = response.json()
print(f"Error {response.status_code}: {error.get('message', 'Unknown error')}") A successful response returns HTTP 200 with a JSON body containing three fields: base64FileString (the rendered PDF, base64-encoded), fileExtension ("pdf"), and message ("PDF Document Generated Successfully"). A failed call returns a 4xx status with a JSON body containing a message field describing the error. The API is synchronous: no job IDs, no polling, no webhooks.
The %PDF- magic bytes check catches cases where the API returned a non-PDF payload: a malformed template, an incorrect outputFormat value, or an error body that got decoded as if it were the file. Run this validation before writing to disk so failures surface immediately rather than producing a corrupt file.
The cURL equivalent uses jq to extract the base64 field and writes the decoded PDF directly to a file:
curl -s -X POST "https://na1.fusion.foxit.com/document-generation/api/GenerateDocumentBase64" \
-H "client_id: YOUR_CLIENT_ID" \
-H "client_secret: YOUR_CLIENT_SECRET" \
-H "Content-Type: application/json" \
-d @request_body.json \
| jq -r '.base64FileString' \
| base64 --decode > invoice_output.pdf Store the full JSON payload from the previous section in request_body.json. The jq -r flag strips JSON string escaping from the base64 field before it reaches base64 --decode. Omitting -r produces a corrupt output file.
Debugging Common Failures
Unmatched tags render as blank strings in the output PDF. The API produces an HTTP 200 and a valid PDF with the missing data absent. If your output has blank fields, compare your template token names character-by-character against your JSON keys. Token matching is case-sensitive: {{client_name}} and {{Client_Name}} are treated as different fields.
Authentication failures return a 401 with a JSON body:
{
"message": "Unauthorized. Invalid client credentials."
} The header names are lowercase (client_id, client_secret), and values must appear without surrounding quotes or whitespace. If you recently rotated keys, regenerate credentials from the developer dashboard and verify you’re reading the correct environment’s values.
Template parse errors return a 400:
{
"message": "Template parsing error: invalid token format detected."
} Two root causes produce this error. A .docx re-saved through LibreOffice alters the underlying XML structure in ways that break the parser, so re-author the template in Word. Curly (smart) quote characters in token braces cause an encoding mismatch, so use Word’s Find & Replace to swap any curly {{ and }} back to straight ASCII equivalents.
Payload size limits apply to the base64-encoded template. Check docs.developer-api.foxit.com for the current threshold. Templates exceeding the limit should have embedded images compressed through Word’s Picture Format settings before export. For templates that remain too large after compression, split the document into two .docx files and merge the generated PDFs using Foxit’s PDF Services API.
Wire the Foxit DocGen API Into Your Stack: Next Steps
The integration pattern for a CRM-triggered flow covers four steps: receive the webhook event, pull the record data from your CRM’s API, POST to GenerateDocumentBase64, then upload the decoded PDF to Amazon S3 or return it as a download URL.
def handle_crm_webhook(event):
record = crm_client.get_record(event["record_id"])
pdf_bytes = generate_document(record) # wraps the API call above
s3_url = upload_to_s3(pdf_bytes, record["id"]) # store in your delivery layer
crm_client.attach_document(record["id"], s3_url) Once DocGen is generating your contracts, invoices, and compliance reports, adding signatures is the natural next step. Foxit’s eSign API accepts the same PDF output and adds a fully auditable, legally binding signing workflow via REST. For low-code integration with Salesforce, HubSpot, or SAP, Foxit’s 40+ pre-built connectors let you trigger document generation from a workflow automation tool without writing the HTTP call yourself. Full documentation and connector references are at docs.developer-api.foxit.com.
Create your free account at developer-api.foxit.com, grab your client_id and client_secret, import the Postman collection from the developer dashboard, and run your first generation call against your own .docx. The round trip from account creation to a rendered PDF takes under five minutes.
Frequently Asked Questions
What does the Foxit DocGen API GenerateDocumentBase64 endpoint return?
It returns a synchronous HTTP 200 response whose JSON body contains a base64FileString key holding the fully rendered PDF (or DOCX) encoded in base64. There’s no job ID, polling loop, or webhook. The rendered file arrives in the same response.
What happens when a template token has no matching key in the JSON payload?
The API silently leaves that placeholder blank in the rendered output and still returns HTTP 200. Missing keys don’t trigger a 400 error, so always validate rendered output programmatically. The %PDF- magic bytes validation shown above is a reliable first check.
Why does my Foxit DocGen template return a 400 parsing error?
The two most common causes are token braces containing Word’s smart (curly) quotes instead of straight ASCII double-braces, and a .docx re-saved through LibreOffice, which alters the underlying OOXML structure in ways the parser rejects. Re-author the template in Microsoft Word and replace any curly quote characters using Find & Replace.
Can I generate DOCX output instead of PDF with the DocGen API?
Yes. Set "outputFormat": "docx" in the request body. The same template syntax and documentValues structure apply regardless of output format.
How does authentication work with the Foxit DocGen API?
Pass your client_id and client_secret as custom HTTP request headers on every call. There’s no OAuth token exchange or session management. Credentials are validated per-request and can be generated or rotated from the Foxit developer dashboard.
DocGen QuickStart FAQs
What is the Foxit DocGen API used for?
The Foxit DocGen API is used to generate documents from structured data. Developers can populate templates with data from systems like CRMs, databases, forms, or internal applications, then output branded PDFs or DOCX files for contracts, invoices, reports, disclosures, and other document workflows.
How does the Foxit GenerateDocumentBase64 endpoint work?
The GenerateDocumentBase64 endpoint accepts a base64-encoded document template, a JSON object containing document values, and an output format such as PDF or DOCX. The API merges the template with the supplied data and returns the generated file as a base64-encoded response.
What inputs do I need to generate a PDF with the Foxit DocGen API?
To generate a PDF, you need a document template, structured JSON data that matches the template fields, Foxit API credentials, and an output format value. In the blog example, the template is a Word .docx file and the output format is set to PDF.
Can the Foxit DocGen API generate DOCX files as well as PDFs?
Yes. The blog states that developers can set the output format to docx instead of pdf when calling the DocGen API. The same general template and data-mapping approach applies, but this specific behavior should still be validated against the current API documentation before publication.
How do repeating table rows work in a Foxit DocGen template?
In the blog example, repeating table rows use TableStart and TableEnd markers around a row in the Word template. The marker name must match the array name in the JSON payload, and each object in the array supplies values for the repeated row fields.
What happens if a JSON key is missing from a Foxit DocGen template?
The blog states that if a template token does not have a matching key in documentValues, the generated document leaves that placeholder blank instead of returning an error. Because this is a specific API behavior, it should be confirmed against the current DocGen API documentation.
How do developers authenticate with the Foxit DocGen API?
The blog states that developers authenticate by passing client_id and client_secret as HTTP request headers. Before publication, Foxit should confirm that this is the current recommended authentication method and add guidance to store credentials securely in environment variables or a secrets manager.
Why does a Foxit DocGen template return a 400 parsing error?
According to the blog, common causes include invalid template token formatting, smart quote characters in template tags, or .docx files saved in a way that changes the underlying document structure. This is useful troubleshooting guidance, but it should be validated by the product or documentation team.
Can I use the Foxit DocGen API with CRM data?
Yes. Foxit’s Document Generation API materials position the API for generating documents from structured data in systems such as CRMs, databases, web forms, ERP, or HR systems. Common use cases include quotes, contracts, disclosures, invoices, onboarding documents, and customer communications.
How can Foxit DocGen fit into a larger document workflow?
Foxit DocGen can generate a document from structured data, then the generated PDF can move into downstream workflows such as signing, storage, delivery, or archiving. Foxit’s broader API materials position Document Generation, eSign, PDF Services, and PDF Embed APIs as part of a full-stack document automation ecosystem.
Generate Dynamic PDFs from JSON using Foxit APIs

See how easy it is to generate PDFs from JSON using Foxit’s Document Generation API. With Word as your template engine, you can dynamically build invoices, offer letters, and agreements—no complex setup required. This tutorial walks through the full process in Python and highlights the flexibility of token-based document creation.
Generate Dynamic PDFs from JSON using Foxit APIs
One of the more fascinating APIs in our library is the Document Generation API. This document generation API lets you create dynamic PDFs or Word documents using your own data as templates. That may sound simple – and the code you’re about to see is indeed simple – but the real power lies in how flexible Word can be as a template engine. This API could be used for:
- Creating invoices
- Creating offer letters
- Creating dynamic agreements (which can integrate with our eSign API)
All of this is made available via a simple API and a “token language” you’ll use within Word to create your templates. Whether you’re feeding in data from a database, a form submission, or a JSON API response, the process looks the same from your Python script. Let’s take a look at how this is done.
Credentials
Before we go any further, head over to our developer portal and grab a set of free credentials. This will include a client ID and secret values – you’ll need both to make use of the API.
Don’t want to read all of this? You can also follow along by video:
Using the API
The Document Generation API flow is a bit different from our PDF Services APIs in that the execution is synchronous. You don’t need to upload your document beforehand or download a result. You simply call the API (passing your data and template) and the result has your new PDF (or Word document). With it being this simple, let’s get into the code.
Loading Credentials
My script begins by loading in the credentials and API root host via the environment:
CLIENT_ID = os.environ.get('CLIENT_ID')
CLIENT_SECRET = os.environ.get('CLIENT_SECRET')
HOST = os.environ.get('HOST') As always, try to avoid hard coding credentials directly into your code.
Calling the API
The endpoint only requires you to pass the output format, your data, and a base64 version of your file. “Your data” can be almost anything you like—though it should start as an object (i.e., a dictionary in Python with key/value pairs). Beneath that, anything goes: strings, numbers, arrays of objects, and so on.
Here’s a Python wrapper showing this in action:
def docGen(doc, data, id, secret):
headers = {
"client_id":id,
"client_secret":secret
}
body = {
"outputFormat":"pdf",
"documentValues": data,
"base64FileString":doc
}
request = requests.post(f"{HOST}/document-generation/api/GenerateDocumentBase64", json=body, headers=headers)
return request.json() And here’s an example calling it:
with open('../../inputfiles/docgen_sample.docx', 'rb') as file:
bd = file.read()
b64 = base64.b64encode(bd).decode('utf-8')
data = {
"name":"Raymond Camden",
"food": "sushi",
"favoriteMovie": "Star Wars",
"cats": [
{"name":"Elise", "gender":"female", "age":14 },
{"name":"Luna", "gender":"female", "age":13 },
{"name":"Crackers", "gender":"male", "age":13 },
{"name":"Gracie", "gender":"female", "age":12 },
{"name":"Pig", "gender":"female", "age":10 },
{"name":"Zelda", "gender":"female", "age":2 },
{"name":"Wednesday", "gender":"female", "age":1 },
],
}
result = docGen(b64, data, CLIENT_ID, CLIENT_SECRET) You’ll note here that my data is hard-coded. In a real application, this would typically be dynamic—read from the file system, queried from a database, or sourced from any other location.
The result object contains a message representing the success or failure of the operation, the file extension for the result, and the base64 representation of the result. To turn that base64 string back into a file, decode it first:
b64_bytes = result["base64FileString"].encode('ascii')
binary_data = base64.b64decode(b64_bytes) Most likely you’ll always be outputting PDFs, so here’s a simple bit of code that stores the result:
with open('../../output/docgen_sample.pdf', 'wb') as file:
file.write(binary_data)
print('Done and stored to ../../output/docgen_sample.pdf') There’s a bit more to the API than I’ve shown here so be sure to check the docs, but now it’s time for the real star of this API, Word.
Using Word as a Template
I’ve probably used Microsoft Word for longer than you’ve been alive and I’ve never really thought much about it. But when you begin to think of a simple Word document as a template, all of a sudden the possibilities begin to excite you. In our Document Generation API, the template system works via simple “tokens” in your document marked by opening and closing double brackets.
Consider this block of text:
See how name is surrounded by double brackets? And food and favoriteMovie? When this template is sent to the API along with the corresponding values, those tokens are replaced dynamically. In the screenshot, notice how favoriteMovie is bolded. That’s fine. You can use any formatting, styling, or layout options you wish.
That’s one example, but you also get some built-in values as well. For example, including today as a token will insert the current date, and can be paired with date formatting to specify how the date looks:
Remember the array of cats from earlier? You can use that to create a table in Word like this:
Notice that I’ve used two new tags here, TableStart and TableEnd, both of which reference the array, cats. Then in my table cells, I refer to the values from that array. Again, the color you see here is completely arbitrary and was me making use of the entirety of my Word design skills.
Here’s the template as a whole to show you everything in context:
The Result
Given the code shown above with those values, and given the Word template just shared, once passed to the API, the following PDF is created:
What About Converting PDF to JSON?
So far we’ve been going one direction: JSON data in, PDF out. But what if you need to go the other way—extract structured content from a PDF and work with it in your application?
Foxit’s PDF Services API includes an Extract endpoint that handles exactly this. You upload a PDF, specify whether you want TEXT, IMAGE, or PAGE-level data, and the API returns the extracted content. The text output is particularly useful if you want to feed the result into a data pipeline, search index, or AI workflow.
Here’s a quick look at how extraction works in Python. First, upload your PDF:
def uploadDoc(path, id, secret):
headers = {
"client_id":id,
"client_secret":secret
}
with open(path, 'rb') as f:
files = {'file': (path, f)}
request = requests.post(f"{HOST}/pdf-services/api/documents/upload", files=files, headers=headers)
return request.json()
doc = uploadDoc("../../inputfiles/input.pdf", CLIENT_ID, CLIENT_SECRET) Then call the Extract endpoint with the document ID and the type of content you want. The result comes back in a structured format you can parse, store, or pass along to other tools—including an LLM if you’re building an AI document pipeline.
You can read a full walkthrough in our PDF text extraction guide.
Ready to Try?
If this looks cool, be sure to check the docs for more information about the template language and API. Sign up for some free developer credentials and reach out on our developer forums with any questions.
If you’re building AI agents or LLM-powered workflows, Foxit also offers an MCP server that lets you connect your agents directly to Foxit PDF Services—so your AI tools can generate, extract, and process documents without any custom glue code.
Want the code? Get it on GitHub (Python).
If you are more of a Node person, check out that version. Get it on GitHub (Node.js).
Document Workflow Automation: An Architectural Guide to Building API-Driven Document Pipelines

Automate document workflows with APIs. Learn how to scale PDF generation, eSign, and processing pipelines using modern architecture.
A PDF generation script that breaks on special characters. A cron job that retries failed document conversions by rerunning the entire job. An eSign flow tracked in a shared spreadsheet where “sent” means someone sent an email. These aren’t hypothetical failure modes; they’re the actual engineering artifacts that accumulate when document workflows grow faster than the architecture beneath them.
The scale problem compounds quickly. A team processing 200 contracts a month can survive on scripts and email hand-offs. At 2,000 contracts, those same workflows are the bottleneck. At 20,000, engineers are maintaining hacks that should have been replaced two years ago: retry logic bolted onto cron jobs, signing flows with no audit trail, and PDF generation that silently drops content when a CRM field contains a Unicode character.
The global intelligent document processing market was valued at $2.3B in 2024 and is projected to reach $12.35B by 2030 at a 33.1% CAGR, not because AI is newly fashionable, but because manual document handling is a measurable operational ceiling. The organizations crossing that ceiling aren’t doing it by adopting better tools in isolation. They’re adopting an architectural model.
The problem isn’t a lack of API options for document generation, conversion, or signing. The problem is the absence of a framework for assembling those operations into a pipeline that’s resilient, auditable, and testable. This guide gives you that framework, then grounds it in working Python examples against a real REST API suite.
Anatomy of a Document Automation Pipeline: The Five Stages
Before you write a single API call, you need a model for what you’re building. Every document workflow automation pipeline, regardless of domain, decomposes into five discrete stages.
Stage 1 is intake: you receive or capture the source data that will drive the document. This might be a webhook payload from your CRM when a deal closes, a form submission, or a batch export from an ERP system. The manual failure mode here is no schema validation, no deduplication, and no observable queue depth. Documents arrive out of order, get processed twice, or disappear without trace.
Stage 2 is generation: you render a document from a template and the structured data from stage 1. Common outputs include contracts, invoices, compliance reports, and onboarding kits. The failure mode is template version drift (production runs a different template version than staging), no validation of input data against the template’s expected schema, and no idempotent retry path if the generation call fails partway through.
Stage 3 is processing: you transform, extract from, or optimize the generated document. This covers format conversion (DOCX to PDF), content extraction for downstream indexing, compression, and linearization for fast web delivery. The failure mode is processing steps chained with no error isolation, so a failed compression step blocks the entire document from reaching signing.
Stage 4 is signing: you route the document for signature, track signer status, and capture consent with a full audit trail. The failure mode is manual polling for signer status, no webhook-driven callbacks, and no programmatic access to the audit log when a compliance review is triggered.
Stage 5 is archival and distribution: you store the signed document with a retention policy and push it to downstream systems, your DMS, CRM, or data warehouse. The failure mode is no content-addressed versioning, no record of which document version was signed, and no delivery confirmation to downstream consumers.
Idempotency is a first-class requirement at every stage. Each operation should be safely retryable: the same inputs produce the same output, and a retried call doesn’t create a duplicate document, signing request, or archive record. You implement idempotency in your orchestration layer by generating a unique key per document job and checking it before re-processing. This is a design responsibility. The API doesn’t handle it for you automatically.
The data flow through a well-designed document automation pipeline looks like this:

One constraint to know upfront: the three APIs in this stack don’t share a document ID namespace. Each stage boundary requires a file handoff. DocGen returns the rendered document as base64 in the response body. You decode it and either save it to disk or upload it directly to PDF Services. PDF Services returns a resultDocumentId that you download as a file, then re-upload to eSign, which runs on a different host with different authentication. The handoff pattern is a feature, not a limitation. It makes each stage independently testable and replayable.
Architectural Decision Framework: Four Axes Before You Write Code
Four decisions determine whether your document pipeline scales cleanly or becomes the thing your team rewrites in 18 months.
Axis 1: REST API vs. SDK
Use REST APIs for cloud-native, horizontally scalable pipelines where document operations are stateless HTTP calls. Use an SDK for on-premise deployments, air-gapped environments, or latency-sensitive processing where network round-trips are a constraint. Foxit offers both: REST APIs for cloud-native pipelines and PDF SDKs for on-premise or air-gapped deployments, so the axis is a real choice, not a theoretical one. If your document pipeline runs inside a regulated environment where data can’t leave the network perimeter, the SDK is the correct answer regardless of how convenient the REST API is.
Axis 2: Synchronous vs. Asynchronous Processing
This is the most consequential call you’ll make, and it varies by stage within a single pipeline.
| Factor | Synchronous | Asynchronous |
|---|---|---|
| Document size | Under ~10 pages | Large or variable-length |
| SLA requirement | Sub-second response | Variable completion time acceptable |
| Typical use case | Real-time contract preview | Batch invoice processing |
| Error handling | Inline exception handling | Dead-letter queue, retry on callback |
| Foxit API example | DocGen (returns document in response body) | PDF Services (returns taskId, poll for result); eSign (webhook callback on folder execution) |
The Foxit suite itself illustrates this split cleanly. DocGen is synchronous: POST your template and data payload, get the rendered document back immediately in the response body. No taskId, no polling. PDF Services is asynchronous: a conversion call returns a taskId, and you poll a status endpoint until the result is ready. eSign is asynchronous via webhooks: creating a folder returns immediately, and the API delivers a callback to your registered endpoint when the folder is executed (all signers complete). Design your pipeline around this reality rather than assuming a uniform execution model across all three APIs.
Axis 3: Linear Pipeline vs. Event-Driven Architecture
A linear pipeline (where stage A blocks until complete before stage B starts) works for simple three-stage flows with predictable volume and acceptable end-to-end latency. An event-driven pipeline, where each stage emits a completion event consumed by the next stage, is the correct choice when you need error isolation (a failed stage 3 doesn’t block stage 2 outputs from being replayed), partial replay (reprocess from stage 2 without regenerating the document), or parallel processing branches (send the same document to multiple downstream consumers simultaneously).
For pipelines that start as linear but need to scale, n8n is a practical bridge. You can call Foxit’s REST APIs from n8n workflows via HTTP Request nodes, which lets you wire pipeline stages without writing custom glue code while you validate the workflow logic before committing to a fully coded implementation.
Axis 4: Error Handling Strategy for Document Pipelines
Three components belong in your initial design, not bolted on afterward.
The first is idempotency keys. Generate a unique key per document job (a UUID tied to the source record ID and timestamp works well) and check it before re-processing. If a worker crashes mid-job and the job re-queues, the idempotency key prevents duplicate processing.
The second is dead-letter handling. Define what happens to a document that has failed three consecutive processing attempts. It should route to a dead-letter queue with the failure reason and enough context to replay it manually or trigger an alert.
The third is a circuit breaker. If PDF Services returns 5xx responses on five consecutive calls within 30 seconds, stop sending requests and return a fast failure to the calling system. This prevents a degraded upstream API from exhausting your worker pool and cascading failures downstream. The circuit breaker pattern maps cleanly onto any stateless HTTP integration.
Building the Pipeline: Foxit APIs in Practice
We’ll use Foxit’s PDF Services, DocGen, and eSign APIs for the examples below. The patterns translate to any REST-based document API, but these are the endpoints we’ll call.
Document Generation with the DocGen API
DocGen takes a DOCX template (encoded as base64) and a JSON data payload, and returns the rendered document immediately in the response body. There’s no templateId concept; you send the template inline with every request. This means you own template versioning. Keep your templates in version control and pin the version used for each job to your event log.
One practical cap to design around: the DocGen endpoint rejects .docx uploads larger than 4 MB once base64-encoded. Compress embedded images through Word’s Picture Format settings, drop embedded fonts and OLE objects, and split very large templates into multiple files before the request leaves your service.
The request uses client_id and client_secret as HTTP headers against na1.fusion.foxit.com.
# Illustrative example - not production code
import base64
import requests
import json
def generate_contract(template_path: str, data: dict) -> bytes:
with open(template_path, "rb") as f:
template_b64 = base64.b64encode(f.read()).decode("utf-8")
payload = {
"outputFormat": "pdf",
"documentValues": data,
"base64FileString": template_b64
}
response = requests.post(
"https://na1.fusion.foxit.com/document-generation/api/GenerateDocumentBase64",
headers={
"client_id": "YOUR_CLIENT_ID",
"client_secret": "YOUR_CLIENT_SECRET",
"Content-Type": "application/json"
},
json=payload
)
response.raise_for_status()
result = response.json()
return base64.b64decode(result["base64FileString"])
# Data pulled from your CRM or ERP; validate against your template schema before calling
contract_data = {
"client_name": "Acme Corp",
"contract_value": "48000",
"effective_date": "2025-09-01",
"payment_terms": "Net 30"
}
pdf_bytes = generate_contract("templates/msa_v3.docx", contract_data) Validate your data payload against the template’s expected field schema before the API call. DocGen doesn’t catch type errors or missing fields with a clean error response. You get a malformed document instead. A Pydantic model or JSON Schema validation step before the POST saves significant debugging time.
PDF Processing with the PDF Services API
The most common PDF Services operation is conversion. The DOCX-to-PDF call is also the simplest entry point for teams new to the API. PDF Services uses a two-step pattern: upload the source file first to get a documentId, then call the operation endpoint with that ID. Because operations are asynchronous, the call returns a taskId that you poll until the result is available.
# Illustrative example - not production code
import time
import requests
PDF_SERVICES_HOST = "https://na1.fusion.foxit.com"
HEADERS = {
"client_id": "YOUR_CLIENT_ID",
"client_secret": "YOUR_CLIENT_SECRET"
}
def upload_document(file_bytes: bytes, filename: str) -> str:
response = requests.post(
f"{PDF_SERVICES_HOST}/pdf-services/api/documents/upload",
headers=HEADERS,
files={"file": (filename, file_bytes, "application/octet-stream")}
)
response.raise_for_status()
return response.json()["documentId"]
def poll_task(task_id: str) -> str:
while True:
status_resp = requests.get(
f"{PDF_SERVICES_HOST}/pdf-services/api/tasks/{task_id}",
headers=HEADERS
)
status_resp.raise_for_status()
status_data = status_resp.json()
if status_data["status"] == "COMPLETED":
return status_data["resultDocumentId"]
elif status_data["status"] == "FAILED":
raise RuntimeError(f"Task failed: {status_data}")
time.sleep(2)
def download_document(document_id: str) -> bytes:
response = requests.get(
f"{PDF_SERVICES_HOST}/pdf-services/api/documents/{document_id}/download",
headers=HEADERS
)
response.raise_for_status()
return response.content
def convert_docx_to_pdf(docx_bytes: bytes) -> bytes:
doc_id = upload_document(docx_bytes, "document.docx")
response = requests.post(
f"{PDF_SERVICES_HOST}/pdf-services/api/documents/create/pdf-from-word",
headers={**HEADERS, "Content-Type": "application/json"},
json={"documentId": doc_id}
)
response.raise_for_status()
result_doc_id = poll_task(response.json()["taskId"])
return download_document(result_doc_id)
def extract_text(pdf_bytes: bytes) -> str:
doc_id = upload_document(pdf_bytes, "document.pdf")
response = requests.post(
f"{PDF_SERVICES_HOST}/pdf-services/api/documents/modify/pdf-extract",
headers={**HEADERS, "Content-Type": "application/json"},
json={"documentId": doc_id, "extractType": "TEXT"}
)
response.raise_for_status()
result_doc_id = poll_task(response.json()["taskId"])
return download_document(result_doc_id).decode("utf-8") The pdf-extract endpoint pulls text from the PDF (pass extractType as TEXT, IMAGE, or PAGE depending on what you need). Both conversion and extraction follow the same upload, execute, poll, download cycle. Feed the text output to a downstream search index so the document is queryable immediately after processing.
Signature Orchestration with the eSign API
The eSign API uses OAuth2, not header-based authentication. Your first call exchanges client_id and client_secret for a Bearer token on a separate host (na1.foxitesign.foxit.com).
# Illustrative example - not production code
import json
import requests
from flask import Flask, request as flask_request
ESIGN_HOST = "https://na1.foxitesign.foxit.com"
def get_esign_token(client_id: str, client_secret: str) -> str:
response = requests.post(
f"{ESIGN_HOST}/api/oauth2/access_token",
data={
"grant_type": "client_credentials",
"client_id": client_id,
"client_secret": client_secret
}
)
response.raise_for_status()
return response.json()["access_token"]
def create_signing_folder(token: str, pdf_bytes: bytes, signers: list) -> str:
folder_payload = {
"folderName": "MSA - Acme Corp",
"parties": [
{
"firstName": s["first_name"],
"lastName": s["last_name"],
"emailId": s["email"],
"permission": "FILL_FIELDS_AND_SIGN",
"sequence": s["sequence"]
}
for s in signers
]
}
response = requests.post(
f"{ESIGN_HOST}/api/folders/createfolder",
headers={"Authorization": f"Bearer {token}"},
files={
"file": ("contract.pdf", pdf_bytes, "application/pdf"),
"data": (None, json.dumps(folder_payload), "application/json")
}
)
response.raise_for_status()
return response.json()["folderId"]
# Webhook handler receives the folder-executed event
app = Flask(__name__)
@app.route("/webhooks/esign", methods=["POST"])
def esign_webhook():
event = flask_request.json
if event.get("event_type") == "folder_executed":
folder_id = event["folder_id"]
signed_doc_url = event["documents"][0]["download_url"]
archive_signed_document(folder_id, signed_doc_url)
return "", 200 Register your webhook endpoint in the eSign developer portal settings. When a folder is executed (all signers complete), the API POSTs the event payload to your endpoint. Extract the signed document URL from the callback and pass it to your archival stage. The eSign API also exposes a folder activity history endpoint that returns a complete audit trail: signer identity, timestamp, IP address, and authentication method for every interaction with the folder.
Chaining the Pipeline Stages with Idempotency
The file handoff between stages is explicit by design. Here’s a minimal orchestration wrapper that chains all three stages and demonstrates the idempotency pattern:
# Illustrative example - not production code
import uuid
def run_document_pipeline(job_id: str, template_path: str, data: dict, signers: list):
idempotency_key = f"{job_id}:{uuid.uuid4()}"
if is_already_processed(idempotency_key):
return # Safe to retry
# Stage 2: Generate (DocGen returns PDF bytes synchronously)
pdf_bytes = generate_contract(template_path, data)
log_pipeline_event(job_id, "generated", hash_document(pdf_bytes))
# Stage 3: Process (extract text for indexing; convert if needed)
extracted = extract_text(pdf_bytes)
index_document(job_id, extracted)
log_pipeline_event(job_id, "processed", hash_document(pdf_bytes))
# Stage 4: Sign (eSign returns folder ID; completion arrives via webhook)
token = get_esign_token("YOUR_CLIENT_ID", "YOUR_CLIENT_SECRET")
folder_id = create_signing_folder(token, pdf_bytes, signers)
log_pipeline_event(job_id, "sent_for_signature", folder_id)
mark_processed(idempotency_key) For async pipelines handling thousands of documents per hour, replace direct function calls with queue messages. Each stage worker pulls a job from Redis or Amazon SQS, executes the API call, ACKs on success, and publishes a completion event to the next stage’s queue. If a worker crashes mid-job, the unACKed message re-queues and the idempotency key prevents re-processing a document that has already been completed.
Auditability and Compliance by Design
GDPR, HIPAA, and SOC 2 Type II each impose specific requirements around document lifecycle traceability. Retrofitting an audit layer onto a pipeline that wasn’t designed for it takes far more work than building it in from the start.
The event sourcing pattern fits document pipelines directly. Maintain an append-only log of every document event: created, converted, sent_for_signature, signed, archived. Use a stable document_id as the primary key. This log makes replay straightforward: if signing fails, you can replay from the processing output without regenerating the document from scratch. Each event record should include the stage name, timestamp, operator identity, and a SHA-256 hash of the document bytes at that stage.
The SHA-256 hash at each stage isn’t overhead; it’s your tamper detection mechanism. If the hash of the document presented for signing doesn’t match the hash recorded at generation, you have an integrity problem that’s immediately visible. This satisfies document integrity requirements in regulated industries without any additional tooling.
The Foxit eSign API’s built-in audit trail captures signer identity, timestamp, IP address, and authentication method for every folder interaction. Query the folder activity history endpoint to retrieve this data and persist it in your own audit store alongside your pipeline event log. Storing it in your own system, rather than relying solely on the eSign provider’s records, gives you a complete, portable audit trail that survives a provider migration.
Scaling Document Workflow Automation Without Rebuilding It
Batch Ingestion
Place incoming document jobs on a queue (Redis list or SQS FIFO queue) and run a pool of stateless worker processes. Each worker pulls a job, executes the API call with an idempotency key, and ACKs on success. Dead-letter routing handles permanently failed documents.
This pattern processes thousands of documents per hour without hammering the API or requiring coordination between workers. Because each REST API call is stateless, workers scale horizontally without any shared state. You add capacity by adding workers, not by redesigning the pipeline.
Credit Quota and Backoff
Foxit’s pricing model is credit-based: API calls consume credits, and calls pause when credits are exhausted until renewal or upgrade. Implement exponential backoff with jitter on 5xx responses as a general practice for any REST API integration.
# Illustrative example - not production code
import time
import random
import requests
def api_call_with_retry(url, headers, payload, max_retries=4):
for attempt in range(max_retries):
response = requests.post(url, headers=headers, json=payload)
if response.status_code < 500:
return response
wait = (2 ** attempt) + random.uniform(0, 1)
time.sleep(wait)
response.raise_for_status() Log quota exhaustion as a separate metric category. Consistent credit exhaustion is a signal to upgrade your plan. It shouldn’t require digging through application logs to detect.
Observability
Instrument each pipeline stage with three metrics: processing latency (time from job enqueue to stage completion), error rate per stage, and document volume per time window. Use structured JSON logging so stage failures are queryable without parsing free-text log lines. Tools like OpenTelemetry make it straightforward to emit these metrics in a vendor-neutral format.
A document that enters the pipeline and never exits is a data integrity problem. Track in-flight documents explicitly: when a job enters signing, record it. When the eSign webhook fires, close the record. Any job that’s been in stage 4 for longer than your expected SLA without a webhook callback warrants an alert, not just a log entry.
Ship Your First Document Pipeline Stage Today
The gap between a collection of one-off scripts and a production document pipeline isn’t as wide as it looks. It starts with one stage, not five.
Create a free account directly at account.foxit.com/site/sign-up (no credit card required; the Developer plan ships with 500 credits per year). The direct URL skips the pricing-page redirect you would otherwise hit from the developer portal, so you finish on the account form and then land in the API Keys section where credentials live. From there, make your first conversion call: POST a DOCX file from your own system to the PDF Services conversion endpoint using the Python example above and confirm you get a valid PDF back. That single round-trip validates your auth, your network path, and the basic integration pattern before you write any orchestration logic.
Once that’s working, pick one document type in your system that’s currently generated or processed manually and map it to the five-stage model from the second section of this article. Find the highest-friction bottleneck stage and start there, not at stage 1. If generation is the pain point, use the Developer Playground in the developer portal to test DocGen templates against real data payloads before writing a single line of integration code. If signing is the bottleneck, wire up the eSign folder creation and a webhook handler to close the loop.
The patterns in this guide (idempotency keys, event-sourced audit logs, async stage handoffs, circuit breakers) apply to any document API stack. A unified REST API suite covering generation, processing, and signing from a single provider cuts the number of authentication models to manage, reduces integration surface area, and gives you a consistent debugging path when something fails across stages. That’s the practical payoff of treating document workflow automation as a first-class architectural concern rather than a collection of scripts that should have been replaced two years ago.
Start building your first pipeline stage today.
Frequently Asked Questions
What is document workflow automation?
Document workflow automation replaces manual, script-driven document operations (generation, conversion, signing, and archival) with a structured API-driven pipeline. Each stage is independently testable, retryable via idempotency keys, and observable through structured event logs. At scale (thousands of documents per hour), automation eliminates the bottlenecks created by cron jobs, shared spreadsheets, and one-off scripts.
When should I use a synchronous vs. asynchronous document API?
Use synchronous APIs when you need sub-second responses for small documents, for example, real-time contract previews under approximately 10 pages. Use asynchronous APIs (polling or webhook-driven) for large or variable-length documents, batch invoice processing, or any workflow where variable completion time is acceptable. Many document API suites, including Foxit’s, mix both models across different endpoints, so design each pipeline stage around the actual execution model of the specific API call it makes.
How do I make a document pipeline idempotent?
Generate a unique key per document job (a UUID tied to the source record ID and timestamp works well) and check whether that key has already been processed before executing any stage. Store processed keys in a fast key-value store (Redis is a common choice). On retry, the idempotency check returns early without duplicating the document, signing request, or archive record. This is an orchestration-layer responsibility; the document API itself doesn’t provide it automatically.
What compliance requirements apply to document pipelines?
GDPR, HIPAA, and SOC 2 Type II each require document lifecycle traceability. Implement an append-only event log keyed by a stable document_id, capturing stage name, timestamp, operator identity, and a SHA-256 hash of the document at each stage. For eSign specifically, store the provider’s audit trail (signer identity, IP address, authentication method, timestamp) in your own system so the record is portable across provider migrations.
HTML to PDF API: Building Production-Grade Conversion Pipelines with Foxit PDF Services

Automate HTML to PDF conversion with Foxit’s API. Build scalable pipelines to replace Puppeteer, handle bulk processing, and ensure reliable document generation.
Your Puppeteer setup works fine at low volume. You launch a Chrome process, load the page, call page.pdf(), and write the bytes to disk. Clean enough. Then your invoice generation hits 500 documents per night, your report export feature goes live in three time zones simultaneously, and the wheels start coming off. Chrome processes time out waiting for JavaScript hydration. Memory climbs until your container OOMs. The font that renders correctly on your MacBook looks wrong on the Linux build server. You spend a Friday afternoon tuning networkidle2 timeouts per template instead of shipping features.
This is the failure mode of treating a rendering engine as a conversion service. Headless Chrome is a browser. Running it at production document volume means you’re operating a browser fleet: process pooling, memory isolation, crash recovery, rendering consistency across OS environments. That infrastructure overhead comes directly out of engineering time.
The architectural alternative is a managed REST API: POST your HTML (or a URL), let the service render the PDF, and download the result. The rendering infrastructure becomes the API provider’s problem. This guide covers how to build that conversion pipeline end-to-end using Foxit PDF Services API, from authentication through batch processing and production error handling.
The Production Problem with Headless Browser PDF Conversion
A standard Puppeteer setup looks like this:
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto(url, { waitUntil: "networkidle2" });
const pdf = await page.pdf({ format: "A4", printBackground: true });
await browser.close();
At five documents a day, this is fine. At five hundred concurrent, each puppeteer.launch() spins up a full Chromium process (roughly 100-200MB RSS on Linux). If you’re running in a container with 2GB of memory and you get 20 concurrent requests, you’re at the limit before accounting for the Node.js process itself or any other application memory.
The standard solution is a Chrome process pool (libraries like puppeteer-cluster or generic-pool). Now you’re managing pool size tuning, handling pool exhaustion under burst traffic, and writing cleanup logic for crashed Chrome instances. You’ve added significant operational complexity to what started as a one-liner.
Font rendering is a separate category of pain. Chrome on macOS uses CoreText. Chrome on Linux uses FreeType with fontconfig. The same CSS font-family: 'Inter' declaration produces visibly different output depending on whether Inter is installed as a system font or loaded via a @font-face declaration, and whether the fallback stack resolves differently across environments. Teams that ship invoice PDFs to customers discover this in production, not in development.
JavaScript execution adds another dimension. If your page renders a data table via a React component that fetches data on mount, networkidle2 is not a reliable wait condition. Network activity can go idle before the DOM has finished updating. You end up tuning waitForSelector or adding arbitrary timeouts per template, and those timeouts become technical debt that breaks when the page changes.
The architectural fix isn’t a better Puppeteer wrapper. It’s offloading the entire rendering layer to a service that was built to handle it reliably: a managed REST API with consistent rendering environments, predictable behavior, and no infrastructure for your team to maintain.
How Cloud HTML-to-PDF APIs Handle Rendering
Cloud conversion APIs typically accept input in two modes: URL mode and file upload mode.
In URL mode, you pass a public URL. The API fetches the page, renders it, and returns a PDF. This works when your page is publicly accessible and all assets (fonts, images, stylesheets) load from the same domain or CDN. The tradeoff is that the API’s rendering environment must reach your server, which creates a dependency on network reachability and your server’s response time. If you’re generating PDFs from an internal dashboard behind a VPN, URL mode doesn’t work without additional networking.
In file upload mode, you construct the complete HTML file (with inlined CSS and assets where needed) and upload it to the API. The service processes the file and returns a PDF. This eliminates the external asset dependency and makes your conversion more deterministic: the same HTML file always produces the same PDF, regardless of what’s deployed on your web server at the time.
Beyond input mode, rendering fidelity depends on several factors:
- CSS
@media printrules control what renders into the PDF. Navigation bars, sidebars, and hover states should be hidden via print stylesheets so they don’t appear in the output. - Font loading strategy determines rendering consistency. Relying on system fonts produces different output across environments. Embedding fonts via
@font-facewith a CDN URL or base64-inlined data guarantees consistent rendering. - Page layout properties (paper size, margins, orientation) can be controlled through CSS
@pagerules embedded in the HTML itself. This keeps layout configuration in the document rather than in API parameters. - JavaScript execution matters for pages that render content dynamically. Some APIs wait for the page to stabilize before capturing; others capture immediately.
These factors are the same ones you’d manage with Puppeteer’s page.pdf() options, but with a cloud API you handle them through your HTML/CSS rather than through in-process code.
Setting Up Foxit PDF Services API: Authentication and First Conversion
Foxit PDF Services API is a cloud-hosted REST API built on Foxit’s proprietary PDF engine, backed by over 20 years of PDF technology development. Create an account at the Foxit Developer Portal (the Developer plan is free, includes 500 credits/year, and requires no credit card). Generate your API credentials (a client_id and client_secret) from the Developer Dashboard.
Understanding the Async Workflow
Unlike a simple request-response API, Foxit PDF Services uses an asynchronous task-based workflow. Every operation follows the same pattern:
- Submit the job (upload a file, or POST a URL)
- Receive a
taskIdin the response - Poll the task status until it completes or fails
- Download the result using the
resultDocumentIdfrom the completed task
This design handles long-running operations gracefully. A complex HTML page might take several seconds to render; the async pattern means your client never blocks on a single HTTP request waiting for rendering to finish.
URL-to-PDF Conversion
For pages that are publicly accessible, URL-to-PDF is the simplest path. You POST the URL directly and the API fetches, renders, and converts it. Here’s the complete workflow in Python using the requests library:
import os
import requests
from time import sleep
HOST = os.environ["FOXIT_API_HOST"] # e.g., https://na1.fusion.foxit.com
CLIENT_ID = os.environ["FOXIT_CLIENT_ID"]
CLIENT_SECRET = os.environ["FOXIT_CLIENT_SECRET"]
AUTH_HEADERS = {
"client_id": CLIENT_ID,
"client_secret": CLIENT_SECRET,
}
def create_url_to_pdf_task(url: str) -> str:
"""Submit a URL for PDF conversion. Returns a taskId."""
headers = {**AUTH_HEADERS, "Content-Type": "application/json"}
response = requests.post(
f"{HOST}/pdf-services/api/documents/create/pdf-from-url",
json={"url": url},
headers=headers,
)
response.raise_for_status()
return response.json()["taskId"]
def poll_task(task_id: str, interval: int = 5) -> dict:
"""Poll until the task completes or fails. Returns the task status object."""
headers = {**AUTH_HEADERS, "Content-Type": "application/json"}
while True:
response = requests.get(
f"{HOST}/pdf-services/api/tasks/{task_id}",
headers=headers,
)
response.raise_for_status()
status = response.json()
if status["status"] == "COMPLETED":
return status
elif status["status"] == "FAILED":
raise RuntimeError(f"Task {task_id} failed: {status}")
sleep(interval)
def download_document(document_id: str, output_path: str) -> None:
"""Download the resulting PDF by its document ID."""
response = requests.get(
f"{HOST}/pdf-services/api/documents/{document_id}/download",
headers=AUTH_HEADERS,
stream=True,
)
response.raise_for_status()
with open(output_path, "wb") as f:
for chunk in response.iter_content(chunk_size=8192):
f.write(chunk)
# Full workflow: URL to PDF
task_id = create_url_to_pdf_task("https://example.com/invoice/1042")
result = poll_task(task_id)
download_document(result["resultDocumentId"], "invoice_1042.pdf")
print("PDF generated successfully.") In this code, you define three reusable functions that map to the async workflow: create_url_to_pdf_task() submits a public URL and returns a taskId, poll_task() checks the task status in a loop until it reaches COMPLETED or FAILED, and download_document() streams the resulting PDF to disk. The final three lines wire them together into the complete conversion pipeline.
Before running: Set your
FOXIT_API_HOST,FOXIT_CLIENT_ID, andFOXIT_CLIENT_SECRETenvironment variables with the values from your Foxit Developer Dashboard. Never commit credentials to source control; use environment variables or a secrets manager.
HTML File-to-PDF Conversion
When your content isn’t publicly accessible (internal dashboards, dynamically generated reports), you can upload an HTML file directly. This follows the standard 4-step async pattern:
def upload_document(file_path: str) -> str:
"""Upload a file to Foxit. Returns a documentId."""
with open(file_path, "rb") as f:
response = requests.post(
f"{HOST}/pdf-services/api/documents/upload",
files={"file": f},
headers=AUTH_HEADERS,
)
response.raise_for_status()
return response.json()["documentId"]
def create_html_to_pdf_task(document_id: str) -> str:
"""Create an HTML-to-PDF conversion task. Returns a taskId."""
headers = {**AUTH_HEADERS, "Content-Type": "application/json"}
response = requests.post(
f"{HOST}/pdf-services/api/documents/create/pdf-from-html",
json={"documentId": document_id},
headers=headers,
)
response.raise_for_status()
return response.json()["taskId"]
# Full workflow: HTML file to PDF
doc_id = upload_document("report.html")
task_id = create_html_to_pdf_task(doc_id)
result = poll_task(task_id)
download_document(result["resultDocumentId"], "report.pdf")
print("HTML converted to PDF successfully.") In this code, you first upload a local .html file via upload_document(), which returns a documentId referencing the uploaded file on Foxit’s servers. Then create_html_to_pdf_task() submits that documentId for conversion. The rest of the workflow is identical: poll for completion, then download the result.
Note: Replace
"report.html"with the path to your own HTML file. This code reuses thepoll_task()anddownload_document()functions from the URL-to-PDF example above, so make sure both are defined in the same script.
The key difference: URL-to-PDF skips the upload step (you POST the URL directly), while HTML file conversion requires uploading the .html file first via the /documents/upload endpoint. Both use the same poll-and-download pattern after task creation.
Refer to the Foxit API documentation and the Postman workspace for the complete parameter reference, including any additional rendering options supported by these endpoints. The GitHub demo repository contains working examples in Python, Node.js, and PHP.
Controlling CSS and JavaScript Rendering in HTML-to-PDF Conversion
Regardless of which API you use for HTML-to-PDF conversion, the quality of the output depends on how well you prepare the HTML. The rendering parameters live in your document, not in API request fields.
The single most common rendering problem between “looks right in a browser” and “looks wrong in a PDF” is the CSS media type. By default, browsers render with screen styles, which means your navigation bar, sidebar, and hover states all appear. For PDF output, you want your @media print rules to take over.
Write your print styles explicitly:
@media print {
nav,
.sidebar,
.no-print {
display: none;
}
body {
font-size: 11pt;
font-family: "Inter", Arial, sans-serif;
color: #000;
}
.invoice-table {
page-break-inside: avoid;
}
.page-header {
page-break-before: always;
}
@page {
size: A4;
margin: 20mm 15mm;
}
} In this stylesheet, you hide non-essential UI elements (navigation, sidebars) when printing, set a clean body font, and use page-break-inside: avoid to prevent the renderer from splitting a table row across pages. The nested @page rule sets the paper size and margins at the CSS level, so layout configuration stays in the document rather than in API parameters.
For font rendering consistency, don’t rely on system fonts. Include a @font-face declaration in your HTML that loads from a CDN, or inline the font as base64:
<style>
@font-face {
font-family: "Inter";
src: url("https://fonts.gstatic.com/s/inter/v13/UcCO3FwrK3iLTeHuS_fvQtMwCp50KnMw2boKoduKmMEVuLyfAZ9hiJ.woff2")
format("woff2");
font-weight: 400;
font-style: normal;
}
</style> In this snippet, you embed the Inter font directly in the HTML using a @font-face declaration that points to Google Fonts. This guarantees Inter renders in the PDF regardless of what fonts are installed in the API’s container environment. The tradeoff is latency: the rendering engine fetches the font file during conversion. If you’re running high-volume batch jobs, consider inlining the font as a base64 data URI to eliminate that network round trip.
For JavaScript-heavy pages, make sure the content has fully rendered before the API captures it. If you’re using the URL-to-PDF endpoint, the API fetches and renders the live page, so your page’s JavaScript will execute. For the HTML file upload path, keep your HTML self-contained with all data already rendered in the markup rather than relying on client-side JavaScript to populate it after load.
Batch HTML-to-PDF Conversion at Scale
Sequential conversion is the naive starting point:
for invoice in invoices:
doc_id = upload_document(invoice.html_path)
task_id = create_html_to_pdf_task(doc_id)
result = poll_task(task_id)
download_document(result["resultDocumentId"], f"output/{invoice.id}.pdf") In this loop, each invoice is processed one at a time: upload, convert, poll, download, then move to the next. Each iteration blocks on the poll loop before starting the next conversion. At a few seconds per document (upload, render, poll, download), 500 invoices could take over 30 minutes.
The fix is concurrent dispatch with a semaphore to cap parallelism. Check your plan’s rate limits before setting the semaphore ceiling in production.
import asyncio
import aiohttp
import os
from pathlib import Path
HOST = os.environ["FOXIT_API_HOST"]
CLIENT_ID = os.environ["FOXIT_CLIENT_ID"]
CLIENT_SECRET = os.environ["FOXIT_CLIENT_SECRET"]
MAX_CONCURRENT = 10 # Adjust based on your plan's rate limits
async def convert_one(
session: aiohttp.ClientSession,
sem: asyncio.Semaphore,
invoice_id: str,
html_path: str,
output_dir: Path,
) -> tuple[str, bool]:
async with sem:
try:
auth = {"client_id": CLIENT_ID, "client_secret": CLIENT_SECRET}
# Step 1: Upload the HTML file
with open(html_path, "rb") as f:
form = aiohttp.FormData()
form.add_field("file", f, filename="document.html")
async with session.post(
f"{HOST}/pdf-services/api/documents/upload",
data=form,
headers=auth,
) as resp:
if resp.status != 200:
return invoice_id, False
upload_result = await resp.json()
doc_id = upload_result["documentId"]
# Step 2: Create the conversion task
async with session.post(
f"{HOST}/pdf-services/api/documents/create/pdf-from-html",
json={"documentId": doc_id},
headers={**auth, "Content-Type": "application/json"},
) as resp:
if resp.status != 200:
return invoice_id, False
task_result = await resp.json()
task_id = task_result["taskId"]
# Step 3: Poll for completion
while True:
async with session.get(
f"{HOST}/pdf-services/api/tasks/{task_id}",
headers={**auth, "Content-Type": "application/json"},
) as resp:
status = await resp.json()
if status["status"] == "COMPLETED":
result_doc_id = status["resultDocumentId"]
break
elif status["status"] == "FAILED":
print(f"Task failed for {invoice_id}")
return invoice_id, False
await asyncio.sleep(5)
# Step 4: Download the result
async with session.get(
f"{HOST}/pdf-services/api/documents/{result_doc_id}/download",
headers=auth,
) as resp:
if resp.status == 200:
pdf_bytes = await resp.read()
(output_dir / f"{invoice_id}.pdf").write_bytes(pdf_bytes)
return invoice_id, True
return invoice_id, False
except Exception as e:
print(f"Error converting {invoice_id}: {e}")
return invoice_id, False
async def batch_convert(invoices: list[dict], output_dir: str = "output") -> dict:
output_path = Path(output_dir)
output_path.mkdir(exist_ok=True)
sem = asyncio.Semaphore(MAX_CONCURRENT)
connector = aiohttp.TCPConnector(limit=MAX_CONCURRENT)
async with aiohttp.ClientSession(connector=connector) as session:
tasks = [
convert_one(session, sem, inv["id"], inv["html_path"], output_path)
for inv in invoices
]
results = await asyncio.gather(*tasks)
succeeded = [r[0] for r in results if r[1]]
failed = [r[0] for r in results if not r[1]]
return {"succeeded": len(succeeded), "failed": failed}
# Usage
invoices = [
{"id": "inv_1042", "html_path": "templates/invoice_1042.html"},
{"id": "inv_1043", "html_path": "templates/invoice_1043.html"},
# ... up to thousands of entries
]
result = asyncio.run(batch_convert(invoices))
print(f"Converted {result['succeeded']} PDFs. Failed: {result['failed']}") In this code, you use asyncio and aiohttp to process multiple HTML-to-PDF conversions concurrently. The convert_one() function runs the full 4-step workflow (upload, create task, poll, download) for a single invoice, while batch_convert() dispatches all invoices in parallel, capped by a semaphore. Results are collected via asyncio.gather() and split into succeeded and failed lists.
Before running: Set
FOXIT_API_HOST,FOXIT_CLIENT_ID, andFOXIT_CLIENT_SECRETas environment variables with your credentials from the Developer Dashboard. AdjustMAX_CONCURRENTbased on your plan’s rate limits, and update theinvoiceslist with your actual file paths.
With MAX_CONCURRENT = 10 and several seconds per conversion (including polling), the batch processes 10 documents at a time instead of one at a time. The semaphore prevents you from flooding the API with simultaneous requests and hitting the rate limit ceiling. Beyond aiohttp, no additional dependencies are needed since asyncio is part of Python’s standard library.
Credit consumption at scale: the Developer plan includes 500 credits/year. The Startup plan ($1,750/year) provides 3,500 credits. Each conversion typically costs 1 credit. For higher volumes, the Business plan ($4,500/year) includes 150,000 credits. Check your remaining credit balance via the Developer Dashboard before launching a large batch job.
For volumes beyond what a single process can handle efficiently, a queue-based architecture decouples submission from processing. Services like Amazon SQS or Redis Streams handle the message brokering:
App Server → Message Queue (SQS / Redis Streams) → Worker Pool (N workers)
Worker: upload HTML → create task → poll → download PDF → store in S3/GCS
Worker: update job status in Postgres / Redis Each worker picks a job from the queue, runs the 4-step conversion workflow, writes the resulting PDF to S3 or GCS, and updates the job status in a database. This pattern handles burst volume naturally: jobs queue up during spikes, workers drain at the rate the API allows, and your app server is never blocked waiting for conversions to complete.
Production Deployment Patterns for HTML-to-PDF Pipelines
Error Handling and Retry Logic
Not all errors warrant a retry. Map HTTP status codes to decisions before writing any retry logic.
A 400 Bad Request means your request body is malformed. Retrying the same payload returns another 400. Fix the payload, don’t retry. A 429 Too Many Requests and a 503 Service Unavailable are transient: back off and retry. A FAILED task status means the conversion itself failed (possibly due to invalid HTML or unreachable URLs); check the task response for diagnostic details.
import time
import random
import requests
from requests.exceptions import RequestException
PERMANENT_ERRORS = {400, 401, 403, 422}
TRANSIENT_ERRORS = {429, 500, 502, 503, 504}
def post_with_retry(
url: str,
max_retries: int = 4,
base_delay: float = 1.0,
**kwargs,
) -> requests.Response:
"""POST with exponential backoff and jitter for transient errors."""
for attempt in range(max_retries + 1):
try:
response = requests.post(url, timeout=60, **kwargs)
if response.status_code in range(200, 300):
return response
if response.status_code in PERMANENT_ERRORS:
raise ValueError(
f"Permanent error {response.status_code}: {response.text}"
)
if response.status_code in TRANSIENT_ERRORS:
if attempt == max_retries:
raise RuntimeError(
f"Max retries exceeded. Last status: {response.status_code}"
)
delay = base_delay * (2 ** attempt) + random.uniform(0, 0.5)
print(f"Transient error {response.status_code}. Retrying in {delay:.1f}s...")
time.sleep(delay)
except RequestException as e:
if attempt == max_retries:
raise
delay = base_delay * (2 ** attempt) + random.uniform(0, 0.5)
time.sleep(delay)
raise RuntimeError("Unexpected: exhausted retries without returning or raising")
# Usage with the URL-to-PDF endpoint
auth_headers = {
"client_id": CLIENT_ID,
"client_secret": CLIENT_SECRET,
"Content-Type": "application/json",
}
response = post_with_retry(
f"{HOST}/pdf-services/api/documents/create/pdf-from-url",
json={"url": "https://example.com/invoice/1042"},
headers=auth_headers,
)
task_id = response.json()["taskId"] In this code, you wrap every POST request in a retry loop with exponential backoff. The function distinguishes between permanent errors (like 400 or 401, which should not be retried) and transient errors (like 429 or 503, which resolve on their own). Each retry doubles the wait time and adds random jitter to avoid synchronized retry waves.
Before running: Replace
CLIENT_ID,CLIENT_SECRET, andHOSTwith your Foxit credentials and API host, or load them from environment variables as shown in the earlier examples.
The jitter (random.uniform(0, 0.5)) prevents a thundering herd where every worker wakes up and retries simultaneously after a 429 burst. Without it, plain exponential backoff still produces synchronized retry waves when all workers hit the rate limit at the same time.
Output Optimization: Compression and Linearization
After conversion, you can chain additional PDF operations using the same async pattern. Upload the resulting PDF, call the compression or linearization endpoint, poll, and download the optimized version.
For PDFs served directly in a browser, linearization enables Fast Web View, which lets the browser display page one while the rest of the file downloads:
def compress_and_linearize(input_pdf_path: str, output_path: str) -> None:
"""Compress a PDF, then linearize it for fast web viewing."""
auth = {"client_id": CLIENT_ID, "client_secret": CLIENT_SECRET}
json_headers = {**auth, "Content-Type": "application/json"}
# Upload the PDF
doc_id = upload_document(input_pdf_path)
# Compress
resp = requests.post(
f"{HOST}/pdf-services/api/documents/modify/pdf-compress",
json={"documentId": doc_id, "compressionLevel": "MEDIUM"},
headers=json_headers,
)
resp.raise_for_status()
task = poll_task(resp.json()["taskId"])
compressed_doc_id = task["resultDocumentId"]
# Linearize the compressed result (no need to re-upload; use the resultDocumentId)
resp = requests.post(
f"{HOST}/pdf-services/api/documents/optimize/pdf-linearize",
json={"documentId": compressed_doc_id},
headers=json_headers,
)
resp.raise_for_status()
task = poll_task(resp.json()["taskId"])
# Download the final optimized PDF
download_document(task["resultDocumentId"], output_path) In this code, you chain two PDF operations back-to-back. First, you upload the PDF and compress it at MEDIUM level (valid options are LOW, MEDIUM, and HIGH). Once compression completes, you pass the resultDocumentId directly into the linearization step, which avoids a second upload. The final download gives you a PDF that is both smaller and optimized for progressive loading in browsers.
Note: This function reuses
upload_document(),poll_task(), anddownload_document()from the earlier examples. Make sure those functions are defined in the same script with your credentials configured. The Foxit developer blog post on chaining PDF actions covers this pattern in detail.
Monitoring and Secret Management
Track three metrics per conversion job: latency (to detect API degradation), credit consumption per job type (to project when you’ll exhaust your plan), and failure rate by error code (to catch template regressions before they hit customers). Set an alert when remaining credits drop below 20% of your plan allocation. The Foxit Developer Dashboard exposes real-time usage data you can check before launching batch runs.
API credentials go in environment variables or a secrets manager (AWS Secrets Manager, HashiCorp Vault, GCP Secret Manager). Rotate credentials from the Developer Dashboard when team members leave or when you suspect a credential has been exposed. You can generate new credentials and revoke old ones without a service interruption if you update your environment first.
Run Your First HTML-to-PDF Conversion
Sign up for the Foxit Developer plan at no cost, no credit card, with 500 credits available immediately. Generate your client_id and client_secret from the Developer Dashboard. Clone the demo repository for working examples in Python, Node.js, and PHP, or copy the URL-to-PDF example from this guide and run it against a public page.
After your first conversion completes, check your credit usage in the Dashboard to validate your throughput estimate and cost projection for production volume. The Startup plan ($1,750/year for 3,500 credits) is self-serve with no sales call required if you need more capacity.
Create Custom Invoices with Word Templates and Foxit Document Generation

Invoicing is a critical part of any business. This tutorial shows how to automate the process by creating dynamic, custom PDF invoices with the Foxit Document Generation API. Learn how to design a Microsoft Word template with special tokens, prepare your data in JSON, and then use a simple Python script to generate your final invoices.
Create Custom Invoices with Word Templates and Foxit Document Generation
Invoicing is a critical part of any business, often involving multiple steps—gathering customer data, calculating amounts owed, and sending out invoices so your company can get paid. Foxit’s Document Generation API streamlines this process by making it easy to create well-formatted, dynamic PDF invoices. Let’s walk through an example.
Before You Start
If you want to follow along with this blog post, be sure to get your free credentials over on our developer portal. Also, read our introductory blog post, which covers the basics of working with our API.
As a reminder, the API makes use of Microsoft Word templates. These templates are essentials tokens wrapped in double brackets. When you call the API, you’ll pass the template and your data. Our API then dynamically replaces those tokens with your data and returns you a nice PDF (you can also get a Word file back as well).
Creating Your Custom Invoice with Word Templates
Let’s begin by designing the template in Word. An invoice typically includes things like:
- The customer receiving the invoice
- The invoice number and issue date
- The payment due date
- A detailed list of items, including name, quantity, and price for each line item, with a total at the end
The Document Generation API makes no requirements in terms of how you design your templates. Size, alignment, and so forth, can match your corporate styles and be as fancy, or simple, as you like. Let’s consider the template below (I’ll link to where you can download this file at the end of the article):
Let's break it down from the top.
- The first token,
{{ invoiceNum }}, represents the invoice number for the customer. - The next token is special.
{{ today \@ MM/dd/yyyy }}represents two different features of the Document Generation API. First,todayis a special value representing the present time, or more accurately, when you call the API. The next portion represents a date mask for representing a date value. Our docs have a list of available masks. {{ accountName }}is another regular token.- The payment date,
{{ paymentDueDate \@ MM/dd/yyyy }}, shows how the date mask feature can be used on dates in your own data as well. - Now let's look at the table. You can format tables however you like, but a common setup includes one row for the header and one row for the dynamic data. (In this example, there’s also a third row, which I'll explain shortly.) To start, you’ll use a marker tag:
{{TableStart:lineItems}}, wherelineItemsrepresents an array in your data. The row ends with the matching{{TableEnd:lineItems}}tag. Between these two tags, you'll place additional tags for each value in the array. For example, we have aproduct,qty,price, andtotalPricefor each item. You'll also see the specialROW_NUMBERvalue, which automatically counts each row starting at 1. Finally, the\# Currencyformat is applied to thetotalPricevalue to display it as a currency. - The last row in the table uses two special features together, namely
SUM(ABOVE), which maps to creating a total of the last column from the table. This can be paired with currency formatting as shown.
Alright, now that you've seen the template, let's talk data!
The Data for Your Custom Invoices
Usually the data for an operation like this would come from a database, or perhaps an API with an ecommerce system. For this demo, the data will come from a simple JSON file. Let's take a look at it:
[
{
"invoiceNum":100,
"accountName":"Customer Alpha",
"accountNumber":1,
"paymentDueDate":"August 15, 2025",
"lineItems":[
{"product":"Product 1", "qty":5, "price":2, "totalPrice":10},
{"product":"Product 5", "qty":3, "price":9, "totalPrice":18},
{"product":"Product 4", "qty":1, "price":50, "totalPrice":50},
{"product":"Product X", "qty":2, "price":15, "totalPrice":30}
]
},
{
"invoiceNum":25,
"accountName":"Customer Beta",
"accountNumber":2,
"paymentDueDate":"August 15, 2025",
"lineItems":[
{"product":"Product 2", "qty":9, "price":2, "totalPrice":18},
{"product":"Product 4", "qty":1, "price":8, "totalPrice":8},
{"product":"Product 3", "qty":10, "price":25, "totalPrice":250},
{"product":"Product YY", "qty":3, "price":15, "totalPrice":45},
{"product":"Product AA", "qty":2, "price":100, "totalPrice":200}
]
},
{
"invoiceNum":51,
"accountName":"Customer Gamma",
"accountNumber":3,
"paymentDueDate":"August 15, 2025",
"lineItems":[
{"product":"Product 9", "qty":1, "price":2, "totalPrice":2},
{"product":"Product 23", "qty":30, "price":9, "totalPrice":270},
{"product":"Product ZZ", "qty":6, "price":15, "totalPrice":90}
]
}
] The data consists of an array of 3 sets of invoice data. Each set follows the same pattern and matches what you saw above in the Word template. The only exception being the accountNumber value which wasn't used in the template. That's fine – sometimes your data will include things not necessary for the final PDF. In this case, though, we're actually going to make use of it (you'll see in a moment). Onward to code!
Calling the Foxit API with Our Data
Now for my favorite part – actually calling the API. The Generate Document API is incredibly simple; needing just your credentials, a base64 version of the template, and your data. The entire demo is slightly over 50 lines of Python code, so let's look at the template and then break it down.
import os
import requests
import sys
from time import sleep
import base64
import json
from datetime import datetime
CLIENT_ID = os.environ.get('CLIENT_ID')
CLIENT_SECRET = os.environ.get('CLIENT_SECRET')
HOST = os.environ.get('HOST')
def docGen(doc, data, id, secret):
headers = {
"client_id":id,
"client_secret":secret
}
body = {
"outputFormat":"pdf",
"documentValues": data,
"base64FileString":doc
}
request = requests.post(f"{HOST}/document-generation/api/GenerateDocumentBase64", json=body, headers=headers)
return request.json()
with open('invoice.docx', 'rb') as file:
bd = file.read()
b64 = base64.b64encode(bd).decode('utf-8')
with open('invoicedata.json', 'r') as file:
data = json.load(file)
for invoiceData in data:
result = docGen(b64, invoiceData, CLIENT_ID, CLIENT_SECRET)
if result["base64FileString"] == None:
print("Something went wrong.")
print(result)
sys.exit()
b64_bytes = result["base64FileString"].encode('ascii')
binary_data = base64.b64decode(b64_bytes)
filename = f"invoice_account_{invoiceData["accountNumber"]}.pdf"
with open(filename, 'wb') as file:
file.write(binary_data)
print(f"Done and stored to {filename}") After importing the necessary modules and loading credentials from the environment, we define a simple docGen method. This method takes the template, data, and credentials, then calls the API endpoint. The API responds with the rendered PDF in Base64 format, which the method returns.
The main code of the template breaks down to:
- Reading in the template and converting it to base64.
- Reading in the JSON file
- Iterating over each block of invoice data and calling the API
- Remember how I said
accountNumberwasn't used in the template? We actually use it here to generate a unique filename. Technically, you don't need to store the results at all. You could take the raw binary data and email it. But having a copy of the results does mean you can re-use it later, such as if the customer is late to pay.
Here's an example of one of the results:
Next Steps
If you want to try this demo yourself, first grab yourself a shiny free set of credentials and then head over to our GitHub to grab the template, Python, and sample output values yourself.
Convert Office Docs to PDFs Automatically with Foxit PDF Services API

See how to build a powerful, automated workflow that converts Office documents (Word, Excel, PowerPoint) into PDFs. This step-by-step guide uses the Foxit PDF Services API, the Pipedream low-code platform, and Dropbox to create a seamless “hands-off” document processing system. We’ll walk through every step, from triggering on a new file to uploading the final PDF.
Convert Office Docs to PDFs Automatically with Foxit PDF Services API
With our REST APIs, it is now possible for any developer to set up an integration and document workflow using their language of choice. But what about workflow automations? Luckily, this is even simpler (of course, depending on platform) as you can rely on the workflow service to handle a lot the heavy lifting of whatever automation needs you may have. In this blog post, I’m going to demonstrate a workflow making use of Pipedream. Pipedream is a low-code platform that lets you build flexible workflows by piecing together various small atomic steps. It’s been a favorite of mine for some time now, and I absolutely recommend it. But note that what I’ll be showing here today could absolutely be done on other platforms, like n8n.
Want the televised version? Catch the video below:
Our Office Document to PDF Workflow
Our workflow is based on Dropbox folders and handles automatic conversion of Office docs to PDFs. To support that, it does the following:
- Listen for new files in a Dropbox folder
- Do a quick sanity check (is it in the input subdirectory and an Office file)
- Download the file to Pipedream
- Send it to Foxit via the Upload API
- Kick off the appropriate conversion based on the Office type
- Check status via the Status API
- When done, download the result to Pipedream
- And finally, push it up to Dropbox in an output subdirectory
Here’s a nice graphical representation of this workflow:
Before we get into the code, note that workflow platforms like Pipedream are incredibly flexible. When I build workflows with platforms like this I try to make each step as atomic, and focused as possible. I could absolutely have built a shorter, more compact version of this workflow. However, having it broken out like this makes it easier to copy and modify going forward (which is exactly how this one came about, it was based on a simpler, earlier version).
Ok, let's break it down, step-by-step.
Getting Triggered
In Pipedream, workflows begin with a trigger. While there are many options for this, my workflow uses a "New File From Dropbox" trigger. I logged into Dropbox via Pipedream so it had access to my account. I then specified a top level folder, "Foxit", for the integration. Additionally, there are two more important settings:
- Recursive – this tells the trigger to file for any new file under the root directory, "Foxit". My Dropbox Foxit folder has both an input and output directory.
- Include Link – this tells Pipedream to ensure we get a link to the new file. This is required to download it later.
Filtering the Document Flow
The next two steps are focused on filtering and stopping the workflow, if necessary. The first, end_if_output, is a built-in Pipedream step that lets me provide a condition for the workflow to end. First, I'll check the path value from the trigger (the path of the new file) and if it contains "output", this means it's a new file in the output directory and the workflow should not run.
The next filter is a code step that handles two tasks. First, it checks whether the new file is a supported Office type—.docx, .xlsx, or .pptx—using our APIs. If the extension isn’t one of these, the workflow ends programmatically.
Later in the workflow, I’ll also need that same extension to route the request to the correct endpoint. So the code handles both: validation and preservation of the extension.
import os
def handler(pd: "pipedream"):
base, extension = os.path.splitext(pd.steps['trigger']['event']['name'])
if extension == ".docx":
api = "/pdf-services/api/documents/create/pdf-from-word"
elif extension == ".xlsx":
api = "/pdf-services/api/documents/create/pdf-from-excel"
elif extension == ".pptx":
api = "/pdf-services/api/documents/create/pdf-from-ppt"
else:
return pd.flow.exit(f"Exiting workflow due to unknow extension: {extension}.")
return { "api":api } As you can see, if the extension isn't valid, I'm exiting the workflow using pd.flow.exit (while also logging out a proper message, which I can check later via the Pipedream UI). I also return the right endpoint if a supported extension was used. This will be useful later in the flow.
Download and Upload API Data
The next two steps are primarily about moving data from the input source (Dropbox) to our API (Foxit).
The first step, download_to_tmp, uses a simple Python script to transfer the Dropbox file into the /tmp directory for use in the workflow
import requests
def handler(pd: "pipedream"):
download_url = pd.steps["trigger"]["event"]["link"]
file_path = f"/tmp/{pd.steps['trigger']['event']['name']}"
with requests.get(download_url, stream=True) as response:
response.raise_for_status()
with open(file_path, "wb") as file:
for chunk in response.iter_content(chunk_size=8192):
file.write(chunk)
return file_path Notice at the end that I return the path I used in Pipedream. This action then leads directly into the next step of uploading to Foxit via the Upload API:
import os
import requests
def handler(pd: "pipedream"):
clientid = os.environ.get('FOXIT_CLIENT_ID')
secret = os.environ.get('FOXIT_CLIENT_SECRET')
HOST = os.environ.get('FOXIT_HOST')
headers = {
"client_id":clientid,
"client_secret":secret
}
with open(pd.steps['download_to_tmp']['$return_value'], 'rb') as f:
files = {'file': (pd.steps['download_to_tmp']['$return_value'], f)}
request = requests.post(f"{HOST}/pdf-services/api/documents/upload", files=files, headers=headers)
return request.json() The result of this will be a documentId value that looks like so:
{
"documentId": "<string>"
} Pipedream lets you define environment variables and I've made use of them for my Foxit credentials and host. Grab your own free credentials here!
Converting the Document Using the Foxit API
The next step will actually kick off the conversion. My workflow supports three different input types (Word, PowerPoint, and Excel). These map to three API endpoints. But remember that earlier we sniffed the extension of our input and set the endpoint there. Since all three APIs work the same, that's literally all we need to do – hit the endpoint and pass the document value from the previous step.
import os
import requests
def handler(pd: "pipedream"):
clientid = os.environ.get('FOXIT_CLIENT_ID')
secret = os.environ.get('FOXIT_CLIENT_SECRET')
HOST = os.environ.get('FOXIT_HOST')
headers = {
"client_id":clientid,
"client_secret":secret,
"Content-Type":"application/json"
}
body = {
"documentId": pd.steps['upload_to_foxit']['$return_value']['documentId']
}
api = pd.steps['extension_check']['$return_value']['api']
print(f"{HOST}{api}")
request = requests.post(f"{HOST}{api}", json=body, headers=headers)
return request.json() {
"taskId": "<string>"
}
Checking Your Document API Status
The next step is one that may take a few seconds – checking the job status. Foxit's endpoint returns a value like so:
{
"taskId": "<string>",
"status": "<string>",
"progress": "<int32>",
"resultDocumentId": "<string>",
"error": {
"code": "<string>",
"message": "<string>"
}
} import os
import requests
from time import sleep
def handler(pd: "pipedream"):
clientid = os.environ.get('FOXIT_CLIENT_ID')
secret = os.environ.get('FOXIT_CLIENT_SECRET')
HOST = os.environ.get('FOXIT_HOST')
headers = {
"client_id":clientid,
"client_secret":secret,
"Content-Type":"application/json"
}
done = False
while done is False:
request = requests.get(f"{HOST}/pdf-services/api/tasks/{pd.steps['create_conversion_job']['$return_value']['taskId']}", headers=headers)
status = request.json()
if status["status"] == "COMPLETED":
done = True
return status
elif status["status"] == "FAILED":
print("Failure. Here is the last status:")
print(status)
return pd.flow.exit("Failure in job")
else:
print(f"Current status, {status['status']}, percentage: {status['progress']}")
sleep(5) As shown, errors are simply logged by default—but you could enhance this by adding notifications, such as emailing an admin, sending a text message, or other alerts.
On success, the final output is passed along, including the key value we care about: resultDocumentId.
Download and Upload – Again
Ok, if the workflow has gotten this far, it's time to finish the process. The next step handles downloading the result from Foxit using the download endpoint:
import requests
import os
def handler(pd: "pipedream"):
clientid = os.environ.get('FOXIT_CLIENT_ID')
secret = os.environ.get('FOXIT_CLIENT_SECRET')
HOST = os.environ.get('FOXIT_HOST')
headers = {
"client_id":clientid,
"client_secret":secret,
}
# Given a file of input.docx, we need to use input.pdf
base_name, _ = os.path.splitext(pd.steps['trigger']['event']['name'])
path = f"/tmp/{base_name}.pdf"
print(path)
with open(path, "wb") as output:
bits = requests.get(f"{HOST}/pdf-services/api/documents/{pd.steps['check_job']['$return_value']['resultDocumentId']}/download", stream=True, headers=headers).content
output.write(bits)
return {
"filename":f"{base_name}.pdf",
"path":path
} Note that I'm using the base name of the input, which is basically the filename minus the extension. So for example, input.docx will become input, which I then slap a pdf extension on to create the filename used to store locally to Pipedream.
Finally, I push the file back up to Dropbox, but for this, I can use a built-in Pipedream step that can upload to Dropbox. Here's how I configured it:
- Path: Once again,
Foxit - File Name: This one's a bit more complex, I want to store the value in the output subdirectory, and ensure the filename is dynamic. Pipedream lets you mix and match hard-coded values and expressions. I used this to enable that:
output/{{steps.download_result_to_tmp.$return_value.filename}}. In this expression the portion inside the double bracket will be dynamic based on the PDF file generated previously. - File Path: This is an expression as well, pointing to where I saved the file previously:
{{steps.download_result_to_tmp.$return_value.path}} - Mode: Finally, the mode attribute specifies what to do on a conflict. This setting will be based on whatever your particular workflow needs are, but for my workflow, I simply told Dropbox to overwrite the existing file.
Here's how that step looks configured in Pipedream:
Conclusion
Believe it or not, that's the entire workflow. Once enabled, it runs in the back ground and I can simply place any files into my Dropbox folder and my Office docs will be automatically converted. What's next? Definitely get your own free credentials and check out the docs to get started. If you run into any trouble at all, hit is up on the forums and we'll be glad to help!
Embed Secure eSignatures into Your App with Foxit API

Foxit eSign makes electronic signatures easy, but developers can take it further by automating the process. This tutorial shows how to use the Foxit eSign API to embed secure eSignatures in your apps. With Python code examples, you’ll learn to send documents for signing, dispatch reminders, and check the signing status programmatically.
Embed Secure eSignatures into Your App with Foxit API
Foxit eSign is an electronic signature solution that enables individuals and businesses to securely sign, send, and manage documents online. It streamlines the document signing process by allowing users to create legally binding eSignatures, prepare forms, and track document status in real time. With features like reusable templates, automated workflows, and audit trails, it enhances productivity and reduces the need for manual paperwork.
At the simplest level, a user can log into the eSign dashboard and handle 100% of their signing needs. So for example, they can upload a Microsoft Word template and then drag and drop fields that will be used during the signing process. I did this with a simple Word document. After uploading, I was given an easy to use editor to drag and drop fields:
In the screen shot above, I’ve added three fields to my document. The first is a date field. The second is for the signers name. The last field is the actual signature spot. Each of these fields have many options, and your own documents could have far more (or heck, even less) fields. The point is – you’re allowed to design these forms to meet whatever need you may have. Also note that you can do all of this directly within Word, as well. Our docs show how to add fields directly into Word that will become enabled in the signing process.
Once you’ve got your template in a nice place, you can initiate the signing process right from the app. The dashboard also gives you a full history and audit trail of that process (have they signed yet? when did they do it? who signed?) which is handy. But, of course, you’re here because you’re a developer, and you’re probably asking yourself – can we automate this? The answer? Of course!
If you would rather watch an introduction to the API, enjoy the video below!
eSign Via API
Before we begin digging into the APIs, be sure to take a quick look at the API Reference. As I’ve stated a few times, the signing process itself can be incredibly complex. For example, there may be 2, 3, or more people who need to sign a document, and in a particular order. Also, remember that I shared that the template process of adding fields and such can be done entirely in Word itself. This is to say that our look here is going to focus on a simple example of signing. But there’s nothing to stop the creation of more advanced, flexible workflows.
The first step in any API usage will be authentication. When you have an eSign account with API access, you’ll be given a client_id and client_secret value, both of which need to be exchanged for an access token at the appropriate endpoint. Here’s a simple example of this with Python:
CLIENT_ID = os.environ.get("CLIENT_ID")
CLIENT_SECRET = os.environ.get("CLIENT_SECRET")
def getAccessToken(id, secret):
url = "https://na1.foxitesign.foxit.com/api/oauth2/access_token"
payload=f"client_id={id}&client_secret={secret}&grant_type=client_credentials&scope=read-write"
headers = {
'Content-Type': 'application/x-www-form-urlencoded'
}
response = requests.request("POST", url, headers=headers, data=payload)
token = (response.json())["access_token"]
return token
access_token = getAccessToken(CLIENT_ID, CLIENT_SECRET) All the rest of the demos will make use of this method, and at the end of this post I'll share GitHub links for the source.
Kicking Off the Signing API Process
Now that we've authenticated with the API, our code can start doing… well, everything. The first thing we will add is a signing process with the template I showed above. From the dashboard itself I was able to make note of the template ID, 392230, but note that there's APIs for working with templates where that could be done via code as well.
To start the signing process, we can use the Create Envelope from Template endpoint. You can think of an envelope as a set of documents a user has to sign. For our demo, it's one, but you can include multiple documents. If you look at the API reference example, you'll see a large input body. As I've said, and will probably keep saying, the electronic signing process can be quite complex. For our simple demo, however, we just need to know the name of the person we're asking to sign the document and their email address. I whipped up this simple Python utility to enable that:
def sendForSigning(template_id, first_name, last_name, email, token):
url = "https://na1.foxitesign.foxit.com/api/templates/createFolder"
body = {
"folderName":"Sending for Signing",
"templateIds":[template_id],
"parties":[
{
"permission":"FILL_FIELDS_AND_SIGN",
"firstName":first_name,
"lastName":last_name,
"emailId":email,
"sequence":1
}
],
"senderEmail":"[email protected]"
}
headers = {
'Authorization': f'Bearer {token}',
}
response = requests.request("POST", url, headers=headers, json=body)
return response.json() Make note that parties is an array of one person, as only one person is needed in the process. Also note that permission is required as it defines the role that person will play in the party.
Calling this method is simple:
# Hard coded template id
tid = "392230"
sendForSigningResponse = sendForSigning(tid, "Raymond", "Camden", "[email protected]", access_token) envelopeId = sendForSigningResponse['folder']['folderId']
print(f"ID of the envelope created: {envelopeId}")
Note: You'll see 'folder' referenced in the API endpoints and results, but the eSign API is migrating to the 'envelope' term instead.
A few seconds after running this code, the email showed up in my account:
Obviously at this point, I could sign it… but what if I didn't?
Sending out Electronic Reminders
One way to help ensure your important documents get signed is to remind the people who need to sign that – well, they need actually sign the document. To enable this, we can use the Send Signature Reminder endpoint. All it needs is the ID of the envelope created earlier (and again, see my note about envelope vs folder):
def sendReminder(envelope_id, token):
url = "https://na1.foxitesign.foxit.com/api/folders/signaturereminder"
body = {
"folderId":envelope_id
}
headers = {
'Authorization': f'Bearer {token}',
}
response = requests.request("POST", url, headers=headers, json=body)
result = response.json()
return result
access_token = getAccessToken(CLIENT_ID, CLIENT_SECRET)
result = sendReminder(envelope_id, access_token)
Ok, But Did They Sign Their Document Yet??
So far, you've seen an example of sending out a document for signing as well as politely reminding the person to do their job. How can you tell if they've completed the process? For this, we can turn to the Get Envelope Details endpoint. It's also rather simple in that it just needs the envelope ID value from before. Once again, here's a wrapper to that API built in Python:
def getStatus(envelope_id, token):
url = f"https://na1.foxitesign.foxit.com/api/folders/myfolder?folderId={envelope_id}"
headers = {
'Authorization': f'Bearer {token}',
}
response = requests.request("GET", url, headers=headers)
result = response.json()
return result
result = getStatus(envelope_id, access_token)
print(f"Envelope status: {result['folder']['folderStatus']}")
While there's a lot of important information returned, we can output just the status to see at a high level what state the process is currently in.
Given the example I've shown so far, the status of the envelope is SHARED. Let's actually click the link:
In the screenshot above, notice how the date is already filled to today's data. Also note that name was prefilled as eSign knows who it was sent to. All I need to do is click and sign. Once I do, the same code above will now return EXECUTED.
Next Steps
Wondering where to go next? If you are completely new to eSign, check out the main homepage for an introduction. You can also check out the Foxit eSign YouTube channel for lots of good video content on the service. If you want a copy of the code I showed in this post, you can find all three examples here on our GitHub repo. Finally, don't forget to visit our forums and bring your questions!