Python PDF form extractor example

Overview

This demo showcases how to use Trigger.dev with Python to extract structured form data from a PDF file available at a URL.

Prerequisites

A project with Trigger.dev initialized
Python installed on your local machine

Features

A Trigger.dev task to trigger the Python script
Trigger.dev Python build extension to install the dependencies and run the Python script
PyMuPDF to extract form data from PDF files
Requests to download PDF files from URLs

GitHub repo

View the project on GitHub

Click here to view the full code for this project in our examples repository on GitHub. You can fork it and use it as a starting point for your own project.

The code

Build configuration

After you’ve initialized your project with Trigger.dev, add these build settings to your trigger.config.ts file:

trigger.config.ts

import { pythonExtension } from "@trigger.dev/python/extension";
import { defineConfig } from "@trigger.dev/sdk";

export default defineConfig({
  runtime: "node",
  project: "<your-project-ref>",
  // Your other config settings...
  build: {
    extensions: [
      pythonExtension({
        // The path to your requirements.txt file
        requirementsFile: "./requirements.txt",
        // The path to your Python binary
        devPythonBinaryPath: `venv/bin/python`,
        // The paths to your Python scripts to run
        scripts: ["src/python/**/*.py"],
      }),
    ],
  },
});

Learn more about executing scripts in your Trigger.dev project using our Python build extension here.

Task code

This task uses the python.runScript method to run the image-processing.py script with the given image URL as an argument. You can adjust the image processing parameters in the payload, with options such as height, width, quality, output format, etc.

src/trigger/pythonPdfTask.ts

import { task } from "@trigger.dev/sdk";
import { python } from "@trigger.dev/python";

export const processPdfForm = task({
  id: "process-pdf-form",
  run: async (payload: { pdfUrl: string }, io: any) => {
    const { pdfUrl } = payload;
    const args = [pdfUrl];

    const result = await python.runScript("./src/python/extract-pdf-form.py", args);

    // Parse the JSON output from the script
    let formData;
    try {
      formData = JSON.parse(result.stdout);
    } catch (error) {
      throw new Error(`Failed to parse JSON output: ${result.stdout}`);
    }

    return {
      formData,
      stderr: result.stderr,
      exitCode: result.exitCode,
    };
  },
});

Add a requirements.txt file

Add the following to your requirements.txt file. This is required in Python projects to install the dependencies.

requirements.txt

PyMuPDF==1.23.8
requests==2.31.0

The Python script

The Python script uses PyMuPDF to extract form data from a PDF file. You can see the original script in our examples repository here.

src/python/extract-pdf-form.py

import fitz  # PyMuPDF
import requests
import os
import json
import sys
from urllib.parse import urlparse

def download_pdf(url):
    """Download PDF from URL to a temporary file"""
    response = requests.get(url)
    response.raise_for_status()

    # Get filename from URL or use default
    filename = os.path.basename(urlparse(url).path) or "downloaded.pdf"
    filepath = os.path.join("/tmp", filename)

    with open(filepath, 'wb') as f:
        f.write(response.content)
    return filepath

def extract_form_data(pdf_path):
    """Extract form data from a PDF file."""
    doc = fitz.open(pdf_path)
    form_data = {}

    for page_num, page in enumerate(doc):
        fields = page.widgets()
        for field in fields:
            field_name = field.field_name or f"unnamed_field_{page_num}_{len(form_data)}"
            field_type = field.field_type_string
            field_value = field.field_value

            # For checkboxes, convert to boolean
            if field_type == "CheckBox":
                field_value = field_value == "Yes"

            form_data[field_name] = {
                "type": field_type,
                "value": field_value,
                "page": page_num + 1
            }

    return form_data

def main():
    if len(sys.argv) < 2:
        print(json.dumps({"error": "PDF URL is required as an argument"}), file=sys.stderr)
        return 1

    url = sys.argv[1]

    try:
        pdf_path = download_pdf(url)
        form_data = extract_form_data(pdf_path)

        # Convert to JSON for structured output
        structured_output = json.dumps(form_data, indent=2)
        print(structured_output)
        return 0
    except Exception as e:
        print(json.dumps({"error": str(e)}), file=sys.stderr)
        return 1

if __name__ == "__main__":
    sys.exit(main())

Testing your task

Create a virtual environment python -m venv venv
Activate the virtual environment, depending on your OS: On Mac/Linux: source venv/bin/activate, on Windows: venv\Scripts\activate
Install the Python dependencies pip install -r requirements.txt
Copy the project ref from your Trigger.dev dashboard and add it to the trigger.config.ts file.
Run the Trigger.dev CLI dev command (it may ask you to authorize the CLI if you haven’t already).
Test the task in the dashboard by providing a valid PDF URL.
Deploy the task to production using the Trigger.dev CLI deploy command.

Learn more about using Python with Trigger.dev

Python build extension

Learn how to use our built-in Python build extension to install dependencies and run your Python code.

Introduction

Frameworks

Guides

Example projects

Python guides

Example tasks

Migration guides

Community packages

Python PDF form extractor example

Overview

Prerequisites

Features

GitHub repo

View the project on GitHub

The code

Build configuration

Task code

Add a requirements.txt file

The Python script

Testing your task

Learn more about using Python with Trigger.dev

Python build extension

Introduction

Frameworks

Guides

Example projects

Python guides

Example tasks

Migration guides

Community packages

​Overview

​Prerequisites

​Features

​GitHub repo

View the project on GitHub

​The code

​Build configuration

​Task code

​Add a requirements.txt file

​The Python script

​Testing your task

​Learn more about using Python with Trigger.dev

Python build extension

Overview

Prerequisites

Features

GitHub repo

The code

Build configuration

Task code

Add a requirements.txt file

The Python script

Testing your task

Learn more about using Python with Trigger.dev