Prerequisites

Overview

There are 3 example tasks to follow on this page:

  1. Basic example
  2. Generate a PDF from a web page
  3. Scrape content from a web page

WEB SCRAPING: When web scraping, you MUST use a proxy to comply with our terms of service. Direct scraping of third-party websites without the site owner’s permission using Trigger.dev Cloud is prohibited and will result in account suspension. See this example which uses a proxy.

Build configurations

To use all examples on this page, you’ll first need to add these build settings to your trigger.config.ts file:

trigger.config.ts
import { defineConfig } from "@trigger.dev/sdk/v3";
import { puppeteer } from "@trigger.dev/build/extensions/puppeteer";

export default defineConfig({
  project: "<project ref>",
  // Your other config settings...
  build: {
    // This is required to use the Puppeteer library
    extensions: [puppeteer()],
  },
});

Learn more about build configurations including setting default retry settings, customizing the build environment, and more.

Set an environment variable

Set the following environment variable in your Trigger.dev dashboard or using the SDK:

PUPPETEER_EXECUTABLE_PATH: "/usr/bin/google-chrome-stable",

Basic example

Overview

In this example we use Puppeteer to log out the title of a web page, in this case from the Trigger.dev landing page.

Task code

trigger/puppeteer-basic-example.ts
import { logger, task } from "@trigger.dev/sdk/v3";
import puppeteer from "puppeteer";

export const puppeteerTask = task({
  id: "puppeteer-log-title",
  run: async () => {
    const browser = await puppeteer.launch();
    const page = await browser.newPage();

    await page.goto("https://trigger.dev");

    const content = await page.title();
    logger.info("Content", { content });

    await browser.close();
  },
});

Testing your task

There’s no payload required for this task so you can just click “Run test” from the Test page in the dashboard. Learn more about testing tasks here.

Generate a PDF from a web page

Overview

In this example we use Puppeteer to generate a PDF from the Trigger.dev landing page and upload it to Cloudflare R2.

Task code

trigger/puppeteer-generate-pdf.ts
import { logger, task } from "@trigger.dev/sdk/v3";
import puppeteer from "puppeteer";
import { PutObjectCommand, S3Client } from "@aws-sdk/client-s3";

// Initialize S3 client
const s3Client = new S3Client({
  region: "auto",
  endpoint: process.env.S3_ENDPOINT,
  credentials: {
    accessKeyId: process.env.R2_ACCESS_KEY_ID ?? "",
    secretAccessKey: process.env.R2_SECRET_ACCESS_KEY ?? "",
  },
});

export const puppeteerWebpageToPDF = task({
  id: "puppeteer-webpage-to-pdf",
  run: async () => {
    const browser = await puppeteer.launch();
    const page = await browser.newPage();
    const response = await page.goto("https://trigger.dev");
    const url = response?.url() ?? "No URL found";

    // Generate PDF from the web page
    const generatePdf = await page.pdf();

    logger.info("PDF generated from URL", { url });

    await browser.close();

    // Upload to R2
    const s3Key = `pdfs/test.pdf`;
    const uploadParams = {
      Bucket: process.env.S3_BUCKET,
      Key: s3Key,
      Body: generatePdf,
      ContentType: "application/pdf",
    };

    logger.log("Uploading to R2 with params", uploadParams);

    // Upload the PDF to R2 and return the URL.
    await s3Client.send(new PutObjectCommand(uploadParams));
    const s3Url = `https://${process.env.S3_BUCKET}.s3.amazonaws.com/${s3Key}`;
    logger.log("PDF uploaded to R2", { url: s3Url });
    return { pdfUrl: s3Url };
  },
});

Testing your task

There’s no payload required for this task so you can just click “Run test” from the Test page in the dashboard. Learn more about testing tasks here.

Scrape content from a web page

Overview

In this example we use Puppeteer with a BrowserBase proxy to scrape the GitHub stars count from the Trigger.dev landing page and log it out. See this list for more proxying services we recommend.

When web scraping, you MUST use the technique below which uses a proxy with Puppeteer. Direct scraping without using browserWSEndpoint is prohibited and will result in account suspension.

Task code

trigger/scrape-website.ts
import { logger, task } from "@trigger.dev/sdk/v3";
import puppeteer from "puppeteer-core";

export const puppeteerScrapeWithProxy = task({
  id: "puppeteer-scrape-with-proxy",
  run: async () => {
    const browser = await puppeteer.connect({
      browserWSEndpoint: `wss://connect.browserbase.com?apiKey=${process.env.BROWSERBASE_API_KEY}`,
    });

    const page = await browser.newPage();

    try {
      // Navigate to the target website
      await page.goto("https://trigger.dev", { waitUntil: "networkidle0" });

      // Scrape the GitHub stars count
      const starCount = await page.evaluate(() => {
        const starElement = document.querySelector(".github-star-count");
        const text = starElement?.textContent ?? "0";
        const numberText = text.replace(/[^0-9]/g, "");
        return parseInt(numberText);
      });

      logger.info("GitHub star count", { starCount });

      return { starCount };
    } catch (error) {
      logger.error("Error during scraping", {
        error: error instanceof Error ? error.message : String(error),
      });
      throw error;
    } finally {
      await browser.close();
    }
  },
});

Testing your task

There’s no payload required for this task so you can just click “Run test” from the Test page in the dashboard. Learn more about testing tasks here.

Local development

To test this example task locally, be sure to install any packages from the build extensions you added to your trigger.config.ts file to your local machine. In this case, you need to install .

Proxying

If you’re using Trigger.dev Cloud and Puppeteer or any other tool to scrape content from websites you don’t own, you’ll need to proxy your requests. If you don’t you’ll risk getting our IP address blocked and we will ban you from our service.

Here are a list of proxy services we recommend: