Background Jobs at Scale with BullMQ and Redis
4/21/2026 • RiseGravity Team
Why background jobs decide whether your product feels fast
The fastest way to make a web app feel slow is to do real work inside the request. Sending email, generating a PDF, transcoding video, calling a flaky third-party API, running an AI pipeline—if any of that happens while the user waits, your p95 latency is hostage to the slowest dependency you don't control.
The fix is to move slow, unreliable, or heavy work out of the request and into a background job queue. The user gets an instant "we're on it"; a separate worker process does the work, retries on failure, and reports progress. We lean on BullMQ (a Redis-backed queue for Node.js) for exactly this—in AI Short Studio it runs the FFmpeg + Whisper video pipeline, in DomainFlow.ai it drives AI research and multi-step email campaigns, and in ProTerminal.io it powers 100+ scheduled market-data jobs. This is the playbook.
Key takeaways
- Anything slow, heavy, flaky, or scheduled belongs in a queue, not the request.
- Jobs must be idempotent—they will run more than once.
- Configure retries with backoff and a dead-letter strategy from day one.
- Rate-limit and partition by tenant so one customer can't starve the rest.
- Treat the queue as production infrastructure: monitor depth, failures, and latency.
The core model: queues, jobs, and workers
Three pieces:
- A queue is a named list of pending work, stored in Redis.
- A job is one unit of work plus its payload (kept small—pass an id, not a megabyte).
- A worker is a process that pulls jobs and runs them, separately from your web server.
import { Queue, Worker } from "bullmq";
const connection = { host: "127.0.0.1", port: 6379 };
// Producer side (inside your API): enqueue and return immediately
const emailQueue = new Queue("email", { connection });
await emailQueue.add(
"welcome",
{ userId }, // small payload: an id, not the whole user
{ attempts: 5, backoff: { type: "exponential", delay: 1000 } }
);
// Consumer side (a separate worker process)
new Worker(
"email",
async (job) => {
const user = await users.findById(job.data.userId);
await sendWelcomeEmail(user); // the slow/flaky part lives here
},
{ connection, concurrency: 10 }
);
The API responded the instant the job was enqueued; the worker does the waiting. Run workers as their own deployable process so you can scale and restart them independently of the web tier.
Idempotency: the rule that prevents data corruption
Here's the truth every queue forces you to confront: a job can run more than once. A worker can crash after doing the work but before acknowledging it; a retry then re-runs it. If "run twice" means "charge the card twice" or "send two emails," you have a bug waiting to fire.
Make every job idempotent—safe to run repeatedly with the same effect as running once.
- Use a stable job id so duplicate enqueues collapse into one:
await paymentQueue.add("capture", { orderId }, { jobId: `capture:${orderId}` });
- Guard the side effect with an idempotency key the downstream system understands (Stripe, your email provider, and most serious APIs support this).
- Make the work a no-op if already done—check state before acting (
if (order.captured) return).
Idempotency is the single most important discipline in async systems; it's why our multi-tenant billing and marketplace payout flows can safely reprocess webhooks.
Retries, backoff, and dead letters
Transient failures are normal—a third-party API hiccups, a network blips. Retries handle them; the trick is doing it without making things worse.
- Exponential backoff spaces retries out so you don't hammer a struggling dependency: 1s, 2s, 4s, 8s…
- Add jitter to avoid a thundering herd of synchronized retries.
- Cap attempts (say 5). After that, the job is genuinely failed—don't retry forever.
- Keep failed jobs (a dead-letter queue /
removeOnFail: false) so you can inspect, fix, and replay them rather than silently losing work.
await queue.add("sync", data, {
attempts: 5,
backoff: { type: "exponential", delay: 2000 },
removeOnComplete: 1000, // keep recent successes for inspection
removeOnFail: false, // never drop failures—you'll want to replay them
});
Distinguish retryable errors (timeouts, 503s) from permanent ones (validation failure, 404). Throwing on a permanent error just to retry it five times wastes capacity and delays the inevitable—fail fast instead.
Rate limiting and fair scheduling
Two different problems hide under "rate limiting."
Respecting external limits. If a provider allows 100 requests/minute, your workers must not exceed it no matter how many jobs are queued. BullMQ supports a queue-level limiter:
new Worker("enrichment", handler, {
connection,
limiter: { max: 100, duration: 60_000 }, // 100 jobs/min across this queue
});
Fairness across tenants. In multi-tenant systems, one customer enqueuing 50,000 jobs must not starve everyone else. Options that work in practice:
- Per-tenant queues (or queue groups) processed round-robin.
- Priority for interactive jobs over bulk/batch jobs.
- Concurrency budgets per tenant so no single one monopolizes workers.
On ProTerminal, partitioning and prioritizing kept ~50,000–80,000 daily market-data calls flowing without any one job class blocking the real-time path.
Scheduling recurring work
A lot of background work is on a clock: nightly reports, hourly syncs, polling for updates. BullMQ's repeatable jobs (cron-style) cover this without a separate scheduler.
// Run every 15 minutes; one definition, survives restarts
await queue.add(
"refresh-quotes",
{},
{ repeat: { pattern: "*/15 * * * *" }, jobId: "refresh-quotes" }
);
Two cautions: give repeatable jobs a stable jobId so you don't accumulate duplicates on each deploy, and make sure a slow run can't overlap the next scheduled run (guard with a lock or check "is the previous run still going?").
Progress and real-time feedback
Long jobs shouldn't leave users staring at a spinner. Emit progress from the worker and stream it to the client—we use Server-Sent Events (SSE) in AI Short Studio so creators watch their video render advance through planning, cropping, captioning, and encoding in real time.
// Inside the worker
await job.updateProgress({ stage: "captioning", percent: 60 });
The frontend subscribes (SSE or WebSocket) and renders a live progress bar. It's a small touch that transforms how reliable and responsive a heavy feature feels.
Observability: treat the queue like production infrastructure
If you can't see your queues, you can't operate them. Track at minimum:
- Queue depth / backlog — rising depth means workers can't keep up; alert on it.
- Failure rate — a spike usually means a dependency is down.
- Job duration (p50/p95) — regressions hide here.
- Stalled/stuck jobs — workers that died mid-job.
- Dead-letter contents — what failed permanently and needs a human or a replay.
Pipe these to your metrics stack and dashboard them. A queue you don't monitor is an outage you'll discover from customers.
A pragmatic checklist before you ship a queue
- Workers run as a separate process from the web tier.
- Every job is idempotent (stable job id + guarded side effects).
- Retries with exponential backoff + jitter, capped attempts.
- Failed jobs are retained for inspection and replay.
- Rate limits respect external APIs; fairness protects tenants.
- Payloads are small—pass ids, fetch data in the worker.
- Recurring jobs have stable ids and overlap protection.
- Metrics and alerts on depth, failures, and duration.
- Redis is treated as durable infrastructure (persistence + memory headroom).
Flows: orchestrating multi-step pipelines
Real features are rarely one job—they're pipelines. AI Short Studio turns a YouTube link into a publishable vertical short through a sequence of stages: download, plan clips with AI, transcribe with Whisper, smart-crop to 9:16, add captions, encode, and finalize. Modeling that as one giant job is a mistake—a failure at the encode step shouldn't re-download and re-transcribe the whole video.
BullMQ flows let you compose jobs into a dependency graph: child jobs run first, and a parent job runs only once its children complete, receiving their results. This gives you retry granularity (re-run just the failed stage), natural parallelism (independent children run concurrently), and a clean place to aggregate results.
import { FlowProducer } from "bullmq";
const flow = new FlowProducer({ connection });
await flow.add({
name: "finalize-short",
queueName: "video",
data: { sourceId },
children: [
{ name: "encode", queueName: "video", data: { sourceId } },
{ name: "transcribe", queueName: "video", data: { sourceId } },
],
});
// "finalize-short" runs only after encode + transcribe succeed.
A few hard-won rules for pipelines:
- Each stage is its own idempotent job with its own retry policy—stages have different failure profiles (a network download fails differently than a CPU-bound encode).
- Pass references, not payloads, between stages. Write intermediate artifacts (the downloaded file, the transcript) to durable storage and pass the location; don't shove megabytes through Redis.
- Make the pipeline observable end to end. Emit progress per stage so the UI can show "transcribing… 60%," and so you can see exactly where a stuck job is stuck.
- Design for partial failure. Decide what happens when stage 4 of 6 fails permanently—does the user retry from there, or start over? The answer should be a product decision, not an accident of your retry config.
Frequently asked questions
Why BullMQ instead of running tasks with setTimeout or a cron script?
In-process timers die with your server and don't survive deploys, retries, or multiple instances. BullMQ persists jobs in Redis, distributes them across workers, and gives you retries, backoff, rate limiting, scheduling, and observability for free—the things you'd otherwise reinvent badly.
How do I make a job idempotent? Give it a stable job id so duplicate enqueues collapse, guard the side effect with an idempotency key the downstream system honors, and make the handler a no-op if the work is already done (check state before acting). Assume every job can run twice.
What happens to jobs that keep failing? Cap attempts (e.g. 5) with exponential backoff, then route exhausted jobs to a dead-letter/failed state you retain rather than delete. Inspect them, fix the cause, and replay—don't retry forever or drop the work silently.
How do I stop one tenant from hogging all the workers? Partition work per tenant (separate queues or queue groups), apply per-tenant concurrency budgets, and prioritize interactive jobs over bulk batches so fairness is enforced rather than hoped for.
Build async systems that don't lose work
Background jobs are where reliability is won or lost—idempotency, retries, fairness, and observability are the difference between a queue that quietly heals itself and one that drops customer work at 3 a.m. If you're building heavy or real-time features and want them to hold up under load, see our Projects, read our multi-tenant architecture guide, or reach out at contact@risegravity.com.