logo
Alexandre Marques
Back to blog

What happens when step 3 of 5 fails? Building rollbackit

When side effects span multiple systems, a database transaction can't save you. That's why I built rollbackit: type-safe, zero-dependency rollback for multi-step operations in TypeScript & JavaScript.

Alexandre Marques
Alexandre Marques
ยท8 min read
What happens when step 3 of 5 fails? Building rollbackit

Picture a sign-up flow. You create the user row, provision a storage bucket for their files, kick off a billing subscription, and send a welcome email. Five steps, four different systems. Then step 3 throws. Now you've got a user in your database with no subscription, a bucket sitting around with nothing in it, and a customer who is technically half-onboarded. None of it rolls back on its own, because none of it lives inside a single database transaction.

We reach for BEGIN ... COMMIT without thinking, and it's wonderful, right up until the side effects leave the database. The moment you touch object storage, a payment provider, a third-party API, or another service, the transaction boundary you've been leaning on simply doesn't stretch that far. You're left writing the same nervous try/catch ladders by hand: undo this, then undo that, but only the things that actually ran, and in the right order, and please don't crash while cleaning up. I'd written that code more times than I'd like to admit. That's why I built rollbackit.

What's rollbackit

rollbackit is a tiny TypeScript library for multi-step operations that need all-or-nothing semantics across systems a single transaction can't cover. The idea is simple: right after each step succeeds, you register how to undo it. If anything later fails, rollbackit runs those undos for you, newest-first, and re-throws the original error. No workflow engine, no orchestrator, no infrastructure to stand up. Some of the key features:

โœจ Key Features:

How to use it

Installation

npm install rollbackit # or pnpm, yarn, bun

Quick Start

The fastest way in is withRollback. You get a rollback instance (rb), and after each side effect you call rb.add(description, undo). If your function returns normally, the undos are discarded. If it throws, they run in reverse order and the original error re-throws, so your caller still sees what actually went wrong.

import { withRollback } from "rollbackit";
 
const result = await withRollback(async (rb) => {
  const user = await db.createUser(data);
  rb.add("delete user", () => db.deleteUser(user.id));
 
  const bucket = await storage.createBucket(user.id);
  rb.add("delete bucket", () => storage.deleteBucket(bucket.id));
 
  await sendWelcomeEmail(user);
 
  return user;
});

That's the whole pattern. Read top to bottom, each step sits right next to its own undo, so there's no separate cleanup function to keep in sync as the flow grows.

Pairing a step with its undo with step()

add registers an undo for work you've already done, which means the ordering is on you: do the thing, then register its compensation, and never let an add slip in front of the action it reverses. step() collapses that into a single call. It runs the forward action and only registers the undo if the action resolves, so a failed step never leaves a compensation pointing at something that was never created. It also hands the action's result straight to the undo, so you don't have to thread ids through outer variables.

const user = await withRollback(async (rb) => {
  return rb.step(
    "create user",
    () => api.createUser(payload),       // forward action
    (user) => api.deleteUser(user.id),   // undo, gets the action's result
  );
});

If the action throws, nothing is registered and the error propagates, so an enclosing withRollback still unwinds the earlier steps. Same guarantee as add, just without the chance of getting the order wrong.

Timeouts: don't let a hung step skip rollback

A slow call isn't only a latency problem, it's a correctness one. If your process is killed while a step hangs (a Lambda hitting its own timeout, a pod SIGKILL'd past its grace period), control never reaches your catch, so the rollback never runs and you leak everything created so far. The fix is to give the work its own deadline that fires first, a few seconds below the platform's, so the unwind happens while you're still alive.

You can bound a single step with step()'s timeout option, or the whole operation with withRollback's timeout, or both. The callback receives an AbortSignal you can thread into your slow calls so the in-flight work actually cancels:

await withRollback(
  async (rb, signal) => {
    const user = await rb.step(
      "create user",
      (stepSignal) => api.createUser(payload, { signal: stepSignal }),
      (user) => api.deleteUser(user.id),
      { timeout: 5_000 }, // per-step budget
    );
    await api.activate(user, { signal }); // thread the scope signal into slow calls
  },
  { timeout: 25_000 }, // whole-operation budget, a few seconds under the platform limit
);

Both throw a TimeoutError (a RollbackError subclass), and a timed-out withRollback still unwinds whatever was registered before re-throwing. Two things worth keeping in mind: the timeout only stops you waiting, so the AbortSignal is what actually cancels the request, and only if your call honors it (fetch(url, { signal }), most drivers and SDKs do). And a timed-out action is left in an unknown state, since it may have created the resource on the server just before the abort landed, with no undo registered for it, so make those actions idempotent or reconcilable and give the timeout enough margin that a genuine success isn't cut off.

Manual control with createRollback

Sometimes a single scoped callback isn't enough. You're spanning multiple methods, or you want to decide when rollback happens yourself. createRollback gives you the instance directly, with an explicit rollback() you call when things go wrong.

import { createRollback } from "rollbackit";
 
const rb = createRollback();
 
try {
  const user = await db.createUser(data);
  rb.add("delete user", () => db.deleteUser(user.id));
 
  await storage.createBucket(user.id);
  rb.add("delete bucket", () => storage.deleteBucket(user.id));
 
  rb.commit();
} catch (error) {
  const { failures } = await rb.rollback();
  if (failures.length) {
    logger.warn("rollback incomplete", failures);
  }
  throw error;
}

When the cleanup itself fails

Cleanup code isn't magic, it can fail too. By default rollbackit is forgiving: if one undo throws, it's collected and the remaining undos still run, so a single flaky cleanup doesn't strand everything behind it. You inspect the outcome through the RollbackResult:

{
  failures, // [{ description, error }] โ€” undos that threw
  pending,  // operations left un-run (when a stop halted the sweep)
}

If you'd rather halt at the first failure, flip stopOnFailure on, either for a single risky operation or for the whole sweep:

// per-operation
rb.add("risky cleanup", () => cleanup(), { stopOnFailure: true });
 
// run-level
const { failures, pending } = await rb.rollback({ stopOnFailure: true });

With withRollback, the same information surfaces through the onFailures hook, perfect for a log line or an alert:

await withRollback(
  async (rb) => {
    /* ... */
  },
  {
    onFailures: ({ failures, pending }) =>
      logger.warn("rollback incomplete", { failures, pending }),
  },
);

Progressive commit: not everything should share fate

Here's the feature I lean on most. Often a flow has natural checkpoints. Once stage one is durably done, a failure in stage two shouldn't tear it down. Calling commit() seals the current batch without finalizing the instance, so a later rollback() only unwinds what was registered after the last commit.

const rb = createRollback();
 
try {
  await stageOne(rb);  // create user + bucket, each with its undo
  rb.commit();         // stage one is safe now
 
  await stageTwo(rb);  // start a billing subscription, with its undo
  rb.commit();
} catch (error) {
  await rb.rollback(); // only unwinds operations after the last commit
  throw error;
}

It's the same idea as a savepoint, but spanning whatever systems your steps happen to touch.

Real-World Use Cases

Why so small

rollbackit deliberately does one thing. It's not a saga framework, it doesn't persist state, and it won't retry your operations across process restarts. If you need durable, long-running orchestration, reach for the heavier tools. But for the very common case of "a handful of steps in one request that must succeed or unwind together," a focused, zero-dependency helper is usually all you actually want, and a lot easier to reason about.

Contributing

I'd love your help making it better:

Visit the GitHub repository to get started.

Thanks for reading! If you find rollbackit useful, please consider giving it a star on GitHub. โญ

Happy coding! ๐Ÿš€