All posts

PixiJS was killing my video exports. So I rewrote the pipeline in Rust.

Nam Tran ·
Performance Comparison of PixiJS and Rust video exporter

I’m building a Mac screen recorder called TinyRec. Standard Electron stack — PixiJS for the editor preview, WebCodecs for the export. And it was slow.

A five-minute 720p clip took fifteen minutes to export. Not 4K. Not some pathological case. Plain 720p — the resolution everyone records at. Three times the source duration, just to render it out.

Worse, the UI froze the entire time. The progress bar I’d built to make the wait feel less awful couldn’t even animate.

I knew exactly why. I just spent a week pretending I didn’t.

The shape of the problem

PixiJS pins everything to the main thread. It needs DOM access for its WebGL context, so OffscreenCanvas in a worker is off the table. Every frame had to:

  1. Decode on the main thread (WebCodecs VideoDecoder)
  2. Composite on the main thread (PixiJS — wallpaper, crop, zoom, cursor, annotations, camera overlay)
  3. Encode on the main thread (WebCodecs VideoEncoder)
  4. Repeat 9000 times for a five-minute clip

Each composite stalled JS for 60–120ms. The encoder spent most of its life waiting for the next frame to arrive. The UI got whatever microseconds were left, which was none.

I tried everything I could without rearchitecting:

  • Lower-resolution preview during export. Helps a little, doesn’t fix the encode/decode tax.
  • Manual requestAnimationFrame chunking. The JS event loop pauses got smaller; total time barely changed.
  • Skipping PixiJS and writing a hand-rolled WebGL2 path. Got partway, then realized I was reimplementing a smaller PixiJS.
  • Moving the encoder out via MediaStreamTrackProcessor. Same main-thread compositor, same problem.

The bottleneck wasn’t the encoder. It was the compositor. And the compositor couldn’t leave the main thread.

The actual answer

I rewrote the export pipeline as a separate Rust binary the Electron app spawns when you hit Export. The renderer talks to it over stdio: a JSON plan in, frame-progress out, exit code at the end.

Electron renderer
UI thread · stays responsive
Rust process
Compositor · encoder · muxer

The architecture inside the Rust binary:

ffmpeg-sys-next Decode source MP4, read camera and audio inputs
Metal compositor Wallpaper · crop · zoom · cursor · annotations · camera overlay
IOSurface pixel buffer GPU output stays on the GPU — no CPU readback
h264_videotoolbox Hardware encode — the same Apple Silicon block WebCodecs uses, minus the JS↔engine roundtrip per frame
ffmpeg muxer Write the .mp4

Two things matter here that I underweighted at the start:

The encoder was never the bottleneck. Both the JS and Rust paths route the actual H.264 encoding through Apple’s VideoToolbox. The win wasn’t a faster encoder — it was eliminating the per-frame copy from a WebGL canvas to a JS Uint8Array to a VideoFrame to the encoder. With IOSurface-backed pixel buffers, the Metal output flows directly into VideoToolbox without ever touching the CPU.

The UI being responsive matters more than the wall-clock time. Even if the export still took fifteen minutes, having it run in a separate process means I can show a real progress bar, stream a thumbnail of the current frame, and let the user keep working in the editor. That alone was worth the rewrite.

The numbers

Same five-minute 720p clip on the same M-series Mac:

JS pipeline
15:00
Rust pipeline
1:00
5-minute 720p clip · same Mac · same input · same output bitrate

Fifteen minutes to one minute. Faster than realtime instead of three times slower.

The JS pipeline still ships — it’s the fallback for features the Rust path doesn’t support yet, and it runs the web editor at editor.tinyrec.io where there’s no native binary to spawn.

Why Rust specifically

Honest answer: I could’ve done this in C++ or Swift. Rust wasn’t strictly necessary.

  • C++: would’ve worked. The ffmpeg integration is more battle-tested. But the build/dep story for a single-file release binary is rough — vcpkg, CMake, dynamic-vs-static-link drama, codesigning each transitive .dylib. I shipped a Rust binary in an afternoon that took me a day in C++ the last time I tried.
  • Swift: also fine ergonomics. But cross-platform is a future bonus, and the day I want a Linux/Windows version of this, Swift makes that harder.
  • Rust: cargo, one binary, codesign once, easy CI. The crate ecosystem for ffmpeg and Metal interop turned out better than I expected.

The hardest part wasn’t the Rust. It was the GPU resource lifecycle between Metal and ffmpeg. Whose pixel buffer is whose. Who frees what. The good crates papered over most of it but I spent a non-trivial amount of time staring at IOSurfaceLock semantics.

What I’d do differently

  1. Build the Rust path first, ship the JS path as a fallback — instead of the other way around. The JS path took weeks to optimize toward acceptable; the Rust path took a long weekend to a working v1. I had the order wrong.
  2. Benchmark before optimizing. I burned a week on PixiJS micro-optimizations that the profiler told me wouldn’t matter, because I didn’t want to accept that the architecture was the problem.
  3. Decoupling the export process turned out to matter more than the speedup. Anything that takes more than five seconds and runs on the main thread is a UX bug. Move it.

If you want to try TinyRec, it’s free on Mac: tinyrec.io.