What I made
So I made this video:
Didn't touch a timeline, didn't open Premiere or any timeline editor. Just chatting back and forth with Codex in Terminal, along with some CLI tools I already had wired up from other work.
It's rough and maybe cringy.
Posting it anyway because I wanted to document the process. I think it's an early indication of how, if you wrap these coding agents with the right tools, you can use them for other interesting workflows too.
Inspiration
I've been seeing a lot of these Remotion skills demo videos on X. They kept popping up in timeline. Wanted to try it myself.
One specific thing I wanted to test: could I have footage of me explaining something and have Codex actually understand the context of what I'm saying and also create animations that fit and then overlay this all in a nice way?
(I do this professionally for my gigs and it takes time. Wanted to see how much of that Codex could handle).
Disclaimers
Before anyone points things out:
- I recorded the video first, then asked Codex to edit it. So any jankiness in the flow is probably from that.
- I did have some structure in my head when I recorded. Not a written storyboard, more like a mental one. I knew roughly what I wanted to say and what kind of animation I might want but didn't know how the edit would turn out. Because I did not the know limitations of codex for animation.
- I'm a professional video producer. If I had done this manually, it probably would have taken me half or a third of the time. But I can increasingly see what this could look like down the line. And find the value.
- I already had CLI tools wired up because I've been doing this for a living. That definitely helped speed things up.
What I wired up
- NVIDIA Parakeet for transcription with word-level timestamps (already had cli for this)
- FastNet ASD for active speaker detection and face bounding boxes (already had cli for this too)
- Remotion for the actual render and motion (this was the skill I saw on X, just installed it for Codex with skill installer)
After that I just opened up the IDE and everything was done through the terminal.
Receipts
These are all the artifacts generated while chatting with Codex. I store intermediate outputs to the file system after each step so I can pick up from any point, correct things, and keep going. File systems are great for this.
| Artifact | Description |
|---|---|
| Raw recording | The original camera file. Everything starts here. |
| Transcript | Word-level timestamps. Used to sync text and timing to speech. |
| Active speaker frames | Per-frame face boxes and speaking scores for tracking. |
| Storyboard timeline | Planning timeline I used while shaping scenes and pacing. |
| 1x1 crop timeline | Crop instructions for the square preview/export. |
| Render timeline | The actual JSON that Remotion renders. This is the canonical edit. |
| Final video | The rendered output from the timeline above. |
If you want to reproduce this, the render timeline is the one you need. Feed it to Remotion and it should just work.
My thoughts
I'm super impressed by what Codex pulled off here. I probably could have done this better manually, and in less time too. But I'm already going to roll this into my workflows.
I've been meaning to shoot explainer videos and AI content for myself outside of client work, but kept putting it off because of time. Now I can actually imagine doing them. Once I templatize my brand aesthetic and lock in the feel I want, I can just focus on the content and delegate the editing part to the terminal.
It's kind of funny. My own line of work is partially getting decimated here. But I dunno, there's something fun about editing videos just by talking to a terminal.
Exciting times!