Article Details
The first thing that got my attention about James Gerde (@gerdegotit) wasn't the work itself. It was something he said about the work.
"Contrary to popular belief about generative AI, it isn't a magic button — there are a string of variables that must be accounted for and ways to optimize your workflow and process."
He said that in the context of explaining why his stuff looked different from everyone else's.
And the more I looked at his feed — nearly two million followers on Instagram, videos that take familiar footage and push it somewhere completely unexpected — the more that quote landed.
Because the part of his process nobody talks about is the part that makes the output possible. And it's not the part people assume.
What James Gerde Actually Does
James Gerde is a Seattle filmmaker. He spent years directing music videos, commercial films, and creative projects before he found the specific AI niche that turned into a company and a million-plus following.
What he does, specifically: video-to-video style transfer.
Not text-to-video. Not image generation. Video-to-video. He takes footage — often from other creators, with credit given — and runs it through a workflow that completely reimagines the visual style. A dance video becomes a neon painting in motion. A street scene becomes an animated world with the underlying movement intact but the surface totally transformed.
It's a meaningfully different discipline than what most people are doing with AI video tools.
Text-to-video is generative — you describe something and the model invents it.
Video-to-video is interpretive. You start with something real, and then you push it through a lens until it looks like it came from a different reality.
The motion is preserved. The character of the original performance stays. But the aesthetic is entirely new.
The Workflow Behind the Look
Gerde built his early work using AnimateDiff inside ComfyUI.
If those words mean nothing to you, here's the quick version:
ComfyUI is a node-based interface for running AI image and video models locally, meaning you're not relying on a web app — you're building custom pipelines on your own hardware.
AnimateDiff is a model designed specifically for applying animation styles to video frames in a temporally consistent way, which is the part that makes the motion look smooth rather than flickery.
This is not plug-and-play.
ComfyUI has a steep learning curve. Workflows are built by connecting nodes — each one doing one job — and getting the chain right to produce clean, consistent output takes real iteration. Gerde has been vocal about the fact that he's been developing and tweaking his workflow continuously since he started.
He's also been watching the whole landscape shift under him. When he replied to someone sharing an older video of his work on X, he said: "What's interesting is I actually don't think this is the best example of where the tech is. However it shows the change from image model based animations to video model based animations quite well. I do miss those warp fusion days tho.. a simpler time."
That's someone who has been in this long enough to feel nostalgia for earlier tools. He started with warp fusion. He moved to AnimateDiff. He's now tracking the shift toward native video models. His practice has had to evolve continuously, and he's chosen to evolve with it rather than stay locked in one technique.
The Part That Made It a Business
Gerde didn't stay a solo creator making cool content on Instagram. He turned it into Gerde Got It, a company built around video-to-video style transfer as a service and a craft.
He held a masterclass for Brandtech — one of the major holding companies in advertising — where he shared his workflow with their employees. He was selected as one of three inaugural creators for a Brandtech residency program specifically for AI creative talent. He has a Patreon where he's published tutorials for people who want to learn the process rather than just watch the output.
The Patreon is interesting because it's a tell. Sharing how you work is the move of someone who has figured out that the process is reproducible but the taste that guides it isn't. Anyone can learn his ComfyUI workflow from his tutorials. Not everyone will produce what he produces, because the workflow is only part of what's happening.
The selection of source footage matters.
The decision about how far to push the style transfer matters. The judgment call about when an output is compelling versus when it's just distorted. These are curatorial, directorial decisions that don't come with tutorials. They come with years of looking at a lot of output and knowing which piece is which.
What Seven Years of Sobriety Has to Do With It
There's a personal dimension to Gerde's work that he's open about and that I think shapes the work more than most people realize.
He's been sober for seven years. And he talks about art — specifically, the creative practice he's built — as a meaningful part of that recovery. In his company's origin story, the creative work isn't separate from the personal work. The two are intertwined.
That context changes how I read the discipline in his output. Shipping consistently. Iterating. Building something with rigor even when the tools are imperfect. Those habits don't just come from creative motivation. They come from a person who has learned, through difficult experience, what it means to show up for something every day regardless of how you feel.
The question of what it actually means for a creative to adopt an AI workflow usually gets framed around skills and tools. But Gerde's version of it is partly about character. The tools give you the capability. The person running them has to bring the rest.
The Credit Question
One thing Gerde does that not everyone does: he credits the original creators when he transforms their videos.
The work is derivative by design — that's the entire premise of video-to-video — and he's explicit about the source.
That's a good-faith practice in a space that has a real problem with attribution. It also matters for the relationship between his work and the original footage. He's not pretending to have invented the performance. He's saying: here's what this looks like when it goes through my lens. The performance belongs to whoever made it. The transformation belongs to him.
It's a clean distinction and it's how the conversation about AI and creative ownership should probably work — not "who made this" as a binary, but "who contributed what."
The Longer View
Gerde made a comment about the shift from image-model-based animation to video-model-based animation.
That's a real technical transition that's been happening over the last couple of years — the tools are changing from tools that work frame-by-frame (treating video as a series of images) to tools that model motion natively (understanding video as video).
That transition changes what's possible and what the creative decisions are. He's navigated it once already, from warp fusion to AnimateDiff. He'll navigate it again as native video models get better. The specific tools he uses today are less important than the fact that he's built a practice of staying current and integrating new capabilities as they arrive.
That adaptability is the actual skill. Not the ability to run a specific workflow. The ability to evaluate new tools quickly, update your pipeline, and keep producing.
He's built a company around a technique that didn't exist three years ago. He'll probably have to rebuild it again. That's the job now.
James Gerde's Toolkit
Gerde has been open about his process over time. Here's the core of what he's working with, with the caveat that his stack has evolved continuously and this reflects the workflow he's best known for:
ComfyUI
The foundation. A node-based interface for running Stable Diffusion and other AI models locally. Gerde uses it to build custom workflows that chain together multiple models and processing steps. Steep learning curve, but total control over the pipeline.
AnimateDiff
The model that makes the video-to-video style transfer work. Applied within ComfyUI, it processes video frames in a way that preserves temporal consistency — meaning motion stays smooth rather than flickering frame-to-frame when the style is transformed.
Stable Diffusion (various models)
The underlying image generation backbone that handles the style transformation. Different models produce different aesthetics, and Gerde has developed taste around which models to use for which inputs.
ControlNet
A critical piece of the puzzle for video-to-video work — ControlNet lets him preserve the pose and movement structure of the original footage while the style transforms around it. It's what keeps the dancer looking like they're doing the same moves even after everything else has changed.
Upscaling (likely Topaz Video AI)
His outputs show clean, high-resolution final renders. Professional video upscaling tools take the raw AI output and sharpen it to something worth posting.
Patreon (Process Documentation)
Not a production tool, but worth noting: Gerde publishes his workflows and tutorials for people who want to learn. The process is documented. The taste that makes it worth learning is the harder thing.