The Work Hidden Inside Gossip Goblin’s Worlds

Gossip Goblin’s Toolkit

Since people always ask how these videos get made, here’s the simplified version of the tools Zack London uses and what he uses them for. This isn’t comprehensive — just the core pieces he mentioned in his Instagram story.

Midjourney
Where he generates most of the characters, costumes, and visual concepts. Hundreds of variations, not one.

Seedream (via Freepik)
Used to blend characters into environments and force the pieces to look like they belong together, even when the lighting fights back.

Veo 3
His go-to for dialogue-heavy scenes. Handles simple movement and built-in audio, which he uses when lip-syncing isn’t essential.

Runway (Act Two)
Used as part of his more complex lip-syncing or performance-driving workflows.

HeyGen
For generating clean lip-sync passes or creating “driving performances” that he uses to steer other models.

ElevenLabs
For voice work — to give characters a consistent, intentional voice instead of relying on whatever the model spits out.

CapCut
His editing choice. He jokes that he uses it “because I am a simpleton,” but it gets the job done for assembling a dozen moving parts into one coherent scene.

The Work Hidden Inside Gossip Goblin’s Worlds

Watching Zack London (better known as GossipGoblin) break down his process in a handful of Instagram stories made me rethink everything.

I’ve been working with AI visuals long enough to feel like I understand the basics. I know how far a single tool (ChatGPT) can take you and where the seams start to show. But watching Zack London (better known as Gossip Goblin) break down his process in a handful of Instagram stories made me rethink everything.

He starts with a script. Not a prompt. Not a mood board. A script.

He says it plainly: “Every video starts with a script.” And when you see the kind of worlds he builds — from neo-feudal cyborg aristocrats to entire goblin civilizations — it makes sense. The writing is the spine and everything else hangs off it.

Then you see how deep the work actually goes.

The Patchwright - Volume 1 from Gossip Goblin's YouTube channel

For a single set of characters, he ran “probably 400 prompts / 1600 images.” And that’s just to get the faces right. That level of iteration doesn’t show up in the final video. You only feel the polish, not the mountain of attempts behind it.

When he moves into environments, the honesty gets even clearer.

He says “this part is not enjoyable” and then describes generating a “FUCKload” of background shots just to get something he could force his characters into. Nothing about this is automated. That’s craft. That’s stubbornness. That’s someone who wants the world to hold together even when the tools don’t make it easy.

The same energy shows up in animation.

Lip sync? He calls it “a colossal headache.” Camera movement? Manual. Dialogue heavy scenes? Carefully shepherded. At one point he mentions running “about 150 generations for a 90-second scene.” You don’t do that unless you care. And you definitely don’t do that if you think this is “just prompting.”

And that’s the part that stuck with me.

Zack has a line in another interview where he says, “There is zero skill involved in generating AI images.” It sounds harsh until you see what he actually means. The art isn’t in the button press. It’s in the world-building, the selection, the rewriting, the stitching, the judgement calls you make a hundred times before something finally looks intentional. You can feel that mindset in all his work.

Seeing his process made me appreciate two different truths at the same time:

The computer handles the speed.
He handles everything else.

And it also showed me how early I still am.

I’ve worked inside my lane for 3 years now — ChatGPT visuals, simple storytelling, pieces that fit what I’m making. But watching Zack hop between half a dozen specialized tools, each doing one job well, each contributing to the final thing, made it obvious how big the landscape really is. There’s a lot I haven’t touched yet. A lot I haven’t unlocked.

What I took from his stories wasn’t so much a tutorial as a reality check. The top people in this space aren’t getting great results because AI “likes them.” They’re getting great results because they’re willing to generate, rework, discard, and rebuild until the thing feels right.

So when he ends one of the stories by saying the whole process takes “12–14 hours end-to-end” — for a single piece — it lands. The output is beautiful. But the work behind it is still laboriously human.

Gossip Goblin’s Toolkit

Midjourney
Where he generates most of the characters, costumes, and visual concepts. Hundreds of variations, not one.

Seedream (via Freepik)
Used to blend characters into environments and force the pieces to look like they belong together, even when the lighting fights back.

Veo 3
His go-to for dialogue-heavy scenes. Handles simple movement and built-in audio, which he uses when lip-syncing isn’t essential.

Runway (Act Two)
Used as part of his more complex lip-syncing or performance-driving workflows.

HeyGen
For generating clean lip-sync passes or creating “driving performances” that he uses to steer other models.

ElevenLabs
For voice work — to give characters a consistent, intentional voice instead of relying on whatever the model spits out.

CapCut
His editing choice. He jokes that he uses it “because I am a simpleton,” but it gets the job done for assembling a dozen moving parts into one coherent scene.

The Work Hidden Inside Gossip Goblin’s Worlds

Gossip Goblin’s Toolkit

Join The Daring Creatives

The Work Hidden Inside Gossip Goblin’s Worlds

Gossip Goblin’s Toolkit

A Website Recipe for Creatives Who Hate Promotion

The Trade-Offs of Autonomy: Closing the Freelance Chapter

A 10X Lesson: What it means to be in business with someone

Join The Daring Creatives