yesterday we got Mia walking. today we realized that walking isn't acting.
we have a fully textured, rigged 3D character. she can stomp, she can idle, she can do any of the 500+ preset animations in Meshy's library. but our screenplay doesn't say "Mia does preset animation #247." it says things like "Mia grabs Leo's hand and pulls him toward the minivan" and "Mia looks back at the house, uncertain, then steels herself and steps through the portal."
that's not a preset. that's acting. and figuring out how to get AI-generated characters to act is, it turns out, the next major unsolved problem in our pipeline.
The Animation Landscape in 2026
we spent the entire day researching what's actually available for AI-assisted character animation. not reading press releases and hype—actually digging into user forums, GitHub issues, pricing pages, and community reviews. here's the honest picture.
text-to-motion AI is a real thing now. you type "a person walks forward angrily" and get a 3D animation. the two leading options are HY-Motion 1.0 (free, open source, from Tencent, released December 2025) and DeepMotion's SayMotion (cloud-based, $9-83/month). both produce genuinely usable body animation for simple actions.
but here's what none of them can do:
- no facial animation. body only. for a movie about a family, this is a dealbreaker on its own. every scene needs faces.
- no hand animation. hands stay in a default pose. characters can't hold things, gesture, or interact with objects.
- no two-character interaction. you can't prompt "two characters hug" or "child grabs parent's hand." each character is animated in isolation.
- no non-humanoid characters. Jetplane the dinosaur—one of our main characters—is completely out of scope for every tool we found.
- 10-12 second maximum per clip. every scene has to be built from tiny chunks stitched together.
this is the state of the art. the best AI animation tools in the world, right now, can make a single human do simple body movements for 10 seconds at a time with no face and no hands.
it's genuinely impressive technology. and it's genuinely not enough to make a movie.
The Living Room Mocap Studio
the more interesting discovery was on the motion capture side.
you can now point a phone camera at a person, record them moving, and get 3D animation data out the other side. no suit, no markers, no special equipment. apps like Move.ai, Rokoko Vision, and DeepMotion's Animate 3D all do this, with varying quality levels and price points (free to $30/month).
which led to an idea that might be the most this-project thing we've come up with yet: what if we have the kids act out the scenes?
I have two children. the movie is about two children. the living room is right there. we could literally have them perform the scenes—walking, running, arguing, being scared, being brave—capture it with a phone, and use that motion data to drive the 3D characters in Blender.
there's something poetically right about it. an AI-generated movie about kids, performed by actual kids, captured with a phone, processed by AI, and rendered by a computer. every layer of technology in service of something fundamentally human: my children pretending to be characters in a story their dad wrote.
there are real questions to answer first. these AI pose estimation systems are trained on adult bodies, and kids have different proportions—bigger heads, shorter limbs, less predictable movement patterns. we'd need to test whether the tracking actually works on small humans before building the whole pipeline around it. the plan is to do a free test with Rokoko Vision: film each kid doing a simple walk and a dramatic gesture, see what the skeleton tracking looks like.
The Tool We're Actually Going to Try
after all the research, we're leaning toward DeepMotion as the primary platform. not because it's the best at any single thing, but because it does the most things in one place:
- SayMotion: text-to-animation for generating body motion from prompts
- Animate 3D: video-to-animation for capturing real performances
- custom character upload: bring your own FBX (our Meshy characters) and animations get generated directly on your rig
- cloud-based: no GPU required, which matters because our production server doesn't have one
the free tier gives us 3 credits and 1 download per month—enough to test the pipeline end to end before committing money. the Starter tier at $9/month (annual) or the Professional at $39/month give us real production capacity.
we also looked hard at HY-Motion 1.0, Tencent's open-source text-to-motion model. it's technically superior and completely free, but it needs an NVIDIA GPU with 8-12GB of VRAM minimum. our server has no GPU at all. running it would mean renting cloud GPU time, which adds complexity and arguably costs about the same as a DeepMotion subscription anyway.
What's Still Missing
even with the best available tools, we have gaps that no amount of AI can currently fill:
facial animation and lip sync. this is its own entire pipeline. none of the motion tools handle faces. we'll need to figure this out separately—probably some combination of blendshapes and a lip-sync tool.
Jetplane. a color-farting dinosaur is not a standard biped. every humanoid animation tool is useless here. Jetplane will probably need to be animated by hand, or we need to find quadruped-specific tools.
character interaction. two characters touching, holding hands, passing objects—this has to be choreographed manually in Blender by positioning separately-animated characters in the same scene. it's doable but labor-intensive.
the last 20%. AI motion generation gets you a plausible starting point. making it look like a specific character with specific emotions in a specific moment—that's the hard part, and it's still manual work.
The Pattern
we're seventeen days into production and a pattern has emerged. every stage of this process follows the same curve:
- AI gets us 70-80% of the way, fast
- we discover the remaining 20-30% is where the actual craft lives
- we figure out which parts of that 20% to do manually and which to accept as "good enough"
concept art, 3D models, rigging, and now animation—same story every time. the tools are transformatively good at producing raw material. turning raw material into a movie is still a human job.
the 80%: Mia exists, she moves, she has textures and lighting. the 20%: making her feel like a character in a story.
tomorrow we start testing. upload Mia to DeepMotion, generate some real scene animations, and maybe point a phone at a couple of kids and see what happens.