How well does 3D facial capture work to convey human emotion?

Project 1 Roundup

Facial Motion Capture Project by Chantel Benson
Can facial motion capture reflect human emotion?

What if you could video-chat with someone, but instead of your face, the other person would see a 3D avatar emoting your emotions, gesturing your gestures and talking exactly the way that you talk? What if you could do this in real-time, from the comfort of your own home, or “traveling office” (if that’s what the kids call it these day…wink), and you would get to build and craft the perfect avatar for yourself?

OK, enough whatifs. This technology is coming/happening/here.

Remember Second Life? Its having a literal second life in the form of High Fidelity – just get yourself an avatar and start interacting with the other folks on the world wide web, replete with hand gesture and facial capture technology (provided that you know how to use it). And then there’s the amazing Kickstarter poster-child Face Rig. These guys make a chat interface look seamless and ready-to-use (spoiler alert: as of this writing, it is not easy to use). So there are already options out there and its only a matter of time until we are easily able to mask ourselves in any 3D form that we want.

Here’s the catch, though. Human emotion is easy to get wrong, especially if you are expecting a non-human form to do the emoting. And, building and animating characters in 3D still has its challenges for the average Joe. That’s the gist of my findings with this first project.

Project Parameters

I used a slew of readily available prosumer-level tech to test the accessibility of a non-techie, such as myself, to create and animate a 3D character in real-time, using just a web-cam. Here’s a recap so that you are up to speed.

  • Facial Motion Capture Project by Chantel Benson
    Can facial motion capture reflect human emotion?

Tech Used

Fuse – a modular 3D character creator
Face Plus – real-time facial capture using just a webcam
Unity 3D – game engine where all the elements come together to work in the same scene

Monologue Approach

Using tech like Face Plus for video chat applications means it needs to be able to react to human emotion and speech at adequate high levels of accuracy.  Humans can easily sense when a non-human figure is not quite “right,” or approaching the Uncanny Valley. I wanted to create a believable character and construct a compelling narrative, in order to test if the system would be able to reflect the level of performance.

I chose a monologue from Gremlins, a classic tragi-comedy from my childhood. Kate, the love interest of the movie’s main protagonist, is a toughened soul who reveals why she hates the Christmas holiday in one of the most gruesome and tragic dark comedic monologues that I’ve ever encountered. Diving into the text was fun and rehearsing the character and the monologue brought me back to my days as theater actress. The level of preparation I put into portraying Kate in my video recordings was sufficient for testing real human emotion on a 3D avatar.

Tech Challenges

Creating a 3D character that looked enough like me was a challenge. Then came learning the intricacies of getting the character hooked up properly to the Face Plus system in the Unity 3D game engine. There were a lot of steps and a complicated interface to master.  My overall impression of the user experience for the tech comes down to one word: frustration.  Just when I thought I was gaining ground in one area, there were three more places where I had to get personal help from co-workers (certified “techies”) for how to fill out certain fields, or set up particular elements.

Project Outcomes

The results are fair, at best. As you can see from the video below (you are getting the full monologue, lucky you!), there is no way to get intelligible lip sync from the 3D character at the speed I am talking in the video.

Real-time Facial Capture onto 3D Character – Kate Monologue – Gremlins, 1984 from Chantel Benson on Vimeo.

Summary

What I learned about this technology is that the capabilities are there, but to bring them to the consumer level there needs to be a focus on user interface optimization. Consumers today expect minimalistic product design and having a calibration menu that is 30+ fields deep is a recipe for user failure.

Three main take-aways:

  • Needs lip sync to be usable
  • User interface is a major hurdle
  • Didn’t work but this tech is coming

My most important realization from doing this work is that no matter what the future holds for this kind of tech, the hardest part is always going to be accuracy in capturing real human emotion. The tech that I used in this test was not up to the challenge. I’m excited for when this kind of technology progresses enough to be accurate and usable by consumers.  And then it will be more important than ever to know how to portray empathetic human characters.

Tags :