How Omniverse Wove a Real CEO — and His Toy Counterpart — Together With Stunning Demos at GTC
It could only happen in NVIDIA Omniverse — the company’s virtual world simulation and collaboration platform for 3D workflows.
And it happened during an interview with a virtual toy model of NVIDIA’s CEO, Jensen Huang.
“What are the greatest …” one of Toy Jensen’s creators asked, stumbling, then stopping before completing his scripted question.
Unfazed, the tiny Toy Jensen paused for a moment, considering the answer carefully.
“The greatest are those,” Toy Jensen replied, “who are kind to others.”
Leading-edge computer graphics, physics simulation, a live CEO, and a supporting cast of AI-powered avatars came together to make NVIDIA’s GTC keynote — delivered using Omniverse — possible.
Along the way, a little soul got into the mix, too.
The AI-driven comments, added to the keynote as a stinger, provided an unexpected peek at the depth of Omniverse’s technology.
“Omniverse is the hub in which all the various research domains converge and align and work in unison,” says Kevin Margo, a member of NVIDIA’s creative team who put the presentation together. “Omniverse facilitates the convergence of all of them.”
Toy Jensen’s ad-lib capped a presentation that seamlessly mixed a real CEO with virtual and real environments as Huang took viewers on a tour of how NVIDIA technologies are weaving AI, graphics and robotics together with humans in real and virtual worlds.
Real CEO, Digital Kitchen
While the CEO viewers saw was all real, the environment around him morphed as he spoke to support the story he was telling.
Viewers saw Huang deliver a keynote that seemed to begin, like so many during the global COVID pandemic, in Huang’s kitchen.
Then, with a flourish, Huang’s kitchen — modeled down to the screws holding its cabinets together — slid away from sight as Huang strolled toward a virtual recreation of Endeavor’s gleaming lobby.
“One of our goals is to find a way to elevate our keynote events,” Margo says. “We’re always looking for those special moments when we can do something novel and fantastical, and that showcase NVIDIA’s latest technological innovations.”
It was the start of a visual journey that would take Huang from that lobby to Shannon’s, a gathering spot inside Endeavor, through a holodeck, and a data center with stops inside a real robotics lab and the exterior of Endeavor.
Virtual environments such as Huang’s kitchen were created by a team using familiar tools supported by Omniverse such as Autodesk Maya and 3ds Max, and Adobe Substance Painter.
Omniverse served to connect them all in real-time — so each team member could see changes made by colleagues using different tools simultaneously, accelerating their work.
“That was critical,” Margo says.
The virtual and the real came together quickly once live filming began.
A small on-site video team recorded Huang’s speech in just four days, starting October 30, in a spare pair of conference rooms at NVIDIA’s Silicon Valley headquarters.
Omniverse allowed NVIDIA’s team to project the dynamic virtual environments their colleagues had created on a screen behind Huang.
As a result, the light spill onto Huang changed as the scene around him changed, better integrating him into the virtual environment.
And as Huang moved through the scene, or as the camera shifted, the environment changed around Huang.
“As the camera moves, the perspective and parallax of the world on the video wall responds accordingly,” Mago says.
And because Huang could see the environment projected on the screens around him, he was better able to navigate each scene.
At the Speed of Omniverse
All of this accelerated the work of NVIDIA’s production team, which had most of what they needed in-camera after each shot rather than adding elaborate digital sets in post-production.
As a result, the video team quickly created a presentation seamlessly blending a real CEO with virtual and real-world settings.
However, Omniverse was more than just a way to speed collaboration between creatives working with real and digital elements hustling to hit a deadline. It also served as the platform that knit the string of demos featured in the keynote together.
To help developers create intelligent, interactive agents with Omniverse that can see, speak, converse on a wide range of subjects and understand naturally spoken intent, Huang announced Omniverse Avatar.
Omniverse brings together a deep stack of technologies — from ray-tracing to recommender systems — that were mixed and matched throughout the keynote to drive a series of stunning demos.
In a demo that swiftly made headlines, Huang showed how “Project Tokkio” for Omniverse Avatar connects Metropolis computer vision, Riva speech AI, avatar animation and graphics into a real-time conversational AI robot — the Toy Jensen Omniverse Avatar.
The conversation between three of NVIDIA’s engineers and a tiny toy model of Huang was more than just a technological tour de force, demonstrating expert, natural Q&A.
It showed how photorealistic modeling of Toy Jensen and his environment — right down to the glint on Toy Jensen’s glasses as he moved his head — and NVIDIA’s Riva speech synthesis technology powered by the Megatron 530B large language model could support natural, fluid conversations.
To create the demo, NVIDIA’s creative team created the digital model in Maya Substance, and Omniverse did the rest.
“None of it was manual, you just load up the animation assets and talk to it,” he said.
Huang also showed a second demo of Project Tokkio, a customer-service avatar in a restaurant kiosk that was able to see, converse with and understand two customers.
Rather than relying on Megatron, however, this model relied on a model that integrated the restaurant’s menu, allowing the avatar to smoothly guide customers through their options.
That same technology stack can help humans talk to one another, too. Huang showed Project Maxine’s ability to add state-of-the-art video and audio features to virtual collaboration and video content creation applications.
A demo showed a woman speaking English on a video call in a noisy cafe, but she can be heard clearly without background noise. As she speaks, her words are transcribed and translated in real-time into French, German and Spanish.
Thanks to Omniverse, they’re spoken by an avatar able to engage in conversation with her same voice and intonation.
These demos were all possible because Omniverse, through Omniverse Avatar, unites advanced speed AI, computer vision, natural language understanding, recommendation engines, facial animation and graphics technologies.
Omniverse Avatar’s speech recognition is based on NVIDIA Riva, a software development kit that recognizes speech across multiple languages. Riva is also used to generate human-like speech responses using text-to-speech capabilities.
Omniverse Avatar’s natural language understanding is based on the Megatron 530B large language model that can recognize, understand and generate human language.
Megatron 530B is a pretrained model that can, with little or no additional training, complete sentences, answers questions involving a large domain of subjects. It can summarize long, complex stories, translate to other languages, and handle many domains that it is not trained specifically to do.
Omniverse Avatar’s recommendation engine is provided by NVIDIA Merlin, a framework that allows businesses to build deep learning recommender systems capable of handling large amounts of data to make smarter suggestions.
Its perception capabilities are enabled by NVIDIA Metropolis, a computer vision framework for video analytics.
And its avatar animation is powered by NVIDIA Video2Face and Audio2Face, 2D and 3D AI-driven facial animation and rendering technologies.
All of these technologies are composed into an application and processed in real-time using the NVIDIA Unified Compute Framework.
Packaged as scalable, customizable microservices, the skills can be securely deployed, managed and orchestrated across multiple locations by NVIDIA Fleet Command.
Using them, Huang was able to tell a sweeping story about how NVIDIA Omniverse is changing multitrillion-dollar industries.
All of these demos were built on Omniverse. And thanks to Omniverse, everything came together — a real CEO, real and virtual environments, and a string of demos made within Omniverse as well.