Get all your news in one place.
100’s of premium titles.
One app.
Start reading
Tom’s Guide
Tom’s Guide
Technology
Christoph Schwaiger

Now ChatGPT has a body — startup puts OpenAI tech in a robot

Figure robot.

You may want to take a seat before reading this one, maybe you can also ask ChatGPT to hand you a glass of water too while you’re at it.

A relatively new AI startup just put OpenAI's artificial intelligence into the body of a robot and the result is pretty much what you’d expect it to be (minus the chaos and destruction if you’re more of a glass of water half-empty kind of person).  

This new tech is being developed by Figure, an AI robotics company worth $2.6 billion that’s partnered with OpenAI. Its latest innovation is Figure 01, a robot which the company demoed in an impressive video.

Images and speech are contextualized 

Judging solely by the acting skills it’s hard to tell who’s the real human, but we’re assuming that Figure 01 is the shiny-looking figure that’s doing all the work. 

Text prompts are already becoming a thing of the past as Figure 01 is capable of having a real-time voice conversation with you  — and it sounds exactly like conversations with the OpenAI ChatGPT Voice option in the app. 

Images are captured from onboard cameras to provide the robot with a visual context so that when the human opposite it mentions he’s hungry, Figure 01 identifies an apple within reach and hands it over. We go from “Can I have something to eat?” to apple delivered successfully to the human hand in around 10 seconds.

Holding a complex conversation

As with our discussions with ChatGPT, Figure 01 can handle equally complex conversations. It can describe what it’s seeing, plan future actions, reflect on its memory, and explain its reasoning verbally.

Since Figure 01 can put things into context maybe we should ask it which kind of scenario we should prepare for. If we're meddling with forbidden fruit or if we're on the cusp of a new era of science and technology.

Behind the scenes, the robot’s cameras are capturing images which are then contextualized. Microphones are picking up speech which is then transcribed to text and fed into a large multimodal model trained by OpenAI that’s capable of understanding both images and text.

So when Figure 01 was asked why it handed over the apple it promptly replied, “I gave you the apple because it’s the only edible item I could provide you with from the table.”

Humans have had an interesting history with apples. They led to quite some trouble in the Garden of Eden but then they inspired Isaac Newton to develop his gravitational theory.

Since Figure 01 can put things into context maybe we should ask it which kind of scenario we should prepare for. If we're meddling with forbidden fruit or if we're on the cusp of a new era of science and technology. 

More from Tom's Guide

Sign up to read this article
Read news from 100’s of titles, curated specifically for you.
Already a member? Sign in here
Related Stories
Top stories on inkl right now
One subscription that gives you access to news from hundreds of sites
Already a member? Sign in here
Our Picks
Fourteen days free
Download the app
One app. One membership.
100+ trusted global sources.