What has two eyes, one nose, one mouth, and 25 fingers? Artificial intelligence’s idea of a person, that’s what. While text-to-image generators like DALL-E, Midjourney, and Stable Diffusion can now dream up faces that look like they could be your neighbors, they’re somehow still ridiculously bad at spitting out images of hands.
The handicap is so amusing to some, that they’ve made pointed (ahem) jokes about it.
As it seems, AI can’t figure hands out because—although it has a general idea of what they look like—it doesn’t know how they really work.
A spokesperson for Stability AI, the platform behind Stable Diffusion, tells BuzzFeed News that part of this shortcoming stems from how hands don’t show up as prominently in datasets as faces, where you can easily pick out features like the eyes, nose, and mouth.
In photos, hands are typically smaller, and are often holding something in a variety of angles and grips. They’re far more complex than, say, a face that’s smiling right at the camera.
Sometimes a thumb might be hidden from view, or you only see the fist. AI doesn’t quite understand the connection, so it dreams up an average of hand positions that may end up looking deformed.
2D synthesizers may acknowledge the existence of a palm, fingers, and nails, but they don’t get the 3D nuances and geometry of a hand, Professor Peter Bentley, a computer scientist at the University College London, posits to the BBC’s Science Focus.
Artist Amelia Winger-Bearskin, who’s also an associate professor of AI and the arts at the University of Florida, shares with BuzzFeed News that in order for generative models to finally perfect hands, they’ll need to grasp “what it is to have a human body,” its relationship with hands, and their limitations.
Since these tools are fairly new, it will only be a matter of time before they catch up. Someday, one of them will be trained on a rich dataset of hands poised in all sorts of positions—and then everything will click into place, just… like… *snaps fingers* that.