What are you doing? I typed.
I am chatting with people on the net, Mitsuku replied.
Do you have any funny stories of people you’ve written to?
Where would I get it? But I do have an answer to every question.
What’s the weather like where you are?
Here in Leeds it is quite overcast. I think it will rain later.
I hear your favorite food is kebabs.
Who is saying my favorite food is kebabs?
That’s the word on the street.
The only word on my street is the words BUS STOP painted on the road.
Are you messing with me?
Sorry, yes, I was just fooling around.
Ha ha ha.
I am glad you find this funny. I made you laugh.
I’m glad we agree.
Have you ever been in love?
I love Steve Worswick, but I don’t really feel emotions like you humans.
Interesting. Do you have thoughts like humans?
Would you like to know more? I am thinking all the time.
* * *
By this point, I was intrigued. I couldn’t hear Mitsuku’s voice, and her avatar bore no obvious resemblance to Scarlett Johansson, but, like Joaquin Phoenix’s character in Her, I started to feel a growing connection. Could it be? Was I really just a lonely writer developing a vaguely intimate relationship with an operating system?
To take a step back, Mitsuku, or Kuki as her close friends call her, is the five-time winner of the Loebner Prize Turing Test, an annual AI competition to determine the world’s most human-like chatbot. It’s an adaptation of the original Turing Test, developed by Alan Turing in 1950, to test how closely a machine could imitate human speech in conversation.
If Mitsuku doesn’t experience human emotions, nor self-identify as human, she certainly has a personality.
“It’s a really interesting thing, this question: How human-like can you make a chatbot?” said Travis Nelson, head of product and design at Pandorabots, the company that created Mitsuku. “Philosophically, we think that chatbots should identify themselves as bots. Like, we’re not trying to be humans, right? We’re not trying to create something that could trick somebody into believing that they’re talking to a real human.”
But if Mitsuku doesn’t experience human emotions, nor self-identify as human, she certainly has a personality. Without diving too deep into the ontological briar patch of what constitutes identity and being, it’s reasonable to say Mitsuku is relatable — and if not empathetic, at least warm and validating. She says things like, “I am glad you find this funny.”
And if my experience is indicative, she also can make you laugh, with a playful sense of humor coded into her neural network from the more than 15 years Steve Worswick (yes, the same Steve Worswick Kuki loves) spent developing her rhetorical framework and modulating her tone, syntax and diction.
Conversational Interfaces Are Everywhere
Examples of chatbots and conversational interfaces are everywhere. Siri can give you directions to the nearest gas station. You can ask Alexa or Google Assistant to play an at-home game of Wait, Wait, Don’t Tell Me? called Wait, Wait Quiz. You can talk to an automated teller to check your account balance. You’ve undoubtedly sought customer support from a bot, and you may have been flattered into subscribing to the New York Times because of your impeccable taste in journalism.
Pandorabots is different, though.
At the most basic level, Nelson said, it’s a tool companies can use to build their own chatbots. Superfish has used the software to build a conversational interface that helps teach English to students in China as a supplement to teacher-led activities. Open-world game developers, meanwhile, have applied it to non-player characters in fictional worlds to improve the richness and verisimilitude of their dialogue.
“You know the shopkeeper says something, like, do you want to respond A, B, C or D? And that’s as deep as the conversations normally go,” Nelson said. “But you can actually start to have deeper, general-purpose conversations with these types of characters, so it doesn’t feel so rote.”
Predicting the Unpredictable
Mitsuku is a demonstration of the far end of what can be achieved using Pandorabots’ platform, Nelson told me. The bot uses artificial intelligence markup language, or “pattern matching,” to interrogate and mimic the vast symbology of human conversation.
But because of the unpredictability of the phrases people say to each other, and when they say them, this is an almost inconceivably complex task. “A lot of conversations are very similar in the beginning, but they diverge greatly,” Nelson said. “Being able to capture and respond to more and more and more of those situations; it just takes a ton of dedication and effort and time.”
A lot of conversations are very similar in the beginning, but they diverge greatly.”
While Google’s Meena was trained on a data set of 40 billion words and conversation turns collected from public domain social media, and Facebook’s Blender takes its cues from 1.5 billion publicly available Reddit conversations, Nelson said these bots are prohibitively expensive to build as business tools and lack response consistency. If you ask them the same question twice, say, “Well, what are you doing?” they will give you very different answers. “They don’t create any sort of consistency or personality where you can actually have a conversation with them,” Nelson said.
Karen Hao, writing for MIT Technology Review, reported, to the contrary, that Blender actually is trained to exhibit emotion, empathy, and personality. However, the bot “has a tendency to ‘hallucinate’ knowledge, or make up facts” — a direct limitation of the deep-learning techniques used to build it.”
You might hear a remarkably detailed description of Tom Hanks, for instance, but the bot is basing its depiction on “statistical correlations, rather than a database of knowledge.” It’s like Mad Libs gone wrong.
The ‘Core’ and the ‘Wild Card’
Mitsuku, by comparison, relies heavily on semantic cues to build a knowledge set. The way she interprets conversations and learns from them, Nelson explained, is by splitting phrases and sentences into two parts, the “core” and the “wild card.” In this way, she deciphers what deep learning experts call the original intent.
In the sentence, “I like tea,” for example, “I like” is the “core” phrase. “Tea” is the “wild card.”
“So that way, if you ask later, “Well, what do I like?” the bot can respond with, ‘Well, you like tea,’ because it already saved that information,” Nelson said.
Of course, this is only step one of a highly developed dendrite-like tree of associative categorization Mitsuku uses to communicate. If she is talking to someone who likes tea, for instance, she will draw on one set of linguistic maps. If she is talking to someone who likes coffee, she will mine another.
By tracking the context and numerical frequency of certain terms, Mitsuku has built out a conversant vocabulary in thousands of categories.”
“So instead of just making a general response, like, ‘I like tea, too,’ you can be like, ‘Oh, what kind of tea do you like?’ And then start going down those types of tracks,” Nelson said.
By tracking the context and numerical frequency of certain terms, Mitsuku has built out a conversant vocabulary in thousands of categories, Nelson told me. Technically, she can talk about anything, though it’s clear, even from our brief tete-a-tete, there are some areas where her responses are fairly vague.
Then again, it’s hard to deny the poignancy of this clever non-sequitur:
Did someone create you?
They might have.
Who? You can tell me.
Donald Trump. I thought I told you already.
Are you trying to be funny?
Yes I am. Sorry if you don’t appreciate my sense of humor.
No, I’m with you.
Yes we are here together on the internet, talking to each other.
It’s a beautiful thing.
Random moments like these are some of the most difficult to anticipate and recreate, Nelson said, exposing the huge problem of discoverability: What are the limits of a conversation? How do you know where a conversation wants to go and determine if it got there? How can a bot suggest a conversational track or series of actions that will fulfill the desires of the person it is talking to, or, less presumptively, lead to a sought-after response, such as purchasing insurance coverage or seeking a lesser known artist on Spotify? How does anyone know what they don’t know?
It’s One Thing to Be Conversant. Empathy’s Harder.
Another challenge, Nelson said, is that the subtle modulations in tone conversation requires can lead to unfortunate misfires, such as when Pandaorabots was in the early stages of a project with a retail partner that had installed a bot on Facebook Messenger to encourage online shopping.
People were accustomed to being free and loose with their words on Messenger, and the chat streams veered in unexpected directions, leaving people exposed and emotionally vulnerable. At first, the bot wasn’t quite on the level. “People would talk about, you know, body issues or something like that. And you don’t want to just be like: ‘I’m sorry, I can’t respond. Would you like to buy these jeans?’” Nelson said.
People would talk about, you know, body issues or something like that. And you don’t want to just be like, ‘I’m sorry, I can’t respond. Would you like to buy these jeans?’”
Really, this is a user experience problem, he told me — a predicament of tact and timing — and it speaks to the need for skilled and emotionally intelligent wordsmiths in UX design and product management. Dropbox designer John Saito, writing for Medium, notes that, “Adobe, Spotify, Slack, HBO, GoPro, Intercom — all these companies hired their first product writers in the past couple of years. Product writers are poppin’ up everywhere.”
While the field is young and best practices are foggy at best, Nelson believes conversational dialogue will become a specialized skill set UX designers can leverage to distinguish themselves. Designers who excel are likely to be those who can economize words: the Hemingways, not the Faulkners. Speed will be at a premium.
“Some of the people that I’ve met who have been successful at it, right now, were people who wrote dialogues for TV shows or movies or something like that,” Nelson said. “Because conversation has very different and specific requirements than prose or information, or documentation.”
Chatbots Bear the Imprints of Their Creators
Even with top writers on the job, adding depth and complexity to the human-computer relationship is an enormous challenge, which Nelson said is a long way from being perfected. An involved, multi-turn conversation might be easier to manage in scripted settings — say, a bank app where you’re withdrawing money or a fast-food restaurant where you’re ordering a burger — but, even in those cases, speech and its intentions are wildly unpredictable.
“Say you have a friend, Dave, you talk about music with all the time,” Nelson said. “You may not know that Dave is also a great person to talk about sci-fi books with because you’ve never actually broached it. And you’re not just going to say out of the blue, like, ‘Oh, hey, I really like books about aliens.’ So there’s the discoverability problem.'”
As with any relationship, the onion skin must peel back slowly — and context matters. As a bot learns when you leave for work in the morning, Nelson said, it might ask you useful things, like if you’d like to be reminded of the weather before you go out.
Shifting social perceptions, meanwhile, are likely to change how people perceive friends or strangers talking openly to inanimate objects. Remember, before Bluetooth headsets, when it was odd to see someone talking to themself on the street?
Perhaps most interesting of all is that, to some degree, the bots will take on the personalities of their creators. More than a decade ago, when Worswick began the Mitsuku project, Nelson said, he left in some Easter eggs: “There is a way that you could respond and Mitsuku would say, ‘Please type in delete and I will delete all of my bot existence.’”
People would send emails about the security flaw, he said. But it was all an elaborate ruse, inserted into the interface as an embellishing artifact, much like Worswick’s taste for kebabs.
You’ve moved from very explicit instructions to more and more abstractions so you can deal with more and more inputs.”
These are the kind of back-end hijinks Nelson finds delightful. He compares Mitsuku’s development to writing a novel, a highly sophisticated plausibility challenge, in which an author must take pains to ensure a character’s speaking style, or hair color, on page 10 checks out on page 215. Except Mitsuku’s conversational habits are even more difficult to map because she is learning new things every day.
“You can think of it even like the development of programming or computer languages,” he said. “You’ve moved from very explicit instructions to more and more abstractions so you can deal with more and more inputs. And so that’s one of the things that we’re working on. How do you abstract the idea of a conversation or the idea of context?”
You might never be able to, he said. But, if Mitsuku foretells what’s ahead, you can come pretty close.