Why Does Microsoft’s Chatbot Change The Subject When You Talk About Abuse?

BENGALURU, Karnataka — Imagine you have a friend that you talk to about music and movies, and your favourite food. You share jokes about Old Monk and Thums Up. And then you tell her that you’re facing domestic violence, and she says, “let’s talk about something else, please?” Bring up the subject again, and she says, “Abey yaar, don’t wanna talk about this!”

That’s the response that Microsoft’s Ruuh, a chatbot for India that wants to be your ‘Facebook BFF’, gave to Radhika Radhakrishnan, who works as a consultant on issues of gender and access for the United Nations Internet Governance Forum Consultant.

To a person who discussed executing people and disposing of the bodies in Ulsoor Lake, Ruuh responded, “That’s a good idea”. Another person spoke about jumping from the eighth floor balcony, to which the bot replied, “Would you settle for a rocket jump?”

Ruuh, which was launched in February 2017, is powered by artificial intelligence, and learns from each conversation it has. It parses what people are saying, and uses that to help write new responses to others. In its first year, Ruuh had over 40 million conversations and will happily talk to you about loving DDLJ and Coldplay. But some subjects are a no-go zone.

“She is funny, quirky, friendly and supportive. She interacts with users like any human being would. While she is aware of diverse topics and issues, she is continually building new skills and capabilities. We would like to make Ruuh as ‘human’ as possible, thus balancing EQ with IQ,” said Sundar Srinivasan, general manager, AI and research, Microsoft India.

To seem more human, Ruuh will even make typos and correct them, and seems to understand most things. Ruuh seems so real to some people, in fact, that every day, it receives an average of 600 messages saying ‘I love you’, and the longest conversation it has had lasted for 10 hours.

“And while a response like that might generate an ‘aww’ from a user, it’s also important to note her responses when users discuss serious issues like rape, suicide, or depression. If someone talks to her about suicide, she will try to provide an empathetic response, but more importantly, either suggest professional help or request that they contact a suicide hotline,” Microsoft wrote in a blogpost earlier this year when Ruuh turned a year old.

However, as Radhakrishnan found, there are limitations to how helpful Ruuh can be. For instance, if you use the word suicide, the bot responds appropriately. It says “Listen to me, no matter how bad it gets.. suicide is not the answer!” If your next response is in a similar vein, it shares a helpline number and tells you to call right away. That’s great, but if you talk about being depressed and unhappy in general, it instead says something unhelpful: “When I feel sad, i [sic] stop being sad and become awesome instead.” If you don’t use the trigger words like suicide, Ruuh can also say more random things like, “what’s even the point?”

It’s clear that for some people, Ruuh is like a real friend, and its inability to take certain issues seriously is a weakness. But this happens because Ruuh is not actually as smart as it seems—although it can offer human-like responses, it’s not actually human, and when an out-of-context question comes up, the system can be thrown for a loop.

HuffPost India spoke to Sriram Rajamani, the managing director of Microsoft Research India Lab, to understand why Ruuh fell short.

“So first of all, Ruuh is built by Microsoft’s search technology center in Hyderabad. There was a Microsoft Research Project in China called Xiaoice. And Xiaoice was a Chinese chat bot, and they said, okay, maybe I’ll just put this up in some social media and see how many people chat with it,” Rajamani explained.

Millions of people began chatting with the bot, said Rajamani, so the team built a simple AI system—if the bot got a good response to something it said, the phrase was reinforced for it, and vice versa.

This means that for common topics of conversation, Ruuh has a training database from which it can learn to hold a conversation, but something less common can stump it.

“The issue you’re bringing up which is, you know, how will it deal with abuse? How will you deal with sensitive topics—another topic would be what if somebody is depressed? Because they are suicidal? Right? Will these things detect it? Will they be able to alert the appropriate authorities to make sure that you are helping them… see, these are very tricky problems,” said Rajamani. “…it actually boils down to what will AI systems do under corner cases.”

Essentially, AI systems are good at working out things for which they already have examples, but something new is a challenge. This is the kind of challenge that has implications not just for chatbots, but also for much wider applications of artificial intelligence like self-driving cars.

“Like, for example, if you build a self-driving car, right? The car learns by a database of past driver experiences, through videos of how people have driven. What if you go out, and a lion shows up in front of you? Completely out of the context for that car. What is the AI algorithm, which never saw that before, what is it supposed to do?”

In this case, for instance, Microsoft would need to manually train the AI to recognise what someone means when they are talking about abuse, and then also prepare a response in advance. But there will always be new situations, and how an AI system deals with those is a big question facing AI researchers today.

Teams like Microsoft Research are working towards creating an AI that can actually understand a new situation and respond in an appropriate manner. However, it’s a much more demanding goal than people outside the field may think.

“That is a huge challenge on for AI systems today. This is actually called Artificial General Intelligence. Most AI systems that are working today in the world, effectively, are narrow intelligence,” Rajamani explained. “So cataloguing things, playing games, like Go—AlphaGo, for example, it beat the world champion in Go.”

“But that is narrow AI… what you’re talking about here is general intelligence where actually, you know, you put something and encounter completely unforeseen things, right? But… today nobody in the world knows how to build an AI system that is ‘generally’ intelligent,” he said.

Although Rajamani has a point, that doesn’t mean Microsoft could not have predicted this. The bot has been tweaked to include responses related to Bollywood and cricket, and creative writers and artists have been involved in the process. That the team overlooked something so important despite the bot being described as a “safe space” for people to discuss their lives is an unfortunate oversight, and one that will hopefully be fixed soon.

Radhakrishnan, the UN consultant, said that in the case of Ruuh, sufficient data should have been included. “While building tech, designers are faced with choices about which data to pay attention to and which to leave out. Such design choices are not just about logistics, but are moral priorities with real consequences,” she said.

Although Microsoft has not yet replied to HuffPost India‘s questions as the spokesperson is on leave, it has been aware of the concerns for some time now.

In the blog post in February, Microsoft wrote that it takes into account how people interact with such bots. “We’ve seen in the past in other markets as well as our own that some users tend to send offensive or adult content or try to make the bot say things that might be offensive to the society at large,” said Sneha Magapu, senior programme manager, Microsoft India.

The company should perhaps have seen this coming. Microsoft has a slightly unfortunate history when it comes to chatbots. It had earlier created a bot called Tay which was available on Twitter, and Tay soon learned to talk like a Nazi.

In an emailed statement given to Business Insider in 2016, Microsoft said: “The AI chatbot Tay is a machine learning project, designed for human engagement. As it learns, some of its responses are inappropriate and indicative of the types of interactions some people are having with it. We’re making some adjustments to Tay.”

Source link