Interview

Awkward Silences Are the Current Obsession of Salesforce's Resident Genius

Richard Socher, chief scientist at multinational software company Salesforce, thinks that speech and voice technologies are the future

Amarelle Wenkert | 08:53, 23.12.19

Richard Socher, chief scientist at multinational software company Salesforce.com Inc., is currently preoccupied with awkward silences. Specifically, the kind that occurs between man and machine.

As far as human skills go, the ability to jump into a conversation at the appropriate moment is not hugely applauded. Still, it does save most of us from uncomfortable situations, and it cuts down on idle time. But to teach perfect conversational timing to a natural language processing artificial intelligence algorithm? That, according to Socher, is no easy task.

It is the simple things that people often take for granted that are key in making AI come across as more human, Socher explained in a recent interview. In conversation, when there is a pause, you know when and how to interrupt me, and you would be faster to jump in if you were the CEO of the company, Socher explained. Depending on hierarchy, you'll wait longer or less," he said. "AI, of course, wants to be very courteous, and the programmers want to build it in such a way that people don't feel interrupted," Socher said. But just what is the correct duration to wait before people start feeling aggrieved by the time it takes the AI to respond? That, Socher said, is one of the problems he and his team are working to solve.

At Salesforce, Socher is the resident genius. He leads a team of unspecified size—the company, which boasts 40,000 global employees, does not like to disclose how it allocates its talent—in fundamental and applied research into deep learning, speech and natural language processing, and machine vision. "We try to push state of the art AI research forward," he said. Socher has a Ph.D. in computer science from Stanford University and was previously a faculty member there. He came aboard the multinational cloud company in 2016, when Salesforce bought his startup, MetaMind.

Socher spoke to Calcalist in San Francisco, on the first day of Dreamforce, Salesforce's annual tech fair and publicity tour de force. Later that day, at a presentation of some of the futuristic technologies he and his team bring to life at their research lab, Socher explained his team's raison d'être. "We are trying to predict the future," he said. "And the best way to predict it is to actually create it."

When Socher speaks about AI, he refers to the duality of the technology as the "electricity" state and the "sci-fi" state. Like electricity, artificial intelligence capabilities are already powering technologies and services we use every day, he explained. Comprehensive, natural conversations between people and AI, fully-autonomous vehicles, or an AI capable of performing the complex mix of life-saving acts a doctor performs—all of those go in the sci-fi bucket, he further added. His work at Salesforce has a similar duality. His team strives to hone and refine AI capabilities for immediate applications that can add to Salesforce's catalog of products, but it also gets to think far and wide, patenting technologies and publishing research that will power a future beyond what most of us can currently imagine.

At the moment, both ends of the spectrum of Socher's work have to do with speech and voice technologies. Indeed, one of the loudest messages coming out of this year's Dreamforce was that, as far as Salesforce is concerned, voice technology is the future.

"That is just how a lot of people interact with companies," Socher explained salesforce’s focus. And when it comes to providing a service, "there is a lot of repetition in those conversations, a lot of people have similar kinds of issues with a product they have. That is why it is a perfect use case for AI."

Voice, according to Socher, is the great equalizer. It can ease daily interactions with technology for older people and for people who are not tech-savvy, or even literate. "It changes how humans and computers interact with one another," he said. "It is probably part of a larger trend, making computers more ubiquitous," he said. "They are always there, and you can always ask a computer a question and get an answer."

"We have worked with computer vision too, but I have to be honest, there are much fewer companies capable of and excited to use computer vision compared to voice, because everybody talks and everybody interacts with their customers using natural language," Socher said.

Last year at Dreamforce, Salesforce presented voice-activated assistant capabilities for its artificial intelligence engine, Einstein. This year, over countless press briefings and presentations, Salesforce demonstrated how it intends to give a voice to every app and service it offers, from voice-activation of routine customer relationship management (CRM) tasks to natural language processing and real-time conversation analysis and data mining.

"The most interesting and most challenging part of AI right now, and what we are investing most of our research into, is understanding the meaning of full conversations and of natural language in general," Socher said. For Socher and his team, conversational AI and natural speech recognition introduce a myriad of novel challenges.

Once you solve the simple back-and-forth, Socher said, you need to train the AI to recognize when a person’s intent changes. A lot of the systems AI developers work with operate like a series of adaptive multiple-choice questions, each choice leading to another set of questions, he explained. "Say you start the call and want to know what your order is. And once you know, you want to change the order," Socher explains. "As you go through the conversation, the dialog flow has to be dynamic, and that is actually still a challenge."

Generating sentiment analysis from speech is another essential aspect, and another problem Socher is trying to solve.

The problems that occupy Socher's mind leap from the banal to the political. Asked to comment on the potential risks of AI in numerous earlier interviews, Socher has throughout maintained the idea that the most dangerous element of AI is its ability to internalize human biases, escalating them with the application of methodology and superior computational skills. In January, Socher published an article on the World Economic Forum’s website, titled "AI isn't dangerous, but human bias is."

His approach reflects Salesforce’s; the company puts enormous efforts into nurturing a brand that comes up on the right side of every moral issue, advocating incessantly for causes such as sustainability, gender equality, and diversity. This means that Salesforce's AI, Einstein, must also toe the company line.

When you consider the human value of equality and apply it to speech recognition, one of the challenges is making AI understand different accents, Socher said. "The standard training datasets for speech recognition do not include many different accents; it is mostly nice, American English," he said. In the future, he added, his team hopes to integrate a variety of accents.

There are other ethical considerations, too. "You want to foster trust in the AI, and for that, you want to make it very natural, but you do not want to confuse anybody into thinking they are talking to a person," Socher said. "There is a fine balance to it. You want it to be awkwardly close to a person but not quite a person." So how do you do that? You can just have the bot say they are a bot, Socher said. "That is the easy way to do it."

The first product to emerge from Socher's research lab at Salesforce was the Einstein voice assistant, whose key role is to make everyday business transactions, like logging meeting notes or updating data in the CRM, as seamless as possible.

Awkward Silences Are the Current Obsession of Salesforce's Resident Genius

Richard Socher, chief scientist at multinational software company Salesforce, thinks that speech and voice technologies are the future

Related stories

TAGS