AI

For all of the promise that artificial intelligence represents, a successful AI initiative still requires all of the right pieces to come together.

AI capabilities are advancing rapidly, but the results are mixed. While chatbots and digital assistants are improving generally, the results can be laughable, perplexing and perhaps even unsettling.

Google’s recent demonstration of Duplex, its natural language technology that completes tasks over the phone, is noteworthy. Whether you love it or hate it, two things are true: It doesn’t sound like your grandfather’s AI; the use case matters.

One of the striking characteristics of the demo, assuming it actually was a demo and not a fake, as some publications have suggested, is the use of filler language in the digital assistant’s speech such as “um” and uh” that make it sound human. Even more impressive, (again, assuming the demo is real), is the fact that Duplex reasons adeptly on-the-fly despite the ambiguous, if not confusing, responses provided by a restaurant hostess on the other end of the line.

Of course, the use case is narrow. In the demo, Duplex is simply making a hair appointment and attempting to make a restaurant reservation. In the May 8 Google Duplex blog introducing the technology, Yaniv Leviathan, principal engineer and Yossi Matias, VP of Engineering explain: “One of the key research insights was to constrain Duplex to closed domains, which are narrow enough to explore extensively. Duplex can only carry out natural conversations after being deeply trained in such domains. It cannot carry out general conversations.”

A common misconception is that there’s a general AI that works for everything. Just point it at raw data and magic happens.

“You can’t plug in an AI tool and it works [because it requires] so much manual tweaking and training. It’s very far away from being plug-and-play in terms of the human side of things,” said Jeremy Warren, CTO of Vivint Smart Home and former CTO of the U.S. Department of Justice. “The success of these systems is driven by dark arts, expertise and fundamentally on data, and these things do not travel well.”

Data availability and quality matter

AI needs training data to learn and improve. Warren said that if someone has mediocre models, processing performance, and machine learning experts, but the data is amazing, the end solution will be very good. Conversely, if they have the world’s best models, processing performance, and machine learning experts but poor data, the result will not be good.

“It’s all in the data, that’s the number one thing to understand, and the feedback loops on truth,” said Warren. “You need to know in a real way what’s working and not working to do this well.”

Daniel Morris, director of product management at real estate company Keller Williams agrees. He and his team have created Kelle, a virtual assistant designed for Keller Williams’ real estate agents that’s available as iPhone and Android apps. Like Alexa, Kelle has been built as a platform so skills can be added to it. For example, Kelle can check calendars and help facilitate referrals between agents.

“We’re using technology embedded in the devices, but we have to do modifications and manipulations to get things right,” said Morris. “Context and meaning are super important.”

One challenge Morris and his team run into as they add new skills and capabilities is handling longtail queries, such as for lead management, lead nurturing, real estate listings, and Keller Williams’ training events. Agents can also ask Kelle for the definitions of terms that are used in the real estate industry or terms that have specific meaning at Keller Williams.

Expectations are or are not managed well

Part of the problem with technology commercialization, including the commercialization of AI, is the age-old problem of over-promising and under-delivering. Vendors solving different types of problems claim that AI is radically improving everything from drug discovery to fraud prevention, which it can, but the implementations and their results can vary considerably, even among vendors focused on the same problem.

“A lot of the people who are really doing this well have access and control over a lot of first-party data,” said Skipper Seabold, director of decision sciences at decision science advisory firm Civis Analytics. “The second thing to note is it’s a really hard problem. What you need to do to deliver a successful AI product is to deliver a system, because you’re delivering software at the end. You need a cross-functional team that’s a mix of researchers and product people.”

Data scientists are often criticized for doing work that’s too academic in nature. Researchers are paid to test the validity of ideas. However, commercial forms of AI ultimately need to deliver value that either feeds bottom line directly, in terms of revenue, cost savings and ultimately profitability or indirectly, such as through data collection, usage and, potentially, the sale of that information to third parties. Either way, it’s important to set end user expectations appropriately.

“You can’t just train AI on raw data and it just works, that’s where things go wrong,” said Seabold. “In a lot of these projects you see the ability for human interaction. They give you an example of how it can work and say there’s more work to be done, including more field testing.”

Decision-making capabilities vary

Data quality affects AI decision-making. If the data is dirty, the results may be spurious. If it’s biased, that bias will likely be emphasized.

“Sometimes you get bad decisions because there are no ethics,” said Seabold. “Also, the decisions a machine makes may not be the same as a human would make. You may get biased outcomes or outcomes you can’t explain.”

Clearly, it’s important to understand what the cause of the bias is and correct for it.

Understanding machine rendered decisions can be difficult, if not impossible, when a black box is involved. Also, human brains and mechanical brains operate differently. An example of that was the Facebook AI Research Lab chatbots that created their own language, which the human researchers were not able to understand. Not surprisingly, the experiment was shut down.

“This idea of general AI is what captures people’s imaginations, but it’s not what’s going on,” said Seabold. “What’s going on in the industry is solving an engineering problem using calculus and algebra.”

Humans are also necessary. For example, when Vivint Smart Homes wants to train a doorbell camera to recognize humans or a person wearing a delivery uniform, it hires people to review video footage and assign labels to what they see. “Data labelling is sometimes an intensely manual effort, but if you don’t do it right, then whatever problems you have in your training data will show up in your algorithms,” said Vivint’s Warren.

Bottom line

AI outcomes vary greatly based on a number of factors which include their scope, the data upon which they’re built, the techniques used, the expertise of the practitioners and whether expectations of the AI implementation are set appropriately. While progress is coming fast and furiously, the progress does not always transfer well from one use case to another or from one company to another because all things, including the availability and cleanliness of data, are not equal.