How to create an automated, but a human voice?


How are you?

That’s a pretty simple, human way to communicate with another person. It provides several layers of positive reinforcement. It shows interest in your situation. It challenges you to provide a response that, if given, may prompt additional interactions that provide all sorts of value to you in your daily life. How many ways can you say that using different verbiage, inflections of voice, body language, all with the same intent? Pretty simple, right? How do you communicate with others? In person, email, phone, messenger, SMS, or just howling at 8:00 pm every night as we do here in Northern California in our COVID-19 infected world.

Over the last few years, technology has evolved to the point we can mimic these types of human interactions and use that linguistic science to define our interactions with all sorts of entities around us. The technical ability to build these conversational interfaces has moved faster than our ability to understand how to design them. Natural Language Processor (NLPs) can now do the heavy lifting, allowing for persons skilled in the science of human-machine linguistics to spend time on the interaction, not the bits and bytes usually found in a typical software design process.


What we really want to hear

Why is this important in today’s world? Well, we find ourselves confronted with the reality of a post-COVID world. Our work cultures are now more remote. People find themselves folding their personal tasks in with their business tasks, within a 24×7 time frame. We have so many channels to communicate: phones, SMS, messenger, Facebook, Instagram, Twitter, and TikTok. So many channels. For companies to compete in this environment, products and services need to be available at the time and convenience that people are, with support of finding and delivering what we need across these channels. Help desks, e-commerce services, banks, health services, and just about everything we do scream for a certain amount of automated “concierge”-like support. The amount of data out there about any given subject, product, or service is just too widespread for simple one-to-one human interaction to solve.

At the same time, we yearn for human contact. Why not the best of both worlds? The architecting of conversational interfaces (painfully and incorrectly referred to as chatbots) and human ecosystems, need not be mutually exclusive.

First, let’s make these interfaces more human not because we think they are, but because humans inherently react to interfaces that are more human. Second, let’s combine those 24/7 sources of information with a human ecosystem for additional support. Voilà! You have the future, both automated and human. These approaches provide for the cost savings that many businesses yearn for, while addressing the needs of consumers to live in a more humanistic environment, whether bot or person supported.


Designing conversational interfaces

In the last few years, I’ve been able to spend some time in the field of conversational interfaces. I’ve seen some projects go brilliantly from conception to production in as little as 6 weeks, iterating over time, learning in the same way humans do. I’ve also seen projects take twice that time and not get past design, sometimes because of regulatory cycles, sometimes because we still struggle to overcome the limitations of past design and build patterns, tools, and cultures. I’ve found these projects require a new set of skills and experts, part technologist, part subject matter content expert.

Let’s start simply, with an interface that provides deep knowledge of a subject matter, deals with humanistic dialog properly, understands some level of exceptions to what is being discussed, and does so in a manner that learns over time. A few simple definitions:


Option Description
Utterance A sentence of phrases spoken by a person or entity. “I’d like to find a ride to San Francisco”
Intent I need assistance to get to a destination.
Entity A keyword or words, when uttered, in context with an intent and entities, will provoke a response. In this case ride, or San Francisco.
Response The retort
Exceptions An “expected” deviation from the core knowledge of the conversation flow.
Small Talk Nonessential or unimportant utterance humans use all the time, or people sue when they converse with non-human entities. “What’s up with that?”, etc.

A few simple principles

  • Define your core knowledge. What is the AI an expert at? It doesn’t need to be everything. Document and categorize that knowledge.
  • Define your brand and the personification of it. Is the AI an educator? An accountant? Medical? Paternal? Maternal? Friendly? Happy? Sad? Or, all the above at different times?
  • Focus on the conversation flow, not the technology stack or the visual design.
  • Define the channels you’ll be communicating across and why. They all have different needs.
  • Using your core knowledge and categories defined, build out your intents and entities and map to the responses.
  • Manage the conversation outside of raw code or a platform. Think CMS.
  • Map multiple utterances to intents. This way the AI can learn over time by simply adding the hundreds of ways a person can “utter” the same intent.
  • Create multiple ways the AI can say the same thing. This is how humans speak.
  • Build out “small talk” patterns to allow the AI to respond to certain useless human phrases.
  • Define the integration points for other applications, people, and entities.
  • Don’t try to create a detailed flow of all the permutations of a conversation. It’s not possible.
  • Don’t get tripped up by old processes and tools.

To deliver the optimum experience, and to keep budget in mind, we may re-use similar, white-label like products we’ve built before, for similar clients and brands. We may have to abide by a client’s internal technology stacks. We may decide to use a SAS-like platforms such as Concerto or KoreAI. The platforms we decide to use should be decided in parallel with the design of the conversation flow.

The time is now

For some generation’s culture this may all seem daunting or to have some negative, science fiction-like overtones. To others, it might help set them free for more profitable or spiritual lives. As we evolve in our post-COVID-19 world, we need to evolve, as well.

We have here many opportunities to help companies streamline cost and provide better service. We can carry on conversations of value 24/7, with access to far more information than individual groups of people can provide. We can reach people across all channels, communicating where they are and when. We can augment those communications with real human interaction, but at a level where we can provide more value, not bogged down in the weeds of simple, yet dense, data. In the end, we can free ourselves up to create more profitable and meaningful projects. Those companies and people that can see that future will be those that succeed. This isn’t something to plan for down the road. It’s now.

In the end, I think we should say “How are you?” more often, even to those non-human entities we discuss, and be more human in the process.

Tom Tully
Managing Director at Capgemini
Tom Tully
Tom Tully
Managing Director at Capgemini

Tom has over 20 years experience in digital creative and technology solutions. Life Science clients include GSK, McKesson, JAZZ Pharmaceuticals. Novartis, Amgen, Alphaeon, Otsuka, Dignity Health, Emergent BioSolutions, and Stanford Health. Tom currently acts as a subject matter expert regarding Artificial Intelligence and Conversational interfaces.

IllustrationsPhillip Hereso
Cookies settings were saved successfully!