Data is the key to develop a truly conversational chatbot

June 27, 2019

The conversational chatbot is likely to play a large part in the future of digital marketing. With continued growth in messaging applications like WhatsApp, WeChat and Facebook Messenger, there is clearly a consumer demand for machine-based communications. In a survey by Usabilla at the start of 2019, 54% of respondents said they would always choose a chatbot over a human customer service representative if it saved them 10 minutes. Consumers expect certain tasks not to require human intervention with 83% saying they would expect to check a bank balance without human interaction for example.

However, the challenge for businesses is that whilst chatbots fill the technology gap, 59% of consumers in a PWC survey felt that companies have lost touch with the human element of customer experience. Companies need to give customers an experience that fits their brand persona and goes beyond an efficient service. As far as the consumer is concerned, the chatbot experience need to feel like as if a real human is interacting with them.

Chatbots are evolving and becoming increasingly sophisticated in an attempt to simulate how humans converse. This is achieved by using applications of artificial intelligence (AI) such as machine learning (ML) and natural language processing (NLP). The algorithms built using these methods have the power to deliver a personalised experience by harnessing huge amounts of data from multiple sources, and thereby, uncovering behavioural patterns.

In theory, this is an amazing concept. A chatbot that uses data to know the user and presents the most applicable conversation for a personalised experience. What’s the catch? Machine learning needs data to operate and when launching a chatbot, generally, the data doesn’t exist yet. Take a practical example. A new user comes to your chatbot and says “Hi.” Your chatbot data only has the word “Hello” programmed as a greeting so it doesn’t know how it should respond. This is also the issue with NLP it that it needs to be able to comprehend what the user says before it can find the data for a response.

Data is key to a chatbot if you want it to be truly conversational. This article will explore how you can get that base data (aka training data) to train the chatbot, make sense of the data by efficient labelling and the broad methods to develop the chatbot.

Fundamentally, a chatbot turns raw data into a conversation. The two key bits of data that a chatbot needs to process are what people are saying to it and what it needs to respond. The easiest example to understand is a simple customer service chatbot. The bot will need an idea of the type of questions people are going to be asking it and then what answers it should be responding with to those questions. To work out those answers, it will use data from previous conversations, emails, telephone chat transcripts and documents etc. This is the training data.

Chatbots are only as good as the training data they are given. You can’t just launch a chatbot with no data and expect customers to start using it. In fact, of the tens and thousands of chatbot that get developed, most are poor quality because they have had no or very little training. Knowing how to train them and doing so isn’t something that happens overnight.

The most obvious place to get training data is from your own datasets. This provides rich information that is relevant to your customer base. However, there are instances where this is not possible. For example, start-ups that do not have any data to use yet but want to start testing how customer interacts with a chatbot. Other businesses might not have enough data but want to expand their knowledge base so the chatbot is more effective.

In these cases, companies often opt for open source training data.

The best chatbots need a massive amount of training data to be useful. Just think about the number of conversations you have every day and how each of those differs in context. In an ideal world, a chatbot would need to account for all those conversational variations. Even if you have a lot of your own data, there are a few open source datasets that are free to use, thus allowing you to add to your knowledge base.

Some examples of open source training data include:

There are hundreds of examples like these that can be incorporated into your training data to optimise it as best as possible. Some are even multilingual and industry specific to support certain use cases. As an example, in 2017, Microsoft released a dialogue dataset related to holiday bookings for public consumption which contained over 1,000 different conversations and responses.

Whilst open source training data is a great way of adding knowledge to your chatbot program, it does come with its limitations.The shortcomings of open source training data and overcoming them

Whilst open source data is certainly a great starting point for training a chatbot, it does have some notable shortcomings.

1. Generic

Firstly, it will be tough to find open source training data that is useful to your business. Every company is unique, and it is unlikely that your processes and features are the same as something that is available publicly.

It is important to recognise that open source data is used as a base and time needs to be spent adding variations that are more company specific. A suggestion would be to adopt the Wiki Q&A data, then tailor it to meet your needs over time.

2. Doesn’t represent your brand

Open source training data won’t always represent your brand personality, and this is often the key differentiator of a memorable bot from an inefficient one. When faced with a question that the chatbot do not understand, an open source training set often provide a fallback response such as “Sorry, I don’t understand”. Consider infusing some personality into such basic response to make your bot more memorable.

3. Languages are usually in English

Open source text data tend to be in English. This makes adopting it in regions where the chatbot are not native English speakers challenging. One might need a chat bot that is catered to people who are Bahasa Indonesia speakers for example. The sentence structure used in Bahasa Indonesia will be vastly different from English. This brings us to the next point.

4. Spelling, punctuation and grammar

As users of the chatbot may not be native English speakers, if they were to communicate with the chatbot in English, the expressions the users enter may be a direct translation from their native language. NLP tasks, like part of speech tagging and named entity recognition, which relies on “proper” grammar, will tend to fail. Moreover, the chatbot users may even use expressions containing a mix of different languages, short-forms and slangs.

It is also not surprising that many spelling mistakes will be committed. For example, if somebody types “elephont” it won’t be recognised as “elephant”. Users also chat with the chat bot from different devices: a spelling mistake made in typing on a physical keyboard, may be different from the type of spelling mistakes (and auto correct) made on the on-screen keyboard of a mobile device.

An open source dataset will not be able to capture these nuance that is particular to a specific region, as open source datasets are often bias in the way the data was collected. For example, some text data created/labelled from Amazon’s mechanical turk can only be used to good effect on chat bot users that fit the profile of the people allowed to perform the mechanical turk task.

Figure 1 Can you read this? Humans have the capability to understand the passage above that is made up of words with only the first and last letter of the word in its correct position. For computers, it is challenging to figure out what the correct word is, and hence to understand what was written. Source: https://www.ecenglish.com/learnenglish/lessons/can-you-read

5. Conversational threads

Open source training data may give you a set list of questions and answers. However, this does not match how real users are likely to type during a conversation. The human brain tends to jump between conversations and to be effective, your chatbot ideally needs to do the same. For example, the answer to one question might drive the customer to a totally different topic. Without being programmed to recognise uncompleted threads, the chatbot won’t know how to deal with these instances.

To overcome such shortcomings, a developer would need to ensure they design non-linear flows against the dataUnique data are valuable assets for a chatbot. It gives the chatbot a competitive edge and differentiates it from the competition. Here’s the conundrum though, enough data needs to be collected from real chatbot usage to create an effective chatbot, however, the chatbot has to be effective in the first place before people actually start to use it.

Whilst open source training data is useful as a starting point, you need to ensure your chatbot learns quickly. One method is by creating your own “chatbot”. Maluuba, a Microsoft company did exactly that. The method works by setting up two people in a chat environment, one as the user and the other acting as the computer. You might give them a list of common scenarios for your business. In the case of Maluuba, it is finding the best deal for booking a flight. Using text interactions, they created 1,369 different dialogues for questions around travel planning and formed a comprehensive training data set.

Included within this are what we call frames. In the shortcomings of chatbots, we spoke about how users switch between conversations continually. Maluuba created a new frame every time a switch in a conversation was noted. It was able to recall a previous part of the conversation and apply that memory to a follow-up question as opposed to getting confused. Maluuba was able to start accounting for error handling within the conversations.

It’s not only chat data that can be loaded into the bot. Explore other sources of data, such as emails, telephone calls, transactions or documents etc., collected from the company’s customer service. These data can give an added edge to any open source training data by providing a business context.

In most cases, human intervention is required to create labels for chatbot user intents. For example, somebody would label “Hi, Hello, Hey, Howdy, Hallo and Good morning” as Greetings so that the chatbot can deliver an appropriate response for each of those collectively and negate gaps in the data. Labelling can be a full-time job as new words need to be added into categories once picked up by a chatbot conversation. If somebody decided to use “Good afternoon”, it would need to be manually added to the greetings label for the chatbot to recognise it as a greeting in the future.

Warwick Analytics developed a “human-in-the-loop” method. In this method interactions are automatically classified and given a certainty score. When the score is low, a human is asked to clarify the label, adding that to the learning experience. Over time, the need for human intervention should be completely eradicated as the machine has enough fuzzy learning to be accurate.

Active learning is another similar method to “human-in-the-loop”. A critical part of machine learning algorithms is having labelled data but getting a good model can take thousands of data points. Active learning reduces the number of data points required to train a good model by prioritising the labelling work. The data point with the highest priority is assigned to a human expert for labelling. The main benefits are 2 folds: 1, the human experts can provide the most useful information instead of spending countless hours going through and labelling trivial data points. 2, the machine learning model used by the chatbot acquires greater classification power with less data.

The two main classes of models for developing a chatbot are retrieval-based models and generative models. In the retrieval-based model, given a user input, a predefined set of responses is returned. We would know this as natural language processing (NLP). On the other hand, a generative model does not rely on predefined response. It learns to respond using a machine learning methodology known as deep learning.

NLP models are less likely to make mistakes and have errors because a lot of their workings have been defined by the user. However, the responses are templated, and conversations appear unnatural. A generative bot is more vulnerable to errors but can adapt on its own to the demands and questions from the customer. The choice between which model you use is usually determined by the likely complexity of chats.

Figure 2 Performance of machine learning methods given the amount of data: An illustration. Deep learning models tend to be less accurate when the amount of training data is small (~ less than thousands of data point). Source: https://machinelearningmastery.com/improve-deep-learning-performance/

If the chat bot application is expected to operate in a closed-domain environment, where most of your customer questions are within a certain context, e.g. about the price of a specific product or business opening times, there isn’t much value in building an intricate generative model. You will be better served with replying expected questions with templated answers.

The deep learning model is good for conversational or human-like chatbots. Because these models can learn on the fly, customers can even banter with them in a way that they couldn’t with predefined models. The drawbacks of deep learning models are, however, lower performance than NLP models when there are not many training data (figure 2), and heavy computational resource requirements during training.

Having explored the pros and cons of retrieval based and generative based chatbots. companies will usually start with a more predefined retrieval model before they can get close to something which is truly conversational. Even commercial systems like Amazon Alexa still needs to have selected skills set up, which are in essence, predefined rules.