To talk more about this research, email us at firstname.lastname@example.org
6 key metrics every chatbot team should be measuring on any chatbot platform
Virtual agents in the enterprise can increase revenue, save money and improve customer experience – but only when done right. Not only is this difficult, but it is also unfortunately rare. There are so many virtual agents that just don’t deliver a high-quality experience.
Virtual agents, like all AI, are never finished. They can always learn something new whether you launched them yesterday or 5 years ago. Knowing where a virtual agent is performing well, and not so well, is the job of a dedicated VA analytics suite.
Wysdom’s analytics (Operations Center) works with any chatbot platform to analyze millions of conversations to determine the ROI on every chatbot improvement.
Ever wonder how your chatbot compares to the industry?
Wysdom has complied 6 key performance metrics to measure the experience and effectiveness of your bot program, based on a survey of large B2C chatbots. Below we have broken down these key metrics from bots that are delivering a great experience and are also effective.
What to measure
The primary role of a virtual agent analytics suite is to measure the effectiveness and experience of a virtual agent and identify places to improve. Bot Experience Score (BES) and Bot Automation Score (BAS) are 2 key metrics and are independent of one another and need to be looked at individually:
- BES: Measuring the customer experience will tell you how your customers are experiencing the bot, which is crucial for long term success. You can deliver an effective solution, but without a good experience customers will not come back.
- BAS: Measuring the effectiveness of the bot brings into focus all of the business objectives other than customer experience. Driving revenue and savings are the ultimate output of an effective bot.
There are many other useful metrics to track such as NLU, bot repetition, false positives and the positive feedback rate which are explained below further in more detail.
Experience managing bots
Wysdom manages more enterprise virtual agents than any other company, and through this experience we have found the best way to measure both experience and effectiveness. A good VA analytics suite will deliver many KPIs, but these are still the 2 most important if you want to understand the overall quality of your bot.
One key aspect of bot measurements (on any chatbot platform) is that they need to be universal and must work on any use case that your bot may tackle. Whether you are working in customer service, revenue generation, employee services or any other specialty use case, in text or voice, the key metrics must be comparable across all bots so you can benchmark many bots against each other.
Let’s look at 6 key measurements that will help you understand the performance of any bot.
Bot Experience Score (BES)
We have found customer experience to be the number one goal of a mature bot program. All businesses want to attract and retain customers and if the bot is delivering a bad experience then those goals will be put at risk.
Measuring the experience can be a challenge. Explicit feedback is rare and is always biased to the negative. After many years of testing formulas, Wysdom has settled on a standard Bot Experience Score that can be used on any bot. This is purely measuring the experience and not the effectiveness of the bot.
The score starts with all conversations in a given period of time and reduces the score for negative experience signals that are common in all virtual agents. The negative signals used in the BES are:
- Bot repetition: when the bot repeats itself for any reason during a conversation
- Customer paraphrase: when the customer uses a similar query twice or more in a conversation
- The conversation is abandoned by the customer mid-journey and did not reach a bot endpoint
- Negative sentiment is detected using an AI-based sentiment model
- Profanity is present in the conversation
- Negative explicit feedback is received in the conversation
- The customer used the word “agent” (or similar) more than once in a conversation. Note that using “agent” once and being directly escalated is not generally a bad experience.
If a conversation has 1 negative signal it receives a score of 75, if 2 negative signals a score of 50, and if 3 signals or more a score of 0. All conversations are given a score and the average is used.
Using this formula across all conversations in a given period of time results in a very clear customer experience score. Providing this BES by customer contact reason makes it actionable.
Bot Automation Score (BAS)
The next most important metric for any bot program is effectiveness – how often the bot can satisfy the customer’s needs without the need for human intervention. We call this the Bot Automation Score.
This is a binary metric, the conversation was either fully automated or it wasn’t. We are not attempting to measure the experience here, just how effective the bot is at completing tasks.
In this case, we have found a formula using negative signals is also the most accurate. This delivers a very conservative view of bot effectiveness but it also makes improvements clear.
The score starts with all conversations in a given period of time and is reduced based on the negative signals. The negative signals used in the BAS are:
- The customer did not reach a bot endpoint (one of the final steps in a bot journey)
- The customer escalated to a live agent for any reason
- Any type of explicit negative feedback was received
- A false positive was recorded
- The customer requested an agent using any “agent”-like word, but was not escalated
- “Bad Containment”: The conversation was not escalated but the topic was one that we know the bot is not effective at automating
If a conversation has any negative signal from the list above it is considered not to be automated. Using this formula across all conversations in a given period of time results in a very clear measure of automation. This can also be used by contact reasons to deliver more actionable information.
NLU (Natural-Language Understanding) Rate
The NLU rate is a common metric in the virtual agent industry. It is simply a measure of the rate at that a classifier can match an utterance to a known intent at a given confidence level.
False Positive Rate
The false positive rate is a measure of the rate that an utterance is classified by the model incorrectly although the model gives it a high confidence level. This is a difficult rate to measure and relies on an independent parallel NLP model.
Bot Repetition Rate
Bot repetition is used in the BES but is also a good independent measure for all bots. A virtual agent theoretically should never repeat itself but this still happens regularly and identifying it will lead to quick improvements.
Positive Feedback Rate
Negative feedback is given in almost all situations at a multiple of positive feedback. The positive feedback rate is the rate of positive feedback divided by the total amount of feedback (positive, negative, neutral) to get a more useful rate.
BES & BAS by topic
Anyone experienced in virtual agent management knows that actionable insights are key to continuous improvement. High-level metrics don’t provide the information necessary to improve a bot.
Wysdom has developed a system to provide BES and BAS by contact reason. All conversations between customer/bot and customer/live-agent are analyzed and clustered into groups of contact reasons or topics. There are typically between 100-300 topics depending on the use case of the bot.
The BES and BAS for each topic can then be determined with a trend over time to provide very clear actionable insights. A topic that has a declining BES is delivering a worse experience to customers today than it did in the past. A topic that has a declining BAS is delivering less revenue and/or savings than it did in the past. These metrics show the topics that need further investigation that can be conducted with a VA analytics tool.
What are the BES and BAS of your bot?
Are you wondering how your bots stack up? With these metrics, you will be able to compare your bots to others and see if they are making customers happy, driving revenue and savings, or if they need some attention.
All of the events (signals) in the BES and BAS are simple to collect in any mature chatbot platform and you can deliver your scores overall, and by contact reason within a few days giving you a clear picture of your bot quality and what needs to be improved.
Wysdom has developed its own proprietary analytics suite that can extract the conversation stream and events from any virtual agent platform. We have deployed this for many enterprise VA programs driving their long-term success.
Wysdom can measure your bot scores
If you have any questions about measuring your virtual agent quality please reach out to Wysdom. All we do is make chatbots great and we’d love to help you do the same.