6 key metrics every chatbot team should be measuring
Virtual agents are never finished. They can always learn something new whether you launched them yesterday or 5 years ago. Knowing where a virtual agent is performing well, and not so well, is the job of a dedicated VA analytics suite.
Based on a comprehensive survey of enterprise B2C chatbots, Wysdom has compiled the 6 key performance metrics you need to know, to correctly measure the effectiveness of your bot.
1. Bot Experience Score
Measuring a customer’s experience when they’re interacting with a chatbot can be a challenge. Explicit feedback is rare and is often and typically biased toward the negative, while surveys provide a very narrow view of actual engagement. This is for a couple of reasons: participation tends to be low, so you’re often working with a small sample size, and those that do participate aren’t always authentic in their responses (if they’re incentivized for example), thereby introducing unintended bias.
After many years of testing performance metric models, Wysdom has settled on a standard Bot Experience Score (BES) that can be used on any bot. This takes into account all customer conversations, to produce an unbiased score and a more accurate view of overall customer satisfaction. This score is purely measuring the experience and not the effectiveness of the bot.
The BES is a number that starts with a score of 100 and goes down in points every time there is a negative engagement signal within the bot. The negative signals used in the BES are:
- Bot repetition occured when the bot repeats itself for any reason during a conversation.
- Customer paraphrase occured when the customer uses a similar query twice or more in a conversation.
- Abandonment occured when the customer left the conversation mid-journey and did not reach a bot endpoint.
- Negative sentiment is detected using an AI-based sentiment model.
- Negative explicit feedback is received in the conversation.
- Profanity is present in the conversation.
- The customer used the word “agent” (or similar) more than once in a conversation. Note that using “agent” once and being directly escalated is not generally a bad experience.
The BES is based on an analysis of all conversations over a given period of time and reduces the score for negative experience signals that are common in all virtual agents. If a conversation has 1 negative signal it receives a score of 75, if 2 negative signals a score of 50, and if 3 signals or more a score of 0. All conversations are given a score and the average is used.
Using this formula across all conversations in a given period of time results in a very clear customer experience score. Providing the Bot Engagement Score by customer contact reason makes it actionable.
2. Bot Automation Score
The next most important metric for any bot program is how often the bot can satisfy the customer’s needs without the need for escalation to a live agent. We call this the Bot Automation Score (BAS).
The BAS is a binary metric that looks at whether the conversation was either fully automated or wasn’t. The BAS is not a measure of the experience itself, but rather how effective the bot is at completing tasks.
In our experience having analyzed the performance of dozens of bots, the most accurate measure of automation is derived using a formula that looks at negative signals.
The score starts with all conversations in a given period of time and is reduced based on the negative signals. The negative signals used in the BAS are:
- The customer did not reach a bot endpoint, which is one of the final steps in a bot journey.
- The customer escalated to a live agent for any reason.
- The customer submitted any type of explicit negative feedback.
- The bot recorded a false positive.
- The customer requested an agent using any “agent”-like word, but wasn’t escalated to an agent.
- “Bad Containment” occurred when the conversation was not escalated, but the topic was one that we know the bot is not effective at automating.
If a conversation has any negative signal from the list above it is considered not to be automated. Using this formula across all conversations in a given period of time results in a very clear measure of automation. This can also be used by contact reasons to deliver more actionable information.
In looking at the negative score, not only does it deliver a very conservative view of bot effectiveness, it quickly becomes obvious what actions can be taken to increase overall automation rates.
3. NLU (Natural-Language Understanding) Rate
The NLU rate is a common metric in the virtual agent industry. It is simply a measure of the rate that a classifier can match an utterance to a known intent at a given confidence level.
4. False Positive Rate
The false positive rate is a measure of the rate that an utterance is classified by the model incorrectly although the model gives it a high confidence level. This is a difficult rate to measure and relies on an independent parallel NLP model, however lower false positive rates typically mean that the natural language understanding (NLU) set up for a chatbot is of good quality.
5. Bot Repetition Rate
Bot repetition is used in the Bot Experience Score (BES) but is also a good independent measure for all bots. A virtual agent theoretically should never repeat itself but this still happens regularly and identifying it will lead to quick improvements.
6. Positive Feedback Rate
Negative feedback is given in almost all situations at a multiple of positive feedback. The positive feedback rate is the rate of positive feedback divided by the total amount of feedback (positive, negative, neutral) to get a more useful rate.
One key aspect of bot measurements (on any chatbot platform) is that they need to be universal and must work on any use case that your bot may tackle. Whether you are working in customer service, revenue generation, employee services or any other special use case, in text or voice, the key metrics must be comparable across all bots so you can benchmark many bots against each other.
Getting to the heart of bot performance
A good VA analytics suite will deliver many KPIs, but BES and BAS are still the 2 most important if you want to understand the overall quality of your bot. Many bot platforms are aligned with bot design and development but don’t have the level of analytics that can provide the true measure of bot performance. That’s where bot analytics software can provide the most accurate measure of performance. A mature chatbot platform will easily provide all the events (signals) required to produce the scores, and a bot analytics solution will provide deliver the clearest picture of bot quality.
Curious to know how your bot stacks up?
If you’re wondering how your bot stacks up, the BES and the BAS are two simple but important metrics that allow you to compare how your bot performs against others.
Wysdom can measure your bot scores
If you have any questions about measuring your virtual agent quality please reach out to Wysdom. All we do is make chatbots great and we’d love to help you do the same.