Wysdom shines in top NLU benchmarking study

Share this 

Share on facebook
Share on twitter
Share on linkedin

At Wysdom, two things never stop – experimentation and benchmarking.

Experimentation ensures that we continually challenge the way we do things and foster a playground of trials where we constantly develop, adapt, and adopt cutting-edge techniques that will allow enterprises to serve engaging interactions with their end customers using intelligent automation.

Benchmarking introduces rigour and discipline. As a team, we push hard to meet the KPIs that an experiment seeks to serve. After all, what gets measured is what gets optimized.

We found an NLU benchmarking test using data that is way out of our wheelhouse

A benchmarking exercise led by Nguyen Trong Canh that compares the leading NLU engines in the industry recently caught our attention. The exercise uses data aggregated from open data question-answer datasets in Ask Ubuntu, Stack Exchange and a German public transit chatbot to create 4 distinct corpus’ for testing. (The summary of the datasets can be found here.)

Given these datasets are distinctly different from the usual industries that Wysdom deals with, the Wysdom team geared up with excitement to complete the benchmarking exercise and compare our own NLU engine to the data. After all, recent enhancements to our multi-stage NLU pipeline allows us to use a combination of statistical approaches, boosting, and deep learning engines, and gives us the ability to automatically detect and trash garbage utterances, identify and respond to small talk, and more.

F1 scores: A measure of accuracy

Things like intent classification and entity extraction are critical components of a Natural Language Understanding (NLU) system in any bot platform, so ensuring accuracy is the most important goal.

An F-score is a measure of a test’s accuracy. It considers both the precision and the recall of the test to compute the score, where precision is the number of correct positive results divided by the number of all positive results returned by the classifier, and recall is the number of correct positive results divided by the number of all samples that should have been identified as positive. The F1 score is the average of precision and recall, where an F1 score reaches its best value at 1 (perfect precision and recall) and worst at 0.

Wysdom shines

As you can see from the results comparison, Wysdom offers one of the best NLU classification performances in the industry.

F1-scores for intent classification for each corpus:

While a good f1-score alone does not guarantee an effective bot, a poor f1-score definitely guarantees an ineffective one.

We should also note that the Wysdom Exchange, which provides pretrained models and data across specialized enterprise verticals such as telecommunications, banking, insurance and more, was not at play for this benchmarking exercise given the nature of the benchmarking data. The researchers compared the f1-scores, a very well acknowledged machine learning metric, in the field of information retrieval, which is commonly used to measure the performance of NLU systems.

Wysdom’s NLU is among the best in the world

This exercise provided us with validation that Wysdom’s NLU is right up there with the best and when combined with prebuilt, industry specific knowledge from the Wysdom Exchange, it outperforms even the biggest players in AI.

Interested in learning more about the Wysdom Exchange? Request a demo to see our Conversational AI in action.


0 replies on “Wysdom shines in top NLU benchmarking study”

Let’s supercharge your virtual agent together

With outstanding customer experience being the most important differentiator, make sure your virtual agent is the one leaving the competition in the dust.

Wysdom is a conversational AI optimization platform that enables higher performing, lower cost conversational virtual agents, continually learning and delivering high-quality customer experiences.

© Copyright Wysdom 2021. All rights reserved.

Artiom Kreimer

VP, Product & Analytics

Artiom has spent 10 years in software and mobile engineering, specializing in quality assurance and customer service. He has worked in testing and QA at both startups and in enterprises such as Clickfree, TELUS, and Freescale Semiconductor.

Michel Benitah

VP, Optimization & Delivery

Michel has 20 years of experience in leading the successful delivery of Conversational AI and Natural Language Customer Care solutions to some of the largest financial, telco, healthcare, utilities, and retail enterprises throughout North America. 


Prior to joining Wysdom, Michel spent 20 years at Nuance Communications, holding senior management and leadership positions within the enterprise division, most recently as director of the Toronto office and professional services team.

Frederic Lam

SVP, Sales

Fred brings in 25 years of international experience in sales and business development across North America, the Caribbean, Asia-Pacific, Europe, and the Middle-East.


Prior to Wysdom.AI, he held sales leadership positions at Oracle, Redknee, and Movius/Glenayre, successfully growing revenues in both large and small organizations. Fred has also been involved in the start-up community in the earlier stages of his career as an Investment Manager with SP Capital and was an alternate director on a few investee companies.

Karen Chan

Chief Engineering Officer, Co-Founder

With 20 years of experience in software and mobile, Karen has held senior technical roles at 5 startups, including Wysdom.AI, Clickfree, Mobile Diagnostix (HP), Teamatic, and Virtualthere.

Karthik Balakrishnan

Chief Technology Officer

Karthik has over a decade of hands-on, proven global expertise in emerging technologies and implementing intricate platforms and solutions for telecommunications and enterprise during his time at Amdocs, with senior positions in their India, Cyprus, America, and Canada offices.

Nitin Singhal

Chief Operating Officer

Nitin has over 20 years of success in global executions of business technology, driving operational efficiency and digital scalability for some of the world’s largest enterprise clients. 


Nitin spent 16 years at Redknee holding executive positions in Research and Development, Customer Operations, Partner Alliances, and most recently as COO.

Jeff Brunet​

President, Co-Founder

Jeff has more than 20 years of experience in the startup world, founding and growing 4 software companies: AracNet, Mobile Diagnostix (HP), ClickFree, and Wysdom.AI. 


His in-depth understanding of software development and the challenges in making new technologies successful in the startup world prove invaluable as he serves on the boards of XMG, SurfEasy (Opera), Locationary (Apple), Groupie, and as an advisor to Pushlife (Google), LogMeIn (IPO) and HP. 


Jeff holds 23 issued patents in the wireless and consumer electronics spaces and is the lead inventor on 30+ pending patents.

Ian Collins​

CEO, Co-Founder

Ian has founded and grown 6 technology companies over the past 20 years, primarily in the enterprise software space including Wyrex, Mobile Diagnostix (HP), Clickfree, and most recently Wysdom.AI. 


Ian invests, mentors, and sits on the boards of several startups in the Toronto area.