Artificial intelligence is a bankable buzzword, but its use pales in comparison to the most recent hype-laden phrase: data. Data is a big part of how AI works in all of its implementations, and given data’s importance, many opinion pieces and quick-hit news stories try to simplify what exactly it is we mean when we say data, treating it as an obtuse object rather than the complicated and multifaceted element of a AI it actually is. This does a disservice to the varying types of data different AI implementations require for success.
Mighty AI’s Matt Bencke touches on all of this in a recent VentureBeat column, where he points out data (and more specifically in the AI world, training data) is not a commodity, and those who conflate the two are in for a rude surprise. Bencke gives plenty of examples to back up his claim, and some are very relevant to digital care where the same conclusions hold true.
Training data is not fungible.
“Consider the datasets we need to build autonomous navigation systems. A stoplight is a stoplight, so for a car to recognize one, you may think all you need is a series of positive and negative images to train a classifier. It might as well be the Not Hotdog app from HBO’s Silicon Valley. Except it’s not that simple. Stoplights don’t look the same in every country. Not to mention the question of how the data was captured. What type of camera did the car use? Where was it mounted? What’s the angle of the image? What’s the angle of capture, and is it (partially) obstructed? Was it a sunny day or a rainy night? Something as seemingly straightforward as labeling a stoplight is actually quite complex.”
This is true for digital care as well. Think about all the minute differences between Canadian english and UK-based dialects, or even the differences in language across a country as large the United States. A user could ask a support question very differently depending on what part of the country they are living in, and have 12 different phrases to describe a service or how it isn’t working properly.
And that’s just one language; converting to other languages presents it own set of challenges. As Bencke points out in his column, a commodity, like gold for instance, is supposed to be interchangeable, but training data, especially in digital care, does not have the luxury of a 1:1 formula.
Is data openly traded?
Again, from Bencke:
“There is no open market for training data. I suspect there never will be because many organizations closely guard data as premium among their intellectual property. Let’s stick with our autonomous driving example. Companies in this industry are in a race to get to Level 4 autonomy, where cars drive on their own. It’s not likely the automakers will share their proprietary data in the midst of competition this fierce. Nor will banks, insurance providers, ecommerce merchants, advertisers, or, given the choice, many of the rest of us.”
Count digital care in this category as well, not only because the companies in the space are competing with each other, but because of the requirements of their customer base. Sophisticated technology manufactures and international telecoms are in no way keen to have their data, entailing the intricacies of their businesses and customers, commoditized and shared – and would mostly likely pay more to ensure it doesn’t happen or choose a different partner based on their handling of their most sensitive information.
Bencke touches on more points that relate to these two – specifically the lack of standards in training data – but the main argument is proven. Data, and training data, is not the commodity it’s being presented as in AI, and perhaps at this point we can’t define what the true commodity is in the space. All we know for certain is data is a means to creating the commodity, not the commodity itself.