Last summer, my colleague Tony Vlismas warned us about conflating AI data with commodities, highlighting that data is a means to creating the commodity, not the commodity itself. That’s not to say you don’t need “good data.”
“Artificial intelligence is a bankable buzzword, but its use pales in comparison to the most recent hype-laden phrase: data. Data is a big part of how AI works in all of its implementations, and given data’s importance, many opinion pieces and quick-hit news stories try to simplify what exactly it is we mean when we say data, treating it as an obtuse object rather than the complicated and multifaceted element of a AI it actually is. This does a disservice to the varying types of data different AI implementations require for success.”
Many platforms and solutions in a myriad of industries boast about their data and how it can help you or your business, but it’s a blanket term for heaps of unqualified information.
How you qualify and use data is just as important as the data itself.
Not A Commodity
In “Data Is Not The New Oil,” authors Jocelyn Goldfein and Ivy Nguyen of Zetta Venture Partners remark on the dangers of basing investment decisions on quantity of data alone:
“In all the enthusiasm for big data, it’s easy to lose sight of the fact that all data is not created equal. Startups and large corporations alike boast about the volume of data they’ve amassed, ranging from terabytes of data to quantities surpassing all of the information contained in the Library of Congress. Quantity alone does not make a “data moat.”
They go on to define what a successful data moat entails, including the accessibility of the data, the time it takes to make it useful, its cost, uniqueness, breadth, and perishability (data durability).
“A truly defensible data moat doesn’t come from just amassing the largest volume of data. The best data moats are tied to a particular problem domain, in which unique, fresh, data compounds in value as it solves problems for customers.”
Bad Data = Useless Technology
In “If Your Data Is Bad, Your Machine Learning Tools Are Useless,” Thomas C. Redman too rails against the perils of poor data quality.
“To properly train a predictive model, historical data must meet exceptionally broad and high quality standards. First, the data must be right: It must be correct, properly labeled, de-deduped, and so forth. But you must also have the right data — lots of unbiased data, over the entire range of inputs for which one aims to develop the predictive model. Most data quality work focuses on one criterion or the other, but for machine learning, you must work on both simultaneously.”
According to Redman, a vast amount of today’s data fails to meet a standard of quality that makes it useful for application – not necessarily because the data is bad per se, but instead it wasn’t checked and cleansed to remove the consequences of expectations, poor calibration, and human bias and error: “Increasingly-complex problems demand not just more data, but more diverse, comprehensive data.”
Redman’s advice for a quality review program are comprehensive, and those curious should read the full article to start implementing them for their own organizations and efforts.
None of these recommendations are easy quick fixes, but those using data to inform their AI and machine learning efforts need to make sure their tools are fuelled by useful data. This is especially true in customer support and cognitive care, where users are often already approaching automated systems with negative connotations. There is little room for frustration or error.