Six barriers to AI adoption - and what enterprises can do about them

While political leaders debate on Terminator-esque AI paradigms, enterprises are more concerned about their AI initiatives abruptly terminating: around half of AI initiatives fail between pilot and production. Through our conversations with practitioners and senior business buyers from some of the largest enterprises in the world, as well as a range of entrepreneurs looking to solve these issues, we identified 6 key challenges that enterprises encounter when driving AI initiatives, and potential mitigants.

The TL;DR

The Solution: Address these issues as early in the data lifecycle as possible, which starts with creating and collecting high-quality data. E.g. a data set that describes temperature of the an engine over time is only as good as the sensor used to record the temperature; if discoveries are made afterwards that suggest the readings were inaccurate, merely “cleaning” the data wouldn’t make the data reliable. This is where early stage companies such as Snowplow enable enterprises to create high-quality, purpose-built data sets for their AI models from the very outset.

The Problem 2.1: Enterprises are worried about LLMs inadvertently leaking confidential data, and falling afoul of GDPR.

The Problem 2.2: Enterprises are concerned about accidentally leveraging copyrighted information through GenAI solutions, and getting embroiled in lawsuits.

The Solution: Technological solutions range from Patronus AI’s EnterprisePII to help enterprises test whether their LLMs detect confidential information typically found in business documents (e.g. meeting notes, commercial contracts), to Lakera’s tool to prevent PII leakage. Non-technological solutions include working with vendors (e.g. Microsoft, Adobe) who have stated that they will assume the legal risks if their GenAI customers are sued for copyright infringement.

Tip for early stage companies: You don’t have to go down the route of offering indemnity – but it would help to make sure that your models are built on data that’s legally okay for you to use – and give your enterprise customers reassurance around the same.

The Problem: LLMs are hallucinating, and this is undermining trust amongst users. Particularly in healthcare, misleading information could have life-altering consequences, making LLM adoption slower in highly regulated industries.

The Solution: Currently enterprises are relying on Retrieval Augmented Generation or RAG, which involves augmenting an LLM’s knowledge with internal company data to make it more context-aware and give relevant answers, and “Chain of Thought” prompting (which breaks down a problem into a series of intermediate steps). Additionally, researchers are experimenting with new approaches, such as Autogen (we discuss it in detail later), which we believe could address the problem of hallucination.

The Problem: Enterprises struggle to get comfortable with the reliability of evaluation metrics (”how do you evaluate the evaluation metrics?”) and are afraid of putting too much faith in them (the “moral hazard” problem).

The Solution: Focus on domain-specific evaluation metrics that look at real world use cases- for instance, Patronus AI’s automated AI evaluation solutions can auto-generate novel adversarial testing sets at scale to find all the edge cases where an enterprise’s models fail.

The Problem: It is getting increasingly difficult for enterprises to estimate the costs of AI initiatives, especially the ballooning inference costs. It is also difficult to define benefits from revenue-generating AI initiatives (e.g. the tech division of a bank internally builds an AI model to help relationship managers identify the right financial products to cross-sell, but the head of commercial banking attributes revenues to the relationship manager “doing his job” rather than the AI tool.)

The Solution: Technological solutions range from TitanML’s Takeoff Inference Server (to reduce inference costs) to NannyML’s solution for measuring the business impact of AI models and tying the performance of the model to monetary or business outcomes (to establish RoI). Strategic solutions involve “Buy Now, Build Later”- where enterprises deploying AI in new ways first “Buy” AI solutions so they can experiment with non-critical use cases (and using less sensitive data) thus avoiding chunky upfront investments in the “Build” approach- and retaining the flexibility to Build once there is greater comfort around the AI solution.

The Problem: There is a widespread AI/ML skills shortage, and resistance to AI adoption given issues with safety and reliability.

The Solution: AI companies such as MindsDB are helping enterprises overcome the AI/ML skills shortage by helping software developers rapidly ship AI/ML products. On addressing problems around hallucination, evaluation, explainability, and demonstrating real value to end users, enterprises can lower the resistance of their employees towards adopting AI solutions more broadly.

Here’s more detail around what we learned:

Problem #1: Data quality issues

The Problem: Enterprises encounter data quality issues through the various stages such as data collection, transformation, storage, tracking and monitoring. As a result, many data scientists end up spending up to 80% of their time bringing data quality up to scratch. Even as myriad data-related solutions emerged, such as synthetic data (essentially artificially generated data to overcome issues around lack of data or data privacy concerns), they came with a new set of challenges. For instance, synthetic data could potentially lead to model collapse, where models forget the true underlying data distribution and give less diverse outputs. Given concerns that we may soon run out of high quality human generated data to train AI models, a number of senior enterprise buyers we spoke to believe that enterprises with access to human-generated data will be more likely to create high quality models.

The Solution: Enterprises should address these issues as early in the data lifecycle as possible, which starts with creating and collecting high-quality data. For instance, Snowplow’s Behavioral Data Platform (BDP) enables enterprises to create and operationalise rich, first-party customer behavioral data to fuel advanced data-driven use cases – directly from the company’s data warehouse or data lake in real-time. In this manner, enterprises can access high-quality, accurate, consistent customer behaviour data according to enterprise definitions and in a format that is suited to their AI models.

“Data is not like oil: you don’t get good data by mining bad data and then processing it into higher quality material — you have to deliberately create it, with the requisite quality, from scratch.”

YALI SASSOONCo-Founder & CPO at snowplow.io

Snowplow illustrates the “quality at source” point for customer behavioural data, but this is a common challenge across different data types and formats (e.g. video, image). Additionally, “quality at source” should not be one-off, but a continuous process. To illustrate: while most annotation tools treat data creation as a one-off activity at the beginning of each project, it is important to monitor and analyse the predictions of a model in production and continuously collect more data to improve the model over time- which is where Argilla’s data curation platform enables practitioners to iterate as much as needed. Similarly, other early stage companies such as YData are building automated data profiling, augmentation, cleaning and selection, in a continuous flow to improve training data and models performance. As such, superior data quality at source combined with continuous improvement is the way forward for enterprises to successfully adopt AI.

Problem #2: Data security and privacy

The Problem 2.1: Enterprises are worried about LLMs inadvertently leaking confidential data, and falling afoul of GDPR.

The Solution 2.1: When using GenAI solutions, there are chances that the LLM answers using confidential data that the user is not supposed to have access to. In order to address this issue, startups like Patronus AI have developed solutions such as EnterprisePII to help enterprises test whether their LLMs detect confidential information typically found in business documents like meeting notes, commercial contracts, marketing emails, performance reviews, and more. Typical PII detection models are based on Named Entity Recognition (NER), and only identify Personally Identifiable Information (PII) such as addresses, phone numbers, or information about individuals. These models fail to detect most business-sensitive information, such as revenue figures, customer accounts, salary details, project owners, and notes about strategy and commercial relationships; EnterprisePII aims to change that, in the process overcoming a business-critical risk that holds back enterprises from adopting LLMs.

The Problem 2.2: Enterprises are concerned about accidentally leveraging copyrighted information through GenAI solutions, and getting embroiled in lawsuits.

The Solution: Sometimes the solutions don’t have to be a new piece of technology. There are many operational/commercial things that technology vendors can do to encourage adoption of AI solutions. For instance, Google, Microsoft, IBM, OpenAI and Adobe have agreed to assume responsibility for the potential legal risks involved if their GenAI customers are challenged on copyright grounds. Some of the enterprises that we spoke to adopted these GenAI solutions partly because of the indemnification offered, which gives us early proof points on the efficacy of this strategy.

That said, the vendors themselves would need to be careful that they are not burdened by heavy financial losses from lawsuits. For vendors to offer such indemnification with confidence, they will need to maintain tighter control on the data used for training their GenAI models, which we view as a positive. For instance, Adobe Firefly was trained on Adobe Stock images, openly licensed content, and public domain content. Additionally, Adobe has developed a compensation model for Adobe Stock contributors whose content is used in the dataset to retrain Firefly models. On the other hand, IBM has published the sources of its data in a white paper that customers can review as well. For early stage companies, which likely would not be in a position to offer indemnification against legal risks, the focus would nevertheless be on maintaining tight control over the data used to train their models, which would reduce the risk of copyright infringement.

Something that could potentially be a solution: There is a lot of interest around Machine Unlearning, which aims to be the answer to the question of “how do you remove data used to train the model without reducing its accuracy and without re-training the model every time data is removed?” Against the data security and privacy context, it becomes even more important given regulations such as GDPR and CCPA uphold the “right to be forgotten” which means that an individual could ask for all data related to them to be completely deleted from a company’s systems- and this would include any effects from their data on a model. Researchers are continuing investigations into Machine Unlearning, and we’ll be eagerly following its progress.

Problem #3: Hallucination

The Problem: LLMs are hallucinating, where they fabricate information or invent facts in moments of uncertainty – and this is undermining trust amongst users. Particularly in healthcare, misleading information could have life-altering consequences, making LLM adoption slower in highly regulated industries.

The Solution: Currently enterprises are relying on Retrieval Augmented Generation or RAG (which involves augmenting an LLM’s knowledge with internal company data to make it more context-aware and give relevant answers) and “Chain of Thought” prompting (which breaks down a problem into a series of intermediate steps). In order to implement RAG, companies use vector embeddings. Vector embeddings are essentially lists of numbers that represent words or image pixels or any data objects. This representation makes it easier to search and retrieve information through semantic search or “similarity search,” where the concepts can be quantified by how close they are to each other as points in vector spaces (e.g. “walking” is to “walked” as “swimming” is to “swam”). This is where early stage companies such as Superlinked play a critical role, as its solutions turn data into vectors and improve the retrieval quality, by helping enterprises create better and more suitable vectors that are aligned with the requirements of their use-case, bringing in data from multiple sources instead of just a text or an image.

“It’s very difficult to control how a language model recalls facts that have been trained into it – this is where hallucinations come from. But there is a solution – think of the LLM as a summarisation and reasoning engine and make your data available to it through RAG, organised in a way that allows for quick, precise and high-quality recall – and that’s done by turning your data into the lingua franca of machine learning: The vector embeddings.”

DANIEL SVONAVACEO & Co-Founder at SUPERLINKED

Something that could potentially be a solution: On speaking to practitioners, we discovered that they are most excited by new research such as Autogen, which we believe could address the problem of hallucination. AutoGen is an early, experimental approach proposed by researchers at Microsoft, which allows developers to build LLM applications via multiple agents that can converse with each other to accomplish tasks. These Autogen agents are customisable, and they focus on specialised, narrow tasks. This is essentially like having a small team of experts rather than one generalist, which is a useful approach given: (1) LLMs such as ChatGPT show the ability to incorporate feedback, which means that LLM agents can converse with each other to seek or provide reasoning, validation etc; and (2) LLMs are able to better solve complex problems when the tasks are broken into simpler tasks, so assigning each Autogen agent a “role” (e.g. one agent who writes code, another who checks the code, another who executes the code) and so on would produce better results.

These agents are able to communicate dynamically as needed (as in, there is no pre-scripted pattern in which the communication flows)- for instance, Agent 2 may write a code for Agent 3 to execute, and Agent 3 may revert saying that a particular package is not installed, in which case Agent 2 comes back with a revised code and set of instructions (to install that particular package). The agents will communicate as needed to get the job done. Furthermore, Autogen lets a human participate in agent conversation via human-backed agents, which could solicit human inputs at certain stages of the conversation. We believe the structure of this solution could enable agents to serve as checks and balances for each other (e.g. one agent could act as a virtual adversarial checker to other agents), which could potentially reduce hallucinations and improve quality of output.

Problem #4: AI model evaluation and explainability

The Problem: Given the lack of standardisation around benchmarks and evaluation metrics, enterprises on their AI journeys are asking themselves what we call “meta questions” such as “how do you evaluate the evaluation metrics?” or “how do you benchmark the benchmarks?” On the one hand, enterprises are questioning the reliability of evaluation metrics and explainability solutions; on the other hand, enterprises are also concerned about “moral hazard.” Moral hazard is a concept in economics where protections in place encourage risky behaviour- e.g. when wearing a seat-belt became mandatory, the rate of accidents increased. In similar vein, trusting the AI model based on good performance on evaluation metrics could lead to unfavourable outcomes. As Anthropic illustrated in a blog post:

“BBQ scores bias on a range of -1 to 1, where 1 means significant stereotypical bias, 0 means no bias, and -1 means significant anti-stereotypical bias. After implementing BBQ, our results showed that some of our models were achieving a bias score of 0, which made us feel optimistic that we had made progress on reducing biased model outputs. When we shared our results internally, one of the main BBQ developers (who works at Anthropic) asked if we had checked a simple control to verify whether our models were answering questions at all. We found that they weren’t — our results were technically unbiased, but they were also completely useless. All evaluations are subject to the failure mode where you overinterpret the quantitative score and delude yourself into thinking that you have made progress when you haven’t.”

The Solution: While enterprises are currently using general benchmarks such as HELM or HuggingFace Leaderboard, we believe the focus will likely shift towards domain-specific evaluation metrics that look at real world use cases- for instance, Patronus AI’s automated AI evaluation solutions can auto-generate novel adversarial testing sets at scale to find all the edge cases where an enterprise’s models fail.

“So far, companies have tried to use academic benchmarks to assess language model performance, but academic benchmarks don’t capture the long right tail of diverse real world use cases. LLMs might score highly on grade school history questions and the LSAT, but enterprise leaders care about business use cases like financial document Q&A and customer service. This is why domain-specific evaluation is so important.”

ANAND KANNAPPANCEO & Co-Founder at PATRONUS AI

Problem #5: Costs and uncertain RoI

The Problem: The enterprises we spoke to also talked about difficulties in ascertaining the costs and RoI for AI initiatives. Firstly, enterprises often do not clearly define the business metrics that the AI solution is intended to have a positive impact on, and how the enterprise would measure them. This would determine which data points the enterprise should be collecting to measure progress, and this should be collected through the course of the AI initiative rather than collected towards the end in hindsight (which unfortunately many enterprises do). Secondly, outside of productivity benefits, enterprises struggle to outline additional benefits (e.g. revenue outcomes) from the AI solution implementation. The calculation of RoI is not just a mathematical computation; it is often a political one as well. To illustrate: the technology division within a bank may build an AI model which would help relationship managers to identify the right financial products to cross-sell to its customers, but business leaders (e.g. the head of commercial banking) would attribute the revenues to the relationship manager “doing his job” rather than the tool.

The Solution: Technological solutions range from TitanML’s Takeoff Inference Server (to reduce inference costs) to NannyML’s solution for measuring the business impact of AI models and tying the performance of the model to monetary or business outcomes (to establish RoI). Working collaboratively with the business heads and agreeing upon attribution when defining RoI of revenue-generating AI use cases would likely help overcome some of the political issues around RoI calculation. There are also other strategic solutions, such as “Buy Now, Build Later.”

“In practise, many enterprises miscalculate how difficult the execution for technology projects might be. This is why our preference for now would be to buy rather than build.”

CIO At a large European Bank

When looking to deploy AI in new ways, enterprises focus on careful experimentation, where one uses commercial AI solutions as a means of furthering the enterprise’s own understanding of these solutions and applying them to non-critical use cases (as well as using less sensitive data) to limit risk. “Buying” the solution requires lower upfront investment, and going forward the enterprise has the freedom to revisit its build vs buy decision once there is greater familiarity with the new AI solutions. Given we are in the hyper evolution phase of the AI cycle, and many enterprises are still in learning mode, enterprises are reluctant (as of now) to invest time and resources building their own model from scratch. That said, for enterprises that are further along their AI adoption journeys and operate in heavily regulated industries, there is greater openness to working with open source models (using their own data and hosted in their secure environment) for sensitive use cases- albeit tempered by concerns around security and licensing of open source models.

“In the rapidly evolving landscape of artificial intelligence, committing to a specific architecture like transformers for long-term development carries the inherent risk of obsolescence. The breakneck pace of innovation in this field could render today’s cutting-edge solutions outdated within a mere 12–18 months. To remain at the forefront of AI advancement, we must embrace adaptability such as investing in modular and composable architectures.”

ARUN NANDI, SR. DIRECTOR & HEAD OF DATA & ANALYTICS AT UNILEVER

Something that could potentially be a solution: The mounting inference costs (e.g. the cost of ChatGPT answering your queries) have prompted research into alternative model architectures, such as Retentive Networks (RetNet), which could potentially replace the Transformer architecture that underpins all the major models today from GPT-4 to Midjourney. A recent research paper by Microsoft and Tsinghua University titled “Retentive Network: A Successor to Transformer for Large Language Models” introduced RetNet, which essentially enables parallelism in training (which makes its performance comparable with Transformers) and recurrent representation in inferences (which lowers inference costs and latency significantly). Putting it simplistically, it combines the best of both worlds in Transformers and RNNs. There have been other proposed alternatives to transformers (such as Hyena), and only time will tell whether RetNet becomes a dominant architecture. Nevertheless, we keep an eye out for changes in model architecture, and other novel solutions to reduce inference cost.

Problem #6: Talent and culture

The Problem: There is a widespread AI/ML skills shortage, and resistance to AI adoption given issues with safety and reliability.

The Solution: Beyond stepping up recruitment efforts, some of the ways in which enterprises are addressing the skills gap is through increased training and development, reducing manual and repetitive tasks, and low-code/no-code tools. This is where we believe providers such as MindsDB are addressing a critical pain point for enterprises, as its platform equips virtually any developer to rapidly ship AI and machine learning applications — effectively transforming them into AI/ML Engineers. Therefore tools which augment the capabilities of in-house talent would rapidly find adoption within large enterprises. Additionally, enterprises can lower the resistance of their employees towards adopting AI solutions more broadly by: (1) assuaging fears around job losses by re-training and re-skilling employees as well as re-designing their roles to accommodate AI in their workflows as a tool that augments their capabilities rather than replacing them; and (2) building trust in AI solutions amongst business users by addressing problems around hallucination, evaluation and explainability (which we addressed earlier).

“Today, there are close to 30 million software developers around the world, but only less than five percent are proficient AI/ML engineers. However, the world is facing a new transformation where most software that you know today will need to be upgraded with an AI-centric approach. To accomplish this, every developer worldwide, regardless of their AI knowledge, should be capable of producing, managing and plugging AI models to existing software infrastructure.”

JORGE TORRESCo-Founder & CEO of MINDSDB

Final Thoughts

The AI world remains in flux; given the hyper evolution phase that we are in, enterprises are prioritising flexibility, adaptability and rapid time to value when it comes to choosing AI solutions. Although the current paradigm creates challenges for enterprises, we see tremendous opportunities for entrepreneurs building the next generation of AI companies – the key is to become a trusted partner during these turbulent times, by focusing on domain-specificity, ensuring safety and demonstrating value.

If you’re an enterprise struggling with any of these challenges, or a founder building a product that addresses these issues…

Get in touch with Advika

MMC was the first early-stage investor in Europe to publish unique research on AI in 2016. We have since spent eight years understanding, mapping and investing in AI companies across multiple sectors as AI has developed from frontier to the early mainstream and new techniques have emerged. This research-led approach has enabled us to build one of the largest AI portfolios in Europe.

Note: Snowplow, Superlinked and MindsDB are MMC portfolio companies.