Supply Chain Security | 24 Jan, 2025

The Hidden Risks in Your AI Supply Chain

Pre-trained models, open-source libraries, and cloud APIs create an AI supply chain that most organisations barely understand.

Software supply chain security has rightly become a board-level concern in recent years. The SolarWinds and Log4j incidents demonstrated how deeply organisations depend on components they neither built nor fully understand. But AI introduces an entirely new dimension to supply chain risk, one that most organisations have barely begun to map.

Traditional software supply chains involve code libraries, frameworks, and infrastructure components. AI supply chains add pre-trained models, training datasets, annotation services, feature stores, and cloud-hosted inference APIs. Each of these introduces dependencies that carry risk, and the standard approaches to vendor management and third-party assurance are not equipped to address them.

The Anatomy of an AI Supply Chain

Before addressing risks, it is worth understanding just how extensive the AI supply chain has become.

Pre-Trained Models

Very few organisations train their AI models entirely from scratch. The cost, expertise, and data requirements make this impractical for most use cases. Instead, teams download pre-trained models from repositories like Hugging Face, use foundation models through provider APIs, or fine-tune open-source models for specific tasks.

Each pre-trained model carries the assumptions, biases, and potential vulnerabilities of its training process. If the original training data was poisoned, or if the model weights have been tampered with before distribution, every downstream user inherits that compromise.

Open-Source ML Libraries

The AI ecosystem runs on open-source software. PyTorch, TensorFlow, scikit-learn, LangChain, and hundreds of smaller libraries form the foundation of most AI development workflows. These libraries are maintained by communities of varying size, funding, and security maturity.

A vulnerability in a widely used ML library can affect thousands of organisations simultaneously. Unlike traditional software vulnerabilities, some ML library issues can compromise model integrity in ways that are extremely difficult to detect through standard testing.

Cloud AI Services and APIs

Many organisations consume AI capabilities through cloud provider APIs: speech recognition, natural language processing, computer vision, and increasingly, large language model inference. These services abstract away enormous complexity, but they also create dependencies on the provider’s security practices, data handling, and model governance.

When an organisation sends sensitive data to a third-party AI API for processing, questions of data residency, retention, and secondary use become critical supply chain concerns.

Training Data Sources

Training data is a supply chain component that has no direct equivalent in traditional software. Organisations acquire training data from internal sources, public datasets, commercial data providers, web scraping, and synthetic generation. Each source carries risks related to quality, provenance, licensing, and potential manipulation.

The Risks Most Organisations Miss

Model Provenance and Integrity

When a development team downloads a pre-trained model, they are trusting that the model is what it claims to be. But model repositories have limited verification mechanisms compared to software package registries. Tampered models can contain backdoors that activate only under specific conditions, making them extremely difficult to detect through standard evaluation.

The question every organisation should ask is straightforward: for each AI model in production, can you trace its origin, verify its integrity, and confirm that it has not been modified since acquisition? Most organisations cannot.

Transitive Dependencies

AI systems often have deep dependency chains. A model might depend on a specific version of a framework, which depends on numerical computing libraries, which depend on hardware-specific optimisation code. A vulnerability or compromise at any level propagates upward.

This problem is compounded by the practice of “model stacking,” where one model’s output feeds into another model’s input. A compromise in an upstream model can affect downstream systems in ways that are difficult to predict or detect.

Data Supply Chain Attacks

Training data poisoning is one of the most concerning AI-specific supply chain risks. An attacker who can influence the data used to train or fine-tune a model can embed biases, backdoors, or targeted misclassifications without ever accessing the model itself.

Public datasets are particularly vulnerable. Research has demonstrated that attackers can manipulate web-scraped datasets by modifying content at source URLs, affecting any model trained on subsequent crawls. Commercial data providers may have their own supply chains that introduce additional risk.

Vendor Lock-In as a Security Risk

Heavy dependence on a single AI provider creates concentration risk. If that provider experiences a security incident, changes their data handling practices, or discontinues a service, dependent organisations may have limited alternatives. The AI security roadmap highlights the importance of building resilience into AI deployments, and supply chain diversification is a key element of that resilience.

The AI Bill of Materials

Just as the software industry has embraced the Software Bill of Materials (SBOM) to improve supply chain transparency, the AI community needs an equivalent: an AI Bill of Materials (AI BOM).

An AI BOM should document, at minimum:

Model components. Every pre-trained model, foundation model, and fine-tuned variant in use, including version, source, and acquisition date.
Training data sources. The provenance, licensing, and quality assurance status of all training data, including any data acquired from third parties.
Software dependencies. All ML libraries, frameworks, and tools in the AI development and deployment pipeline, extending the traditional SBOM to cover AI-specific components.
Cloud services and APIs. Every external AI service consumed, including data flow details, contractual terms, and security assurance evidence.
Human services. Any third-party annotation, labelling, or data preparation services, including their own security and quality practices.

Creating an AI BOM is not a one-time exercise. It must be maintained as a living document, updated whenever components change, and integrated into the organisation’s broader supply chain risk management processes.

Practical Steps for Managing AI Supply Chain Risk

Establish Model Governance

Before any pre-trained model enters the development pipeline, it should pass through a governance process that evaluates its provenance, assesses known risks, and documents the decision to adopt it. This does not need to be bureaucratic. A lightweight review template that captures source, intended use, known limitations, and risk assessment can add significant value without slowing teams down.

Implement Integrity Verification

Where possible, verify the integrity of downloaded models and libraries using cryptographic hashes and signatures. Monitor model repositories for reports of tampering or malicious uploads. For critical models, consider independent evaluation against benchmark datasets to detect unexpected behaviour.

Diversify Strategically

Avoid single points of failure in the AI supply chain. This might mean qualifying multiple model providers for critical capabilities, maintaining the ability to retrain models using alternative data sources, or ensuring that cloud AI dependencies can be migrated if necessary.

Extend Vendor Assessments

Traditional third-party risk assessments focus on information security controls, business continuity, and regulatory compliance. For AI suppliers, extend these assessments to cover model training practices, data governance, bias testing, and incident response capabilities specific to AI systems.

Monitor Continuously

Supply chain risk is not static. New vulnerabilities are discovered in ML libraries regularly. Model repositories face ongoing threats. Data sources can be compromised at any time. Continuous monitoring of the AI supply chain, supported by threat intelligence specific to AI ecosystems, is essential.

Visibility Is the Foundation

The hidden nature of AI supply chain risk is itself the primary danger. Organisations that cannot see their AI dependencies cannot secure them. Building visibility through AI BOMs, model governance, and extended vendor assessments is the necessary first step.

The supply chain principles that have served cybersecurity well for decades (know your suppliers, verify integrity, diversify dependencies, monitor continuously) apply directly to AI. The challenge is extending them to cover components and risks that are genuinely new. Organisations that begin this work now, rather than waiting for the first major AI supply chain incident to force action, will be far better positioned to operate securely as AI adoption accelerates.