Defence in Depth for Machine Learning Pipelines
Applying the classic defence-in-depth principle to ML pipelines, from data ingestion through training to deployment and inference.
Defence in depth is one of the oldest principles in security: layer multiple independent controls so that no single point of failure compromises the whole system. It predates computing entirely, with roots in military fortification strategy. Applied to machine learning pipelines, it offers a structured approach to securing systems that are often treated as monolithic black boxes.
The challenge is that ML pipelines are not single applications. They are complex workflows spanning data collection, preprocessing, feature engineering, model training, evaluation, deployment, and inference. Each stage presents distinct attack surfaces. Each stage needs its own controls. And the connections between stages need protection too.
Understanding the ML Pipeline as an Attack Surface
Before applying defences, security teams need a clear picture of what they are defending. A typical ML pipeline includes:
- Data ingestion from internal databases, external APIs, web scraping, or user-submitted content.
- Data storage and management in data lakes, feature stores, and labelling platforms.
- Training infrastructure comprising compute clusters, experiment tracking systems, and hyperparameter tuning services.
- Model registry and versioning where trained models are stored, tagged, and promoted.
- Deployment infrastructure including serving platforms, API gateways, and edge deployment systems.
- Inference and monitoring covering real-time prediction serving, logging, and performance tracking.
Each stage involves different technologies, different access patterns, and different threat actors. A comprehensive security approach must address all of them, not just the model itself.
The AI Security Fundamentals pillar page establishes the foundational principles that underpin this layered approach, mapping classical security concepts to the specific realities of AI systems.
Layer 1: Data Ingestion and Preparation
Data is the foundation of every ML system, which makes it the first place adversaries look to cause harm. Data poisoning, where an attacker corrupts training data to influence model behaviour, is a well-documented threat that can be extraordinarily difficult to detect after the fact.
Controls at this layer
Provenance tracking. Every dataset should have a documented lineage: where it came from, when it was collected, what transformations it underwent, and who approved its use. This is not merely good practice; it is essential for both security and regulatory compliance.
Input validation and anomaly detection. Automated checks should flag statistical anomalies in incoming data, such as sudden distribution shifts, unexpected feature values, or unusual volume changes. These may indicate poisoning attempts or data quality issues that would degrade model performance.
Access controls on data sources. Restrict who and what can write to training data repositories. If an attacker can modify training data, they can influence every model trained on that data. Write access should be tightly controlled and audited.
Data integrity verification. Use cryptographic hashes or checksums to verify that datasets have not been tampered with between collection and training. Any modification to a dataset should trigger re-verification before use.
Layer 2: Training Infrastructure
The training phase is where raw data becomes a functional model. Compromising this stage allows an attacker to produce a model that appears to work correctly in testing but contains embedded vulnerabilities or backdoors.
Controls at this layer
Isolated training environments. Training workloads should run in segregated environments with no direct access to production systems. Network segmentation, dedicated compute resources, and strict egress controls prevent both accidental and malicious data leakage.
Experiment reproducibility. Every training run should be fully reproducible: fixed random seeds, version-pinned dependencies, recorded hyperparameters, and immutable snapshots of training data. If a model cannot be reproduced, it cannot be audited.
Dependency management. ML frameworks, libraries, and pre-trained models all represent supply chain risks. Pin specific versions, verify checksums, and monitor for known vulnerabilities in all dependencies. The same rigour applied to software supply chain security applies here.
Compute access controls. Training clusters represent significant computational resources that attackers may target for cryptomining or other abuse. Multi-factor authentication, role-based access, and session monitoring should be standard.
Layer 3: Model Registry and Versioning
The model registry serves as the bridge between training and deployment. Compromising it allows an attacker to substitute a malicious model for a legitimate one, potentially without detection.
Controls at this layer
Model signing. Cryptographically sign trained models before storing them in the registry. Verify signatures before any deployment. This ensures that only models produced by authorised training pipelines can reach production.
Immutable versioning. Once a model version is registered, it should not be modifiable. Any change produces a new version. This creates an audit trail and prevents silent replacement of models.
Approval workflows. Promotion from staging to production should require explicit approval from designated personnel, supported by documented evaluation results and security review outcomes.
Vulnerability scanning of model artefacts. Model files can contain serialised code (particularly in formats like Python pickle). Scan model artefacts for embedded code execution risks before deployment.
Layer 4: Deployment and Serving
Deployment is where the model meets the real world. The serving infrastructure must protect against both direct attacks on the model and exploitation of the surrounding systems.
Controls at this layer
API security. Model endpoints should be protected by authentication, rate limiting, and input validation. Treat model APIs with the same security rigour as any other production API. Monitor for unusual query patterns that might indicate model extraction attempts.
Network segmentation. Model serving infrastructure should sit behind appropriate network controls. Internal models should not be accessible from the public internet without explicit justification and compensating controls.
Resource limits. Set memory, CPU, and request-size limits on model serving infrastructure. This prevents denial-of-service attacks through adversarially crafted inputs designed to maximise computational cost.
Canary deployments and rollback. Deploy new model versions gradually, monitoring for behavioural anomalies. Maintain the ability to roll back to a previous version quickly if issues are detected. Automated rollback triggers based on performance thresholds add an additional safety net.
Layer 5: Inference and Monitoring
Once a model is live, continuous monitoring is the final defensive layer. It serves as both a detective control and an early warning system.
Controls at this layer
Output monitoring. Track the distribution of model outputs over time. Significant shifts may indicate data drift, model degradation, or adversarial manipulation. Set alerting thresholds and investigate anomalies promptly.
Input logging and analysis. Log inference requests (with appropriate privacy controls) to enable forensic analysis and detection of adversarial probing. Look for patterns such as systematic exploration of decision boundaries.
Performance monitoring. Track accuracy, latency, and error rates continuously. Degradation in these metrics may indicate issues ranging from infrastructure problems to active attacks.
Feedback loop security. If the model learns from production data or user feedback, those feedback channels become attack vectors. Apply the same data integrity controls from Layer 1 to any production feedback mechanisms.
Cross-Cutting Concerns
Several security requirements span all layers and deserve explicit attention.
Identity and access management. Every component in the pipeline should enforce authentication and authorisation. Service-to-service communication should use mutual TLS or equivalent mechanisms. Human access should follow the principle of least privilege with regular access reviews.
Encryption. Data should be encrypted at rest and in transit throughout the pipeline. This includes training data, model weights, inference inputs, and outputs. Key management should follow established practices.
Audit logging. Every significant action across the pipeline should be logged to a centralised, tamper-evident logging system. Logs should capture who did what, when, and from where.
Incident response. The organisation’s incident response plan should include scenarios specific to ML pipeline compromise: data poisoning detection, model replacement, inference manipulation, and training infrastructure breach. The AI Security Roadmap outlines how to build these capabilities systematically.
Getting Started
Implementing defence in depth across an ML pipeline is not an all-or-nothing proposition. A pragmatic approach starts with the highest-risk areas and builds outward.
- Map the pipeline. Document every stage, component, data flow, and access point. Visibility comes first.
- Identify crown jewels. Which models, datasets, and systems would cause the most damage if compromised? Prioritise these.
- Assess current controls. For each pipeline stage, catalogue existing security measures and identify gaps.
- Address the most critical gaps. Focus on controls that protect against the most likely and impactful attack scenarios.
- Automate where possible. Manual security processes do not scale with ML pipeline velocity. Invest in automated scanning, monitoring, and enforcement.
Defence in depth works because it assumes failure. No individual control is perfect, and no pipeline stage is immune to attack. By layering independent defences across every stage, organisations build resilience that degrades gracefully rather than failing catastrophically. The principle is centuries old. Its application to machine learning is new, but the logic is exactly the same.