Explainer

SOC 2 for AI companies: data, models, and new expectations

AI vendors face sharper data-handling scrutiny than conventional SaaS, and SOC 2 alone no longer answers every question buyers ask. Here is how to scope it and where ISO 42001 fits in.

Why AI vendors draw extra scrutiny

When your product trains on or processes data with models, buyers worry about flows that are harder to trace than in a conventional app: where data lands, whether it improves your models, and which third parties see it along the way. Security reviews for AI vendors now routinely add questions about prompt retention, training-data isolation, and third-party model usage that sit outside standard SOC 2 controls. The concern is grounded in real risk; research in 2025 highlighted how poisoning a tiny fraction of training data can compromise a model's behavior. A clean SOC 2 report establishes baseline hygiene, but AI buyers expect you to go further on data governance.

Handling training data versus customer data

The single most important distinction to get right is whether customer data is used to train or fine-tune your models, and your controls and contracts should state this unambiguously. Most enterprise AI buyers want a default of no training on their data, with strong logical isolation between tenants and clear, contractually guaranteed deletion mechanisms. Within SOC 2 scope you can evidence access controls, encryption, retention, and tenant separation for the data stores and pipelines that feed inference. Where training pipelines are involved, document the data lineage explicitly so an auditor and a buyer can both follow how information moves and where it stops.

Model and inference infrastructure

AI products add infrastructure that a traditional SaaS SOC 2 wasn't designed around: inference endpoints, vector stores, embedding indexes, and the GPU or hosted-model layer that serves predictions. Each is in scope for Security controls covering access management, change control, logging, and monitoring, and each is a place where one tenant's data could leak into another's results if isolation is weak. Buyers increasingly ask for proof of per-tenant segregation at the index or database level rather than a verbal assurance. Treat the retrieval and inference path as production infrastructure with the same rigor you apply to your primary application.

Sub-processors and LLM API providers

If you call a hosted model such as a major LLM API to serve your product, that provider is a sub-processor and belongs in your vendor inventory, your sub-processor list, and your buyers' due diligence. Configure those integrations to opt out of data retention and training where the provider offers it, and keep the contractual terms on file because reviewers will ask for them. Your own SOC 2 program should include vendor risk management over these providers, since their controls become part of your effective security posture. Be ready to explain which model providers touch customer data and under what terms, because this is now a standard line of questioning.

ISO 42001 alongside SOC 2

ISO/IEC 42001, published in late 2023 as the first international AI management system standard, has moved from novelty to recognizable buyer signal, with certification bodies operationalized and notable vendors certifying through 2025. It addresses what SOC 2 deliberately does not: AI-specific governance such as model risk management, responsible-use policies, and lifecycle oversight of AI systems. The two are complementary rather than competing; SOC 2 attests to security and data-handling controls while ISO 42001 demonstrates a governed approach to building and operating AI. Many AI companies will pursue SOC 2 first for procurement velocity, then layer ISO 42001 as enterprise and non-U.S. buyers begin asking specifically about AI governance.

How to scope your first AI-company audit

Start with SOC 2 Security as the foundation, since it is mandatory and answers the bulk of baseline questions, then add Confidentiality or Availability if contracts demand them. Resist scoping AI-governance promises into a SOC 2 report where they don't belong; instead, document your AI data practices in a trust center and a dedicated AI questionnaire response so buyers get clear answers fast. Keep the control set honest about your real architecture, including inference and retrieval components and every model sub-processor. Then plan the ISO 42001 conversation for the moment your pipeline shows buyers asking about AI governance specifically, rather than committing to two frameworks before either is needed.