Intelligence Is Contextual: Designing the Enterprise Context Layer

The context window is not the context layer. In the enterprise, the layer is documents and data – governed, versioned, permissioned, audited.

May 26, 2026

Enterprise agents work on documents and data. Documents are often the source of truth; systems of record are representations. The context layer is the infrastructure that lets the agent reach that truth. The model’s window is only what the layer produced for this turn.

The window and the layer — the model receives a small per-turn context window assembled from the durable context layer (documents and data: contracts, policies, claim files, CRM records, ERP entries) with structure, versioning, governance, and audit intact

TL;DR

The industry talks about context engineering, context retrieval, and context platforms mostly without saying what the context is of.

In the enterprise, the answer is concrete: documents and data. But documents are special. Contracts, claims, policies, reports, regulatory filings – these are often the legal, contractual, audited truth, with structure, version, signature, identity. Core systems like CRM and ERP remain essential: they hold operational state and expose actions agents can use. But many line-of-business records are representations of document truth. When those representations disagree with the signed or operative document, the document is what the auditor, regulator, or court will ask to see.

The context layer for an enterprise agent is the layer that holds those documents and produces, per turn, what enters the model. The window the model sees is the output, not the layer itself. Optimizing the window without owning the layer is engineering the wrong thing.

Two definitions that get confused

There is a category mistake at the center of most current discussion of context for agents.

The context window is what the model sees on a given turn – the bytes that get serialized into its input. It is a per-turn artifact. It is bounded, it is transient, it is constructed each time the model is invoked.

The context layer is the infrastructure that produces that window. It is durable. It holds the data the agent can ever draw from. It applies the rules under which the data can be used. It assembles, ranks, scopes, and audits the slice that flows to the model.

Most public discourse uses “context” to mean the window. “Context engineering” usually means crafting the window. “Context retrieval” means filling the window with the right pieces. “Context platforms” mean tooling that helps you do this.

This is engineering one of two things and calling it the whole. The window is the output. The layer is the work. The window can only be as good as what the layer makes available.

In the enterprise, the context is documents and data

The other unspoken assumption in most “context” discourse is that the content is uniform – a sea of vectors, a corpus of paragraphs, a knowledge graph. That assumption holds for some agent products. It does not hold for the enterprise.

In the enterprise, the agent works on two kinds of material: documents and structured data.

The documents are the typed artifacts: contracts, insurance policies, claim files. Regulatory filings, loan agreements, statements of work. Vendor agreements, annual reports, audit trails. Medical records, employment contracts, procurement orders. Litigation memos, compliance attestations. These have structure (sections, clauses, schedules), version (drafts, executed copies, amendments), signature (who signed, when, with what authority), identity (which counterparty, which jurisdiction, which entity), and lifecycle (signed, in-force, expired, superseded).

The data is everything else the agent has to reason against: customer records in the CRM, line items in the ERP, transactions, calendar entries, support tickets, telemetry, account state. Structured rows in named systems, with their own schemas, their own access patterns, their own lifecycles.

The two are connected. The contract specifies the terms; the CRM record reflects the operational state under those terms. The signed policy is the legal artifact; the underwriting database is the running view of it. An enterprise agent rarely works on one and not the other – it reads the contract for what is allowed and the data for what is happening.

Both are typed, both are governed, both belong inside the context layer. Neither is a sea of vectors.

Documents are the source of truth

Here is the part that is easy to miss if you have only built consumer agents.

In the enterprise, the document is not a description of the truth. It is the truth. The system of record is a representation of the document, made for operational convenience, and it is routinely wrong.

The signed insurance policy beats the underwriting system row. If the field in the database says one thing and the executed policy says another, the policy wins.

The signed contract beats the CRM contract record. The signed contract is what a court will read.

The filed claim form beats the claims management system entry. The form is what the regulator will audit against.

The loan agreement beats the loan management system. The annual report beats the data extract. The signed medical consent beats the EHR checkbox.

None of this is a hot take. Anybody who has worked in a regulated industry will tell you the same thing, often with a sigh, often with a specific story about an audit that turned on exactly this point. The document is the law. The system is bookkeeping. When they disagree, the document wins, every time.

This has consequences for agents that go beyond what most current frameworks even try to address. The enterprise context layer ends up holding two things at once: the company’s knowledge of how it operates, and the documents that define its obligations. The knowledge and the law, basically. The engineering point is the simpler one: the window is only the output, the layer is the work.

The agent must look at the document, not its representation

If the document is the source of truth, an agent that consults the system of record without checking the document is operating on a derivative.

This is fine when the derivative is correct. It is a compliance event when it is not.

A real production agent in insurance, banking, healthcare, legal, or any regulated sector has to be able to read the document itself – not a summary of it, not a row extracted from it, not a vector chunk indexed off it. The document, with its structure intact, its signatures visible, its version known, its identity preserved.

That is a much harder requirement than “search the vector store”. It implies that the context layer must give the agent access to documents as documents, not as text-of-documents.

Take a claims agent.

The claims system says the policy is active. The underwriting database says the customer has a certain coverage level. The claim file says the loss happened on a specific date. All of that is useful, and all of it is structured data the agent can read in milliseconds.

But the coverage decision still depends on the executed policy, the endorsements, the exclusions, the effective dates, and sometimes an amendment attached three pages further down. If the agent only reads the claims system row, it is operating on the representation. If the row is wrong, the agent is wrong with confidence.

The context layer has to let the agent ask a different kind of question. Open the operative policy version, find the exclusion that applies to this loss type, check the endorsement history, and compare it to the claim date. That is not a top-k retrieval problem. It is document navigation under governance.

Why retrieval-as-context misses

Most of what is sold today as “the context layer” is some flavour of retrieval-augmented generation: a vector store, an embedding model, a top-k search, and a prompt template that pastes the results into the window.

That stack works, in narrow cases. It does not work as an enterprise context layer.

It loses structure. A PDF contract is not a flat sequence of paragraphs. It has parties, recitals, defined terms, sections, schedules, exhibits, signature blocks. Embed it into vectors and you get fragments. The structure that made the document legally meaningful disappears.

It loses version. The same contract may exist as several drafts and one executed version, with amendments. A vector index does not natively understand which one is the operative document right now.

It loses identity. Whose document is this? Signed by whom? Against which counterparty? In which jurisdiction? Embeddings collapse that into similarity.

It loses governance. Who is allowed to see this document? At what redaction level? With what audit obligation? Retrieval does not apply access control; it returns chunks.

These are not minor problems. They are the reasons “we built RAG over our document corpus” rarely turns into a defensible production system. The retrieval layer is a useful component inside a context layer. It is not the layer itself.

Documents must be prepared

A raw PDF is not a usable input for a model. A 200-slide deck is not a query. Ten M&A contracts will not fit in a context window. An Excel file with three sheets and forty pivots is not a paragraph.

The context layer cannot operate on raw content. The artifacts have to be prepared – extracted, parsed, structured, indexed, and stored in a form that supports retrieval by meaning rather than by byte offset. The context layer is, basically, the enterprise’s content made structured, addressable, and governable for agents. That preparation is significant work, and it happens ahead of time, not on the agent’s turn. You cannot transform a PDF into text on every call.

For each piece of content – a PDF contract, a PowerPoint deck, an Excel model, a scanned form, an image, a Word document – the layer needs:

Extraction and parsing. Word and PDF documents need layout-aware parsing. Scanned forms need OCR. Decks need slide-by-slide extraction with the speaker notes and table content preserved. Spreadsheets need sheet, range, and formula awareness, not just a CSV dump. Images need captioning and visual analysis. Tables, footnotes, headers, signatures kept as structural features, not flattened.
Section structure. The agent does not want “chunk 47 of 312”. It wants the indemnification section, Schedule A, the signature block, clause 4.2. The layer has to extract that structure and make it addressable.
Metadata extraction. Parties, dates, jurisdictions, amounts, defined terms – these are queryable properties, not bag-of-words content. The layer extracts them on ingest, not at runtime.
Embeddings, as one index among several. Where similarity matters – and it does, for some queries – the layer computes embeddings. But as one index alongside structural and metadata ones, not as the entire retrieval model.
Cross-references. This amendment modifies that contract. This claim references this policy. The layer captures these as edges, not as text mentions buried inside the body.

Once a document is prepared, the agent can address it semantically. “Open the indemnification clause of the operative version of the agreement with Counterparty X” becomes a structured query, not a vector search.

This matters because of the window budget. Even with everything prepared, you cannot fit a 200-slide pitch deck into a model’s context window. You can fit a set of ten executed M&A contracts – if they have been prepared, indexed, summarized at section level, and made surfaceable by structural address. The layer is what makes that possible. Without preparation, the agent is staring at a corpus it cannot enter.

What the context layer actually has to do

If you take seriously that the context layer is an infrastructure layer, not a crafting discipline, it has to do specific work.

Navigate. The agent has to be able to move through the document space – open a contract, follow a clause to its referenced schedule, jump to an amendment, walk back to the original. This is closer to a filesystem or a hypertext than a search engine.

Identify. Given a question, the layer has to surface which documents are relevant, not just which fragments. The unit of relevance is the document.

Search across modes. Full-text search for exact phrases. Structured search across the extracted metadata – party, date, amount, jurisdiction, status. Semantic search for similar passages where similarity actually matters. The agent should be able to compose these. “All loan agreements signed in 2025 in California mentioning indemnification by the borrower” combines all three. A single retrieval mode does not cover the question space.

Rank and assemble. From the candidate documents (and parts of documents), the layer assembles what should enter the window for this turn, given how much room there is, what the agent has already seen, and what it is trying to do.

Scope and enforce. Every retrieval respects the user’s permissions, the project’s data boundaries, the regulatory zone, the retention rules. Access control is enforced at the read boundary, not applied as a hint to the model. Nothing flows into the window that the calling principal is not entitled to see. The identity model that makes this possible is its own architectural piece. I will cover it next, in No Secrets for Agents.

Govern. Sensitive fields are redacted, classified content is gated, retention windows are enforced. The layer is the place where these rules live, and the window inherits them.

Audit. Every access – read, modification, derived view – is recorded, attributed, and queryable. When the audit team asks which documents this agent touched, on whose behalf, when, with what authority – the answer is a structured trace, not a guess. Writes count too: a document edited by an agent carries the agent run, the principal, the timestamp, and the prior version, preserved.

Expose as tools. The layer is not a one-shot fetch. It exposes search, navigate, get-document, get-section, list-versions, follow-reference as tools the model itself can call mid-turn. The agent constructs its context incrementally – the way a human researcher does: start broad, identify what is relevant, drill in, follow citations, accumulate what it needs. The window grows as the agent works, within its budget, each step a structured tool call against the layer rather than a pre-baked retrieval. The runtime that dispatches those tool calls, manages retries, and audits each access is the agent harness – which gets its own essay after the identity piece.

A retrieval API gives you almost none of these on its own. A context layer must.

The Repository Becomes the Context Layer

We have written elsewhere about the design of a content repository for the agent era – a repository the agent can navigate, identify in, dynamically build context from. That essay defined the shape. This one names what the shape is for: being the context layer.

A content repository, in our sense, is not a folder of files plus a search box. It is a structured, governed, versioned, identity-aware store of documents and the data derived from them, with a query and traversal model the agent can use programmatically, with redaction and access control applied at the read boundary, and with every access audited.

The honest news is that most of this already exists. The enterprise content stack has held source-of-truth documents for decades, in SharePoint, in Documentum, in Box, in Alfresco, in OpenText, in homegrown repositories. The artifacts are there. The lifecycle rules are there. The access controls are there. The audit obligations are there. What is not there, yet, is the agent-readable surface.

Look at the properties this piece described – preparation, structured addressing, multi-modal search, access control, governance, audit, version, exposure as tools. None of them are new. They are the basics of any half-decent ECM platform, accumulated over thirty years of regulated industries working out what a serious document store has to do. The work of the agent era is not to invent these properties. It is to expose them, cleanly, as a surface the agent can call.

ECM is also not just a place where documents sit. It is where the enterprise keeps its memory of what it has done, its knowledge of how it operates, and its record of what is true. The contracts, the policies, the filings, the meeting minutes, the procedures, the audited statements – this is the institutional substance of the company. An agent that draws from the ECM repository is drawing from the enterprise’s context. They are the same thing, named differently.

That is the modernization work. Not throwing out the repositories and replacing them with a vector store. Not pretending that “the new context layer” is an embedding model with a fancy retrieval API. The modernization is taking the existing document repositories – with all their structure, governance, and identity – and exposing them as the context layer the agent can draw from. The retrieval API is a tool inside the layer, not a substitute for it.

An enterprise context layer can also replace an existing content management system that has reached its limits, or enrich one that already works – exposing the same artifacts through the new agent-readable surface, without forcing a rip-and-replace.

At Vertesia, that is what we have been building. The content repository is the context layer in the sense this piece is arguing for: navigable, governed, identity-scoped, version-aware, document-native, audited. The agent does not draw from a vector store; it draws from the repository. The window the model sees is what the repository assembled for this turn, under this principal, for this task.

This is not glamorous infrastructure. It is, however, the part that decides whether the agent’s answers are defensible.

The workaround was the representation layer

Core systems are not going away.

ERP, CRM, HRIS, ServiceNow, core banking, payment rails – these remain. They hold operational state, execute transactions, enforce business rules, expose APIs, and provide capabilities agents can use. Salesforce is not a line-of-business workaround in this sense. It is a core customer system.

The part that changes is different.

For decades, a lot of line-of-business applications had to build structured representations of document truth because software could not read the documents directly. Policy systems, claims systems, contract lifecycle platforms, matter management databases, lending portals – each one extracted a slice of the business into searchable fields, status values, reports, and workflows.

There was a reason for that. The truth lived in documents – contracts, policies, claim files, regulatory filings, signed memos. Computers could not read documents. So we extracted what we could into structured fields, accepted that documents and records would drift over time, and built business processes around the representations.

That was rational engineering, given the constraint. It was also a workaround. The documents stayed the source of truth; the structured systems became operational views of that truth; every reconciliation cycle made the gap visible.

The constraint is now moving.

We can read documents. Not perfectly, not without preparation, but well enough that the truth itself becomes addressable. That does not eliminate the ERP or the CRM. It changes how agents use them.

The agent can read the signed contract, identify the indemnification clause, check it against the operative policy, compare it to the CRM account state, and propose action. The CRM is still useful. The ERP is still useful. They become part of the context and action layer, not the only place where the truth is represented.

If you push this a step further, the shape of some enterprise software starts to shift. Some line-of-business applications become thinner – more like view-apps and coordination surfaces: interfaces that render what the work produced, gather human input where needed, route approvals, and call core systems when an action has to be taken.

The current sprawl was partly built for a constraint that no longer fully holds. Plausibly, a lot of it gets rebuilt closer to the truth.

The window is the output. The layer is the work.

People who confuse the two end up engineering the wrong thing. They tune the prompt. They tweak the retrieval. They argue about chunk size. Meanwhile, the layer underneath – the document store, the navigation model, the access controls, the audit pipeline – is a folder of PDFs and a vector index, and no amount of window-tuning will rescue what was lost when the documents were turned into chunks.

The window is the output. The layer is the work.

Optimize the window all you want; it will only ever be as good as what the layer made available. Get the layer right – documents as documents, source of truth preserved, governance enforced, access scoped, audit captured – and the engineering on top is just operational discipline.

That is the architectural move. The context layer is not a feature on top of an agent platform. It is the layer the platform stands on, and in the enterprise, it is a document and data layer – with documents as the source of truth, core systems as operational surfaces, and agents finally able to work across both.

Eric Barroca

Discussion about this post

Ready for more?