Durable Execution Is The Floor, Not The Building: A Note On Mistral Workflows

We agree with Mistral on the hardest infrastructure question. We disagree on what an agent system is above that floor.

May 16, 2026

Mistral launched Workflows this week. I read the docs, downloaded the Python package, looked through the SDK, and compared the design to what we have built at Vertesia.

There is a lot to like. There is also a clean architectural split.

Mistral and Vertesia agree on the floor: production AI work needs durable execution. We disagree on what an agent system is above that floor.

That distinction matters because “durable execution” is becoming a category slogan. It should be. But it is not the whole product.

What Mistral Got Right

The most important thing in Mistral Workflows is not the decorator syntax. It is the execution model.

We’ve been building our agent system on Temporal for two years.

They picked Temporal this week.

That is not a coincidence. Long-running AI work does not fit the old pattern of one process holding everything in memory while it waits, retries, calls tools, waits for humans, and hopes nothing dies.

If your agent dies after three hours of work and cannot resume, your platform is a demo.

If a provider throttles you for two hours and hundreds of in-flight jobs sit burning compute in retry loops, your platform is a demo.

If a worker disappears mid-run and the execution history cannot be replayed from a durable boundary, your platform is a demo.

If a workflow pauses for a human approval overnight and the system has to keep an active process alive for that wait state, your platform is a demo.

Durable execution changes the failure model. State survives worker death. Waiting does not burn active compute. Signals and timers become first-class. Long-running work can resume from the next durable step.

That is the floor for production agent systems.

The other thing Mistral implicitly gets right is that single-threaded isolated sandboxes are not a substitute for durable execution.

Sandboxes are useful. We use them too. They are the right primitive for code execution, file manipulation, package installation, and risky tool work. But an isolated sandbox is not an execution substrate. Snapshotting a container is not the same as orchestrating a durable workflow. It does not give you reliable signals, human wait states, child workflow coordination, queryable execution history, or replay semantics across worker failure and provider outages.

A single-threaded agentic loop in a code sandbox is to durability what a Java app deployed on EC2 was to cloud in 2011.

So yes: durable execution matters. Temporal matters. Sandboxes are a tool, not the runtime.

On that floor, Mistral is, in effect, directionally right.

The disagreement starts with the next question.

The floor and the building — durable execution is the floor; the building is everything above

What Is An Agent System?

Mistral’s answer is roughly:

An agent system is a Python codebase you write on top of a durable workflow SDK that calls a stateful Agent API and a set of MCP-registered tools. They host the orchestrator. You host the workers.

That is a coherent product. Clean, opinionated, developer-first. And honest about what it is: a durable execution SDK with the Mistral Agent API as the recommended orchestration engine.

Our answer is different.

For us, an agent system is a product, not an SDK. It is:

a durable runtime (yes, Temporal — same floor)
an autonomous agent loop that already knows how to reason, call tools, manage its context, recover, branch into subagents, and stream
a process engine sitting above that loop, so you can compose, supervise, and govern agents inside structured business processes
a tool system with progressive disclosure via skills, so agents don’t drown in their own tool catalog
a content layer that gives agents properly prepared documents and assets they can actually reason over
an interaction layer that turns prompts into versioned, parameterized, server-rendered first-class artifacts
a cloud platform so the customer does not have to run anything — but can elect to

Mistral Workflows gives developers a durable Python SDK. Vertesia gives teams an agent platform.

Both are useful. They are not the same abstraction.

The Autonomous Agent Loop, Pre-Built

A reasoning loop is “just a while loop.” I have written that explicitly:

while the agent needs more work, it reasons, it picks a tool, the tool runs, the result goes back in.

That is the kernel.

The kernel is the easy part.

The hard part is everything around it, and Mistral leaves nearly all of it to you.

Token accounting normalized across providers. Tool-call wire formats reconciled across Anthropic, OpenAI, Gemini. Context-window aware checkpointing that knows when to compress, when to roll forward, when to summarize. Cache control that actually maintains hit rates across tool surface changes. Reasoning-block preservation so models do not silently degrade between turns. Stop-condition handling that distinguishes “done” from “stuck in a tool-call loop.” Coherent failure recovery when a tool times out mid-reason.

In Vertesia all of this lives in the agent loop. It runs as a Temporal workflow with continueAsNew checkpointing, signals for UserInput / Stop / TriggerCheckpoint, queries for active workstreams, sub-100ms streaming back to the UI, child workflows for subagents with isolated contexts, and capacity-aware flow control in front of the model providers (the deeper argument on that last one is in Provider Limits Are an Architecture Problem). We built it once, against a multi-provider abstraction, and every customer benefits from every fix.

In Mistral Workflows you can build this. They give you the substrate. But you are still writing the loop. And the subagent system. And the multi-provider routing. And the cloud platform that makes any of it operable.

And loops are hard. Multi-model loops are very hard.

There is also an obvious downstream consequence: Mistral’s first-class agent path uses Mistral models. You can call other APIs from activities, but the integrated agent surface is theirs. If you want Claude Opus 4.7 or GPT-5 or Gemini 3 driving the loop, you are back to writing it yourself on top of their SDK. If you want to switch models per interaction, A/B providers, route by cost, or follow the frontier next quarter, that is your problem. And from both experience and benchmarks, Mistral’s models are not currently at the frontier for advanced reasoning and orchestration.

Vertesia routes through Llumiverse. Same loop, any provider, switchable per interaction — and per subagent within a single workflow. You can put Claude Opus 4.7 on the orchestrator, Sonnet on the planner, Haiku on cheap extraction, all in the same agent run.

That is not a small detail.

The frontier moves every six months. Coupling your agent platform to one model family is a strategic mistake.

The Process Engine: The Outer Control Plane

The other architectural piece Mistral does not have is the one I think matters most for real enterprise work.

A reasoning loop is the right shape for open-ended investigation. It is the wrong shape for a regulated business process. Most production work that enterprises actually want to ship is neither pure workflow automation nor pure conversational agent. It is a hybrid process: deterministic state, typed context, guarded transitions, human approval gates, retries, audit trail — with bounded agent reasoning at specific steps.

We built the Process Engine for that. It is a separate Temporal workflow that runs a node graph: branch, foreach, interaction, agent, supervisor. Transitions guarded by JSON Logic against typed context. Optional human supervisor signals with timeouts. A visual designer for non-developers. And — this is the part that matters — agents are nodes inside the process, not the entire system.

The process owns determinism, routing, validation, auditability. The agent nodes own the open-ended reasoning at the points where reasoning is what you want.

Two complementary control planes, one above the other, on the same durable substrate. (The full design — typed context, declared writes, the supervisor seat, MCP under the same governance — is in The Engine Is the Contract.)

Tools, Skills, And Progressive Disclosure

Mistral’s tool story is MCP Connectors, currently in beta. Register an MCP server URL, the model auto-discovers the tools, you can intercept calls for human-in-the-loop. That is a clean bet on an open standard, and MCP is, in my view, the right long-term answer for cross-vendor tool exchange.

We expose an MCP server too. We also consume MCP tools.

But MCP alone is not, in itself, a tool system.

A serious agent platform also has to deal with tool-surface explosion. The moment you give an agent forty tools, the loop quality degrades. Selection mistakes go up. Hallucinated tool calls go up. Tokens spent on tool descriptions go up. This is real, it is measurable, and it is the reason the Vertesia tool layer is shaped around skills with progressive disclosure:

A skill is a named area of work — web_search, process_designer, studio_assistant, process_definitions, and so on.
An agent calls learn_<skill> when it decides it needs that area.
The skill returns instructions plus unlocks its tools dynamically.
Until then, the tools are not in the model’s tool list at all.

The agent starts narrow, expands on purpose, and never has to choose from forty options up front. The current Studio Assistant runs this way and the difference in tool-call quality is not subtle.

Vertesia also ships a substantial library of built-in tools — content operations, code execution, web search, structured extraction, document processing, artifact handling, and more. You don’t start from zero.

Mistral has function calling and MCP. They do not have skills, progressive disclosure, a built-in tool library, or the context discipline that comes with all three. You could rebuild any of it on their SDK. Most teams will not.

The Content Layer Underneath

Here is the part that gets glossed over in any side-by-side that focuses only on the runtime.

Agents reason over content. That content is almost never clean. PDFs come in with broken tables, lost layouts, flattened columns, missing headers. Images come in without semantic structure. Documents come in without versioning, embeddings, or retrievable history. If your platform treats document quality as someone else’s problem, your agent is reasoning over garbage and you are blaming the model for poor results.

Vertesia ships the layer underneath:

a content store with versioning and access control
intake and processing pipelines that extract structured content from PDFs, images, audio, video
embeddings and indexing for semantic search and retrieval
assets as first-class citizens, not strings stuffed into a prompt
collections that an agent can search, filter, fetch, and cite
artifact storage scoped per agent run for intermediate work

That whole layer is the difference between an agent that handles real enterprise documents and one that needs them pre-cleaned.

Mistral Workflows does not address this. It is not in scope. Their pitch is the runtime. (The full thesis on why the repository itself has to change is in The Repository Reads Itself.)

Interactions: Prompts As Versioned Artifacts

One more piece, because it shows up everywhere in real deployments.

A prompt is not a string in your code. It is a configuration artifact. It needs to be parameterized, validated against a schema, versioned, A/B tested, swapped per environment, observed in production, and editable by people who are not engineers.

Vertesia’s interactions are exactly that: server-rendered Handlebars templates with typed parameters, model and provider routing per interaction, prompt-cache aware structure, and full execution history. Product managers can change a prompt without a deploy. Engineers can pin versions in workflows. Both can see what ran, with what parameters, on what model, with what cost.

Mistral Workflows treats prompts as your code’s responsibility. That is a defensible decision for an SDK. It is also a layer of operational maturity that any enterprise eventually needs and that you are now on the hook to build.

So How Should You Choose?

Two ways to think about the work — Mistral as a linear function with LLM calls; Vertesia as agents and AI apps, with the process engine wrapping the agent loop and platform capabilities under one governance

I will be honest about where each fits.

Mistral Workflows is the right choice when…

You are thinking linear function with some LLM calls.

You are a Python-first team that wants to write workflows in code, not adopt a product.
You are already using Mistral models and you are happy to stay there forever.
You want a durable execution SDK and you intend to build the agent loop, the tool system, and the surrounding maturity yourself.
You want to run the workers in your own infrastructure and you have the team to operate them.

That is a real audience and Mistral is serving them honestly.

Vertesia is the right choice when…

You are thinking agents and AI apps.

You want an agent loop that already knows how to reason, recover, stream, and manage context across model providers — without writing it yourself.
You want to use the best available models, not be tied to one vendor.
You need both autonomous agents and structured processes that orchestrate them, with human-in-the-loop, audit, and supervision.
You need the content layer: documents, images, intake, embeddings, search, artifacts.
You need interactions, skills, progressive tool disclosure, and observability as first-class platform features.
You do not want to be required to run workers, infrastructure, or a Temporal cluster — but you can, if you want. Cloud is the default; on-prem is there for regulated customers who need it.

These are not the same product.

They overlap on the substrate and diverge above it.

Is Mistral Workflows Just Temporal Repackaged?

Let me address the question I have been dancing around.

The honest answer is: Mistral Workflows is Temporal with a Python SDK, a managed control plane, an opinionated AI feature set bolted on (streaming, payload offloading, encryption, RBAC), and a clean integration with the Mistral Agent API.

That is more than “just Temporal.” Streaming, payload offloading >2MB to customer-owned blob storage, SDK-layer encryption, multi-tenancy, and observability are real engineering on top of the open-source engine. The deployment topology is genuinely useful for customers who want managed orchestration but customer-controlled compute.

But the customer still runs the workers. The customer still writes the workflow code. The customer still builds the agent loop. The customer still builds anything that looks like a process engine, a content layer, an interaction system, or a skill catalog. The customer still gets one model family as the first-class agent path.

So: more than just Temporal. Less than an agent platform.

Best understood as a managed, AI-flavored Temporal SDK — and that is exactly the right framing if it helps you decide whether to use it.

Where I Land

Durable execution is the floor. Mistral and Vertesia agree on that, and anyone shipping an agent platform without it in 2026 is selling a demo.

The floor is not the building.

The building is the agent loop you don’t have to write, the subagent system that comes with it, the process engine that wraps it, the tool library and skill system that keep it focused, the content layer that gives it something to reason over, the interactions that make prompts into versioned product artifacts, the multi-provider abstraction that lets you keep using the best model — including mixing models across subagents within a single run — and the cloud platform that hides all of it from your ops team.

Pick the floor if that’s what you need.

Pick the building when you want something you don’t have to construct yourself.

Eric Barroca

Discussion about this post

Ready for more?