Three mornings, soon
A parent in Park City opens her assistant on a Tuesday and types: "Notify me when swim lessons open for my 5-year-old." That's it. No app to download, no account to check weekly, no tab left open. Three weeks later the assistant pings her. Inside the chat, not linked from it, a small card grid appears: four swim sessions, ages 4 to 6, with times and prices. She taps a filter chip for Saturdays, the grid updates, and she registers. She never visited a website.
Across town, a sales operations lead types: "When a deal closes in Salesforce, kick off the onboarding project in Asana and announce it in Slack." No Zapier recipe. No per-tool automation builder to learn. All three products are connected to her company's agent, the agent knows how to call each one, and the rule just runs.
And in our own codebase, the third scene already shipped. Ask Claude or ChatGPT about kids' activities in Park City and our server doesn't reply with a wall of text. It hands the host an interactive component: branded cards, live filter chips, a fullscreen mode. The interface renders inside the conversation, and the model watches what you do with it.
The agent becomes the operating system. Your product becomes a capability it can summon. And the interface does not vanish: it moves inside the conversation.
The shift, in one line
For forty years, software was a place. You went to it: a desktop app, then a website, then a phone icon. Each product owned a destination, and your day was a tour of destinations.
That model is inverting. ChatGPT and Claude are becoming the layer you live in, the way Windows and macOS once were: a single runtime where you state intent and connected software executes underneath. In that world a SaaS product stops being a destination you log into and becomes a bundle of capabilities the agent can summon: search this, create that, watch for this and tell me.
Here's the bold version: the app, as a place you go, dissolves. Here's the version we will actually defend by the end of this post: the app's interface survives, and even thrives, but it relocates. It renders inside the agent, side by side with the conversation, with the model reading along. The destination dies. The interface moves in.
Why now? Because every SaaS vendor is shipping its own "agentic platform": its own agents to configure, its own console to manage them, its own rules and quirks to learn. Each one is, in effect, another automation layer. One or two of those is manageable. Forty of them, one per product an organization owns, is not. Enterprises are not exhausted yet, but they will be, and the pressure will consolidate toward a single runtime. The platforms with the users (OpenAI and Anthropic chief among them) are the candidates. For everyone else, the strategic question is not "should we build our own agent?" It is "when the runtime war resolves, will our capabilities already be native there?"
What changed: from text to interactive UI
Some vocabulary first, because the rest of this post depends on it.
MCP solved the doing problem. A model could finally reach into your product and act. But every tool result came back the same way: text and structured data, flattened into prose by the model. Search returns 40 activities? You get a paragraph summarizing eight of them. Want to see only weekend options? Type another message. Want to sort by price? Another message. Every click you would have made in a real interface became a round trip through the model.
That gap closed on January 26, 2026, when MCP Apps shipped as the first official MCP extension. It is a protocol-level open standard, not a marketplace and not any vendor's distribution program. It was co-developed in the open by the MCP community alongside OpenAI's Apps SDK team and the MCP-UI project, and it is supported today in Claude (web and desktop), ChatGPT, VS Code (Insiders), and Goose, with more hosts announced.
What it does is simple to state: a tool can now hand back a real, interactive interface that renders directly in the conversation. Not a screenshot. Not a link. A live component the user can click, filter, and refine, while the model stays in the loop.
The "Now" panel above is a working sketch of exactly what our production widget does, with one difference: in production, each chip click triggers a real tool call back to our server, and the model is told what changed. We'll get to that.
How it works, in plain language
Picture the host (Claude or ChatGPT) as a department store, and Family Bugle as a brand renting a counter inside it, the way Sephora runs its own counters inside Kohl's. The counter is ours: our design, our products, our experience. The store sets the rules: where the counter sits, what it can touch, and that anything we need goes through the store's intercom. MCP Apps is simply that lease agreement, standardized, so any brand can open a counter in any store that honors it. The shopper never leaves the store, but at our counter they get the full Family Bugle experience.
Mechanically, it is four steps:
- The tool announces that it comes with a screen. When our server introduces its tools to the host, any tool that has an interface says so up front, in a metadata field called
_meta.ui.resourceUrithat points to aui://address. At that address sits the actual screen: a bundle of HTML and JavaScript we built ahead of time. In counter terms, the design is filed with the store before opening day, so the host can inspect every inch of it before a shopper ever sees it. - You ask, the model acts. Nothing new happens here. You type "find swim lessons," the model picks the right tool and calls it, and our server returns the matching data, exactly like plain MCP.
- The host puts the screen up, inside a locked box. The box is an iframe: a webpage embedded inside another webpage. "Sandboxed" means the box is locked down to rules declared in advance, which the host enforces through a Content Security Policy (browser rules restricting what a page may load or talk to). What that buys everyone: our brand renders inside ChatGPT or Claude with our design and our interactions, and it physically cannot read your conversation, touch other apps, or phone home anywhere we did not declare. Our counter cannot wander the store.
- The screen and the conversation stay in sync. The interface and the host pass structured notes back and forth using JSON-RPC (a simple format for sending named function calls as messages) over postMessage (the browser's built-in way for two webpages to message each other safely). In practice that means: when the model finds new results, the host hands them to the screen to draw. When you tap a filter, the screen asks the host to run the search again. When you pick something, the model finds out. That is the intercom, and it is why a click in the widget and a sentence in the chat end up in one shared brain. This step is where the user-facing magic lives, and we come back to it.
Note what the security model buys everyone. The interface is pre-built and pre-declared, so the host can audit it before rendering. The sandbox and CSP confine it. Every message between the UI and the host is a structured JSON-RPC call the host can log and gate, including requiring user consent before a UI-initiated tool call. This is what makes "third-party interfaces inside the chat" tolerable to platform owners at all.
The three generations of agent UI
"The tool hands back an interface" leaves one big question open: who builds the interface, and when? The industry has converged on a useful taxonomy with three answers, laid out well by Ruben Casas of Postman and by CopilotKit's generative UI spectrum. They trade control for flexibility. One more word to pin down before the tour: a "component" here means the actual thing on your screen (the card grid, the chart, the form with a button), as opposed to the data behind it.
Generation 1: static components
The developer designs and builds the interface ahead of time, ships it as a ui:// resource, and the host renders it in a sandboxed iframe. One naming trap to avoid: "static" refers to who built it and when, not to interactivity. A static component can have filters, pagination, animations, and live tool calls. It is "static" the way a mobile app is static: pre-built, reviewed, and owned by its developer. This generation gives maximum control, security, and predictability, and it is the current production standard.
Generation 2: declarative UI
Instead of shipping finished code, the model emits a structured description of an interface (JSON or YAML), and a rendering engine maps that description onto a predefined catalog of components the host or developer maintains. The easiest way to picture it: the model sends a food order, not a finished dish. It writes a short description ("a bar chart, these five numbers, these labels"), and the host's kitchen, its own library of pre-approved components, cooks it in the house style. If Family Bugle took this route, our server would return the activity data, the model would emit a descriptor saying "render these as a filterable list," and ChatGPT or Claude would draw it using their components, not ours. Because the model writes a description rather than code, the result stays consistent and on-design-system while getting more dynamic and personal. Google's A2UI and Open-JSON-UI are the visible efforts here. Both are pre-1.0 and still changing, which matters when you are deciding what to bet a product on.
Generation 3: generative components
The model writes the HTML, CSS, and JavaScript itself, at runtime, per request. With Family Bugle data, it would look like this: our server hands over the raw activity results, and the model authors a brand-new interface for them on the spot. One parent might get a week-view calendar, another a price comparison table, each invented for that exact question, code that did not exist a second earlier. That is the magic, and also the catch: because the interface is written fresh every time, nobody (including us) reviewed it before it rendered. Lowest determinism, least consistency, largest security surface. The emerging containment pattern is a nested iframe: an outer iframe isolates the app from the host, an inner one isolates the generated code from the app. This generation is experimental, not a production default.
The point that ties this section together: MCP Apps is agnostic about which generation you use. It is the delivery and containment layer (the resource scheme, the sandbox, the message channel) for all three. Choosing a generation is a product decision you make on top of it. Which brings us to ours.
What we built: Family Bugle
Family Bugle is an AI-powered guide to Park City family life: activities, camps, childcare, school calendars, tuition. The data layer is Supabase (hosted PostgreSQL). The question we set out to answer: what does this product look like when the agent is the operating system?
The answer is an MCP server with a real interface attached. Here is the actual implementation, from the code.
The server
The MCP server lives at backend/src/mcp/server.py, built on FastMCP and mounted at /mcp inside the same FastAPI application that serves our public API (backend/src/main.py). It speaks streamable HTTP, so Claude and ChatGPT connect to it as a remote server. Read-only catalog tools are open; account tools (a parent's saved children, digest preferences) sit behind OAuth 2.1 backed by Supabase Auth, defined in backend/src/mcp/auth.py.
The 23 tools split into two families. Catalog tools include list_activities, get_activity, search, fetch, list_providers, get_provider, list_school_breaks, and list_tuition, plus a program, season, and team hierarchy for club sports. Account tools include get_my_profile, list_my_children, add_child, and update_digest_settings. That last family is what makes the consumer vignette real: "my 5-year-old" resolves because the agent can read the family's saved child profiles, with the parent's consent, through the same protocol.
We also publish a second, minimal server (mcp/src/familybugle_mcp/server.py) that runs locally over stdio with six tools (search, get, ask, school_breaks, tuition, provider_hierarchy) and calls our public API. Same capabilities, different transport, zero UI. Keep that one in mind; it proves a point in a moment.
The interface
Three tools declare a widget: list_activities, get_activity, and get_provider. The widgets are React components living in frontend/widgets/ (an activities grid, an activity detail view, and a provider profile), compiled by Vite via frontend/widgets/build.mjs into fully self-contained JavaScript bundles: no runtime fetches, no external fonts, even the Family Bugle logo is inline SVG. Each bundle is inlined into an HTML shell and registered as an MCP resource by backend/src/mcp/widgets/registry.py.
In plain terms: we designed and built three small screens ourselves, the same way we would build pages of our own website, and we hand each one to the host as a single sealed package. Sealed matters. The package carries everything it needs (code, styles, even the logo), so when it renders inside ChatGPT or Claude it works instantly, looks exactly as we shipped it, and makes no calls back to us that the host did not broker.
The tool declaration is exactly the spec's shape. Trimmed from the registry:
# backend/src/mcp/widgets/registry.py (abridged)
"_meta": {
# MCP Apps standard (authoritative)
"ui": { "resourceUri": "ui://widget/activities-list-BCEDG6U3.html" },
# ChatGPT Apps SDK aliases (compatibility)
"openai/outputTemplate": "ui://widget/activities-list-BCEDG6U3.html",
"openai/toolInvocation/invoking": "Finding activities…",
"openai/widgetAccessible": true,
}
That BCEDG6U3 in the URI is a content hash: a fingerprint of the bundle's bytes. Rebuild the widget and the URI changes, so hosts that cache templates reload the new code instead of serving a stale one. Old hashes stay registered (a manifest in backend/src/mcp/widgets/dist/manifest.json tracks current plus history), so a conversation from last month still renders with the exact bundle it was born with. For the non-engineers: think of it as a batch number on the package. New recipe, new batch number, and the old batches stay on the shelf for the conversations that used them.
Inside the widget, the activities grid renders branded cards with interactive filter chips for age, time of day, days of the week, activity type, and provider. Tap a chip and the widget calls list_activities again, through the host, with the new filters. Tap a card and the widget sends a natural-language follow-up into the conversation so the model narrates the detail view. There is a fullscreen mode for browsing more than the eight inline cards, and a link out to familybugle.com for the long tail. Every one of those interactions replaces a sentence the parent would otherwise have had to type, and that arithmetic (a tap instead of a prompt, ten times a session) is the whole user-experience case for MCP Apps.
Three ways to consume one server
Because the UI is declared per tool (3 of our 23 tools carry a widget; the other 20 are plain), the same server supports three consumption modes without any branching logic on our side:
- Full MCP App. Claude or ChatGPT renders the widget. This is the swim-lesson scene from the cold open: a parent asks one question, gets our branded card grid inside the chat, and refines it with her thumb instead of typing follow-ups.
- Bare tool calls. A host with no UI support (or our stdio server, which has no widgets at all) gets clean structured data and the spec's graceful degradation to text. Concretely: a voice assistant answering "is there a school break next week?" out loud, or an agent quietly checking activity availability in the middle of planning an entire birthday party. No visuals needed, same capabilities underneath.
- The hybrid. The widget shows the grid; the chat layer wraps it. "Which of these works around a Tuesday piano lesson?" is a sentence, not a filter chip we ever built, and the model answers it about the data on screen. Interface for what interfaces are good at, language for what language is good at.
How this maps to the spec
- ✓Tool declares UI via
_meta.ui.resourceUriusing the spec's nested form, not the deprecated flat key. Verified inbackend/src/mcp/widgets/registry.py. - ✓UI resources served under
ui://with the spec's required MIME type,text/html;profile=mcp-app, as the default. AWIDGET_HOST_TARGETenv switch can emit ChatGPT's legacytext/html+skybridgeas an escape hatch. - ✓Sandboxed iframe rendering. Sandboxing is the host's job under the spec; our side declares the inputs it needs:
ui.csp(emptyconnectDomains, because the bundles make zero runtime fetches),ui.domain, andui.prefersBorder. A unit test asserts no widget embeds external iframes. - ✓JSON-RPC over postMessage via the official
@modelcontextprotocol/ext-appsApp client infrontend/widgets/lib/host.ts, including theui/initializehandshake. - ✓App-side methods match the SDK. Tool results arrive via the
ui/notifications/tool-resultnotification; the widget callscallServerTool(aliased tocallToolin our bridge) andupdateModelContext. - ✓UI is opt-in at the tool level. Only 3 of 23 tools attach a widget; the server is fully consumable with no visual layer at all.
- +Beyond the spec, by choice: we dual-emit ChatGPT's
openai/*alias keys next to the standardui.*keys, pass the widget's render payload through a_meta.fullDataconvention alongside the model-facing summary, and keep historical bundle hashes registered so old conversations resolve. All additive; removing them would not break spec conformance.
Why pre-built components, and not declarative or generative yet
This was the central engineering decision, so here is the reasoning in full.
Our requirements were specific. We needed filter chips with our exact semantics (age bands, time of day, days of the week, provider). We needed tap-to-refine that re-queries live data. We needed the Family Bugle brand rendered correctly every time, in light mode and dark. And we are a consumer product serving families, which means the rendering must be deterministic: the same query produces the same interface, reviewable in advance, every single time.
Read the columns honestly and the choice makes itself. Our whole product is continual refinement: filter, look, narrow again, every tap re-querying live data. The declarative route cannot promise that experience today, because the model can only order from the host's fixed menu of components, and our filter-and-refine loop is not on that menu yet (the menu formats themselves, A2UI and Open-JSON-UI, are still pre-1.0 and changing). The generative route could build it, but it would rebuild it differently every time: an interface nobody reviewed before it rendered is not something we will put in front of parents booking childcare in 2026. Pre-built components gave us everything on the list, at the cost of flexibility we did not need.
So this is not the path we settled for. It is the mature path, chosen on purpose, and the industry's current production standard besides. The calculus will change: when declarative catalogs can express our interactions and the formats stabilize, the personalization upside gets interesting, and because MCP Apps is the delivery layer for all three generations, migrating is an implementation swap, not a re-platform. We revisit the decision; we did not foreclose it.
"Static" was never the limitation it sounds like. The component is pre-built. The experience is alive.
What you click, the model knows
Here is the payoff of all that plumbing: the screen and the conversation share one memory. Click something in the widget, and the model knows. When the model learns something new, the screen shows it. (Engineers call this the bidirectional loop.) This is what separates an MCP App from a webpage pasted into a chat window, and it is worth being precise about what ships today versus what is coming.
Live today, in our production code, three channels run between the interface and the model:
- Results flow onto the screen automatically. When the model calls
list_activities, the host hands the result straight to the widget to draw (the spec calls this theui/notifications/tool-resultnotification). The widget never makes a network request of its own, which is part of why hosts can trust it. - A tap is a query. Tap a filter chip and the widget asks the host to re-run our search tool via
callServerTool, where the request is auditable and can be gated on user consent. Our bridge infrontend/widgets/lib/host.tswraps this, and even deduplicates rapid taps with a generation counter so a fast-clicking parent never sees a stale grid win a race. The outcome for the user: refining results costs a tap, not a typed sentence. - The model sees what you did. Through
updateModelContext, the widget tells the model which filters are active and what is on screen. So when the parent follows up with "book the second one," the model knows exactly what "the second one" means. Clicks become part of the conversation's memory, and nobody has to repeat themselves.
Forward-looking The dashed arrow is the frontier, and we want to be straight about its status: the agent fully driving a rendered component (clicking its filters, operating it like a user would) rather than observing it is the emerging direction, not a shipped, standardized capability. The spec's plumbing points that way (hosts already stream tool inputs into views), but do not build a roadmap slide that claims it works everywhere today. What works today is already remarkable: an interface and a model sharing one context, each handling the half of the interaction it is best at.
Why this matters for the business
Return to the two vignettes, because they are the business case.
The consumer side is a stickiness story. The swim-lesson parent never installed our app, yet Family Bugle renders, branded, inside the assistant she already uses every day. It is worth being precise about how she got there. Today, connecting is a one-time, lightweight step: she enables Family Bugle from her assistant's app directory, a couple of taps, no download, and she signs in only if she wants the personalized tools. Forward-looking The direction the ecosystem is pushing is for even that step to fade: agents discovering and connecting to services on their own, so that asking about Park City activities is enough for the assistant to find and offer Family Bugle. That is not how it works today, and we will not pretend otherwise.
Once connected, the stickiness compounds, and it is worth naming the real aha moment. It is not the card grid, nice as that is; the grid is the first impression. The aha comes a week later. The parent mentioned that Rose loves swimming, Rose's age already lives in our account tools, and the assistant comes back on its own: "swim lessons just opened for Rose's age group." Nobody searched. The agent remembered, checked our data on a schedule, and showed up with the answer. Forward-looking The natural next line is "want me to register her?", and that part is still ahead of us: our tools today show and notify, they do not yet book. But even without it, that moment converts a search into a relationship. In the old model, distribution meant winning a download. In this one, it means being the connected capability the agent reaches for, and the one the household never bothers to disconnect.
The enterprise side is a leverage story. Recall the sales lead from the opening, typing "when a deal closes in Salesforce, kick off the onboarding project in Asana and announce it in Slack." That rule took one sentence because all three products were connected capabilities inside the same assistant, Claude or ChatGPT, instead of three products with three logins and three automation tools. Multiply that by every pairing of tools an organization owns and you see what disappears: the per-tool automation layer, the integration middleware, the training for each vendor's bespoke agent. One assistant, every connected capability, compounding. The organizations that get that leverage first will get it through the vendors that showed up as MCP-native capabilities first.
Two honest caveats belong here. First, expect resistance: no vendor is eager to become an invisible capability inside someone else's platform, which is exactly why so many are shipping their own agent layers, and why the exhaustion is coming. Second, the deepest enterprise products do not reduce to chat. A complex suite with years of configuration depth keeps its full interface for the deep work; what moves into the agent first is the cross-product, routine, glue work nobody loved doing anyway. The embedded surface plus a chat layer sits alongside those products, not instead of them: the agent becomes the front door for the routine work, and the deep work keeps its deep interface.
Which is the strategic move in one sentence: ship your capabilities as MCP-native now, with real UI attached, and own your spot in the foundation before the platform war resolves. The protocol is open and host-agnostic. Our one server serves Claude and ChatGPT alike, which means we did not have to bet on the winner. We bet on the standard, and the standard is the one thing every contender supports.
Where this goes
Forward-looking Everything in this section is direction, not shipped fact. Treat it accordingly.
The destination is not "chat replaces software." It is embedded app surfaces with a chat layer that augments them. Screens are better at some jobs: showing forty options at a glance, comparing them side by side, narrowing with a tap. Words are better at others: saying what you actually want, handling the exception, asking the question no button was built for. The product that wins lets each do its job, with the two sharing that one memory. For a family, that looks like one assistant that knows the kids' ages and can show as well as tell. For a company, it looks like forty products behaving like one, with each product's full interface still there when the work gets deep.
Expect the connection itself to become the moat: switching costs move from "learn a new interface" to "rewire my agent's capabilities." Expect declarative UI to mature into the personalization layer (the same data, composed differently for a parent of toddlers than for a parent of teens), and generative UI to earn narrow, sandboxed production roles. Expect the host platforms to keep absorbing OS-like responsibilities: permissions, identity, payment, and the app review function that pre-declared MCP App templates already hint at.
And expect the defensible version of this post's title to hold up better than the provocative one. OpenAI and Claude are becoming the operating system, yes. But operating systems never killed applications; they gave them somewhere to run. The interface and the agent, together. That is the future we built for.
Your product's next surface is not a website or an app store listing. It is a tool with an interface attached, summoned by name, inside the conversation your customer already lives in.
We shipped ours: 23 tools, 3 widgets, one open standard, every major host.
Ask your agent about Family Bugle →