Paula Silva | Software Global Black Belt
Open Horizons, the Agentic DevOps Platform

The open-source accelerator for platform engineering
and Agentic AI at enterprise scale.

Backstage on Azure, six layers of governance, the Horizons Phases, and a working Agent IDP in 90 days.

AuthorPaula Silva
RoleSoftware Global Black Belt
Duration60 to 90 minutes
Date2026-05-22
Agenda

Four acts. Thirteen parts. Two hours end to end.

ACT 1
I-IV · 25 min
Diagnostic
  • IThe problem
  • IIWhat Open Horizons is
  • IIIPlatform engineering 101
  • IVThe rise of Agentic DevOps
ACT 2
V-VI · 30 min
The stack
  • VMS Agentic DevOps stack
  • VIDay in the life
  • VIILive tour, ohorizons.ai
  • VISix-layer Context Platform
ACT 3
VII-X · 35 min
Adoption + ops
  • VIIHorizons Phases, H1/H2/H3
  • VIIIGitHub + ADO integration
  • IXGolden Paths + Agent Catalog
  • XSecurity + governance
ACT 4
XI-XIII · 30 min
Business + action
  • XIBusiness value + ROI
  • XIIGetting started, 5 steps
  • XIIIPartner ecosystem
  • Close + Discovery CTA

Short on time? Watch Act 2 alone, V to VI. That covers the demo and the architecture. Then jump to Act 4 for Discovery sign-off.

Who is speaking

Paula Silva, Software Global Black Belt.

Building the future of software development with AI and Agentic DevOps.

I work with enterprise customers across the Americas on Agentic AI, platform engineering, and software modernization. This deck distills the patterns I see repeated across dozens of programs, pilots that work in a notebook and stall on a production cluster. Open Horizons is the accelerator that closes that gap, by construction, on Day 1, not on Day 180.

PART
I

The problem.

Why most enterprise AI pilots stall before production, and why more tools will not fix it.

The agent cemetery
95%

is the GenAI pilot failure rate measured by MIT NANDA in 2025. Failures concentrated in context, integration, and governance gaps. Not model quality.

Source: MIT NANDA, The GenAI Divide, State of AI in Business 2025.

The cancellation curve
40%

of Agentic AI projects will be cancelled by end of 2027, according to Gartner. Cost overruns, unclear business value, inadequate risk controls.

Source: Gartner press release, June 2025. Same period when Gartner also forecast 40 percent of enterprise apps will feature task-specific AI agents by 2026, up from under 5 percent in 2025.

The inverted perception problem

Developers estimated AI tools would speed them up by 20 percent.
An RCT measured they were 19 percent slower.

Estimated speedup
+20%
developer self-report of expected productivity gain from AI tools
Inversion
39 pp
vs measured outcome
Measured speedup
-19%
actual change in throughput, 16 experienced open-source developers

Source: METR, RCT on AI-augmented development, 2025. arXiv:2507.09089. When the tool makes you feel capable, you cannot detect it has made you less effective. Trust measured trajectories, not satisfaction surveys.

The Triple Debt

Three forms of debt accumulate in AI-native development.
Each is invisible until it is expensive.

Familiar
Technical debt
CMU 2026, AI agents drive +18 percent static-analysis warnings and +39 percent cognitive complexity. Liu et al. 2026, AI code adds significantly more requirement and test debt across 304,362 commits.
New
Cognitive debt
Storey 2026 names it cognitive surrender. Anthropic Fellows 2026, AI use during learning reduces library-specific skill acquisition by 17 percent. The codebase becomes orphaned knowledge.
Worst
Intent debt
Objectives, constraints, and decision rationale never captured. The Klarna 2025 case, comprehensive context, no codified intent. The agent optimizes for the wrong metric.

Open Horizons exists to eliminate the Triple Debt by construction. SDD anchors intent, Backstage anchors knowledge, scope guards anchor scope.

The four costs of the status quo

Without a platform, every team pays four taxes.

Cognitive
Six jobs in one role
Kubernetes, Terraform, Actions, observability, security scanners, and now agent frameworks. Burnout, shallow expertise, slow delivery.
Inconsistency
N teams, N paved roads
N pipelines, N base images, N security postures. Zero leverage when a CVE drops or compliance changes.
Compliance
Annual fire drill
SOC 2, ISO 27001, HIPAA, PCI-DSS enforced manually, retrospectively, and incompletely. Audits become projects, not checkpoints.
AI adoption
Shadow agents
Devs pasting code into ChatGPT, unsanctioned prototypes touching prod, no per-agent cost, no governance for which models, which tools, which data.
The wrong reflex

The instinct is to buy another tool.
That makes the problem worse.

The reflex · stack another tool
CI/CD vN+1
New IaC
AI gateway
Obs vendor
Sec scanner
Agent SaaS
MCP hub
FinOps SaaS
+ N more
Every purchase adds surface area, integrations to maintain, and one more decision the developer has to make before writing code. The cognitive tax compounds.
platform
shift
The answer · one platform, opinions baked in
Paved roads, Golden Paths replace per-team improvisation.
Opinions encoded, no rebuilding the wheel on each repo.
Governance baked in, security, FinOps, audit by construction.
Agents first-class, identity, RBAC, cost ceiling, replay.
A platform multiplies leverage. That is what Open Horizons delivers.

The solution is not another vendor. The solution is a platform with opinions, paved roads, and governance baked in. Tools serve the platform; the platform serves developers and agents.

What good looks like

An enterprise where the platform multiplies leverage, not headcount.

01A new service is scaffolded, deployed, monitored, and compliant in under 30 minutes.
02Every team uses the same paved road, but can step off it explicitly when needed.
03AI agents are first-class citizens with identity, RBAC, cost ceilings, and trajectory replay.
04Audits are continuous, not annual. Every change is traceable to a spec, a PR, and a run log.
05The platform team ships a product. Developers and agents are the customers.
The road from here

The problem is named. The path forward is mapped.
Here is how the next 110 slides take you from the cemetery to a working platform.

Act II → V · Concept
What Open Horizons is, why platform engineering, what Agentic DevOps means, how Microsoft ships the stack
VI + VII · Demo
Day-in-the-life of a developer and an agent. Live tour through ohorizons.ai screens you can open today
VI → X · Mechanics
The six-layer architecture, Horizons Phases, GitHub + ADO integration, Golden Paths, Agent Catalog, security and compliance
XI → XIII · Action
Business value frame, the five-step engagement model, partners, references, and how to start a Discovery

If short on time, jump to the live tour at VII (10 slides), then the Horizons Phases at IX (6 slides), then Getting Started at XIV (6 slides). That is the 22-slide path.

The body of work behind this deck

Three artifacts, one continuum.
The deck teaches the model. The playbook documents the patterns. Open Horizons ships them.

01 · Deck
Context Platform Stack

The diagnostic deck. Four questions, four layers, the cemetery numbers, the cost of failure. 50 slides, executive audience. The conceptual model in compressed form.

Audience: CTO, CIO, CFO, board
02 · Playbook
Open Horizons Playbook

25 chapters. Part I tells the story. Part II is the receipts: peer-reviewed research, the CNCF crosswalk, every layer deep, every claim cited. The reference architecture as a published guide.

Audience: architects, platform leads, security, FinOps
03 · Accelerator
Open Horizons (executable)

The deck, materialized. A working Backstage + Azure + Foundry deployment in your tenant in under 3 hours. 22 Golden Paths, 17 agents, 15 MCP servers, all governed and observable on Day 1.

Audience: every team in the org · ohorizons.ai

This deck pulls from all three. Numbers and diagnostics come from the deck, mechanics come from the playbook, screenshots and live demos come from the accelerator.

The four questions

Four questions most enterprises fail to answer with precision.
Open Horizons answers each one in code, not in slides.

Q1 · Cloud and infrastructure
"Where do agents run, and at what real cost?"
Compute, GPU, Kubernetes, decision observability, tool choice, inference tokens.
OH answer: L1 Terraform modules + AKS + ACR + Key Vault + Foundry + L1/L6 FinOps roll-up dashboards.
Q2 · Platform engineering
"What can agents access, and who governs it?"
IDP, Golden Paths, guardrails, per-agent RBAC, quotas, auditor agent.
OH answer: L2 Backstage + 22 Golden Paths + RBAC plugin + DORA + OPA Gatekeeper + scoped Workload Identity per agent.
Q3 · Context engineering
"What can agents know, when, and at what token cost?"
ACE pattern, skills, three memory tiers (hot, warm, cold), MCP servers.
OH answer: L3 Foundry Toolbox (12 MCP + 4 built-ins) + 3-tier prompt cache + enterprise_memory on pgvector + Shared Context Store.
Q4 · Intent engineering
"What should agents optimize for, in what hierarchy of trade-offs?"
CONSTITUTION.md, SDD with EARS, intent debt, specification engineering.
OH answer: L4 Specky 10-phase pipeline + 103 EARS reqs + scope-guard hooks + .github/model-routing.yaml + intent-drift measurement.
From the original four to the production six

The Context Platform Stack started as four layers.
Production added two more: Integration and Harness.

2025 · Original 4-layer model
L4 Intent · what should agents optimize for
L3 Context · what can agents know
L2 Platform · what can agents access
L1 Cloud · where do agents run
prod
2026 · OH production 6-layer model
L6 Harness · wraps every model call · NEW
L5 Integration · GitHub + ADO + Argo + MCP · NEW
L4 Intent · what should agents optimize for
L3 Context · what can agents know
L2 Platform · what can agents access
L1 Cloud · where do agents run

Five layers collapses Integration into Platform and Harness into Context. Both collapses produced unmaintainable bundles in real deployments. The two extra layers are what survived production contact.

Defending the model

Two recurring pushbacks. Both have crisp answers from the field.

Pushback 1 · "Three is enough"
Merging context and intent creates drift.

If you merge "what the agent knows" with "what the agent wants," both become ambiguous. You see it when the same skill change alters expected behavior and nobody can tell if it is a bug or a feature. Separating context (facts) from intent (values) makes drift detectable. That is why L3 and L4 are different layers in Open Horizons, with different artifacts, different review processes, different change cadences.

Pushback 2 · "Six is too many"
Integration and Harness have distinct owners.

Integration (L5) is owned by the platform integrators handling GitHub + ADO + Argo + MCP coexistence. Harness (L6) is owned by SRE + FinOps + Security teams. If you collapse them into Platform or Context you put the wrong owner on a contract. Separation maps cleanly to who is on-call when each layer fails. Production survives the audit only when ownership is unambiguous.

The 4-layer model is the teaching frame. The 6-layer model is the operating frame. Both are correct. They serve different rooms.

PART
II

What Open Horizons is.

An accelerator, not a SaaS product. A platform, not a stack of tools. Two personas, one portal.

In one sentence

Open Horizons is an open-source Agentic DevOps Platform.
An Azure-native accelerator that gives enterprises a production-grade Internal Developer Platform and an AI Agent Platform in one coherent stack.

Delivered by Microsoft and the certified partner network. Deployed in your tenant, your subscription, your data. No SaaS, no lock-in, no per-seat pricing.

What it is not

Four things Open Horizons is deliberately not.

Not SaaS
Your tenant
Open Horizons runs in your Azure subscription, on your AKS cluster, with your identity provider. No data leaves your tenant.
Not a fork
Upstream Backstage
Open Horizons consumes upstream Backstage and contributes back. You stay on the community release train, you keep the ecosystem.
Not lock-in
Open standards
Every layer is open source or open standard. AKS, Terraform, Argo CD, OpenTelemetry, MCP. You can take the code with you.
Not templates
Working infra
The accelerator includes live infrastructure, a working agent runtime, observability, Golden Paths, agents, skills, prompts, policies, runbooks.
Two personas, one portal

Both personas share identity, catalog, RBAC, and observability.
An agent is just another component that needs to be managed, secured, and measured.

Developer IDP
For app engineers
Self-service scaffolding via Golden Paths. One-click environments. TechDocs per service. Integrated CI/CD, secrets, observability, cost. A catalog of components, APIs, resources, teams.
Agent IDP
For AI engineers
Catalog of agents with identity, ownership, governance. Trajectory logs, replayable and auditable. Per-agent cost dashboards. Skill, prompt, instruction registry with PR-based approvals.
The core components

Twelve concerns. Four families. One coherent platform.

Portal
Backstage OSS
Single pane of glass for devs and agents.
Catalog
Software Catalog
Source of truth, services + APIs + agents.
Scaffolder
Ohorizons/ohorizons-golden-paths
22 templates, sibling repo, versioned apart.
GitOps
Argo CD
Declarative deploys, no kubectl from laptops.
Runtime
Azure Kubernetes Service
Private API, autoscaler, Workload Identity.
IaC
Terraform, 18 modules
Tags mandatory, reproducible Azure infra.
Observability
Prometheus + Grafana + Loki
7 per-layer dashboards out of the box.
Identity
Entra + Workload Identity
Zero secrets in pods, per-agent identity.
Secrets
Azure Key Vault + CSI
Private endpoint, secrets as files.
AI Runtime
Microsoft AI Foundry
gpt-5-1, gpt-5-4-pro, Agent Framework.
Agent layer
.github/ · H1+H2+H3 integrators
19 agents that ship + integrate the Horizons.
Policy
OPA + Gatekeeper + Trivy + tfsec
Continuous compliance, admission control.
Developer experience
Infrastructure
AI runtime + agents
Governance
Anatomy of the accelerator

Two repositories under github.com/Ohorizons.
The agents in .github/ are what makes this an accelerator, not a template.

Ohorizons / ohorizons public · main
ohorizons/  # the platform (this repo deploys the accelerator)
  terraform/    18 reusable Azure modules
  backstage/    Backstage app + plugins + FastAPI agent APIs
  argocd/       GitOps app-of-apps
  .github/      ★ 19 agents · 27 skills · 16 prompts · 13 instructions
                the implementer + integrator team
  mcp-servers/  Model Context Protocol tool servers
  policies/     OPA, Gatekeeper, tfsec rules
  prometheus/   Recording + alerting rules
  grafana/      Pre-built dashboards (7 per layer)
  docs/         Architecture, runbooks, TechDocs
  scripts/      End-to-end deploy + validation automation
Ohorizons / ohorizons-golden-paths public · main
ohorizons-golden-paths/  # versioned independently
  h1-foundation/
    landing-zone/
    azure-module/
  h2-enhancement/
    microservice-{nodejs,python,dotnet,go}/
    frontend-react/, api-openapi-first/
    database-postgres/, event-worker/
    techdocs-site/, library-typescript/
  h3-innovation/
    agent-maf/, agent-sk/
    mcp-server/, eval-job/, skill/
    rag-application/
    multi-agent-system/
    foundry-agent/

# Consumed by Backstage Scaffolder via
# location entries in the platform repo
★ Why this is an accelerator
The agents in .github/ are the implementer + integrator team that wires H1, H2, H3 together. They provision infrastructure, register Golden Paths from the sibling repo, onboard agents, configure dashboards, and verify each phase. Without these agents, this would be a template. With them, it is an accelerator.
PART
III

Platform engineering 101.

The discipline of building internal products for internal developers. Not DevOps rebranded.

Definitions that matter

Four terms the executive team must agree on before signing a platform charter.

01Platform engineering. The discipline of designing and building toolchains and workflows that enable self-service for software engineering in the cloud-native era.
02Internal Developer Platform, IDP. The product that platform engineers build. To developers what Azure plus GitHub plus Backstage is to an enterprise, one paved road that abstracts complexity.
03Golden Path. A pre-defined, opinionated way to do a common task. How a platform expresses "here is how we do it here."
04DevEx. The measurable quality of being a developer in your org. DORA, SPACE, time-to-first-PR. What we instrument.
The five pillars of a modern IDP

Every IDP, including Open Horizons, must deliver these.

01
Service Catalog
Single source of truth for what exists. In OH, Backstage Software Catalog.
02
Software Templates
Scaffolding new things in minutes. Golden Paths organized by H1, H2, H3.
03
TechDocs
Docs as code next to the code, published via the portal. No more wiki rot.
04
Observability
Dashboards, alerts, logs, traces in the developer context, not a separate tool.
05
Governance
Policy as code, RBAC, secrets, supply chain by default, not as an afterthought.
The product mindset

The single biggest mistake is treating platform as an infra project.
It is a product. Developers and agents are both customers.

Infrastructure mindset
  • "We deployed Kubernetes."
  • "The cluster is up."
  • "Devs should read the docs."
  • Annual roadmap.
  • Success equals uptime.
Product mindset
  • "12 teams onboarded, 8 deploying daily. 14 agents in catalog, 9 running production traffic."
  • "Time-to-first-PR is 3 days. Time-to-first-agent-trajectory is 90 minutes."
  • "We watched 5 devs and 3 agent authors onboard, here is where they got stuck."
  • Monthly product reviews with engineering and agent-author customers.
  • Success equals adoption + DevEx + AgentX metrics.
DORA + SPACE, the metrics that matter

You cannot improve what you do not measure. Two frameworks, shipped out of the box.

DORA · the four keys
DELIVERY
01 · Velocity
Deployment frequency
Elite: multiple times per day.
02 · Velocity
Lead time for changes
Elite: under one day commit to prod.
03 · Stability
Change failure rate
Elite: under 15 percent.
04 · Stability
Mean time to recovery
Elite: under one hour.

OH dashboards: Backstage DORA Four Keys plugin · Grafana L2 board · alert rules on the four keys.

SPACE · the human side
DEVEX
S
Satisfaction and well-being. Burnout signal, sentiment, retention.
P
Performance. Quality of output, defect escape rate.
A
Activity. Volume of work, PRs, reviews, commits.
C
Communication and collaboration. Cross-team flow.
E
Efficiency and flow. Focus time, context switches, time-to-merge.

DORA measures the system. SPACE measures the people running it. Both surface in the same Grafana board, both gate platform releases.

Anti-patterns to avoid

Five mistakes that kill platforms. All preventable.

01If we build it, they will come. They will not. Adoption is a product motion. Sell internally, train, iterate.
02One Golden Path to rule them all. Different stacks need different paths. Open Horizons ships multiple per Horizon.
03Security is someone else's problem. Bolt it on later and you never catch up. Defaults are the policy.
04Two SREs in a corner. Platforms need PMs, designers, engineers. Ratio target, 1 per 15 to 25 app engineers.
05We will add AI later. Retrofit agent governance is harder than building it in. Treat agents first-class on Day 1.
PART
IV

The rise of Agentic DevOps.

From code completion to chat to production agents. The enterprise governance problem just exploded.

A three-year arc

From autocomplete to autonomy in three release cycles.

2022
Code completion
GitHub Copilot makes autocomplete intelligent. Humans still drive every step.
2023, 2024
Chat assistants
GitHub Copilot Chat, ChatGPT enter the IDE. Each interaction one-shot. No memory, no tools, no autonomy.
2025, 2026
Agentic systems
Agents plan, call tools, persist memory, run for minutes or hours, and ship work. The governance problem just exploded.
What makes a system agentic

Four properties of a minimum-viable agent.

01
Goal-directed
It decomposes a request into a plan and reasons over it.
02
Tool use
It can call APIs, search code, read files, invoke deployments.
03
Memory
It remembers across turns, sessions, runs. The context platform.
04
Governed autonomy
It can act, but within identity, RBAC, and policy constraints.
Why "just use GitHub Copilot" is not enough

GitHub Copilot ships the primitives. Open Horizons ships the enterprise control plane that unifies them.

Enterprise need GitHub Copilot 2026 What GitHub Copilot ships · gap OH Agent IDP
Per-team cost visibility Partial Team-level metrics API + Cost Centers (May 2026). Min 5 active users/day, no hierarchical chargeback.1 Yes
Trajectory logging and replay Partial Agent-Logs-Url trailer + OTel export (Mar 2026). Coding-agent only, not unified across IDE/CLI surfaces.2 Yes
RBAC: which agent touches which data Partial Content Exclusion + Agent Control Plane GA Feb 2026 + MCP allowlist. Not enforced on Cloud Agent, CLI, Agent Mode yet.3 Yes
Approved skill, prompt, instruction registry Partial Org custom instructions GA (Apr 2026) + BYOR MCP + custom agents repo. Federation of files, no single registry UI.4 Yes
Compliance-grade audit logs Partial SOC 2 Type 1 + ISO 27001 in scope. 180-day retention only; long-term needs Splunk/Event Hubs streaming.5 Yes
Custom long-running business agents Limited Custom Agents (Markdown profiles) bounded to dev workflows. Business-domain agents need Copilot Studio / Foundry.6 Yes
Integration: catalog, observability, policy Limited OpenTelemetry native. Backstage plugin is community-maintained. No first-party OPA integration.7 Yes
Sources (GitHub Docs & Changelog, May 2026): 1 Team-level usage metrics API + Cost centers. 2 Agent-Logs-Url trailer + GitHub Copilot SDK OTel. 3 Enterprise AI Controls GA + Content Exclusion. 4 Org instructions GA + Custom agents. 5 SOC 2 + ISO 27001 + Agentic audit events. 6 Custom agents docs. 7 VS Code GitHub Copilot OTel + Backstage community plugin.

GitHub Copilot ships the primitives. Open Horizons unifies them with your catalog, observability stack, policy engine, and existing IDP.

The four pillars of Agentic DevOps

Open Horizons is built around these four. Every customer inherits them.

Pillar 1
Identity for agents
Every agent has a service principal in Entra ID, a catalog entry, scoped RBAC, a cost center tag. "What is this agent allowed to do?" has an audited answer.
Pillar 2
Context engineering
The six-layer Context Platform Stack. MCP servers, three-tier memory, RAG, prompt cache, scope guards. Grounded in 25+ peer-reviewed papers.
Pillar 3
Trajectories and cost
Every agent run produces a structured, replayable, evaluable trajectory. Cost per agent, per team, per task. The black box recorder.
Pillar 4
Spec-driven dev
Specky, 10-phase pipeline. Init, Discover, Specify, Clarify, Design, Tasks, Analyze, Implement, Verify, Release. Intent stays anchored.
Production agents today

What enterprises are doing with Open Horizons agents right now.

@deploy
End-to-end deploy
Orchestrates Terraform plan to AKS smoke tests. Picks up where a human SRE would. Always with trajectory.
@reviewer
Code review
SOLID, security, naming, refactoring opportunities. Posts PR comments.
@pipeline
CI diagnosis
Diagnoses GitHub Actions failures using real workflow run data.
@sentinel
Quality gates
Analyzes CI checks, coverage gaps, and quality gates on PRs.
@security
Compliance audit
Runs OWASP, RBAC, secrets, compliance audits across the repo.
@sre
Incident triage
Generates runbooks, configures SLOs, triages incidents end-to-end.
@compass
Epic decomposition
Decomposes epics into INVEST user stories and creates GitHub Issues.
@docs
Documentation
Keeps documentation, runbooks, and ADRs current.

These are not demos. They are production agents running in customer environments with identity, governance, and cost controls.

What goes wrong without governance

Every failure mode below is a governance failure, not a model failure.

01Shadow agents running on personal credentials, leaking IP into public model endpoints.
02Unattributed spend. Surprise OpenAI bills with no per-team breakdown.
03Hallucinated changes reaching production without traceability.
04Prompt injection attacks through agents that fetch untrusted content.
05Compliance failures. Auditors cannot answer "who changed this and why?"

Open Horizons solves the governance failure. The model gets to do its job.

From category to one concrete stack

Agentic DevOps is the category. GitHub + Azure + Foundry is the stack. Open Horizons is the opinionated assembly.

01 · Category
DISCIPLINE
Agentic DevOps
vendor-neutral · 2025+

A discipline, not a product. Agents are first-class developers with identity, RBAC, cost ceilings, and trajectory replay. Vendor neutral.

choose
products
02 · Stack
PRODUCTS
GitHub + Azure + Foundry
Microsoft building blocks
AKS GHAS GitHub Copilot AI Foundry Entra ID Sentinel Purview
opinionate
+ wire
DAY 1 READY
03 · Assembly
OSS
Open Horizons
paved-road · governed · reproducible

Working Backstage + Azure + Foundry in your tenant under 3 hours. 22 Golden Paths, 19 agents, 12 MCP servers, governed and observable on Day 1.

Next part unpacks the middle box. We name every component, show how it fits, and end with the loops that make GitHub + Azure better together than either alone.

PART
V

The Microsoft Agentic DevOps stack.

GitHub, Azure, AI Foundry. The three pieces Open Horizons orchestrates into one platform.

Evolving DevOps

Twenty years, three definitions. Each one adds, none removes.

DevOps evolution timeline 2007 DevOps 2016 DevSecOps 2025 Agentic DevOps
DevOps
"Union of people, process, and technology to enable continuous delivery of value to end users."
The original. Broke down silos between Dev and Ops. Made deploys daily, not quarterly.
DevSecOps
"Union of people, process, and technology with security as a shared responsibility to enable continuous delivery of value to end users."
Security shifted left. Scans, policies, identity baked into the pipeline.
Agentic DevOps
AI-powered agents operating as members of your dev and ops teams, automating, optimizing, and accelerating every stage of the software lifecycle.
Agents shifted in. Code, review, deploy, operate alongside humans, governed.

Source: Microsoft. Each layer survives. Agentic DevOps requires DevSecOps requires DevOps.

Agentic DevOps defined

Autonomous and semi-autonomous agents work alongside developers and operators
across every stage of the software lifecycle.

Agents solve routine and complex tasks together, bringing apps to market faster, increasing code quality and security, removing repetitive development work, reducing technical debt, and reframing the economics of operating, maintaining, and modernizing apps in production.

Through Agentic DevOps, developers orchestrate a series of agentic services with the freedom to focus on higher-value creative work, while operators proactively identify, mitigate, and resolve issues in production.

Developer
Orchestrates agents. Owns intent.
Agent
Executes routine + complex tasks.
Operator
Identifies, mitigates, resolves.
Microsoft + GitHub, the platform for AI innovation

One platform across the full software lifecycle. Policy and governance wrap every stage.

Policy and governance
Plan
  • Planning Agent
  • GitHub Issues
  • Spaces
Code
  • Agent Mode
  • Coding Agent
  • Spark Workbench
Verify
  • Autofix
  • Code Review
  • Playwright
  • Pull Requests
Deploy
  • GitHub Actions
  • Spark Runtime
  • AI Workflows
Operate
  • Metrics
  • Models
  • SRE Agent
Integrations + MCP  ·  Anthropic · OpenAI · Atlassian · Docker · VS Code

Source: Microsoft. Open Horizons consumes this stack and wires it into Backstage. Every named asset above appears in the catalog with an owner.

Agentic DevOps for Azure and GitHub

Three workflows. Code, Collaborate, Operate. Each one is the agent + the human working the same surface.

Agentic DevOps for Azure and GitHub CODE VS Family VS Code · Visual Studio + Agent Mode GitHub Copilot in IDE + GitHub Copilot for Azure Azure-native suggestions COLLABORATE MCP Server Coding Agent GitHub Issues via MCP Server GitHub Repos source of truth GitHub Actions CI/CD + scans One repo · one PR · one audit trail Dev writes intent. Agent drafts code. Both review through the same PR. OPERATE SRE Agent Incident triage + runbooks Azure AKS · App Service · Functions Azure AI Foundry Models + Agent Service + Observability
GitHub and Azure work better together

Six concrete loops where GitHub + Azure remove a class of integration work.

GitHub Copilot Modernization
Build apps
GitHub Copilot App Modernization builds applications and refactors legacy code into AKS and App Services.
Foundry Apps
Manage AI services
Customers adopting AI and Agents build AI Apps and use Azure AI Foundry to manage their AI services in one place.
PGSQL preference
AI-app data layer
Developers building AI Apps prefer PostgreSQL with pgvector, leading to use of Azure Database for PostgreSQL.
Dev with GitHub Copilot
Code to prod
Developers using AI and Agents with GitHub Copilot write the code that ships to production and integrates across Azure services.
GHAS + Defender
Secure the loop
GitHub Advanced Security integrates with Microsoft Defender for Cloud to identify vulnerabilities and remediate with agents through GitHub Copilot.
SRE Agent
Incident response
SRE Agent automates incident response and monitoring, creating GitHub Issues and collaborating with GitHub Copilot's coding agent to fix problems.
Azure AI Foundry

App platform for a multi-model world. The model + agent + observability surface Open Horizons consumes.

Azure AI Foundry architecture Dev experience Copilot Studio Visual Studio GitHub Foundry SDK Azure AI Foundry Foundry Models Foundry Agent Service Azure AI Search Azure AI Services Azure Machine Learning AI Content Safety Foundry Observability Security and governance Cloud + Edge Azure Azure Arc Foundry Local AKS · App Service ·Container Apps · Functions

Open Horizons consumes Foundry Models for inference, Foundry Agent Service for orchestration, and Foundry Observability as the data source for the L6 harness telemetry.

Two protocols, two scopes
MCP A2A
MCP is about Agent → Tool interactions
A2A is about Agent → Agent interactions

MCP, the Model Context Protocol from Anthropic, is the standard for how an agent calls a tool or retrieves context. A2A v1.0 is the standard for how one agent hands off to another, propagating state and trace context. Open Horizons ships 12 MCP servers and uses A2A v1.0 in the L6 harness so multi-agent workflows are observable end to end.

Spec-Driven Development

Four phases that turn a vague idea into a verifiable spec, ready for agent execution.

01 · Specify
What and why
Define user stories, goals, success criteria. Not tech specs. Source of truth for the change.
02 · Plan
How
Tech stack, architecture, data models, integration with legacy systems, compliance constraints.
03 · Task
Break it down
Reviewable, testable, specific. Implementable and verifiable in isolation. TDD-friendly.
04 · Implement
Execute
Each task implemented individually. Review, test, approve. Continuous validation against the spec.
.specs/042-pricing/
CONSTITUTION.md      # non-negotiables
SPECIFICATION.md     # EARS reqs, user stories
PLAN.md              # architecture, data models
TASKS.md             # 22 reviewable units
DIAGRAMS/            # 4 mermaid + 2 svg
ADRs/                # decision records

@implementer picks T-001..T-022 in sequence
@reviewer   gates each PR against SPEC
scope-guard blocks file edits outside .specs/042-*

Source: spec-kit pattern, used by Microsoft Specky and Open Horizons SDD pipeline. The spec is the contract between human intent and agent execution.

GitHub Copilot, ten foundational use cases

The starter menu. What every team uses on day one, before the first agent ships.

01
Code Completion
Suggests lines or blocks based on context.
02
Refactoring
Improves efficiency, readability, maintainability.
03
Documentation
Generates doc files and inline comments.
04
Test Generation
Writes test cases, reduces TDD friction.
05
Bug Fixing
Identifies and fixes errors quickly.
06
Code Conversion
Translates between programming languages.
07
IaC + Automation
YAML, Docker, Terraform, Bicep, ARM.
08
GitHub Copilot CLI
Bash, Git, GitHub scripts on demand.
09
SQL Optimization
Writes and improves SQL across engines.
10
Learning
Acts as a mentor for best practices and concepts.

Source: Microsoft GitHub Copilot fundamentals. Open Horizons treats these as the on-ramp. Once teams adopt them, the next step is custom agents through Foundry Agent Service.

GitHub Copilot, use cases by persona

Five roles. Different prompts, same plumbing. The platform is one.

Developer
  • Code completion
  • Boilerplate generation
  • Multi-language support
  • Refactoring
  • Debugging
  • API integration
  • Unit test generation
QA Engineer
  • Test case generation
  • Edge case suggestions
  • Test data generation
  • Bug reproduction
  • Mock + stubbing
  • Regression tests
  • Coverage improvement
DBA
  • SQL optimization
  • Schema design
  • Stored procedures
  • Indexing
  • Data migration
  • Backup + recovery
  • Query debugging
DevOps
  • IaC (Terraform, Bicep)
  • CI/CD pipeline YAML
  • Log parsing
  • Containerization
  • System monitoring
  • Incident response
  • Shell scripting
Security
  • Secure coding
  • Threat modeling
  • Policy enforcement
  • Pen-testing scripts
  • Log analysis
  • IAM config
  • Crypto guidance
✓ Accelerates workflows
Suggestions, boilerplate, automation.
✓ Reduces cognitive load
Less syntax memorization.
✓ Enhances learning
Real-time examples and best practices.
✓ Improves efficiency
Less context switching, faster cycles.
Better together for AIOps

Four loops where GitHub + Azure AI Foundry produce outcomes neither delivers alone.

Loop 1
Code-first AI dev
Application code, model configurations, prompt engineering all in one repo. AI Toolkit + AI Foundry extension lets devs work locally before deploying.
Loop 2
Build, test, deploy
Automate AI model deployment and agent workflows. Deploy prompt flows, evaluations, and monitoring to Azure AI Foundry.
Loop 3
Enterprise security
Detect vulnerabilities early with CodeQL + Dependabot. Add observability, safety filters, evaluation frameworks for responsible AI.
Loop 4
End-to-end orchestration
Automates deployment of validated models and agents directly. Unifies AI orchestration across models and tools with chaining, memory, planning.
Faster innovation
Robust security
Enterprise governance
Scalable AI adoption
Where Open Horizons fits

Microsoft ships the building blocks.
Open Horizons assembles them into one governed platform on your tenant.

GitHub gives you GitHub Copilot, Actions, Issues, Advanced Security. Azure gives you AKS, Entra, Key Vault, App Insights, Defender. Azure AI Foundry gives you Foundry Models, Foundry Agent Service, Foundry Observability. Each piece is excellent on its own and well documented. The work that takes 9 to 18 months in most enterprises is gluing them together into a single, governed, opinionated, reproducible platform with paved roads and a catalog. That is the work Open Horizons has already done. You consume the result.

What writing a spec looks like

VS Code, Specky extension, the .specs/ folder open.
Intent first, then plan, then tasks, then code. In that order.

042-pricing-engine · ohorizons · main
EXPLORER
▾ .specs/
▾ 042-pricing-engine/
📋 CONSTITUTION.md
📄 SPECIFICATION.md
🏗 PLAN.md
✓ TASKS.md
📊 DIAGRAMS/
📝 ADRs/
▸ src/
▸ tests/
▸ helm/
📜 catalog-info.yaml
SPECIFICATION.md
CONSTITUTION.md
1# Pricing Engine v2 · SPECIFICATION
2
3## Goal
4Reprice 1.2M SKUs in ≤4h, respecting margin floor.
5
6## EARS Requirements
7REQ-001 WHEN a SKU has margin < 8%,
8 THE SYSTEM SHALL hold the prior price.
9REQ-002 WHEN upstream cost increases >15%,
10 THE SYSTEM SHALL route to @reviewer first.
11
12## Out of scope
13· Promo engine (separate spec 043)
14· Currency conversion (handled upstream)
15
GitHub Copilot · Specky
P
REQ-002 mentions @reviewer. Should that fire on every price change or only when cost change > 15%?
C
Spec uses WHEN ... THEN, so guard fires only at the threshold. I would tighten the wording:
REQ-002 WHEN
  cost_delta > 0.15
  THE SYSTEM SHALL
  route("@reviewer")
✓ Spec syntax valid. Adds to TASKS.md as T-009.

The dev writes intent in natural language. Specky validates EARS, suggests refinements, and queues implementation tasks. Code does not start until the spec compiles.

Part VI · Day in the life
VI

How developers and agents actually use the portal.

A walkthrough with mockups, flows, and one collaboration diagram. So you can picture it before you build it.

The developer's portal, simulated

What a developer sees on a Tuesday morning. One pane. Everything wired.

backstage.ohorizons.ai/catalog
Open Horizons
  • Catalog
  • Create new
  • Golden Paths
  • TechDocs
  • AI Agents
  • FinOps
  • Dashboards
Software Catalog
46 entities · owned by 5 teams · all healthy
Name Kind Owner DORA
formulation-serviceServiceteam-rnd● Elite
batch-trackingServiceteam-quality● Elite
@deployAgentteam-platform● High
@reviewerAgentteam-platform● High
storefrontServiceteam-commerce● Medium

Services and agents in one catalog. Same owner, same lifecycle, same DORA scoring. An agent is not a special creature, it is a component with a model.

How a developer uses the portal

From "I need a new microservice" to "it is live in production." Seven steps, fully self-service.

Developer flow through the portal DEV PORTAL SCAFFOLDER GITHUB ARGO CD AKS 👤 UI {} PR k8s 1 · OPEN "Create new service" 2 · PICK PATH microservice-python · fill params 3 · GENERATE repo + workflow + helm + IaC + catalog-info.yaml 4 · CI runs · scans · build 5 · MERGE manifest pushed to env repo 6 · SYNC declarative deploy to cluster 7 · "your service is live" notification with portal link

Total developer effort: ~5 minutes of clicks. Total wall-clock to production: ~12 minutes. No tickets opened.

The agent's portal, simulated

What an agent author sees. Same UI, different tab. Trajectory + tokens + verdict in plain view.

backstage.ohorizons.ai/agents/@reviewer/trajectories/trj-9f2a
@reviewer · trj-9f2a-b81e
PR #2418 · pricing-engine · 2026-05-22 09:14:08Z
● SUCCESS
REPLAYABLE
Model
gpt-5-1
Tokens
8,412 in · 1,902 out
Cost
$0.041
Latency
14.2s
Safety
● clean
Trajectory timeline
+0.0s
plan → break PR into 3 review passes
+0.4s
tool → github.get_pr_files (12 files, 412 LoC changed)
+2.1s
tool → mcp.codemap.lookup ("pricing-engine")
+3.8s
cache hit → SOLID rubric (saved $0.018, 1.4s)
+9.0s
model → review_pass_security (gpt-5-1, 2,104 tok)
+13.6s
tool → github.post_review_comment (×3)
+14.2s
verdict → SUCCESS (3 suggestions, 0 blockers)

This is the black box recorder. Click any step to inspect the prompt, the tool args, the model response. Replay deterministically to debug.

How an agent uses the portal

An agent is invoked by an event. The harness wraps the run. The trajectory lands in the catalog.

Agent flow through the platform EVENT HARNESS AGENT MCP TOOLS FOUNDRY TRAJECTORY L6 @x { } LLM log 1 · WEBHOOK PR opened on repo 2 · WIF + budget check 3 · INVOKE @reviewer with scoped RBAC token 4 · TOOL CALLS github.get_pr · mcp.codemap.lookup 5 · MODEL CALL routed via .github/model-routing.yaml 6 · llm.call.completed · 21 fields · App Insights

Every step is recorded. The agent never bypasses the harness. The trajectory is what the auditor reads.

How developers and agents interact

A single PR. Two collaborators. Same catalog, same governance. The developer writes intent, the agent writes the boring part.

DEV LANE
Paula
Software Eng
Owns
Intent · spec · judgement call
SHARED · One PR · One spec · One catalog
1 · DEV WRITES
.specs/042-pricing
CONSTITUTION + EARS
2 · AGENT CODES
@impl drafts PR
scope-guard enforced
3 · AGENT REVIEWS
@reviewer + @security
comments on PR
4 · DEV DECIDES
accepts 2/3 fixes
rejects 1 with reason
5 · MERGE
@deploy
ArgoCD → AKS
Same PR · Same run log · One audit trail
spec-id 042-pricing · PR #2418 · trajectory trj-9f2a-b81e · 3 reviewers (1 human, 2 agents) · evidence → SOC 2 · ISO 27001 · NIST AI RMF
AGENT LANE
@reviewer
+ @security · @impl
Owns
Code · review · the boring part

The platform refuses to record an agent action without a PR + spec link, and refuses to record a human action without a PR + reviewer. Same governance, both sides.

Portal surfaces, one click away

Six entry points covering the full developer + agent workflow.

Foundry Control
Live agents and models. The Toolbox: 12 MCP servers, 4 built-ins. 3-tier cache. Model routing.
AI Agents
Every agent: what it consumes, its model tier, its cache hit-rate, its trajectories.
FinOps & Tokens
L6 dashboards: calls/min, $/day by agent, token budget, hook actions, Purview audit.
Software Catalog
Every Component, API, Resource, System, Agent registered in the portal. One source of truth.
Golden Paths
The 22 Scaffolder templates. rag-application, multi-agent-system, foundry-agent, and more.
All Dashboards
Grafana folder tagged context-platform. L1 to L6, DORA, FinOps, agent fleet, eval scores.
The unifying principle

One catalog. One identity model. One audit trail.
Two kinds of users.

A developer scaffolds a service via a Golden Path. An agent is invoked by an event, calls tools through the harness, writes a trajectory. Both go through the same portal, the same RBAC, the same observability stack. The agent is not a side-car or a chatbot, it is a first-class platform citizen registered in the catalog with an owner, a cost center, a runbook, and an SLO. Treat it that way from Day 1 and the 95 percent pilot failure rate becomes a 5 percent problem.

From concept to pixels

You have seen the diagrams.
Now look at the actual product.

The next ten slides are mockups of the real ohorizons.ai screens. Same chrome, same data shapes, same agents. You can open the URL on your laptop and recognize every panel. If a screen looks unfamiliar, that is the gap. If a screen makes you say "we need that," that is the opening for a Discovery.

→ Landing page
→ Maturity framework
→ Command dashboard
→ Create templates
→ AI Chat
→ AI Impact
→ Foundry Control
→ 17 Platform Agents
Part VII · Live tour
VII

Inside the platform.

Eight UI surfaces, simulated from the real ohorizons.ai showcase environment.

ohorizons.ai · landing

The public showcase. Same brand chrome devs see inside the portal.

ohorizons.ai
Open Horizons
PlatformDifferentiatorsArchitectureFAQ
● Agentic DevOps Platform · Open Horizons

The platform that accelerates the Agentic SDLC

AI-powered developer portal with Golden Paths, intelligent agents, and full observability, built on Backstage, Azure, and GitHub.

Sign in with GitHub
Explore Platform →
22
Golden Paths
17
AI Agents
15
MCP Servers
AI
Insights
GitHub Copilot CLI · @deploy agent
$ @deploy platform --env prod
Initializing deployment agent...
@deploy → Provisioning AKS cluster
Applying 18 Terraform modules...
Key Vault, Networking, Defender...
✓ H1 Foundation, 32m 14s
@deploy → H2 Enhancement
ArgoCD, Backstage, Prometheus
Loading 22 Golden Path templates
✓ H2 Enhancement, 28m 47s
@deploy → H3 Innovation
AI Agents, MCP Servers, RAG
✓ H3 Innovation, 18m 22s
ohorizons.ai · AI Maturity Framework

28 capabilities across 3 pillars, measured from Traditional (L0) to Agentic (L4).

P1
Developer Productivity
8 capabilities
  • AI Coding Assistants + GitHub Copilot
  • Dev Environment Standardization
  • Code Review Automation
  • Testing + Test Generation
  • Knowledge Management
  • Developer Experience
  • Onboarding + Time to Productivity
  • Code Quality + Technical Debt
P2
DevOps Lifecycle
10 capabilities
  • CI/CD Pipeline Automation
  • Test Automation + Quality Gates
  • Security Scanning + Compliance
  • Release Management + Deployment
  • Observability + Monitoring
  • Incident Response + Management
  • IaC + GitOps
  • DORA Metrics + Performance
  • Agentic DevOps + SRE Automation
  • Deployment Frequency + Velocity
P3
Application Platform
10 capabilities
  • Cloud Architecture + Infrastructure
  • Platform Engineering + IDP
  • AI/ML Operations (MLOps)
  • Model Evaluation + AI Safety
  • RAG Systems + Knowledge Retrieval
  • AI Agents + Orchestration
  • Data Management + Governance
  • API Management + Service Mesh
  • Disaster Recovery + BC
  • Cost Optimization + FinOps
L0
Traditional Manual, ad-hoc, no AI tools
L1
AI-Assisted Piloting AI tools, limited standardization
L2
AI-Enhanced Significant AI integration, managed processes
L3
AI-Optimized AI agents, predictive capabilities
L4
Agentic Multi-agent systems, self-healing, autonomous
ohorizons.ai · Command Dashboard

Agentic DevOps Command. The hub a platform lead opens every morning.

Open Horizons
DashboardCatalogDocs
🔍 Search resources...
PS
Paula Silva
GitHub User
Home
Catalog
APIs
Docs
Create
Graph
Cost Insights
Validation
Platform
GitHub Copilot Metrics
DORA Metrics
Security + Quality
Tech Debt
Platform Status
Intelligence
AI Chat
AI Impact
● Platform Overview
Agentic DevOps Command
Your central Agentic DevOps hub for the entire SDLC. Monitor platform health, CI/CD pipelines, and KPIs across Azure, GitHub, and Azure DevOps.
View All Metrics
Quick Actions
14
Active projects
124
Deployments
42
Team members
94%
Health score
Active deployments
1,284
↑ +12.5% from last week
Success rate
99.4%
↑ +0.2% from last week
Avg lead time
42m
↓ -5.4% from last week
Open incidents
3
↓ -2 from last week
ohorizons.ai · Create

Software Templates. Every Golden Path is one click away.

H1 Foundation
H2 Enhancement
H3 Innovation
All 22
gitops-config
H2 · Create GitOps Deployment Configuration
Create GitOps deployment configs. Generates ArgoCD Application manifests, Kustomize overlays, and multi-env pipelines.
h2-enhancementgitopsargocd
👤 Platform EngineeringCHOOSE
ai-agent
H3 · Create AI Foundry Agent
Create an autonomous AI agent powered by Azure AI Foundry. Tool definitions, RAG integration, safety controls, Agent Service deploy.
h3-innovationfoundryagentic
👤 Platform EngineeringCHOOSE
multi-agent-system
H3 · Create Multi-Agent System
Production multi-agent AI system. Orchestration, collaboration patterns, human-in-the-loop. AutoGen, Semantic Kernel, custom frameworks.
h3-innovationmulti-agentsemantic-kernel
👤 Platform EngineeringCHOOSE
ohorizons.ai · AI Chat

One conversation. Six specialised agents. Click a suggestion to see the orchestrator route.

@pipeline
@sentinel
@compass
@guardian
@lighthouse
@forge
OH
● Orchestrator
Hello, I'm the Open Horizons Assistant. I coordinate 6 agents. Try a suggestion below or @mention an agent directly.
Try one
@pipeline check the build on ohorizons
@sentinel show test status on main
@compass decompose epic: user auth with SSO
@guardian scan security on ohorizons
@lighthouse show error rate on prod
@forge describe deploy-orchestrator pod
ohorizons.ai · AI Impact

Measure the real impact of AI and Agentic DevOps. Adoption, productivity, velocity, quality, in one place.

● AI Impact
AI Impact Dashboard
Measure the real impact of AI and Agentic DevOps on your SDLC. Powered by GitHub APIs, KPI engine, and Claude Opus.
Run AI Analysis
Refresh Data
45/100
Impact score
3.33/day
Deploy freq
42
Contributors
12
Insights
📈 Key Performance Indicators
Adoption
GitHub Copilot seat utilization
68%
Adoption rate
Productivity
GitHub Copilot effectiveness
42%
Acceptance rate
Velocity
Development speed
3.33
Deploys/day elite
Quality
Reliability + security
2.5%
Change failure rate low

Powered by GitHub APIs + KPI engine + Claude Opus. Ask the AI Impact Analyzer in natural language.

ohorizons.ai · Foundry Control

The agents-service gateway. Live readings from the L3 + L6 runtime.

Context Platform · L3 Context Engineering · L6 Harness
Foundry Control · the agents-service gateway

The foundry-agents service in namespace ai-services is the runtime that fronts Azure AI Foundry: serves agents, fronts the model router, aggregates the MCP Toolbox, runs the 3-tier semantic prompt cache, applies pre/postToolUse hooks, emits 21-field telemetry, writes Purview audit.

Gateway
HEALTHY LIVE
Gateway /healthz
live probe via in-cluster proxy
4 LIVE
Agents registered
from /v1/agents
15 LIVE
Toolbox tools
11 MCP + 4 built-in
94% LIVE
Prompt cache hit-rate
semantic, threshold 0.93
MCP Toolbox · 11 MCP + 4 built-ins · /v1/toolbox/*
Tool
Description
mcp.github
MCP server 'github', PR + Issues + Actions
mcp.azure
MCP server 'azure', resources + cost
mcp.terraform
MCP server 'terraform', plan + apply
mcp.foundry
MCP server 'foundry', model deploy + agent service
mcp.aks
MCP server 'aks', cluster + pods + services
mcp.backstage
MCP server 'backstage', catalog + scaffolder
builtin.web_search
Foundry built-in web search, Bing-grounded
builtin.azure_ai_search
Foundry built-in Azure AI Search retrieval
a2a.platform
Agent-to-Agent connection to 'platform' (A2A v1.0)
ohorizons.ai · Platform Agents

The 17 GitHub Copilot domain agents that run the platform. Each one owned, tiered, cached.

Agent
Owner
Model · tier
Calls/24h
Cache
What it does
router
three-horizons-platform
gpt-4o-mini CHEAP
2,410
97%
Routes each request to cost-optimal tier; A2A fan-out
doc-writer
three-horizons-platform
gpt-4o-mini CHEAP
780
93%
Generates ADRs / RFCs / READMEs / TechDocs
code-reviewer
three-horizons-platform
gpt-4o WORKHORSE
660
90%
Reviews PRs against repo standards + security-insights
test-engineer
three-horizons-platform
gpt-4o-mini CHEAP
240
88%
Characterization / contract / equivalence tests; coverage
incident-responder
three-horizons-platform
gpt-4o WORKHORSE
210
71%
Triages incidents from logs/metrics/traces; runs SRE runbooks
infra-architect
three-horizons-platform
gpt-4o WORKHORSE
180
74%
Azure Well-Architected; Terraform module design
security-auditor
three-horizons-platform
gpt-4o WORKHORSE
160
69%
OWASP/CWE scanning, deps CVEs, secrets, hardening
terraform-agent
three-horizons-platform
gpt-4o-mini CHEAP
150
86%
Authors/refactors Terraform; plan/apply triage; drift detection
devops-agent
three-horizons-platform
gpt-4o-mini CHEAP
190
87%
CI/CD pipelines, k8s orchestration, GH Actions / Tekton
sre-agent
three-horizons-platform
gpt-4o-mini CHEAP
130
83%
SLOs, observability, error budgets; wires Prometheus/Grafana/Loki
github-integration
three-horizons-platform
gpt-4o-mini CHEAP
95
85%
Configures GitHub Apps, GHAS, Actions, Packages
ado-integration
three-horizons-platform
gpt-4o-mini CHEAP
70
84%
Azure DevOps PAT, repos, pipelines, Boards
hybrid-scenarios
three-horizons-platform
gpt-4o WORKHORSE
60
76%
Designs GitHub + ADO coexistence (scenarios A/B/C)
template-engineer
three-horizons-platform
gpt-4o-mini CHEAP
110
88%
Creates Golden Path templates; converts repos to templates
context-architect
three-horizons-platform
gpt-4o WORKHORSE
140
80%
Plans coordinated multi-file changes; maps context + dependencies
onboarding-agent
three-horizons-platform
gpt-4o-mini CHEAP
85
90%
Walks new users through prerequisites, config, first deployment
docs-agent
three-horizons-platform
gpt-4o-mini CHEAP
100
92%
Technical writing + knowledge management across platform docs

Tiered model routing keeps 76% of calls on the cheap tier. Workhorse only when the task demands it. Cache hits drive total spend down by an order of magnitude.

Try it live

The whole platform is at ohorizons.ai

Sign in with GitHub, explore the Catalog, click into Foundry Control, run an AI Impact analysis, or talk to the six agents. The showcase is the platform, deployed on a public Azure subscription, with the same code you would deploy on yours.

URL
ohorizons.ai
Public showcase environment
Sign-in
GitHub OAuth
Read-only access to the showcase
Source
github.com/Ohorizons
Same code as your deployment
Maturity radar · current state

Where most LATAM enterprises score today across the four engineering layers.
Pre-platform vs post-H2 Open Horizons.

Pre-platform · typical client baseline
INTENT CONTEXT PLATFORM INFRA
Intent 1 · Context 2 · Platform 2 · Infra 3
Post-H2 Open Horizons · 12 weeks in
INTENT CONTEXT PLATFORM INFRA
Intent 4 · Context 4 · Platform 4 · Infra 4

Maturity is decided by the weakest layer. The radar makes that visible. H1 lifts Infra, H2 lifts Platform, H3 lifts Context and Intent together.

What L4 and L6 actually measure

Two dials every platform team should read weekly. Intent debt and token spend distribution.

L4 · Intent debt index
MONITOR
0 50 100 34
0-40
OK
40-60
MONITOR
60+
ACTION

Distance between spec baseline and live agent behavior. Above 60 means the spec is stale and agents drifted.

L6 · Token spend · last 30d
5× SAVED
$4.2k 30-DAY SPEND
Cheap tier76%
Workhorse18%
Premium5%
Cache hit1%

Model routing keeps 76 percent of calls on cheap tier. Without routing, the same workload would cost ~5× more on workhorse default.

L4 measures whether agents are still doing the right thing. L6 measures how much it costs. Both numbers belong in the platform team's Monday review.

PART
VIII

The Context Platform Stack.

Six layers, integrated. Cloud, platform, context, intent, integration, harness.

The stack at a glance

Six layers, top to bottom. Intent flows down, telemetry flows up.

telemetry
L6 · Harness
SRE+FinOps+Sec
App Insights · pre/postToolUse hooks · A2A v1.0 · 21-field llm.call.completed · FinOps · Purview · Sentinel · model routing · Entra WIF.
Grafana L6 FinOps board
L5 · Integration
Platform integrators
Hybrid scenarios A/B/C · GitHub + Azure DevOps + Argo CD + MCP coexistence · API Center · catalog cross-links · integration agents.
Grafana L5 ArgoCD ↗
L4 · Intent
Spec authors
SDD + Specky pipeline · 103 EARS requirements · ADRs · pipeline-guard.yml LGTM gates · model-routing.yaml · scope guards.
Grafana L4 .specs/
L3 · Context
AI engineers
Foundry Toolbox · 12 MCP servers + 4 built-ins · 3-tier semantic prompt cache · enterprise_memory on pgvector · rag-application path.
Grafana L3 MCP APIs
L2 · Platform
Platform team
Upstream Backstage on AKS · 22 Golden Paths · RBAC plugin · DORA Four Keys · OPA Gatekeeper · the scaffolder.
Grafana L2 Catalog
L1 · Cloud
Cloud platform team
Terraform 18 modules · AKS · networking · Key Vault · ACR · PostgreSQL Flexible Server (pgvector) · Azure AI Foundry.
Grafana L1 terraform/
intent

Intent flows from L4 down into Golden Paths in L2 and agent behavior in L3. The harness in L6 wraps every model call. Integration in L5 is how GitHub, Azure DevOps, Argo CD, and MCP coexist. Everything sits on the Terraform-managed Azure foundation.

Layer 1 · Cloud and Infrastructure

The compute substrate. Eighteen Terraform modules, declared and reproducible.

AKS
Kubernetes 1.34
Private API server, autoscaler min=1, max=4, Workload Identity enabled.
ACR
Container registry
Admin disabled, managed-identity pull, signed images via cosign.
Key Vault
Secrets
Private endpoint, RBAC, CSI driver projects to pods as files.
PostgreSQL
Flexible Server
Private VNet, 30-day backup, pgvector enabled for memory and RAG.
Log Analytics
Telemetry
90-day retention, Container Insights, the App Insights sink for L6.
AI Foundry
Models
gpt-5-1, gpt-5-4-pro deployed by name. Routing handled at L4.
Ingress
NGINX + cert-manager
Let's Encrypt, 4 TLS ingresses Ready by default.

All resources tagged with customer_name, environment, cost_center for L1 + L6 FinOps roll-up.

Layer 2 · Platform Engineering

The developer experience layer. Upstream Backstage on AKS, 22 Golden Paths, full observability.

01Upstream Backstage OSS, portal + Software Catalog + Scaffolder + TechDocs.
02Argo CD app-of-apps for declarative GitOps deployments.
0322 Golden Paths covering H1, H2, H3 use cases, including rag-application, multi-agent-system, foundry-agent.
04RBAC plugin + DORA Four Keys plugin for governance and metrics in one UI.
05OPA + Gatekeeper admission control on every workload.
06Prometheus + Grafana + Loki + Alertmanager, the observability stack agents and humans share.

This is the layer that makes the developer productive. Agents consume L3 on top of it.

Layer 3 · Context Engineering

Give agents the right context, at the right time, in the right format.

Toolbox
12 MCP + 4 built-ins
Agents-service exposes curated functionality. Discoverable, governed, rate-limited.
Prompt cache
3-tier RediSearch HNSW
Collapses repeated reasoning into cache hits. Hit ratio per tier surfaced in Grafana L3.
enterprise_memory
pgvector long-term
Long-term agent memory on PostgreSQL + pgvector. Indexed for retrieval.
3-tier memory
user, repo, session
Scoped memory with explicit lifetimes. Cross-agent sharing via the Shared Context Store.
RAG index
rag-application path
Customer-specific. The Lumen demo indexes INCI, regulations, formulas.
CODEMAP.md
Program skeleton
A curated map the agents read first. Cuts cold-start token spend dramatically.
Layer 4 · Intent Engineering

Translate human intent into specifications the system can execute reliably.

01SDD / Specky 10-phase pipeline, artifacts in .specs/NNN-*/, from Init through Release.
02103 EARS requirements, 144 tasks, 18 diagrams, ADRs in the reference repo.
03pipeline-guard.yml, LGTM gates between phases. No skipping.
04.github/model-routing.yaml, declarative routing, cheap model for cheap task, premium for design.
05scope-guard.sh + preToolUse hooks, block out-of-scope file edits.
06scripts/measure-intent-drift.sh, distance between current behavior and spec baselines.
Layer 5 · Integration

Make GitHub, Azure DevOps, Argo CD and MCP coexist under one platform.

Scenarios
A, B, C
GitHub-only, ADO-only, Hybrid. Customer chooses, platform abstracts.
GitHub App
Discovery + scaffolder
Catalog discovery + scaffolder publish. App ID 3010479 in the reference deployment.
ADO connection
Workload Identity Fed
No PAT, no SP secrets. Federated credential.
Argo CD
Git-source agnostic
Consumes manifests from either source. The deployment authority.
API Center
Unified inventory
Single API inventory across GitHub + ADO repos.
Cross-links
Catalog ↔ everything
Backstage entities link to GitHub Issues, ADO Boards, Argo apps, Grafana dashboards.
Layer 6 · Harness Engineering

The runtime that wraps every model call. With L6, the agent becomes a governed production system.

How the harness wraps a model call
L6 HARNESS Agent request deploy-orchestrator preToolUse hooks scope guard · safety · route Model call gpt-5-1 · Foundry postToolUse hooks telemetry · audit · cache llm.call.completed · 21 fields model · tokens · USD · latency · agent · team · CC · spec · trajectory · cache · route · retries · safety · … App Insights live cost · trajectories Sentinel SIEM/SOAR high-risk events Purview audit sensitive retrievals
Observability · 3 hooks
Gateway pre/postToolUse hooks intercept every call. A2A v1.0 context with correlation IDs, spans, trace. 21-field llm.call.completed streamed to App Insights.
FinOps · budget enforcement
Per-agent, per-team, per-CC budgets enforced at 50/80/100 percent. 100 percent hard-stops the agent.
Security & compliance · 2 sinks
Microsoft Purview audits every retrieval against sensitive sources. Sentinel SIEM/SOAR receives prompt-injection, safety violations, scope-guard breaches.
Identity · per agent
Entra Workload Identity Federation per agent. Kubernetes ServiceAccount, Azure AD identity, scoped RBAC.
Per-layer Grafana dashboards

Six layers, dedicated dashboards each. Data sources, Prometheus, Loki, App Insights, Azure Cost API, PostgreSQL.

Layer Highlight dashboards
L1, CloudAKS Cluster Health · AKS Resource Utilization · Azure Resource Inventory · Azure FinOps Spend by Service · FinOps Anomaly · Key Vault Health · PostgreSQL
L2, PlatformBackstage Service Health · Catalog Coverage · Argo CD Sync · Golden Path Adoption · DORA Four Keys · Ingress + TLS · OPA Violations
L3, ContextMCP Server Health · Tool Call Distribution · Prompt Cache Hit Ratio · Shared Context Store · 3-Tier Memory · Skill Load Heatmap · RAG Index Health
L4, IntentSDD Pipeline Status · Model Routing Decisions · Routing Cost Savings · Scope Guard Activity · Intent Drift
L5, IntegrationGitHub Actions Health · Azure DevOps Pipeline Health · Argo CD App-of-Apps · API Center Inventory · Catalog Cross-Link Coverage
L6, HarnessAgent Fleet Overview · Trajectory Volume · Token Consumption Live · Cost Live USD · Budget vs Actual · Budget Alerts 50/80/100 · Eval Scores · Content Safety · Purview Access · llm.call.completed Stream

Every dashboard has Alertmanager rules in prometheus/alerting-rules.yaml. The platform refuses to operate without observability, CI gates enforce dashboard + alert presence.

Cross-layer FinOps

The CFO view. L1 cloud spend + L6 AI spend, one board.
Closes the "AI is unaffordable" objection before it leaves the room.

Total spend
Cloud + AI roll-up
L1 cloud and L6 AI in one number, with month-over-month deltas, broken down by cost center, team, environment.
Forecasting
Budget vs actual
Forecast vs budget for the current month. Anomalies and breaches in the last 30 days. Top 5 cost drivers across cloud + AI.
Per agent
Cost-per-trajectory
Efficiency metric, USD per successful trajectory, per agent. Drives prompt-cache investment, model routing, and retirement decisions.
Why six layers and not five or seven

Five collapses. Seven over-specifies. Six survived production contact.

5
COLLAPSES
Five layers loses contracts.

Five collapses Integration into Platform (the GitHub + ADO + Argo + MCP coexistence problem becomes a Backstage problem) and Harness into Execution (telemetry + FinOps + hooks + audit become the agent author's problem). Both produced unmaintainable bundles in real deployments.

SHIPPED
6
SURVIVES
Six layers, clean ownership.

L5 Integration owned by platform integrators, not by Backstage maintainers. L6 Harness owned by SRE + FinOps + Security, not by the agent author. Each layer maps cleanly to who is on-call when it fails. Survives audits.

7
OVER-SPECIFIES
Seven adds noise.

Splitting Identity out of Cloud, or carving Telemetry out of Harness, doubles the contracts without doubling the ownership. The boundary stops matching how teams actually operate. Layers should be the minimum that survive a real audit, not the maximum a diagram can hold.

The six-layer model is not a theory. It is the residue of trying five and seven first. Clean contracts, each one with a named owner.

PART
IX

The Horizons Phases.

H1 Foundation, H2 Enhancement, H3 Innovation. A staged adoption model.

Three phases, three outcomes

Each phase builds on the previous. No customer is asked to commit to H3 before H1 works.

H1
Foundation, 4 to 8 weeks
Cloud + Platform + Portal. A working IDP on Azure with Backstage, GitOps, observability. Pilot team usable within 30 days.
H2
Enhancement, 8 to 12 weeks
Golden Paths + Governance + Self-service. Application teams ship via paved roads. Continuous compliance enabled.
H3
Innovation, 12 to 24 weeks
Agentic DevOps + Context Platform. Agents first-class platform citizens. Trajectory, cost, governance in production.

Naming note. These are the Open Horizons Phases. They are not McKinsey's "Three Horizons of Growth" strategy framework.

H1 · Foundation

Stand up a production-grade Azure environment with Backstage, GitOps, and observability.

What gets deployed
  • Azure Landing Zone (RG, networking, identity)
  • AKS cluster (private, autoscaled)
  • Azure Container Registry
  • Azure Key Vault
  • Azure Database for PostgreSQL
  • Log Analytics + Container Insights
  • Entra Workload Identity
  • Backstage OSS portal
  • Argo CD GitOps
  • Prometheus + Grafana + Loki + Alertmanager
  • NGINX Ingress + cert-manager + Let's Encrypt
Success criteria
  • Developer logs into Backstage via GitHub OAuth or Entra
  • Pilot service scaffolded, deployed, visible in catalog
  • Argo CD shows green sync for every workload
  • Grafana shows cluster, ingress, application metrics
  • All secrets in Key Vault, no plaintext anywhere
  • TLS automatic and renewed by cert-manager
H2 · Enhancement

Turn the foundation into a product application teams adopt voluntarily.

Add
Golden Paths
12+ Software Templates. Repo + CI/CD + IaC + Helm + catalog + TechDocs + Argo app, wired together.
Add
Continuous compliance
OPA + Gatekeeper, tfsec, Trivy, gitleaks, Defender for Cloud surfaced in Backstage. Evidence on a schedule.
Add
FinOps starts
Cost tags on every resource. FinOps dashboards. Per-team showback or chargeback.

Success criteria, 80 percent of new services via Golden Path. Every PR gated by tfsec, Trivy, OPA. FinOps dashboard with attributed spend.

H3 · Innovation

Make agents first-class platform citizens. Move from AI assistants to production agentic systems.

01Agent IDP, the second persona. Agent catalog in Backstage with owner, RBAC, cost center, version. Trajectory logs into Loki + Backstage. Per-agent cost dashboards in Grafana.
02Context Platform (L3). MCP servers + Shared Context Store + three-tier memory + prompt cache.
03Intent Platform (L4). Specky + model routing + scope guards + intent drift measurement.
04Harness (L6). The full telemetry + FinOps + Purview + Sentinel stack goes live.

Success criteria, at least 5 production agents each with >100 trajectories/week. Per-agent cost attribution is exact and auditable. A failed run replays deterministically.

Visualizing the journey

Day 0 to Day 180+, three stages, three outcomes.

Horizons Phases timeline DAY 0 DAY 30 H1 Foundation DAY 90 H2 Enhancement DAY 180+ H3 Innovation Working portal, GitOps, observability Pilot team productive Paved roads, continuous compliance Application teams self-serve Agentic DevOps in production Agents governed, observed, costed

Each phase delivers value on its own. Each one multiplies the value of the next.

PART
X

GitHub and Azure DevOps.

The foundation of everything. Three scenarios, one platform.

Source control is the center of gravity

If it is not in Git, it is not real.

Every workflow in Open Horizons, every deploy, every spec, every agent invocation, every audit event, begins or ends in a Git repository. GitHub and Azure DevOps are not afterthoughts. They are the foundation that makes the rest of the platform work.

Three integration scenarios

Customer chooses. Platform abstracts.

Scenario A
GitHub-only
Cloud-native, modern enterprises, OSS-friendly. GitHub App + Actions + Packages + Advanced Security + OAuth.
Scenario B
Azure DevOps-only
Microsoft-shop enterprises with existing ADO investment. Repos + Pipelines + Boards + Artifacts.
Scenario C
Hybrid coexistence
Migrations in progress, divisional preferences, M&A scenarios. Single Backstage catalog, dual auth, Argo CD agnostic.
Scenario A · GitHub end to end

Seven steps from "Create new service" to live in production.

01Developer opens Backstage, picks "Create new service" from the scaffolder.
02Backstage uses the GitHub App to create a new repo with the chosen Golden Path.
03Repo registered in the Backstage catalog automatically.
04GitHub Actions runs on every push, tests, scans, image build, push to GHCR or ACR.
05Argo CD picks up the manifest change, syncs the new version to AKS.
06GitHub Advanced Security results stream into Backstage. tfsec, Trivy, gitleaks results gate the PR.
07@reviewer, @sentinel, @security agents post comments on the PR.
Scenario B · Azure DevOps end to end

Six steps. Same outcome, ADO-native plumbing.

01Developer opens Backstage, picks a Golden Path.
02Backstage uses the ADO REST API (via Service Connection) to create the repo and pipeline.
03Repo registered in Backstage, discovered as a catalog entity.
04Azure Pipelines runs on every push, same scans as the GitHub equivalent.
05Argo CD consumes the rendered manifests, syncs to AKS. Argo is Git-source agnostic.
06Boards work items link to PRs and surface in Backstage via the ADO plugin.
Scenario C · Hybrid coexistence

A single Backstage catalog discovering entities from both sources. Dual auth. Single agent plumbing.

Catalog
Unified
A single Backstage catalog that discovers entities from both GitHub and ADO.
Auth
Dual
Users sign in via either provider. SSO mapping in Entra.
CI/CD
Native + converged
Each repo uses its native CI (Actions or Pipelines) but converges on the same Helm + Argo CD pattern.
Agent identity
Common
Workload identity in Azure is the common substrate. Agents authenticate the same way regardless of source repo.
Migration
Stepping stone
Hybrid is often a stepping stone to consolidation. The platform does not force a choice.
What makes the foundation strong

Five non-negotiables across A, B, and C.

01Everything is in Git. Code, IaC, policies, specs, prompts, instructions, all version-controlled.
02PRs are the unit of change. No deploy without a PR. No agent action without a trajectory tied to a PR or spec.
03CODEOWNERS flows into the catalog. Ownership is never lost.
04Branch protection is enforced. No direct pushes to protected branches. No bypass without audit.
05Argo CD is the deployment authority. No kubectl apply from laptops.

DevSecOps tenet "everything as code" made operational, by Open Horizons defaults.

PART
XI

Golden Paths and the Agent Catalog.

Paved roads for developers. First-class governance for agents.

A Golden Path in one line

The opinionated, paved, well-lit road for the most common developer journeys.
Fully scaffolded, fully wired, ready to ship.

Term coined at Spotify. Operationalized in Open Horizons as Backstage Software Templates that produce a working repo, pipeline, infrastructure, observability, and catalog entry, in a single click.

What a Golden Path produces, all at once

Production-ready on Day 1. Twelve artifacts wired together, four families.

Repo foundations
01Git repository in the chosen org.
02App skeleton in the chosen language.
11CODEOWNERS, branch protection, Dependabot, GHAS.
Build, ship, deploy
03CI/CD pipeline with tests, scans, image push.
04Helm chart with limits, probes, NetworkPolicy.
06Argo CD Application in the app-of-apps pattern.
Infra & DevEx
05Terraform for dedicated infra: DB, queue, storage.
07Backstage catalog entry with owner and lifecycle.
10Devcontainer for Codespaces or local VS Code.
Specify & observe
08TechDocs scaffolding ready to publish.
09.specs/ folder seeded with Constitution + Spec templates.
12Pre-baked observability with default Grafana dashboard.

One Backstage Software Template, one click, twelve artifacts ready to ship. The platform did the boilerplate so the developer never has to.

The Golden Path catalog

22 paths today, organized by Horizons Phase.

H1, Foundation
  • landing-zone
  • azure-module
H2, Enhancement
  • microservice-nodejs
  • microservice-python
  • microservice-dotnet
  • microservice-go
  • frontend-react
  • api-openapi-first
  • library-typescript
  • database-postgres
  • event-worker
  • techdocs-site
H3, Innovation
  • agent-maf (Microsoft Agent Framework)
  • agent-sk (Semantic Kernel)
  • mcp-server
  • eval-job
  • skill
  • rag-application
  • multi-agent-system
  • foundry-agent

Stepping off the path is supported. The platform never punishes deviation, only requires it to be explicit.

The Agent Catalog in Backstage

If a service needs a catalog entry, an owner, RBAC, observability, cost, and a runbook, so does every agent.

catalog-info.yaml · committed to repo
apiVersion: backstage.io/v1alpha1
kind: Component
metadata:
  name:        deploy-orchestrator
  tags:        [agent, llm, production]
spec:
  type:        agent
  owner:       team-platform
  lifecycle:   production
  system:      agent-idp
  model:       gpt-5-1
  cost_center: cc-platform
  entrypoint:  https://agents.ohorizons.ai/api/agents/deploy
  runbook:     backstage.ohorizons.ai/docs/.../deploy-runbook
  policies:    [policy-net-egress, policy-budget-50usd]
What this gives the agent in Backstage
Identity & RBAC
Entra WIF per agent, scoped Azure AD permissions.
Observability
Auto-wired to Grafana L6 dashboards, trajectory replays.
Cost attribution
Per-agent USD by cost_center, FinOps roll-up.
Policy enforcement
OPA Gatekeeper applies policies on every deploy.
Catalog graph
Linked to APIs, systems, owners, runbooks.
Runbook + TechDocs
On-call procedures linked to the same record.

Every agent is discoverable, attributable, and auditable through the same UI developers already use for services.

The agent lifecycle

Same stages as any component, with AI-specific checkpoints.

01Proposal. A spec is opened in .specs/NNN-agent-name/ describing goal, scope, tools, budget.
02Specification. EARS requirements + acceptance tests (Specky Phase 3).
03Design. Chosen model, prompt structure, tool list, memory scopes (Phase 4).
04Implement. Built via agent-maf or agent-sk Golden Path (Phase 7).
05Evaluate. Runs against the golden set, correctness, safety, faithfulness, cost-per-task.
06Pilot. Limited rollout, one team, low-stakes use case.
07Production. Promoted in the catalog, subject to full governance.
08Deprecate. Retired with archived trajectories.
Identity for agents

No agent runs on a human's credentials. Period.

Azure
Entra Workload Identity
Federated to Kubernetes ServiceAccount. No client secrets, no SP rotations.
Kubernetes
ServiceAccount per agent
Mapped to a Pod Security Standard. RBAC at the namespace and resource level.
Tool call
OAuth tokens
Issued per-agent, scoped to specific tools. No "agent can call anything."
Portal
Backstage roles
Backstage role plugin controls who can view, edit, or invoke each agent.
Trajectories, the black box recorder

Every agent invocation produces a structured, replayable record.

01The original user prompt or trigger event
02The plan the agent generated
03Every tool call, name, arguments, latency, result, error
04Every model call, model name, tokens in/out, cost, latency
05Memory reads and writes
06The final output and a verdict, success, failure, escalation

Trajectories are stored in PostgreSQL + Loki, indexed, replayable, exportable to OpenTelemetry, and surfaced in Backstage. Without them, you have no debug story.

Cost governance, FinOps for AI

The single fastest way to lose executive trust in AI is a surprise bill.
Open Horizons prevents it with six controls.

Ledger
Per-agent
Middleware logs every model call with agent_id, team, cost_center, model, tokens, USD.
Dashboards
Per-team and per-CC
Grafana panels with filters, forecasts. Drill down from CFO view to a single trajectory.
Alerts
50, 80, 100 percent
Alertmanager fires when an agent or team exceeds threshold.
Ceilings
Hard
Per-agent monthly budgets enforced by the runtime. Refuse to run if exceeded.
Routing
.github/model-routing.yaml
Declares which model handles which task. Cheaper models for cheaper tasks.
Eval budget
Separate
Continuous evaluation cost is tracked separately from production traffic.
PART
XII

Security, compliance, governance.

Security is not a feature. It is a property of every layer. The secure way is also the easy way.

The seven security domains

Each maps to NIST CSF, ISO 27001, CIS, SOC 2.

Identity
Zero passwords
Workload Identity for every workload. Zero standing SSH. Just-in-time elevation via Entra PIM. RBAC at every layer.
Secrets
Key Vault SoT
Single source of truth, private endpoint, RBAC. Pods consume via CSI driver. gitleaks blocks PRs.
Network
Private by default
Private endpoints for every PaaS that supports them. Default-deny NetworkPolicies inside the cluster. HSTS on ingress.
Workload
PSS restricted
Pod Security Standards restricted enforced. Read-only root FS. Non-root users. Mandatory limits.
Supply chain
Sign + scan
cosign on every image. SBOM for every image. Dependabot. GitHub Advanced Security. tfsec on every Terraform PR.
Data
Encryption everywhere
At rest, in transit. 30-day backup defaults. Egress restricted to private endpoints.
AI specific
OWASP LLM Top 10
Tool-call vetting, content sanitization, scope guards, rate limits, model pinning. See next slide.
OWASP LLM Top 10 to Open Horizons controls

Every risk has a structural mitigation.

OWASP LLM risk Open Horizons control
LLM01 Prompt InjectionTool-call vetting, content sanitization, scoped tool RBAC
LLM02 Insecure Output HandlingOutput filters, content safety, structured-output schemas
LLM03 Training Data PoisoningNo customer-data training without explicit pipeline + approval
LLM04 Model DoSPer-agent rate limits + budget ceilings
LLM05 Supply ChainModel pinning by name + version. No untrusted MCP servers.
LLM06 Sensitive Info DisclosureDLP on outputs. Redaction in trajectories.
LLM07 Insecure Plugin DesignMCP server review, scope guards, allow-listed tools
LLM08 Excessive AgencyScoped Workload Identity, tool RBAC, human-in-loop gates for high-risk actions
LLM09 OverrelianceEvaluation jobs + human review for production agents
LLM10 Model TheftEgress restrictions, no model export, audit on weight access
Continuous compliance, not annual

Nine frameworks mapped. Evidence exported to a tamper-evident store on a schedule.

SOC 2 Type 2
PR history + RBAC + scans
Continuous evidence, PR history, deploy logs, RBAC change logs, scan reports, runbook executions.
ISO 27001
Asset + access + incident
Asset inventory in Backstage catalog. RBAC export. Runbook + on-call as incident management.
NIST CSF 2.0
Identify, Protect, Detect
Catalog + RBAC + Defender + alerts + runbooks + backups. Mapped end to end.
CIS Kubernetes
PSS restricted
CIS-aligned AKS configuration via Terraform. Enforced through OPA.
Azure WAF
Five pillars
Reliability, Security, Cost, Performance, Operational Excellence with explicit checklists per service.
NIST AI RMF
Map, Measure, Manage, Govern
Implemented across L3, L4, L5. SDD provides the Manage and Govern receipts.
ISO 42001
AI Management System
Lifecycle, risk, transparency, monitoring, supported by SDD pipeline and trajectory infrastructure.
Audit posture

"How did this change reach production?" Answered in seconds, end to end.

01The spec in .specs/NNN-feature/ describes the intent.
02The PR in GitHub or ADO shows the change, the reviews, the scans.
03The CI run shows the tests and the security gates.
04The Argo CD sync shows the actual deployment to the cluster.
05The Grafana dashboard shows the post-deploy behavior.
06The trajectory, if an agent was involved, shows the autonomous steps.

Every link in this chain is immutable, timestamped, and signed.

PART
XIII

Business value and ROI.

Three buckets. Customer's own numbers. Publicly verifiable benchmarks.

The three buckets of value

A complete business case quantifies all three.
Plug your own numbers in. That is the only credible business case.

Bucket 1
Productivity
Faster scaffolding. Reduced cognitive load. AI-assisted code, review, docs. Faster troubleshooting. Measured via DORA + GitHub Copilot research ranges.
Bucket 2
Risk reduction
Continuous scanning. Always-on observability. Per-agent identity + audit. Spec-driven dev. Measured via DORA CFR/MTTR, Verizon DBIR, IBM Ponemon.
Bucket 3
Cost optimization
Cost attribution by tag. AKS autoscaler. Scheduled start/stop dev envs. Per-agent budgets. Model routing. Tool consolidation. Measured via FinOps Foundation, Flexera.
TCO frame

Compare against what you would build yourself, or a closed SaaS IDP.

Status quo
Build it yourself
6 to 18 months to a working IDP. 4 to 8 platform engineers full-time during build. Custom integrations everywhere. Custom agent runtime. Maintained internally, forever.
Alternative
Closed SaaS IDP
Faster to start, but your data and code go somewhere else. Per-seat pricing. Limited customization. Often no agent IDP at all.
Recommended
Open Horizons
4 to 12 weeks to a working H1. Microsoft and certified partners deliver the heavy lift. Customer owns the code, cluster, data. Open source license, no per-seat pricing.

The accelerator typically pays for itself inside the H2 window through productivity and consolidation savings alone.

Three scales of speed

Same scope of platform. Three timeframes. Customer's choice.

9 to 18 months
Industry baseline
A platform of this scope, built from scratch. The cost of going alone. Forrester Wave Q1 2026 baseline for comparable IDPs.
90 to 180 days
Three Horizons end-to-end
H1 Foundation + H2 Enhancement + H3 Innovation integrated. First production AI workload. Observability complete. The accelerator path.
2h 30m
git clone to production H1
Backstage on AKS reachable via HTTPS, Let's Encrypt TLS, GitHub OAuth, ArgoCD syncing, Grafana live, first Golden Path scaffoldable. The install-wizard path, around 3h when an agent drives it.

The compression is the entire business case. Same scope, three speeds, customer's choice.

From numbers to next steps

The business case is yours to build.
The first step is small. Two weeks. Fixed scope. Walk-away clause.

Everything from here is mechanics. The Discovery is a paid, scoped, fixed-deliverable engagement that produces an H1 plan your CFO can sign off on or reject in writing. You leave with the architecture diagram, the risk register, the cost estimate, and a partner short-list whether or not you proceed. No build-trap. No retainer. The only commitment is two weeks of your platform lead's time.

Step 1
Discovery, 1 to 2 weeks
Step 2
Pilot, 2 to 4 weeks
Step 3
H1 Foundation, 4 to 8 weeks
Steps 4 + 5
H2 + H3, 20 to 36 weeks
PART
XIV

Getting started.

From discovery to innovation, five steps with decision gates at every step.

The five-step engagement model

Each step has a decision gate. Customers can stop, pause, or scale at any gate.

Engagement model, five steps 01 Discovery 1 to 2 weeks 02 Pilot 2 to 4 weeks 03 H1 Foundation 4 to 8 weeks 04 H2 Enhancement 8 to 12 weeks 05 H3 Innovation 12 to 24 weeks
What you need on Day 1

To start a Discovery, the customer needs six things. Everything else is provided.

01An Azure subscription (or willingness to create one)
02A GitHub organization or Azure DevOps organization
03A Microsoft Entra ID tenant, typically the same one used for M365 or Azure
04An executive sponsor, typically CTO, head of platform, or chief architect
05A named platform lead on the customer side
06One to three pilot teams willing to be early adopters
What you get from Day 1

Five deliverables before the first sprint starts.

Repo access
ohorizons
Access to the source template repository.
Discovery report
Tailored
Current-state architecture diagram, target-state H1 scope proposal, risk register.
Partner match
Certified network
A short-list of certified partners matched to your industry, region, and AI ambition.
Success plan
Milestones + gates
A 90-day plan with success criteria and decision gates.
Microsoft backstop
Architectural
Microsoft is engaged on architectural escalations across the engagement.
Common pitfalls

Six failure modes, all preventable with the staged model.

01Trying to do H1+H2+H3 at once. Stage. Each horizon delivers value alone.
02Building before piloting. Spend the 2 to 4 weeks on the pilot. It pays for itself.
03Skipping platform-as-a-product. Treat developers as customers. Survey them, iterate.
04Underfunding the platform team. A platform without a team becomes a graveyard. Budget for 2 to 4 dedicated engineers.
05Letting agents bypass governance. Use SDD, trajectories, and cost ceilings from Day 1.
06No exit story from a partner. Insist on knowledge transfer milestones in every SOW.
What "done" looks like

You will know Open Horizons is working when five things become true.

01A new service goes from idea to production in hours, not weeks.
02An application developer can answer "where does this metric come from" in the portal, without asking the platform team.
03An auditor can trace any production change to a spec, a PR, and a trajectory in under a minute.
04An agent invocation has a known cost, a known owner, a known SLO. Like any other service.
05The platform team is shipping a product, not fighting fires.
PART
XV

Partner ecosystem.

You buy the accelerator once. You customize it forever, with partners who know the stack.

Certified partners do four things

Often in combination. Always with customer ownership intact.

Service 1
Deploy + onboard
Stand up the platform in the customer's Azure tenant. H1 in 4 to 8 weeks.
Service 2
Customize
New Golden Paths, plugins, agents, MCP servers, bespoke compliance mappings, custom eval pipelines.
Service 3
Operate Day-2
Team augmentation, upgrades, on-call, FinOps reviews, prompt iteration.
Service 4
Train + enable
Platform team certification, developer onboarding curricula, "train the trainer" for large enterprises.
Certification tiers

Three tiers, based on demonstrated outcomes, not paid status. Renewed annually.

Registered
Trained + signed
Code of conduct signed, disclosure rules agreed. Can deliver onboarding and basic customization.
Certified
>=3 successful H1+H2
Passed technical assessment. Can deliver Day-2, complex customization, agent work.
Strategic
H3 track record
Published reference architectures, named technical leads. Enterprise scale, regulated industries, multi-region.
What partners do not do

Four boundaries that protect the customer.

01Partners do not own customer code or data.
02Partners do not lock customers into a fork of Open Horizons.
03Partners do not bypass governance, security, or audit controls.
04Partners do not exclusively service a customer. Customers can engage multiple, switch, or insource.
How to engage a partner

A five-step procurement pattern that keeps incentives aligned.

01Talk to the Microsoft field team. Size the engagement, match partners to your context.
02Request two or three partner proposals. Compare approaches, references, pricing models.
03Run a paid Discovery, typically 1 to 2 weeks. Produces a fixed-scope H1 plan.
04Sign an SOW tied to outcomes, not hours.
05Insist on knowledge transfer milestones. Every engagement should reduce, not increase, dependence on the partner.
The field-friendly takeaway

"Pilots fail at 95 percent because teams build agents without the four-layer foundation.
Open Horizons gives you that foundation on Day 1.
The 95 percent becomes a 5 percent problem instead."

Paula Silva
Software Global Black Belt

The data is in the references at the end of the playbook. The accelerator is in the repository. The conversation starts with a Discovery.

References

The research-grounded backbone of this deck. Sixteen open citations, four families.

Every claim in this deck traces to one of these sources. The playbook has the full bibliography with annotations.

Thank you

Let's talk.

If your enterprise is stuck between pilots and production, the conversation starts with a Discovery. One to two weeks. Fixed scope. A 90-day H1 plan you can fund or walk away from.

Contact
Paula Silva
Software Global Black Belt
paulasilva@microsoft.com
Deck reference
v1.1.0
Published 2026-05-23
Next step
Open Horizons Discovery
1 to 2 weeks, fixed scope
1 / 128
Use · O overview · N notes · P presenter