Open Horizons, the Agentic DevOps Platform

The open-source accelerator for platform engineering
and Agentic AI at enterprise scale.

Backstage on Azure, six layers of governance, the Horizons Phases, and a working Agent IDP in 90 days.

AuthorPaula Silva

RoleAI-Native Software Engineer

Duration60 to 90 minutes

Date2026-05-22

Agenda

Four acts. Thirteen parts. Two hours end to end.

ACT 1

I-IV · 25 min

Diagnostic

IThe problem
IIWhat Open Horizons is
IIIPlatform engineering 101
IVThe rise of Agentic DevOps

ACT 2

V-VI · 30 min

The stack

VMS Agentic DevOps stack
VIDay in the life
VIILive tour, ohorizons.ai
VISix-layer Context Platform

ACT 3

VII-X · 35 min

Adoption + ops

VIIHorizons Phases, H1/H2/H3
VIIIGitHub + ADO integration
IXGolden Paths + Agent Catalog
XSecurity + governance

ACT 4

XI-XIII · 30 min

Business + action

XIBusiness value + ROI
XIIGetting started, 5 steps
XIIIPartner ecosystem
→Close + Discovery CTA

Short on time? Watch Act 2 alone, V to VI. That covers the demo and the architecture. Then jump to Act 4 for Discovery sign-off.

Who is speaking

Paula Silva, AI-Native Software Engineer.

Building the future of software development with AI and Agentic DevOps.

I work with enterprise customers across the Americas on Agentic AI, platform engineering, and software modernization. This deck distills the patterns I see repeated across dozens of programs, pilots that work in a notebook and stall on a production cluster. Open Horizons is the accelerator that closes that gap, by construction, on Day 1, not on Day 180.

PART

The problem.

Why most enterprise AI pilots stall before production, and why more tools will not fix it.

The agent cemetery

95%

is the GenAI pilot failure rate measured by MIT NANDA in 2025. Failures concentrated in context, integration, and governance gaps. Not model quality.

Source: MIT NANDA, The GenAI Divide, State of AI in Business 2025.

The cancellation curve

40%

of Agentic AI projects will be cancelled by end of 2027, according to Gartner. Cost overruns, unclear business value, inadequate risk controls.

Source: Gartner press release, June 2025. Same period when Gartner also forecast 40 percent of enterprise apps will feature task-specific AI agents by 2026, up from under 5 percent in 2025.

The inverted perception problem

Developers estimated AI tools would speed them up by 20 percent.
An RCT measured they were 19 percent slower.

Estimated speedup

+20%

developer self-report of expected productivity gain from AI tools

Inversion

39 pp

vs measured outcome

Measured speedup

-19%

actual change in throughput, 16 experienced open-source developers

Source: METR, RCT on AI-augmented development, 2025. arXiv:2507.09089. When the tool makes you feel capable, you cannot detect it has made you less effective. Trust measured trajectories, not satisfaction surveys.

The Triple Debt

Three forms of debt accumulate in AI-native development.
Each is invisible until it is expensive.

Familiar

Technical debt

CMU 2026, AI agents drive +18 percent static-analysis warnings and +39 percent cognitive complexity. Liu et al. 2026, AI code adds significantly more requirement and test debt across 304,362 commits.

New

Cognitive debt

Storey 2026 names it cognitive surrender. Anthropic Fellows 2026, AI use during learning reduces library-specific skill acquisition by 17 percent. The codebase becomes orphaned knowledge.

Worst

Intent debt

Objectives, constraints, and decision rationale never captured. The Klarna 2025 case, comprehensive context, no codified intent. The agent optimizes for the wrong metric.

Open Horizons exists to eliminate the Triple Debt by construction. SDD anchors intent, Backstage anchors knowledge, scope guards anchor scope.

The four costs of the status quo

Without a platform, every team pays four taxes.

Cognitive

Six jobs in one role

Kubernetes, Terraform, Actions, observability, security scanners, and now agent frameworks. Burnout, shallow expertise, slow delivery.

Inconsistency

N teams, N paved roads

N pipelines, N base images, N security postures. Zero leverage when a CVE drops or compliance changes.

Compliance

Annual fire drill

SOC 2, ISO 27001, HIPAA, PCI-DSS enforced manually, retrospectively, and incompletely. Audits become projects, not checkpoints.

AI adoption

Shadow agents

Devs pasting code into ChatGPT, unsanctioned prototypes touching prod, no per-agent cost, no governance for which models, which tools, which data.

The wrong reflex

The instinct is to buy another tool.
That makes the problem worse.

The reflex · stack another tool

CI/CD vN+1

New IaC

AI gateway

Obs vendor

Sec scanner

Agent SaaS

MCP hub

FinOps SaaS

+ N more

Every purchase adds surface area, integrations to maintain, and one more decision the developer has to make before writing code. The cognitive tax compounds.

platform
shift

The answer · one platform, opinions baked in

Paved roads, Golden Paths replace per-team improvisation.

Opinions encoded, no rebuilding the wheel on each repo.

Governance baked in, security, FinOps, audit by construction.

Agents first-class, identity, RBAC, cost ceiling, replay.

A platform multiplies leverage. That is what Open Horizons delivers.

The solution is not another vendor. The solution is a platform with opinions, paved roads, and governance baked in. Tools serve the platform; the platform serves developers and agents.

What good looks like

An enterprise where the platform multiplies leverage, not headcount.

01A new service is scaffolded, deployed, monitored, and compliant in under 30 minutes.

02Every team uses the same paved road, but can step off it explicitly when needed.

03AI agents are first-class citizens with identity, RBAC, cost ceilings, and trajectory replay.

04Audits are continuous, not annual. Every change is traceable to a spec, a PR, and a run log.

05The platform team ships a product. Developers and agents are the customers.

The road from here

The problem is named. The path forward is mapped.
Here is how the next 110 slides take you from the cemetery to a working platform.

Act II → V · Concept

What Open Horizons is, why platform engineering, what Agentic DevOps means, how Microsoft ships the stack

VI + VII · Demo

Day-in-the-life of a developer and an agent. Live tour through ohorizons.ai screens you can open today

VI → X · Mechanics

The six-layer architecture, Horizons Phases, GitHub + ADO integration, Golden Paths, Agent Catalog, security and compliance

XI → XIII · Action

Business value frame, the five-step engagement model, partners, references, and how to start a Discovery

If short on time, jump to the live tour at VII (10 slides), then the Horizons Phases at IX (6 slides), then Getting Started at XIV (6 slides). That is the 22-slide path.

The body of work behind this deck

Three artifacts, one continuum.
The deck teaches the model. The playbook documents the patterns. Open Horizons ships them.

01 · Deck

Context Platform Stack

The diagnostic deck. Four questions, four layers, the cemetery numbers, the cost of failure. 50 slides, executive audience. The conceptual model in compressed form.

Audience: CTO, CIO, CFO, board

02 · Playbook

Open Horizons Playbook

25 chapters. Part I tells the story. Part II is the receipts: peer-reviewed research, the CNCF crosswalk, every layer deep, every claim cited. The reference architecture as a published guide.

Audience: architects, platform leads, security, FinOps

03 · Accelerator

Open Horizons (executable)

The deck, materialized. A working Backstage + Azure + Foundry deployment in your tenant in under 3 hours. 22 Golden Paths, 17 agents, 15 MCP servers, all governed and observable on Day 1.

Audience: every team in the org · ohorizons.ai

This deck pulls from all three. Numbers and diagnostics come from the deck, mechanics come from the playbook, screenshots and live demos come from the accelerator.

The four questions

Four questions most enterprises fail to answer with precision.
Open Horizons answers each one in code, not in slides.

Q1 · Cloud and infrastructure

"Where do agents run, and at what real cost?"

Compute, GPU, Kubernetes, decision observability, tool choice, inference tokens.

OH answer: L1 Terraform modules + AKS + ACR + Key Vault + Foundry + L1/L6 FinOps roll-up dashboards.

Q2 · Platform engineering

"What can agents access, and who governs it?"

IDP, Golden Paths, guardrails, per-agent RBAC, quotas, auditor agent.

OH answer: L2 Backstage + 22 Golden Paths + RBAC plugin + DORA + OPA Gatekeeper + scoped Workload Identity per agent.

Q3 · Context engineering

"What can agents know, when, and at what token cost?"

ACE pattern, skills, three memory tiers (hot, warm, cold), MCP servers.

OH answer: L3 Foundry Toolbox (12 MCP + 4 built-ins) + 3-tier prompt cache + enterprise_memory on pgvector + Shared Context Store.

Q4 · Intent engineering

"What should agents optimize for, in what hierarchy of trade-offs?"

CONSTITUTION.md, SDD with EARS, intent debt, specification engineering.

OH answer: L4 Specky 10-phase pipeline + 103 EARS reqs + scope-guard hooks + .github/model-routing.yaml + intent-drift measurement.

From the original four to the production six

The Context Platform Stack started as four layers.
Production added two more: Integration and Harness.

2025 · Original 4-layer model

L4 Intent · what should agents optimize for

L3 Context · what can agents know

L2 Platform · what can agents access

L1 Cloud · where do agents run

prod

2026 · OH production 6-layer model

L6 Harness · wraps every model call · NEW

L5 Integration · GitHub + ADO + Argo + MCP · NEW

L4 Intent · what should agents optimize for

L3 Context · what can agents know

L2 Platform · what can agents access

L1 Cloud · where do agents run

Five layers collapses Integration into Platform and Harness into Context. Both collapses produced unmaintainable bundles in real deployments. The two extra layers are what survived production contact.

Defending the model

Two recurring pushbacks. Both have crisp answers from the field.

Pushback 1 · "Three is enough"

Merging context and intent creates drift.

If you merge "what the agent knows" with "what the agent wants," both become ambiguous. You see it when the same skill change alters expected behavior and nobody can tell if it is a bug or a feature. Separating context (facts) from intent (values) makes drift detectable. That is why L3 and L4 are different layers in Open Horizons, with different artifacts, different review processes, different change cadences.

Pushback 2 · "Six is too many"

Integration and Harness have distinct owners.

Integration (L5) is owned by the platform integrators handling GitHub + ADO + Argo + MCP coexistence. Harness (L6) is owned by SRE + FinOps + Security teams. If you collapse them into Platform or Context you put the wrong owner on a contract. Separation maps cleanly to who is on-call when each layer fails. Production survives the audit only when ownership is unambiguous.

The 4-layer model is the teaching frame. The 6-layer model is the operating frame. Both are correct. They serve different rooms.

PART

What Open Horizons is.

An accelerator, not a SaaS product. A platform, not a stack of tools. Two personas, one portal.

In one sentence

Open Horizons is an open-source Agentic DevOps Platform.
An Azure-native accelerator that gives enterprises a production-grade Internal Developer Platform and an AI Agent Platform in one coherent stack.

Delivered by Microsoft and the certified partner network. Deployed in your tenant, your subscription, your data. No SaaS, no lock-in, no per-seat pricing.

What it is not

Four things Open Horizons is deliberately not.

Not SaaS

Your tenant

Open Horizons runs in your Azure subscription, on your AKS cluster, with your identity provider. No data leaves your tenant.

Not a fork

Upstream Backstage

Open Horizons consumes upstream Backstage and contributes back. You stay on the community release train, you keep the ecosystem.

Not lock-in

Open standards

Every layer is open source or open standard. AKS, Terraform, Argo CD, OpenTelemetry, MCP. You can take the code with you.

Not templates

Working infra

The accelerator includes live infrastructure, a working agent runtime, observability, Golden Paths, agents, skills, prompts, policies, runbooks.

Two personas, one portal

Both personas share identity, catalog, RBAC, and observability.
An agent is just another component that needs to be managed, secured, and measured.

Developer IDP

For app engineers

Self-service scaffolding via Golden Paths. One-click environments. TechDocs per service. Integrated CI/CD, secrets, observability, cost. A catalog of components, APIs, resources, teams.

Agent IDP

For AI engineers

Catalog of agents with identity, ownership, governance. Trajectory logs, replayable and auditable. Per-agent cost dashboards. Skill, prompt, instruction registry with PR-based approvals.

The core components

Twelve concerns. Four families. One coherent platform.

Portal

Backstage OSS

Single pane of glass for devs and agents.

Catalog

Software Catalog

Source of truth, services + APIs + agents.

Scaffolder

Ohorizons/ohorizons-golden-paths

22 templates, sibling repo, versioned apart.

GitOps

Argo CD

Declarative deploys, no kubectl from laptops.

Runtime

Azure Kubernetes Service

Private API, autoscaler, Workload Identity.

IaC

Terraform, 18 modules

Tags mandatory, reproducible Azure infra.

Observability

Prometheus + Grafana + Loki

7 per-layer dashboards out of the box.

Identity

Entra + Workload Identity

Zero secrets in pods, per-agent identity.

Secrets

Azure Key Vault + CSI

Private endpoint, secrets as files.

AI Runtime

Microsoft AI Foundry

gpt-5-1, gpt-5-4-pro, Agent Framework.

Agent layer

.github/ · H1+H2+H3 integrators

19 agents that ship + integrate the Horizons.

Policy

OPA + Gatekeeper + Trivy + tfsec

Continuous compliance, admission control.

Developer experience

Infrastructure

AI runtime + agents

Governance

Anatomy of the accelerator

Two repositories under `github.com/Ohorizons`.
The agents in .github/ are what makes this an accelerator, not a template.

Ohorizons / ohorizons public · main

ohorizons/  # the platform (this repo deploys the accelerator)
  terraform/    18 reusable Azure modules
  backstage/    Backstage app + plugins + FastAPI agent APIs
  argocd/       GitOps app-of-apps
  .github/      ★ 19 agents · 27 skills · 16 prompts · 13 instructions
                the implementer + integrator team
  mcp-servers/  Model Context Protocol tool servers
  policies/     OPA, Gatekeeper, tfsec rules
  prometheus/   Recording + alerting rules
  grafana/      Pre-built dashboards (7 per layer)
  docs/         Architecture, runbooks, TechDocs
  scripts/      End-to-end deploy + validation automation

Ohorizons / ohorizons-golden-paths public · main

ohorizons-golden-paths/  # versioned independently
  h1-foundation/
    landing-zone/
    azure-module/
  h2-enhancement/
    microservice-{nodejs,python,dotnet,go}/
    frontend-react/, api-openapi-first/
    database-postgres/, event-worker/
    techdocs-site/, library-typescript/
  h3-innovation/
    agent-maf/, agent-sk/
    mcp-server/, eval-job/, skill/
    rag-application/
    multi-agent-system/
    foundry-agent/

# Consumed by Backstage Scaffolder via
# location entries in the platform repo

★ Why this is an accelerator

The agents in .github/ are the implementer + integrator team that wires H1, H2, H3 together. They provision infrastructure, register Golden Paths from the sibling repo, onboard agents, configure dashboards, and verify each phase. Without these agents, this would be a template. With them, it is an accelerator.

PART

III

Platform engineering 101.

The discipline of building internal products for internal developers. Not DevOps rebranded.

Definitions that matter

Four terms the executive team must agree on before signing a platform charter.

01Platform engineering. The discipline of designing and building toolchains and workflows that enable self-service for software engineering in the cloud-native era.

02Internal Developer Platform, IDP. The product that platform engineers build. To developers what Azure plus GitHub plus Backstage is to an enterprise, one paved road that abstracts complexity.

03Golden Path. A pre-defined, opinionated way to do a common task. How a platform expresses "here is how we do it here."

04DevEx. The measurable quality of being a developer in your org. DORA, SPACE, time-to-first-PR. What we instrument.

The five pillars of a modern IDP

Every IDP, including Open Horizons, must deliver these.

Service Catalog

Single source of truth for what exists. In OH, Backstage Software Catalog.

Software Templates

Scaffolding new things in minutes. Golden Paths organized by H1, H2, H3.

TechDocs

Docs as code next to the code, published via the portal. No more wiki rot.

Observability

Dashboards, alerts, logs, traces in the developer context, not a separate tool.

Governance

Policy as code, RBAC, secrets, supply chain by default, not as an afterthought.

The product mindset

The single biggest mistake is treating platform as an infra project.
It is a product. Developers and agents are both customers.

Infrastructure mindset

"We deployed Kubernetes."
"The cluster is up."
"Devs should read the docs."
Annual roadmap.
Success equals uptime.

Product mindset

"12 teams onboarded, 8 deploying daily. 14 agents in catalog, 9 running production traffic."
"Time-to-first-PR is 3 days. Time-to-first-agent-trajectory is 90 minutes."
"We watched 5 devs and 3 agent authors onboard, here is where they got stuck."
Monthly product reviews with engineering and agent-author customers.
Success equals adoption + DevEx + AgentX metrics.

DORA + SPACE, the metrics that matter

You cannot improve what you do not measure. Two frameworks, shipped out of the box.

DORA · the four keys

DELIVERY

01 · Velocity

Deployment frequency

Elite: multiple times per day.

02 · Velocity

Lead time for changes

Elite: under one day commit to prod.

03 · Stability

Change failure rate

Elite: under 15 percent.

04 · Stability

Mean time to recovery

Elite: under one hour.

OH dashboards: Backstage DORA Four Keys plugin · Grafana L2 board · alert rules on the four keys.

SPACE · the human side

DEVEX

Satisfaction and well-being. Burnout signal, sentiment, retention.

Performance. Quality of output, defect escape rate.

Activity. Volume of work, PRs, reviews, commits.

Communication and collaboration. Cross-team flow.

Efficiency and flow. Focus time, context switches, time-to-merge.

DORA measures the system. SPACE measures the people running it. Both surface in the same Grafana board, both gate platform releases.

Anti-patterns to avoid

Five mistakes that kill platforms. All preventable.

01If we build it, they will come. They will not. Adoption is a product motion. Sell internally, train, iterate.

02One Golden Path to rule them all. Different stacks need different paths. Open Horizons ships multiple per Horizon.

03Security is someone else's problem. Bolt it on later and you never catch up. Defaults are the policy.

04Two SREs in a corner. Platforms need PMs, designers, engineers. Ratio target, 1 per 15 to 25 app engineers.

05We will add AI later. Retrofit agent governance is harder than building it in. Treat agents first-class on Day 1.

PART

The rise of Agentic DevOps.

From code completion to chat to production agents. The enterprise governance problem just exploded.

A three-year arc

From autocomplete to autonomy in three release cycles.

2022

Code completion

GitHub Copilot makes autocomplete intelligent. Humans still drive every step.

2023, 2024

Chat assistants

GitHub Copilot Chat, ChatGPT enter the IDE. Each interaction one-shot. No memory, no tools, no autonomy.

2025, 2026

Agentic systems

Agents plan, call tools, persist memory, run for minutes or hours, and ship work. The governance problem just exploded.

What makes a system agentic

Four properties of a minimum-viable agent.

Goal-directed

It decomposes a request into a plan and reasons over it.

Tool use

It can call APIs, search code, read files, invoke deployments.

Memory

It remembers across turns, sessions, runs. The context platform.

Governed autonomy

It can act, but within identity, RBAC, and policy constraints.

Why "just use GitHub Copilot" is not enough

GitHub Copilot ships the primitives. Open Horizons ships the enterprise control plane that unifies them.

Enterprise need	GitHub Copilot 2026	What GitHub Copilot ships · gap	OH Agent IDP
Per-team cost visibility	Partial	Team-level metrics API + Cost Centers (May 2026). Min 5 active users/day, no hierarchical chargeback.¹	Yes
Trajectory logging and replay	Partial	Agent-Logs-Url trailer + OTel export (Mar 2026). Coding-agent only, not unified across IDE/CLI surfaces.²	Yes
RBAC: which agent touches which data	Partial	Content Exclusion + Agent Control Plane GA Feb 2026 + MCP allowlist. Not enforced on Cloud Agent, CLI, Agent Mode yet.³	Yes
Approved skill, prompt, instruction registry	Partial	Org custom instructions GA (Apr 2026) + BYOR MCP + custom agents repo. Federation of files, no single registry UI.⁴	Yes
Compliance-grade audit logs	Partial	SOC 2 Type 1 + ISO 27001 in scope. 180-day retention only; long-term needs Splunk/Event Hubs streaming.⁵	Yes
Custom long-running business agents	Limited	Custom Agents (Markdown profiles) bounded to dev workflows. Business-domain agents need Copilot Studio / Foundry.⁶	Yes
Integration: catalog, observability, policy	Limited	OpenTelemetry native. Backstage plugin is community-maintained. No first-party OPA integration.⁷	Yes

Sources (GitHub Docs & Changelog, May 2026): ¹ Team-level usage metrics API + Cost centers. ² Agent-Logs-Url trailer + GitHub Copilot SDK OTel. ³ Enterprise AI Controls GA + Content Exclusion. ⁴ Org instructions GA + Custom agents. ⁵ SOC 2 + ISO 27001 + Agentic audit events. ⁶ Custom agents docs. ⁷ VS Code GitHub Copilot OTel + Backstage community plugin.

GitHub Copilot ships the primitives. Open Horizons unifies them with your catalog, observability stack, policy engine, and existing IDP.

The four pillars of Agentic DevOps

Open Horizons is built around these four. Every customer inherits them.

Pillar 1

Identity for agents

Every agent has a service principal in Entra ID, a catalog entry, scoped RBAC, a cost center tag. "What is this agent allowed to do?" has an audited answer.

Pillar 2

Context engineering

The six-layer Context Platform Stack. MCP servers, three-tier memory, RAG, prompt cache, scope guards. Grounded in 25+ peer-reviewed papers.

Pillar 3

Trajectories and cost

Every agent run produces a structured, replayable, evaluable trajectory. Cost per agent, per team, per task. The black box recorder.

Pillar 4

Spec-driven dev

Specky, 10-phase pipeline. Init, Discover, Specify, Clarify, Design, Tasks, Analyze, Implement, Verify, Release. Intent stays anchored.

Production agents today

What enterprises are doing with Open Horizons agents right now.

@deploy

End-to-end deploy

Orchestrates Terraform plan to AKS smoke tests. Picks up where a human SRE would. Always with trajectory.

@reviewer

Code review

SOLID, security, naming, refactoring opportunities. Posts PR comments.

@pipeline

CI diagnosis

Diagnoses GitHub Actions failures using real workflow run data.

@sentinel

Quality gates

Analyzes CI checks, coverage gaps, and quality gates on PRs.

@security

Compliance audit

Runs OWASP, RBAC, secrets, compliance audits across the repo.

@sre

Incident triage

Generates runbooks, configures SLOs, triages incidents end-to-end.

@compass

Epic decomposition

Decomposes epics into INVEST user stories and creates GitHub Issues.

@docs

Documentation

Keeps documentation, runbooks, and ADRs current.

These are not demos. They are production agents running in customer environments with identity, governance, and cost controls.

What goes wrong without governance

Every failure mode below is a governance failure, not a model failure.

01Shadow agents running on personal credentials, leaking IP into public model endpoints.

02Unattributed spend. Surprise OpenAI bills with no per-team breakdown.

03Hallucinated changes reaching production without traceability.

04Prompt injection attacks through agents that fetch untrusted content.

05Compliance failures. Auditors cannot answer "who changed this and why?"

Open Horizons solves the governance failure. The model gets to do its job.

From category to one concrete stack

Agentic DevOps is the category. GitHub + Azure + Foundry is the stack. Open Horizons is the opinionated assembly.

01 · Category

DISCIPLINE

Agentic DevOps

vendor-neutral · 2025+

A discipline, not a product. Agents are first-class developers with identity, RBAC, cost ceilings, and trajectory replay. Vendor neutral.

choose
products

02 · Stack

PRODUCTS

GitHub + Azure + Foundry

Microsoft building blocks

AKS GHAS GitHub Copilot AI Foundry Entra ID Sentinel Purview

opinionate
+ wire

DAY 1 READY

03 · Assembly

OSS

Open Horizons

paved-road · governed · reproducible

Working Backstage + Azure + Foundry in your tenant under 3 hours. 22 Golden Paths, 19 agents, 12 MCP servers, governed and observable on Day 1.

Next part unpacks the middle box. We name every component, show how it fits, and end with the loops that make GitHub + Azure better together than either alone.

PART

The Microsoft Agentic DevOps stack.

GitHub, Azure, AI Foundry. The three pieces Open Horizons orchestrates into one platform.

Evolving DevOps

Twenty years, three definitions. Each one adds, none removes.

DevOps

"Union of people, process, and technology to enable continuous delivery of value to end users."

The original. Broke down silos between Dev and Ops. Made deploys daily, not quarterly.

DevSecOps

"Union of people, process, and technology with security as a shared responsibility to enable continuous delivery of value to end users."

Security shifted left. Scans, policies, identity baked into the pipeline.

Agentic DevOps

AI-powered agents operating as members of your dev and ops teams, automating, optimizing, and accelerating every stage of the software lifecycle.

Agents shifted in. Code, review, deploy, operate alongside humans, governed.

Source: Microsoft. Each layer survives. Agentic DevOps requires DevSecOps requires DevOps.

Agentic DevOps defined

Autonomous and semi-autonomous agents work alongside developers and operators
across every stage of the software lifecycle.

Agents solve routine and complex tasks together, bringing apps to market faster, increasing code quality and security, removing repetitive development work, reducing technical debt, and reframing the economics of operating, maintaining, and modernizing apps in production.

Through Agentic DevOps, developers orchestrate a series of agentic services with the freedom to focus on higher-value creative work, while operators proactively identify, mitigate, and resolve issues in production.

Developer

Orchestrates agents. Owns intent.

Agent

Executes routine + complex tasks.

Operator

Identifies, mitigates, resolves.

Microsoft + GitHub, the platform for AI innovation

One platform across the full software lifecycle. Policy and governance wrap every stage.

Policy and governance

Plan

Planning Agent
GitHub Issues
Spaces

Code

Agent Mode
Coding Agent
Spark Workbench

Verify

Autofix
Code Review
Playwright
Pull Requests

Deploy

GitHub Actions
Spark Runtime
AI Workflows

Operate

Metrics
Models
SRE Agent

Integrations + MCP · Anthropic · OpenAI · Atlassian · Docker · VS Code

Source: Microsoft. Open Horizons consumes this stack and wires it into Backstage. Every named asset above appears in the catalog with an owner.

Agentic DevOps for Azure and GitHub

Three workflows. Code, Collaborate, Operate. Each one is the agent + the human working the same surface.

GitHub and Azure work better together

Six concrete loops where GitHub + Azure remove a class of integration work.

GitHub Copilot Modernization

Build apps

GitHub Copilot App Modernization builds applications and refactors legacy code into AKS and App Services.

Foundry Apps

Manage AI services

Customers adopting AI and Agents build AI Apps and use Azure AI Foundry to manage their AI services in one place.

PGSQL preference

AI-app data layer

Developers building AI Apps prefer PostgreSQL with pgvector, leading to use of Azure Database for PostgreSQL.

Dev with GitHub Copilot

Code to prod

Developers using AI and Agents with GitHub Copilot write the code that ships to production and integrates across Azure services.

GHAS + Defender

Secure the loop

GitHub Advanced Security integrates with Microsoft Defender for Cloud to identify vulnerabilities and remediate with agents through GitHub Copilot.

SRE Agent

Incident response

SRE Agent automates incident response and monitoring, creating GitHub Issues and collaborating with GitHub Copilot's coding agent to fix problems.

Azure AI Foundry

App platform for a multi-model world. The model + agent + observability surface Open Horizons consumes.

Open Horizons consumes Foundry Models for inference, Foundry Agent Service for orchestration, and Foundry Observability as the data source for the L6 harness telemetry.

Two protocols, two scopes

MCP ♥ A2A

MCP is about Agent → Tool interactions

A2A is about Agent → Agent interactions

MCP, the Model Context Protocol from Anthropic, is the standard for how an agent calls a tool or retrieves context. A2A v1.0 is the standard for how one agent hands off to another, propagating state and trace context. Open Horizons ships 12 MCP servers and uses A2A v1.0 in the L6 harness so multi-agent workflows are observable end to end.

Spec-Driven Development

Four phases that turn a vague idea into a verifiable spec, ready for agent execution.

01 · Specify

What and why

Define user stories, goals, success criteria. Not tech specs. Source of truth for the change.

02 · Plan

How

Tech stack, architecture, data models, integration with legacy systems, compliance constraints.

03 · Task

Break it down

Reviewable, testable, specific. Implementable and verifiable in isolation. TDD-friendly.

04 · Implement

Execute

Each task implemented individually. Review, test, approve. Continuous validation against the spec.

.specs/042-pricing/

CONSTITUTION.md      # non-negotiables
SPECIFICATION.md     # EARS reqs, user stories
PLAN.md              # architecture, data models
TASKS.md             # 22 reviewable units
DIAGRAMS/            # 4 mermaid + 2 svg
ADRs/                # decision records

@implementer picks T-001..T-022 in sequence
@reviewer   gates each PR against SPEC
scope-guard blocks file edits outside .specs/042-*

Source: spec-kit pattern, used by Microsoft Specky and Open Horizons SDD pipeline. The spec is the contract between human intent and agent execution.

GitHub Copilot, ten foundational use cases

The starter menu. What every team uses on day one, before the first agent ships.

Code Completion

Suggests lines or blocks based on context.

Refactoring

Improves efficiency, readability, maintainability.

Documentation

Generates doc files and inline comments.

Test Generation

Writes test cases, reduces TDD friction.

Bug Fixing

Identifies and fixes errors quickly.

Code Conversion

Translates between programming languages.

IaC + Automation

YAML, Docker, Terraform, Bicep, ARM.

GitHub Copilot CLI

Bash, Git, GitHub scripts on demand.

SQL Optimization

Writes and improves SQL across engines.

Learning

Acts as a mentor for best practices and concepts.

Source: Microsoft GitHub Copilot fundamentals. Open Horizons treats these as the on-ramp. Once teams adopt them, the next step is custom agents through Foundry Agent Service.

GitHub Copilot, use cases by persona

Five roles. Different prompts, same plumbing. The platform is one.

Developer

Code completion
Boilerplate generation
Multi-language support
Refactoring
Debugging
API integration
Unit test generation

QA Engineer

Test case generation
Edge case suggestions
Test data generation
Bug reproduction
Mock + stubbing
Regression tests
Coverage improvement

DBA

SQL optimization
Schema design
Stored procedures
Indexing
Data migration
Backup + recovery
Query debugging

DevOps

IaC (Terraform, Bicep)
CI/CD pipeline YAML
Log parsing
Containerization
System monitoring
Incident response
Shell scripting

Security

Secure coding
Threat modeling
Policy enforcement
Pen-testing scripts
Log analysis
IAM config
Crypto guidance

✓ Accelerates workflows

Suggestions, boilerplate, automation.

✓ Reduces cognitive load

Less syntax memorization.

✓ Enhances learning

Real-time examples and best practices.

✓ Improves efficiency

Less context switching, faster cycles.

Better together for AIOps

Four loops where GitHub + Azure AI Foundry produce outcomes neither delivers alone.

Loop 1

Code-first AI dev

Application code, model configurations, prompt engineering all in one repo. AI Toolkit + AI Foundry extension lets devs work locally before deploying.

Loop 2

Build, test, deploy

Automate AI model deployment and agent workflows. Deploy prompt flows, evaluations, and monitoring to Azure AI Foundry.

Loop 3

Enterprise security

Detect vulnerabilities early with CodeQL + Dependabot. Add observability, safety filters, evaluation frameworks for responsible AI.

Loop 4

End-to-end orchestration

Automates deployment of validated models and agents directly. Unifies AI orchestration across models and tools with chaining, memory, planning.

Faster innovation

Robust security

Enterprise governance

Scalable AI adoption

Where Open Horizons fits

Microsoft ships the building blocks.
Open Horizons assembles them into one governed platform on your tenant.

GitHub gives you GitHub Copilot, Actions, Issues, Advanced Security. Azure gives you AKS, Entra, Key Vault, App Insights, Defender. Azure AI Foundry gives you Foundry Models, Foundry Agent Service, Foundry Observability. Each piece is excellent on its own and well documented. The work that takes 9 to 18 months in most enterprises is gluing them together into a single, governed, opinionated, reproducible platform with paved roads and a catalog. That is the work Open Horizons has already done. You consume the result.

What writing a spec looks like

VS Code, Specky extension, the .specs/ folder open.
Intent first, then plan, then tasks, then code. In that order.

042-pricing-engine · ohorizons · main

EXPLORER

▾ .specs/

▾ 042-pricing-engine/

📋 CONSTITUTION.md

📄 SPECIFICATION.md

🏗 PLAN.md

✓ TASKS.md

📊 DIAGRAMS/

📝 ADRs/

▸ src/

▸ tests/

▸ helm/

📜 catalog-info.yaml

SPECIFICATION.md

CONSTITUTION.md

1# Pricing Engine v2 · SPECIFICATION

3## Goal

4Reprice 1.2M SKUs in ≤4h, respecting margin floor.

6## EARS Requirements

7REQ-001 WHEN a SKU has margin < 8%,

8 THE SYSTEM SHALL hold the prior price.

9REQ-002 WHEN upstream cost increases >15%,

10 THE SYSTEM SHALL route to @reviewer first.

12## Out of scope

13· Promo engine (separate spec 043)

14· Currency conversion (handled upstream)

15│

GitHub Copilot · Specky

REQ-002 mentions @reviewer. Should that fire on every price change or only when cost change > 15%?

Spec uses WHEN ... THEN, so guard fires only at the threshold. I would tighten the wording:

REQ-002 WHEN
  cost_delta > 0.15
  THE SYSTEM SHALL
  route("@reviewer")

✓ Spec syntax valid. Adds to TASKS.md as T-009.

The dev writes intent in natural language. Specky validates EARS, suggests refinements, and queues implementation tasks. Code does not start until the spec compiles.

Part VI · Day in the life

How developers and agents actually use the portal.

A walkthrough with mockups, flows, and one collaboration diagram. So you can picture it before you build it.

The developer's portal, simulated

What a developer sees on a Tuesday morning. One pane. Everything wired.

backstage.ohorizons.ai/catalog

Open Horizons

Catalog
Create new
Golden Paths
TechDocs
AI Agents
FinOps
Dashboards

Software Catalog

46 entities · owned by 5 teams · all healthy

Name	Kind	Owner	DORA
formulation-service	Service	team-rnd	● Elite
batch-tracking	Service	team-quality	● Elite
@deploy	Agent	team-platform	● High
@reviewer	Agent	team-platform	● High
storefront	Service	team-commerce	● Medium

Services and agents in one catalog. Same owner, same lifecycle, same DORA scoring. An agent is not a special creature, it is a component with a model.

How a developer uses the portal

From "I need a new microservice" to "it is live in production." Seven steps, fully self-service.

Total developer effort: ~5 minutes of clicks. Total wall-clock to production: ~12 minutes. No tickets opened.

The agent's portal, simulated

What an agent author sees. Same UI, different tab. Trajectory + tokens + verdict in plain view.

backstage.ohorizons.ai/agents/@reviewer/trajectories/trj-9f2a

@reviewer · trj-9f2a-b81e

PR #2418 · pricing-engine · 2026-05-22 09:14:08Z

● SUCCESS

REPLAYABLE

Model

gpt-5-1

Tokens

8,412 in · 1,902 out

Cost

$0.041

Latency

14.2s

Safety

● clean

Trajectory timeline

+0.0s

plan → break PR into 3 review passes

+0.4s

tool → github.get_pr_files (12 files, 412 LoC changed)

+2.1s

tool → mcp.codemap.lookup ("pricing-engine")

+3.8s

cache hit → SOLID rubric (saved $0.018, 1.4s)

+9.0s

model → review_pass_security (gpt-5-1, 2,104 tok)

+13.6s

tool → github.post_review_comment (×3)

+14.2s

verdict → SUCCESS (3 suggestions, 0 blockers)

This is the black box recorder. Click any step to inspect the prompt, the tool args, the model response. Replay deterministically to debug.

How an agent uses the portal

An agent is invoked by an event. The harness wraps the run. The trajectory lands in the catalog.

Every step is recorded. The agent never bypasses the harness. The trajectory is what the auditor reads.

How developers and agents interact

A single PR. Two collaborators. Same catalog, same governance. The developer writes intent, the agent writes the boring part.

DEV LANE

Paula

Software Eng

Owns

Intent · spec · judgement call

SHARED · One PR · One spec · One catalog

1 · DEV WRITES

.specs/042-pricing

CONSTITUTION + EARS

2 · AGENT CODES

@impl drafts PR

scope-guard enforced

3 · AGENT REVIEWS

@reviewer + @security

comments on PR

4 · DEV DECIDES

accepts 2/3 fixes

rejects 1 with reason

5 · MERGE

@deploy

ArgoCD → AKS

Same PR · Same run log · One audit trail

spec-id 042-pricing · PR #2418 · trajectory trj-9f2a-b81e · 3 reviewers (1 human, 2 agents) · evidence → SOC 2 · ISO 27001 · NIST AI RMF

AGENT LANE

@reviewer

+ @security · @impl

Owns

Code · review · the boring part

The platform refuses to record an agent action without a PR + spec link, and refuses to record a human action without a PR + reviewer. Same governance, both sides.

Portal surfaces, one click away

Six entry points covering the full developer + agent workflow.

Foundry Control

Live agents and models. The Toolbox: 12 MCP servers, 4 built-ins. 3-tier cache. Model routing.

AI Agents

Every agent: what it consumes, its model tier, its cache hit-rate, its trajectories.

FinOps & Tokens

L6 dashboards: calls/min, $/day by agent, token budget, hook actions, Purview audit.

Software Catalog

Every Component, API, Resource, System, Agent registered in the portal. One source of truth.

Golden Paths

The 22 Scaffolder templates. rag-application, multi-agent-system, foundry-agent, and more.

All Dashboards

Grafana folder tagged context-platform. L1 to L6, DORA, FinOps, agent fleet, eval scores.

The unifying principle

One catalog. One identity model. One audit trail.
Two kinds of users.

A developer scaffolds a service via a Golden Path. An agent is invoked by an event, calls tools through the harness, writes a trajectory. Both go through the same portal, the same RBAC, the same observability stack. The agent is not a side-car or a chatbot, it is a first-class platform citizen registered in the catalog with an owner, a cost center, a runbook, and an SLO. Treat it that way from Day 1 and the 95 percent pilot failure rate becomes a 5 percent problem.

From concept to pixels

You have seen the diagrams.
Now look at the actual product.

The next ten slides are mockups of the real ohorizons.ai screens. Same chrome, same data shapes, same agents. You can open the URL on your laptop and recognize every panel. If a screen looks unfamiliar, that is the gap. If a screen makes you say "we need that," that is the opening for a Discovery.

→ Landing page

→ Maturity framework

→ Command dashboard

→ Create templates

→ AI Chat

→ AI Impact

→ Foundry Control

→ 17 Platform Agents

Part VII · Live tour

VII

Inside the platform.

Eight UI surfaces, simulated from the real ohorizons.ai showcase environment.

ohorizons.ai · landing

The public showcase. Same brand chrome devs see inside the portal.

ohorizons.ai

Open Horizons

PlatformDifferentiatorsArchitectureFAQ

● Agentic DevOps Platform · Open Horizons

The platform that accelerates the Agentic SDLC

AI-powered developer portal with Golden Paths, intelligent agents, and full observability, built on Backstage, Azure, and GitHub.

Explore Platform →

Golden Paths

AI Agents

MCP Servers

Insights

GitHub Copilot CLI · @deploy agent

$ @deploy platform --env prod
Initializing deployment agent...
@deploy → Provisioning AKS cluster
Applying 18 Terraform modules...
Key Vault, Networking, Defender...
✓ H1 Foundation, 32m 14s
@deploy → H2 Enhancement
ArgoCD, Backstage, Prometheus
Loading 22 Golden Path templates
✓ H2 Enhancement, 28m 47s
@deploy → H3 Innovation
AI Agents, MCP Servers, RAG
✓ H3 Innovation, 18m 22s

ohorizons.ai · AI Maturity Framework

28 capabilities across 3 pillars, measured from Traditional (L0) to Agentic (L4).

Developer Productivity

8 capabilities

AI Coding Assistants + GitHub Copilot
Dev Environment Standardization
Code Review Automation
Testing + Test Generation
Knowledge Management
Developer Experience
Onboarding + Time to Productivity
Code Quality + Technical Debt

DevOps Lifecycle

10 capabilities

CI/CD Pipeline Automation
Test Automation + Quality Gates
Security Scanning + Compliance
Release Management + Deployment
Observability + Monitoring
Incident Response + Management
IaC + GitOps
DORA Metrics + Performance
Agentic DevOps + SRE Automation
Deployment Frequency + Velocity

Application Platform

10 capabilities

Cloud Architecture + Infrastructure
Platform Engineering + IDP
AI/ML Operations (MLOps)
Model Evaluation + AI Safety
RAG Systems + Knowledge Retrieval
AI Agents + Orchestration
Data Management + Governance
API Management + Service Mesh
Disaster Recovery + BC
Cost Optimization + FinOps

Traditional Manual, ad-hoc, no AI tools

AI-Assisted Piloting AI tools, limited standardization

AI-Enhanced Significant AI integration, managed processes

AI-Optimized AI agents, predictive capabilities

Agentic Multi-agent systems, self-healing, autonomous

ohorizons.ai · Command Dashboard

Agentic DevOps Command. The hub a platform lead opens every morning.

Open Horizons

DashboardCatalogDocs

🔍 Search resources...

Paula Silva

GitHub User

Home

Catalog

APIs

Docs

Create

Graph

Cost Insights

Validation

Platform

GitHub Copilot Metrics

DORA Metrics

Security + Quality

Tech Debt

Platform Status

Intelligence

AI Chat

AI Impact

● Platform Overview

Agentic DevOps Command

Your central Agentic DevOps hub for the entire SDLC. Monitor platform health, CI/CD pipelines, and KPIs across Azure, GitHub, and Azure DevOps.

View All Metrics

Quick Actions

Active projects

124

Deployments

Team members

94%

Health score

Active deployments

1,284

↑ +12.5% from last week

Success rate

99.4%

↑ +0.2% from last week

Avg lead time

42m

↓ -5.4% from last week

Open incidents

↓ -2 from last week

ohorizons.ai · Create

Software Templates. Every Golden Path is one click away.

H1 Foundation

H2 Enhancement

H3 Innovation

All 22

gitops-config

H2 · Create GitOps Deployment Configuration

Create GitOps deployment configs. Generates ArgoCD Application manifests, Kustomize overlays, and multi-env pipelines.

h2-enhancementgitopsargocd

👤 Platform EngineeringCHOOSE

ai-agent

H3 · Create AI Foundry Agent

Create an autonomous AI agent powered by Azure AI Foundry. Tool definitions, RAG integration, safety controls, Agent Service deploy.

h3-innovationfoundryagentic

👤 Platform EngineeringCHOOSE

multi-agent-system

H3 · Create Multi-Agent System

Production multi-agent AI system. Orchestration, collaboration patterns, human-in-the-loop. AutoGen, Semantic Kernel, custom frameworks.

h3-innovationmulti-agentsemantic-kernel

👤 Platform EngineeringCHOOSE

ohorizons.ai · AI Chat

One conversation. Six specialised agents. Click a suggestion to see the orchestrator route.

@pipeline

@sentinel

@compass

@guardian

@lighthouse

@forge

● Orchestrator

Hello, I'm the Open Horizons Assistant. I coordinate 6 agents. Try a suggestion below or @mention an agent directly.

Try one

@pipeline check the build on ohorizons

@sentinel show test status on main

@compass decompose epic: user auth with SSO

@guardian scan security on ohorizons

@lighthouse show error rate on prod

@forge describe deploy-orchestrator pod

ohorizons.ai · AI Impact

Measure the real impact of AI and Agentic DevOps. Adoption, productivity, velocity, quality, in one place.

● AI Impact

AI Impact Dashboard

Measure the real impact of AI and Agentic DevOps on your SDLC. Powered by GitHub APIs, KPI engine, and Claude Opus.

Run AI Analysis

Refresh Data

45/100

Impact score

3.33/day

Deploy freq

Contributors

Insights

📈 Key Performance Indicators

Adoption

GitHub Copilot seat utilization

68%

Adoption rate

Productivity

GitHub Copilot effectiveness

42%

Acceptance rate

Velocity

Development speed

3.33

Deploys/day elite

Quality

Reliability + security

2.5%

Change failure rate low

ohorizons.ai · Foundry Control

The agents-service gateway. Live readings from the L3 + L6 runtime.

Context Platform · L3 Context Engineering · L6 Harness

Foundry Control · the agents-service gateway

The foundry-agents service in namespace ai-services is the runtime that fronts Azure AI Foundry: serves agents, fronts the model router, aggregates the MCP Toolbox, runs the 3-tier semantic prompt cache, applies pre/postToolUse hooks, emits 21-field telemetry, writes Purview audit.

Gateway

HEALTHY LIVE

Gateway /healthz

live probe via in-cluster proxy

4 LIVE

Agents registered

from /v1/agents

15 LIVE

Toolbox tools

11 MCP + 4 built-in

94% LIVE

Prompt cache hit-rate

semantic, threshold 0.93

MCP Toolbox · 11 MCP + 4 built-ins · /v1/toolbox/*

Tool

Description

mcp.github

MCP server 'github', PR + Issues + Actions

mcp.azure

MCP server 'azure', resources + cost

mcp.terraform

MCP server 'terraform', plan + apply

mcp.foundry

MCP server 'foundry', model deploy + agent service

mcp.aks

MCP server 'aks', cluster + pods + services

mcp.backstage

MCP server 'backstage', catalog + scaffolder

builtin.web_search

Foundry built-in web search, Bing-grounded

builtin.azure_ai_search

Foundry built-in Azure AI Search retrieval

a2a.platform

Agent-to-Agent connection to 'platform' (A2A v1.0)

ohorizons.ai · Platform Agents

The 17 GitHub Copilot domain agents that run the platform. Each one owned, tiered, cached.

Agent

Owner

Model · tier

Calls/24h

Cache

What it does

router

three-horizons-platform

gpt-4o-mini CHEAP

2,410

97%

Routes each request to cost-optimal tier; A2A fan-out

doc-writer

three-horizons-platform

gpt-4o-mini CHEAP

780

93%

Generates ADRs / RFCs / READMEs / TechDocs

code-reviewer

three-horizons-platform

gpt-4o WORKHORSE

660

90%

Reviews PRs against repo standards + security-insights

test-engineer

three-horizons-platform

gpt-4o-mini CHEAP

240

88%

Characterization / contract / equivalence tests; coverage

incident-responder

three-horizons-platform

gpt-4o WORKHORSE

210

71%

Triages incidents from logs/metrics/traces; runs SRE runbooks

infra-architect

three-horizons-platform

gpt-4o WORKHORSE

180

74%

Azure Well-Architected; Terraform module design

security-auditor

three-horizons-platform

gpt-4o WORKHORSE

160

69%

OWASP/CWE scanning, deps CVEs, secrets, hardening

terraform-agent

three-horizons-platform

gpt-4o-mini CHEAP

150

86%

Authors/refactors Terraform; plan/apply triage; drift detection

devops-agent

three-horizons-platform

gpt-4o-mini CHEAP

190

87%

CI/CD pipelines, k8s orchestration, GH Actions / Tekton

sre-agent

three-horizons-platform

gpt-4o-mini CHEAP

130

83%

SLOs, observability, error budgets; wires Prometheus/Grafana/Loki

github-integration

three-horizons-platform

gpt-4o-mini CHEAP

85%

Configures GitHub Apps, GHAS, Actions, Packages

ado-integration

three-horizons-platform

gpt-4o-mini CHEAP

84%

Azure DevOps PAT, repos, pipelines, Boards

hybrid-scenarios

three-horizons-platform

gpt-4o WORKHORSE

76%

Designs GitHub + ADO coexistence (scenarios A/B/C)

template-engineer

three-horizons-platform

gpt-4o-mini CHEAP

110

88%

Creates Golden Path templates; converts repos to templates

context-architect

three-horizons-platform

gpt-4o WORKHORSE

140

80%

Plans coordinated multi-file changes; maps context + dependencies

onboarding-agent

three-horizons-platform

gpt-4o-mini CHEAP

90%

Walks new users through prerequisites, config, first deployment

docs-agent

three-horizons-platform

gpt-4o-mini CHEAP

100

92%

Technical writing + knowledge management across platform docs

Tiered model routing keeps 76% of calls on the cheap tier. Workhorse only when the task demands it. Cache hits drive total spend down by an order of magnitude.

Try it live

The whole platform is at ohorizons.ai

Sign in with GitHub, explore the Catalog, click into Foundry Control, run an AI Impact analysis, or talk to the six agents. The showcase is the platform, deployed on a public Azure subscription, with the same code you would deploy on yours.

URL

ohorizons.ai

Public showcase environment

Sign-in

GitHub OAuth

Read-only access to the showcase

Source

github.com/Ohorizons

Same code as your deployment

Maturity radar · current state

Where most LATAM enterprises score today across the four engineering layers.
Pre-platform vs post-H2 Open Horizons.

Pre-platform · typical client baseline

Intent 1 · Context 2 · Platform 2 · Infra 3

Post-H2 Open Horizons · 12 weeks in

Intent 4 · Context 4 · Platform 4 · Infra 4

Maturity is decided by the weakest layer. The radar makes that visible. H1 lifts Infra, H2 lifts Platform, H3 lifts Context and Intent together.

What L4 and L6 actually measure

Two dials every platform team should read weekly. Intent debt and token spend distribution.

L4 · Intent debt index

MONITOR

0-40

40-60

MONITOR

60+

ACTION

Distance between spec baseline and live agent behavior. Above 60 means the spec is stale and agents drifted.

L6 · Token spend · last 30d

5× SAVED

Cheap tier76%

Workhorse18%

Premium5%

Cache hit1%

Model routing keeps 76 percent of calls on cheap tier. Without routing, the same workload would cost ~5× more on workhorse default.

L4 measures whether agents are still doing the right thing. L6 measures how much it costs. Both numbers belong in the platform team's Monday review.

PART

VIII

The Context Platform Stack.

Six layers, integrated. Cloud, platform, context, intent, integration, harness.

The stack at a glance

Six layers, top to bottom. Intent flows down, telemetry flows up.

telemetry

L6 · Harness

SRE+FinOps+Sec

App Insights · pre/postToolUse hooks · A2A v1.0 · 21-field llm.call.completed · FinOps · Purview · Sentinel · model routing · Entra WIF.

Grafana L6 FinOps board

L5 · Integration

Platform integrators

Hybrid scenarios A/B/C · GitHub + Azure DevOps + Argo CD + MCP coexistence · API Center · catalog cross-links · integration agents.

Grafana L5 ArgoCD ↗

L4 · Intent

Spec authors

SDD + Specky pipeline · 103 EARS requirements · ADRs · pipeline-guard.yml LGTM gates · model-routing.yaml · scope guards.

Grafana L4 .specs/

L3 · Context

AI engineers

Foundry Toolbox · 12 MCP servers + 4 built-ins · 3-tier semantic prompt cache · enterprise_memory on pgvector · rag-application path.

Grafana L3 MCP APIs

L2 · Platform

Platform team

Upstream Backstage on AKS · 22 Golden Paths · RBAC plugin · DORA Four Keys · OPA Gatekeeper · the scaffolder.

Grafana L2 Catalog

L1 · Cloud

Cloud platform team

Terraform 18 modules · AKS · networking · Key Vault · ACR · PostgreSQL Flexible Server (pgvector) · Azure AI Foundry.

Grafana L1 terraform/

intent

Intent flows from L4 down into Golden Paths in L2 and agent behavior in L3. The harness in L6 wraps every model call. Integration in L5 is how GitHub, Azure DevOps, Argo CD, and MCP coexist. Everything sits on the Terraform-managed Azure foundation.

Layer 1 · Cloud and Infrastructure

The compute substrate. Eighteen Terraform modules, declared and reproducible.

AKS

Kubernetes 1.34

Private API server, autoscaler min=1, max=4, Workload Identity enabled.

ACR

Container registry

Admin disabled, managed-identity pull, signed images via cosign.

Key Vault

Secrets

Private endpoint, RBAC, CSI driver projects to pods as files.

PostgreSQL

Flexible Server

Private VNet, 30-day backup, pgvector enabled for memory and RAG.

Log Analytics

Telemetry

90-day retention, Container Insights, the App Insights sink for L6.

AI Foundry

Models

gpt-5-1, gpt-5-4-pro deployed by name. Routing handled at L4.

Ingress

NGINX + cert-manager

Let's Encrypt, 4 TLS ingresses Ready by default.

All resources tagged with customer_name, environment, cost_center for L1 + L6 FinOps roll-up.

Layer 2 · Platform Engineering

The developer experience layer. Upstream Backstage on AKS, 22 Golden Paths, full observability.

01Upstream Backstage OSS, portal + Software Catalog + Scaffolder + TechDocs.

02Argo CD app-of-apps for declarative GitOps deployments.

0322 Golden Paths covering H1, H2, H3 use cases, including rag-application, multi-agent-system, foundry-agent.

04RBAC plugin + DORA Four Keys plugin for governance and metrics in one UI.

05OPA + Gatekeeper admission control on every workload.

06Prometheus + Grafana + Loki + Alertmanager, the observability stack agents and humans share.

This is the layer that makes the developer productive. Agents consume L3 on top of it.

Layer 3 · Context Engineering

Give agents the right context, at the right time, in the right format.

Toolbox

12 MCP + 4 built-ins

Agents-service exposes curated functionality. Discoverable, governed, rate-limited.

Prompt cache

3-tier RediSearch HNSW

Collapses repeated reasoning into cache hits. Hit ratio per tier surfaced in Grafana L3.

enterprise_memory

pgvector long-term

Long-term agent memory on PostgreSQL + pgvector. Indexed for retrieval.

3-tier memory

user, repo, session

Scoped memory with explicit lifetimes. Cross-agent sharing via the Shared Context Store.

RAG index

rag-application path

Customer-specific. The Lumen demo indexes INCI, regulations, formulas.

CODEMAP.md

Program skeleton

A curated map the agents read first. Cuts cold-start token spend dramatically.

Layer 4 · Intent Engineering

Translate human intent into specifications the system can execute reliably.

01SDD / Specky 10-phase pipeline, artifacts in .specs/NNN-*/, from Init through Release.

02103 EARS requirements, 144 tasks, 18 diagrams, ADRs in the reference repo.

03pipeline-guard.yml, LGTM gates between phases. No skipping.

04.github/model-routing.yaml, declarative routing, cheap model for cheap task, premium for design.

05scope-guard.sh + preToolUse hooks, block out-of-scope file edits.

06scripts/measure-intent-drift.sh, distance between current behavior and spec baselines.

Layer 5 · Integration

Make GitHub, Azure DevOps, Argo CD and MCP coexist under one platform.

Scenarios

A, B, C

GitHub-only, ADO-only, Hybrid. Customer chooses, platform abstracts.

GitHub App

Discovery + scaffolder

Catalog discovery + scaffolder publish. App ID 3010479 in the reference deployment.

ADO connection

Workload Identity Fed

No PAT, no SP secrets. Federated credential.

Argo CD

Git-source agnostic

Consumes manifests from either source. The deployment authority.

API Center

Unified inventory

Single API inventory across GitHub + ADO repos.

Cross-links

Catalog ↔ everything

Backstage entities link to GitHub Issues, ADO Boards, Argo apps, Grafana dashboards.

Layer 6 · Harness Engineering

The runtime that wraps every model call. With L6, the agent becomes a governed production system.

How the harness wraps a model call

Observability · 3 hooks

Gateway pre/postToolUse hooks intercept every call. A2A v1.0 context with correlation IDs, spans, trace. 21-field llm.call.completed streamed to App Insights.

FinOps · budget enforcement

Per-agent, per-team, per-CC budgets enforced at 50/80/100 percent. 100 percent hard-stops the agent.

Security & compliance · 2 sinks

Microsoft Purview audits every retrieval against sensitive sources. Sentinel SIEM/SOAR receives prompt-injection, safety violations, scope-guard breaches.

Identity · per agent

Entra Workload Identity Federation per agent. Kubernetes ServiceAccount, Azure AD identity, scoped RBAC.

Per-layer Grafana dashboards

Six layers, dedicated dashboards each. Data sources, Prometheus, Loki, App Insights, Azure Cost API, PostgreSQL.

Layer	Highlight dashboards
L1, Cloud	AKS Cluster Health · AKS Resource Utilization · Azure Resource Inventory · Azure FinOps Spend by Service · FinOps Anomaly · Key Vault Health · PostgreSQL
L2, Platform	Backstage Service Health · Catalog Coverage · Argo CD Sync · Golden Path Adoption · DORA Four Keys · Ingress + TLS · OPA Violations
L3, Context	MCP Server Health · Tool Call Distribution · Prompt Cache Hit Ratio · Shared Context Store · 3-Tier Memory · Skill Load Heatmap · RAG Index Health
L4, Intent	SDD Pipeline Status · Model Routing Decisions · Routing Cost Savings · Scope Guard Activity · Intent Drift
L5, Integration	GitHub Actions Health · Azure DevOps Pipeline Health · Argo CD App-of-Apps · API Center Inventory · Catalog Cross-Link Coverage
L6, Harness	Agent Fleet Overview · Trajectory Volume · Token Consumption Live · Cost Live USD · Budget vs Actual · Budget Alerts 50/80/100 · Eval Scores · Content Safety · Purview Access · llm.call.completed Stream

Every dashboard has Alertmanager rules in prometheus/alerting-rules.yaml. The platform refuses to operate without observability, CI gates enforce dashboard + alert presence.

Cross-layer FinOps

The CFO view. L1 cloud spend + L6 AI spend, one board.
Closes the "AI is unaffordable" objection before it leaves the room.

Total spend

Cloud + AI roll-up

L1 cloud and L6 AI in one number, with month-over-month deltas, broken down by cost center, team, environment.

Forecasting

Budget vs actual

Forecast vs budget for the current month. Anomalies and breaches in the last 30 days. Top 5 cost drivers across cloud + AI.

Per agent

Cost-per-trajectory

Efficiency metric, USD per successful trajectory, per agent. Drives prompt-cache investment, model routing, and retirement decisions.

Why six layers and not five or seven

Five collapses. Seven over-specifies. Six survived production contact.

COLLAPSES

Five layers loses contracts.

Five collapses Integration into Platform (the GitHub + ADO + Argo + MCP coexistence problem becomes a Backstage problem) and Harness into Execution (telemetry + FinOps + hooks + audit become the agent author's problem). Both produced unmaintainable bundles in real deployments.

SHIPPED

SURVIVES

Six layers, clean ownership.

L5 Integration owned by platform integrators, not by Backstage maintainers. L6 Harness owned by SRE + FinOps + Security, not by the agent author. Each layer maps cleanly to who is on-call when it fails. Survives audits.

OVER-SPECIFIES

Seven adds noise.

Splitting Identity out of Cloud, or carving Telemetry out of Harness, doubles the contracts without doubling the ownership. The boundary stops matching how teams actually operate. Layers should be the minimum that survive a real audit, not the maximum a diagram can hold.

The six-layer model is not a theory. It is the residue of trying five and seven first. Clean contracts, each one with a named owner.

PART

The Horizons Phases.

H1 Foundation, H2 Enhancement, H3 Innovation. A staged adoption model.

Three phases, three outcomes

Each phase builds on the previous. No customer is asked to commit to H3 before H1 works.

Foundation, 4 to 8 weeks

Cloud + Platform + Portal. A working IDP on Azure with Backstage, GitOps, observability. Pilot team usable within 30 days.

Enhancement, 8 to 12 weeks

Golden Paths + Governance + Self-service. Application teams ship via paved roads. Continuous compliance enabled.

Innovation, 12 to 24 weeks

Agentic DevOps + Context Platform. Agents first-class platform citizens. Trajectory, cost, governance in production.

Naming note. These are the Open Horizons Phases. They are not McKinsey's "Three Horizons of Growth" strategy framework.

H1 · Foundation

Stand up a production-grade Azure environment with Backstage, GitOps, and observability.

What gets deployed

Azure Landing Zone (RG, networking, identity)
AKS cluster (private, autoscaled)
Azure Container Registry
Azure Key Vault
Azure Database for PostgreSQL
Log Analytics + Container Insights
Entra Workload Identity
Backstage OSS portal
Argo CD GitOps
Prometheus + Grafana + Loki + Alertmanager
NGINX Ingress + cert-manager + Let's Encrypt

Success criteria

Developer logs into Backstage via GitHub OAuth or Entra
Pilot service scaffolded, deployed, visible in catalog
Argo CD shows green sync for every workload
Grafana shows cluster, ingress, application metrics
All secrets in Key Vault, no plaintext anywhere
TLS automatic and renewed by cert-manager

H2 · Enhancement

Turn the foundation into a product application teams adopt voluntarily.

Add

Golden Paths

12+ Software Templates. Repo + CI/CD + IaC + Helm + catalog + TechDocs + Argo app, wired together.

Add

Continuous compliance

OPA + Gatekeeper, tfsec, Trivy, gitleaks, Defender for Cloud surfaced in Backstage. Evidence on a schedule.

Add

FinOps starts

Cost tags on every resource. FinOps dashboards. Per-team showback or chargeback.

Success criteria, 80 percent of new services via Golden Path. Every PR gated by tfsec, Trivy, OPA. FinOps dashboard with attributed spend.

H3 · Innovation

Make agents first-class platform citizens. Move from AI assistants to production agentic systems.

01Agent IDP, the second persona. Agent catalog in Backstage with owner, RBAC, cost center, version. Trajectory logs into Loki + Backstage. Per-agent cost dashboards in Grafana.

02Context Platform (L3). MCP servers + Shared Context Store + three-tier memory + prompt cache.

03Intent Platform (L4). Specky + model routing + scope guards + intent drift measurement.

04Harness (L6). The full telemetry + FinOps + Purview + Sentinel stack goes live.

Success criteria, at least 5 production agents each with >100 trajectories/week. Per-agent cost attribution is exact and auditable. A failed run replays deterministically.

Visualizing the journey

Day 0 to Day 180+, three stages, three outcomes.

Each phase delivers value on its own. Each one multiplies the value of the next.

PART

GitHub and Azure DevOps.

The foundation of everything. Three scenarios, one platform.

Source control is the center of gravity

If it is not in Git, it is not real.

Every workflow in Open Horizons, every deploy, every spec, every agent invocation, every audit event, begins or ends in a Git repository. GitHub and Azure DevOps are not afterthoughts. They are the foundation that makes the rest of the platform work.

Three integration scenarios

Customer chooses. Platform abstracts.

Scenario A

GitHub-only

Cloud-native, modern enterprises, OSS-friendly. GitHub App + Actions + Packages + Advanced Security + OAuth.

Scenario B

Azure DevOps-only

Microsoft-shop enterprises with existing ADO investment. Repos + Pipelines + Boards + Artifacts.

Scenario C

Hybrid coexistence

Migrations in progress, divisional preferences, M&A scenarios. Single Backstage catalog, dual auth, Argo CD agnostic.

Scenario A · GitHub end to end

Seven steps from "Create new service" to live in production.

01Developer opens Backstage, picks "Create new service" from the scaffolder.

02Backstage uses the GitHub App to create a new repo with the chosen Golden Path.

03Repo registered in the Backstage catalog automatically.

04GitHub Actions runs on every push, tests, scans, image build, push to GHCR or ACR.

05Argo CD picks up the manifest change, syncs the new version to AKS.

06GitHub Advanced Security results stream into Backstage. tfsec, Trivy, gitleaks results gate the PR.

07@reviewer, @sentinel, @security agents post comments on the PR.

Scenario B · Azure DevOps end to end

Six steps. Same outcome, ADO-native plumbing.

01Developer opens Backstage, picks a Golden Path.

02Backstage uses the ADO REST API (via Service Connection) to create the repo and pipeline.

03Repo registered in Backstage, discovered as a catalog entity.

04Azure Pipelines runs on every push, same scans as the GitHub equivalent.

05Argo CD consumes the rendered manifests, syncs to AKS. Argo is Git-source agnostic.

06Boards work items link to PRs and surface in Backstage via the ADO plugin.

Scenario C · Hybrid coexistence

A single Backstage catalog discovering entities from both sources. Dual auth. Single agent plumbing.

Catalog

Unified

A single Backstage catalog that discovers entities from both GitHub and ADO.

Auth

Dual

Users sign in via either provider. SSO mapping in Entra.

CI/CD

Native + converged

Each repo uses its native CI (Actions or Pipelines) but converges on the same Helm + Argo CD pattern.

Agent identity

Common

Workload identity in Azure is the common substrate. Agents authenticate the same way regardless of source repo.

Migration

Stepping stone

Hybrid is often a stepping stone to consolidation. The platform does not force a choice.

What makes the foundation strong

Five non-negotiables across A, B, and C.

01Everything is in Git. Code, IaC, policies, specs, prompts, instructions, all version-controlled.

02PRs are the unit of change. No deploy without a PR. No agent action without a trajectory tied to a PR or spec.

03CODEOWNERS flows into the catalog. Ownership is never lost.

04Branch protection is enforced. No direct pushes to protected branches. No bypass without audit.

05Argo CD is the deployment authority. No kubectl apply from laptops.

DevSecOps tenet "everything as code" made operational, by Open Horizons defaults.

PART

Golden Paths and the Agent Catalog.

Paved roads for developers. First-class governance for agents.

A Golden Path in one line

The opinionated, paved, well-lit road for the most common developer journeys.
Fully scaffolded, fully wired, ready to ship.

Term coined at Spotify. Operationalized in Open Horizons as Backstage Software Templates that produce a working repo, pipeline, infrastructure, observability, and catalog entry, in a single click.

What a Golden Path produces, all at once

Production-ready on Day 1. Twelve artifacts wired together, four families.

Repo foundations

01Git repository in the chosen org.

02App skeleton in the chosen language.

11CODEOWNERS, branch protection, Dependabot, GHAS.

Build, ship, deploy

03CI/CD pipeline with tests, scans, image push.

04Helm chart with limits, probes, NetworkPolicy.

06Argo CD Application in the app-of-apps pattern.

Infra & DevEx

05Terraform for dedicated infra: DB, queue, storage.

07Backstage catalog entry with owner and lifecycle.

10Devcontainer for Codespaces or local VS Code.

Specify & observe

08TechDocs scaffolding ready to publish.

09.specs/ folder seeded with Constitution + Spec templates.

12Pre-baked observability with default Grafana dashboard.

One Backstage Software Template, one click, twelve artifacts ready to ship. The platform did the boilerplate so the developer never has to.

The Golden Path catalog

22 paths today, organized by Horizons Phase.

H1, Foundation

landing-zone
azure-module

H2, Enhancement

microservice-nodejs
microservice-python
microservice-dotnet
microservice-go
frontend-react
api-openapi-first
library-typescript
database-postgres
event-worker
techdocs-site

H3, Innovation

agent-maf (Microsoft Agent Framework)
agent-sk (Semantic Kernel)
mcp-server
eval-job
skill
rag-application
multi-agent-system
foundry-agent

Stepping off the path is supported. The platform never punishes deviation, only requires it to be explicit.

The Agent Catalog in Backstage

If a service needs a catalog entry, an owner, RBAC, observability, cost, and a runbook, so does every agent.

catalog-info.yaml · committed to repo

apiVersion: backstage.io/v1alpha1
kind: Component
metadata:
  name:        deploy-orchestrator
  tags:        [agent, llm, production]
spec:
  type:        agent
  owner:       team-platform
  lifecycle:   production
  system:      agent-idp
  model:       gpt-5-1
  cost_center: cc-platform
  entrypoint:  https://agents.ohorizons.ai/api/agents/deploy
  runbook:     backstage.ohorizons.ai/docs/.../deploy-runbook
  policies:    [policy-net-egress, policy-budget-50usd]

What this gives the agent in Backstage

Identity & RBAC

Entra WIF per agent, scoped Azure AD permissions.

Observability

Auto-wired to Grafana L6 dashboards, trajectory replays.

Cost attribution

Per-agent USD by cost_center, FinOps roll-up.

Policy enforcement

OPA Gatekeeper applies policies on every deploy.

Catalog graph

Linked to APIs, systems, owners, runbooks.

Runbook + TechDocs

On-call procedures linked to the same record.

Every agent is discoverable, attributable, and auditable through the same UI developers already use for services.

The agent lifecycle

Same stages as any component, with AI-specific checkpoints.

01Proposal. A spec is opened in .specs/NNN-agent-name/ describing goal, scope, tools, budget.

02Specification. EARS requirements + acceptance tests (Specky Phase 3).

03Design. Chosen model, prompt structure, tool list, memory scopes (Phase 4).

04Implement. Built via agent-maf or agent-sk Golden Path (Phase 7).

05Evaluate. Runs against the golden set, correctness, safety, faithfulness, cost-per-task.

06Pilot. Limited rollout, one team, low-stakes use case.

07Production. Promoted in the catalog, subject to full governance.

08Deprecate. Retired with archived trajectories.

Identity for agents

No agent runs on a human's credentials. Period.

Azure

Entra Workload Identity

Federated to Kubernetes ServiceAccount. No client secrets, no SP rotations.

Kubernetes

ServiceAccount per agent

Mapped to a Pod Security Standard. RBAC at the namespace and resource level.

Tool call

OAuth tokens

Issued per-agent, scoped to specific tools. No "agent can call anything."

Portal

Backstage roles

Backstage role plugin controls who can view, edit, or invoke each agent.

Trajectories, the black box recorder

Every agent invocation produces a structured, replayable record.

01The original user prompt or trigger event

02The plan the agent generated

03Every tool call, name, arguments, latency, result, error

04Every model call, model name, tokens in/out, cost, latency

05Memory reads and writes

06The final output and a verdict, success, failure, escalation

Trajectories are stored in PostgreSQL + Loki, indexed, replayable, exportable to OpenTelemetry, and surfaced in Backstage. Without them, you have no debug story.

Cost governance, FinOps for AI

The single fastest way to lose executive trust in AI is a surprise bill.
Open Horizons prevents it with six controls.

Ledger

Per-agent

Middleware logs every model call with agent_id, team, cost_center, model, tokens, USD.

Dashboards

Per-team and per-CC

Grafana panels with filters, forecasts. Drill down from CFO view to a single trajectory.

Alerts

50, 80, 100 percent

Alertmanager fires when an agent or team exceeds threshold.

Ceilings

Hard

Per-agent monthly budgets enforced by the runtime. Refuse to run if exceeded.

Routing

.github/model-routing.yaml

Declares which model handles which task. Cheaper models for cheaper tasks.

Eval budget

Separate

Continuous evaluation cost is tracked separately from production traffic.

PART

XII

Security, compliance, governance.

Security is not a feature. It is a property of every layer. The secure way is also the easy way.

The seven security domains

Each maps to NIST CSF, ISO 27001, CIS, SOC 2.

Identity

Zero passwords

Workload Identity for every workload. Zero standing SSH. Just-in-time elevation via Entra PIM. RBAC at every layer.

Secrets

Key Vault SoT

Single source of truth, private endpoint, RBAC. Pods consume via CSI driver. gitleaks blocks PRs.

Network

Private by default

Private endpoints for every PaaS that supports them. Default-deny NetworkPolicies inside the cluster. HSTS on ingress.

Workload

PSS restricted

Pod Security Standards restricted enforced. Read-only root FS. Non-root users. Mandatory limits.

Supply chain

Sign + scan

cosign on every image. SBOM for every image. Dependabot. GitHub Advanced Security. tfsec on every Terraform PR.

Data

Encryption everywhere

At rest, in transit. 30-day backup defaults. Egress restricted to private endpoints.

AI specific

OWASP LLM Top 10

Tool-call vetting, content sanitization, scope guards, rate limits, model pinning. See next slide.

OWASP LLM Top 10 to Open Horizons controls

Every risk has a structural mitigation.

OWASP LLM risk	Open Horizons control
LLM01 Prompt Injection	Tool-call vetting, content sanitization, scoped tool RBAC
LLM02 Insecure Output Handling	Output filters, content safety, structured-output schemas
LLM03 Training Data Poisoning	No customer-data training without explicit pipeline + approval
LLM04 Model DoS	Per-agent rate limits + budget ceilings
LLM05 Supply Chain	Model pinning by name + version. No untrusted MCP servers.
LLM06 Sensitive Info Disclosure	DLP on outputs. Redaction in trajectories.
LLM07 Insecure Plugin Design	MCP server review, scope guards, allow-listed tools
LLM08 Excessive Agency	Scoped Workload Identity, tool RBAC, human-in-loop gates for high-risk actions
LLM09 Overreliance	Evaluation jobs + human review for production agents
LLM10 Model Theft	Egress restrictions, no model export, audit on weight access

Continuous compliance, not annual

Nine frameworks mapped. Evidence exported to a tamper-evident store on a schedule.

SOC 2 Type 2

PR history + RBAC + scans

Continuous evidence, PR history, deploy logs, RBAC change logs, scan reports, runbook executions.

ISO 27001

Asset + access + incident

Asset inventory in Backstage catalog. RBAC export. Runbook + on-call as incident management.

NIST CSF 2.0

Identify, Protect, Detect

Catalog + RBAC + Defender + alerts + runbooks + backups. Mapped end to end.

CIS Kubernetes

PSS restricted

CIS-aligned AKS configuration via Terraform. Enforced through OPA.

Azure WAF

Five pillars

Reliability, Security, Cost, Performance, Operational Excellence with explicit checklists per service.

NIST AI RMF

Map, Measure, Manage, Govern

Implemented across L3, L4, L5. SDD provides the Manage and Govern receipts.

ISO 42001

AI Management System

Lifecycle, risk, transparency, monitoring, supported by SDD pipeline and trajectory infrastructure.

Audit posture

"How did this change reach production?" Answered in seconds, end to end.

01The spec in .specs/NNN-feature/ describes the intent.

02The PR in GitHub or ADO shows the change, the reviews, the scans.

03The CI run shows the tests and the security gates.

04The Argo CD sync shows the actual deployment to the cluster.

05The Grafana dashboard shows the post-deploy behavior.

06The trajectory, if an agent was involved, shows the autonomous steps.

Every link in this chain is immutable, timestamped, and signed.

PART

XIII

Business value and ROI.

Three buckets. Customer's own numbers. Publicly verifiable benchmarks.

The three buckets of value

A complete business case quantifies all three.
Plug your own numbers in. That is the only credible business case.

Bucket 1

Productivity

Faster scaffolding. Reduced cognitive load. AI-assisted code, review, docs. Faster troubleshooting. Measured via DORA + GitHub Copilot research ranges.

Bucket 2

Risk reduction

Continuous scanning. Always-on observability. Per-agent identity + audit. Spec-driven dev. Measured via DORA CFR/MTTR, Verizon DBIR, IBM Ponemon.

Bucket 3

Cost optimization

Cost attribution by tag. AKS autoscaler. Scheduled start/stop dev envs. Per-agent budgets. Model routing. Tool consolidation. Measured via FinOps Foundation, Flexera.

TCO frame

Compare against what you would build yourself, or a closed SaaS IDP.

Status quo

Build it yourself

6 to 18 months to a working IDP. 4 to 8 platform engineers full-time during build. Custom integrations everywhere. Custom agent runtime. Maintained internally, forever.

Alternative

Closed SaaS IDP

Faster to start, but your data and code go somewhere else. Per-seat pricing. Limited customization. Often no agent IDP at all.

Recommended

Open Horizons

4 to 12 weeks to a working H1. Microsoft and certified partners deliver the heavy lift. Customer owns the code, cluster, data. Open source license, no per-seat pricing.

The accelerator typically pays for itself inside the H2 window through productivity and consolidation savings alone.

Three scales of speed

Same scope of platform. Three timeframes. Customer's choice.

9 to 18 months

Industry baseline

A platform of this scope, built from scratch. The cost of going alone. Forrester Wave Q1 2026 baseline for comparable IDPs.

90 to 180 days

Three Horizons end-to-end

H1 Foundation + H2 Enhancement + H3 Innovation integrated. First production AI workload. Observability complete. The accelerator path.

2h 30m

git clone to production H1

Backstage on AKS reachable via HTTPS, Let's Encrypt TLS, GitHub OAuth, ArgoCD syncing, Grafana live, first Golden Path scaffoldable. The install-wizard path, around 3h when an agent drives it.

The compression is the entire business case. Same scope, three speeds, customer's choice.

From numbers to next steps

The business case is yours to build.
The first step is small. Two weeks. Fixed scope. Walk-away clause.

Everything from here is mechanics. The Discovery is a paid, scoped, fixed-deliverable engagement that produces an H1 plan your CFO can sign off on or reject in writing. You leave with the architecture diagram, the risk register, the cost estimate, and a partner short-list whether or not you proceed. No build-trap. No retainer. The only commitment is two weeks of your platform lead's time.

Step 1

Discovery, 1 to 2 weeks

Step 2

Pilot, 2 to 4 weeks

Step 3

H1 Foundation, 4 to 8 weeks

Steps 4 + 5

H2 + H3, 20 to 36 weeks

PART

XIV

Getting started.

From discovery to innovation, five steps with decision gates at every step.

The five-step engagement model

Each step has a decision gate. Customers can stop, pause, or scale at any gate.

What you need on Day 1

To start a Discovery, the customer needs six things. Everything else is provided.

01An Azure subscription (or willingness to create one)

02A GitHub organization or Azure DevOps organization

03A Microsoft Entra ID tenant, typically the same one used for M365 or Azure

04An executive sponsor, typically CTO, head of platform, or chief architect

05A named platform lead on the customer side

06One to three pilot teams willing to be early adopters

What you get from Day 1

Five deliverables before the first sprint starts.

Repo access

ohorizons

Access to the source template repository.

Discovery report

Tailored

Current-state architecture diagram, target-state H1 scope proposal, risk register.

Partner match

Certified network

A short-list of certified partners matched to your industry, region, and AI ambition.

Success plan

Milestones + gates

A 90-day plan with success criteria and decision gates.

Microsoft backstop

Architectural

Microsoft is engaged on architectural escalations across the engagement.

Common pitfalls

Six failure modes, all preventable with the staged model.

01Trying to do H1+H2+H3 at once. Stage. Each horizon delivers value alone.

02Building before piloting. Spend the 2 to 4 weeks on the pilot. It pays for itself.

03Skipping platform-as-a-product. Treat developers as customers. Survey them, iterate.

04Underfunding the platform team. A platform without a team becomes a graveyard. Budget for 2 to 4 dedicated engineers.

05Letting agents bypass governance. Use SDD, trajectories, and cost ceilings from Day 1.

06No exit story from a partner. Insist on knowledge transfer milestones in every SOW.

What "done" looks like

You will know Open Horizons is working when five things become true.

01A new service goes from idea to production in hours, not weeks.

02An application developer can answer "where does this metric come from" in the portal, without asking the platform team.

03An auditor can trace any production change to a spec, a PR, and a trajectory in under a minute.

04An agent invocation has a known cost, a known owner, a known SLO. Like any other service.

05The platform team is shipping a product, not fighting fires.

PART

Partner ecosystem.

You buy the accelerator once. You customize it forever, with partners who know the stack.

Certified partners do four things

Often in combination. Always with customer ownership intact.

Service 1

Deploy + onboard

Stand up the platform in the customer's Azure tenant. H1 in 4 to 8 weeks.

Service 2

Customize

New Golden Paths, plugins, agents, MCP servers, bespoke compliance mappings, custom eval pipelines.

Service 3

Operate Day-2

Team augmentation, upgrades, on-call, FinOps reviews, prompt iteration.

Service 4

Train + enable

Platform team certification, developer onboarding curricula, "train the trainer" for large enterprises.

Certification tiers

Three tiers, based on demonstrated outcomes, not paid status. Renewed annually.

Registered

Trained + signed

Code of conduct signed, disclosure rules agreed. Can deliver onboarding and basic customization.

Certified

>=3 successful H1+H2

Passed technical assessment. Can deliver Day-2, complex customization, agent work.

Strategic

H3 track record

Published reference architectures, named technical leads. Enterprise scale, regulated industries, multi-region.

What partners do not do

Four boundaries that protect the customer.

01Partners do not own customer code or data.

02Partners do not lock customers into a fork of Open Horizons.

03Partners do not bypass governance, security, or audit controls.

04Partners do not exclusively service a customer. Customers can engage multiple, switch, or insource.

How to engage a partner

A five-step procurement pattern that keeps incentives aligned.

01Talk to the Microsoft field team. Size the engagement, match partners to your context.

02Request two or three partner proposals. Compare approaches, references, pricing models.

03Run a paid Discovery, typically 1 to 2 weeks. Produces a fixed-scope H1 plan.

04Sign an SOW tied to outcomes, not hours.

05Insist on knowledge transfer milestones. Every engagement should reduce, not increase, dependence on the partner.

The field-friendly takeaway

"Pilots fail at 95 percent because teams build agents without the four-layer foundation.
Open Horizons gives you that foundation on Day 1.
The 95 percent becomes a 5 percent problem instead."

Paula Silva

AI-Native Software Engineer

The data is in the references at the end of the playbook. The accelerator is in the repository. The conversation starts with a Discovery.

References

The research-grounded backbone of this deck. Sixteen open citations, four families.

Industry data

GARTNER · 2025

40% of enterprise apps to feature task-specific AI agents by 2026 ↗

GARTNER · 2025

Over 40% of Agentic AI projects cancelled by 2027 ↗

MIT NANDA · 2025

The GenAI Divide, State of AI in Business ↗

DORA · 2025

State of DevOps Report ↗

Academic research

METR · 2025

AI-augmented development impact study ↗

STOREY, M.A. · 2026

Cognitive debt and intent debt in AI-native dev ↗

LIU ET AL. · 2026

AI-generated code quality across 304,362 commits ↗

SHEN + TAMKIN · ANTHROPIC FELLOWS

How AI impacts skill formation ↗

Standards & frameworks

OWASP

Top 10 for LLM Applications ↗

NIST

AI Risk Management Framework 1.0 ↗

CNCF

Platforms White Paper ↗

FINOPS FOUNDATION

State of FinOps ↗

Implementation tech

MICROSOFT

Microsoft Agent Framework ↗

ANTHROPIC

Model Context Protocol ↗

BACKSTAGE

Architecture Overview ↗

MICROSOFT

Azure Well-Architected Framework ↗

Every claim in this deck traces to one of these sources. The playbook has the full bibliography with annotations.

Thank you

Let's talk.

If your enterprise is stuck between pilots and production, the conversation starts with a Discovery. One to two weeks. Fixed scope. A 90-day H1 plan you can fund or walk away from.

Contact

Paula Silva

AI-Native Software Engineer

linkedin.com/in/paulanunes

Deck reference

v1.1.0

Published 2026-05-23

Next step

Open Horizons Discovery

1 to 2 weeks, fixed scope

The open-source accelerator for platform engineering and Agentic AI at enterprise scale.

Four acts. Thirteen parts. Two hours end to end.

Paula Silva, AI-Native Software Engineer.

The problem.

Developers estimated AI tools would speed them up by 20 percent.An RCT measured they were 19 percent slower.

Three forms of debt accumulate in AI-native development.Each is invisible until it is expensive.

Without a platform, every team pays four taxes.

The instinct is to buy another tool.That makes the problem worse.

An enterprise where the platform multiplies leverage, not headcount.

The problem is named. The path forward is mapped.Here is how the next 110 slides take you from the cemetery to a working platform.

Three artifacts, one continuum.The deck teaches the model. The playbook documents the patterns. Open Horizons ships them.

Four questions most enterprises fail to answer with precision.Open Horizons answers each one in code, not in slides.

The Context Platform Stack started as four layers.Production added two more: Integration and Harness.

Two recurring pushbacks. Both have crisp answers from the field.

What Open Horizons is.

Open Horizons is an open-source Agentic DevOps Platform.An Azure-native accelerator that gives enterprises a production-grade Internal Developer Platform and an AI Agent Platform in one coherent stack.

Four things Open Horizons is deliberately not.

Both personas share identity, catalog, RBAC, and observability.An agent is just another component that needs to be managed, secured, and measured.

Twelve concerns. Four families. One coherent platform.

Two repositories under github.com/Ohorizons.The agents in .github/ are what makes this an accelerator, not a template.

Platform engineering 101.

Four terms the executive team must agree on before signing a platform charter.

Every IDP, including Open Horizons, must deliver these.

The single biggest mistake is treating platform as an infra project.It is a product. Developers and agents are both customers.

You cannot improve what you do not measure. Two frameworks, shipped out of the box.

Five mistakes that kill platforms. All preventable.

The rise of Agentic DevOps.

From autocomplete to autonomy in three release cycles.

Four properties of a minimum-viable agent.

GitHub Copilot ships the primitives. Open Horizons ships the enterprise control plane that unifies them.

Open Horizons is built around these four. Every customer inherits them.

What enterprises are doing with Open Horizons agents right now.

Every failure mode below is a governance failure, not a model failure.

Agentic DevOps is the category. GitHub + Azure + Foundry is the stack. Open Horizons is the opinionated assembly.

The Microsoft Agentic DevOps stack.

Twenty years, three definitions. Each one adds, none removes.

Autonomous and semi-autonomous agents work alongside developers and operatorsacross every stage of the software lifecycle.

One platform across the full software lifecycle. Policy and governance wrap every stage.

Three workflows. Code, Collaborate, Operate. Each one is the agent + the human working the same surface.

Six concrete loops where GitHub + Azure remove a class of integration work.

App platform for a multi-model world. The model + agent + observability surface Open Horizons consumes.

Four phases that turn a vague idea into a verifiable spec, ready for agent execution.

The starter menu. What every team uses on day one, before the first agent ships.

Five roles. Different prompts, same plumbing. The platform is one.

Four loops where GitHub + Azure AI Foundry produce outcomes neither delivers alone.

Microsoft ships the building blocks.Open Horizons assembles them into one governed platform on your tenant.

VS Code, Specky extension, the .specs/ folder open. Intent first, then plan, then tasks, then code. In that order.

How developers and agents actually use the portal.

What a developer sees on a Tuesday morning. One pane. Everything wired.

From "I need a new microservice" to "it is live in production." Seven steps, fully self-service.

What an agent author sees. Same UI, different tab. Trajectory + tokens + verdict in plain view.

An agent is invoked by an event. The harness wraps the run. The trajectory lands in the catalog.

A single PR. Two collaborators. Same catalog, same governance. The developer writes intent, the agent writes the boring part.

Six entry points covering the full developer + agent workflow.

One catalog. One identity model. One audit trail.Two kinds of users.

You have seen the diagrams. Now look at the actual product.

Inside the platform.

The public showcase. Same brand chrome devs see inside the portal.

The platform that accelerates the Agentic SDLC

28 capabilities across 3 pillars, measured from Traditional (L0) to Agentic (L4).

Agentic DevOps Command. The hub a platform lead opens every morning.

Software Templates. Every Golden Path is one click away.

One conversation. Six specialised agents. Click a suggestion to see the orchestrator route.

Measure the real impact of AI and Agentic DevOps. Adoption, productivity, velocity, quality, in one place.

The agents-service gateway. Live readings from the L3 + L6 runtime.

The 17 GitHub Copilot domain agents that run the platform. Each one owned, tiered, cached.

The whole platform is at ohorizons.ai

Where most LATAM enterprises score today across the four engineering layers.Pre-platform vs post-H2 Open Horizons.

Two dials every platform team should read weekly. Intent debt and token spend distribution.

The Context Platform Stack.

Six layers, top to bottom. Intent flows down, telemetry flows up.

The compute substrate. Eighteen Terraform modules, declared and reproducible.

The developer experience layer. Upstream Backstage on AKS, 22 Golden Paths, full observability.

Give agents the right context, at the right time, in the right format.

Translate human intent into specifications the system can execute reliably.

Make GitHub, Azure DevOps, Argo CD and MCP coexist under one platform.

The runtime that wraps every model call. With L6, the agent becomes a governed production system.

Six layers, dedicated dashboards each. Data sources, Prometheus, Loki, App Insights, Azure Cost API, PostgreSQL.

The CFO view. L1 cloud spend + L6 AI spend, one board.Closes the "AI is unaffordable" objection before it leaves the room.

Five collapses. Seven over-specifies. Six survived production contact.

The open-source accelerator for platform engineering
and Agentic AI at enterprise scale.

Developers estimated AI tools would speed them up by 20 percent.
An RCT measured they were 19 percent slower.

Three forms of debt accumulate in AI-native development.
Each is invisible until it is expensive.

The instinct is to buy another tool.
That makes the problem worse.

The problem is named. The path forward is mapped.
Here is how the next 110 slides take you from the cemetery to a working platform.

Three artifacts, one continuum.
The deck teaches the model. The playbook documents the patterns. Open Horizons ships them.

Four questions most enterprises fail to answer with precision.
Open Horizons answers each one in code, not in slides.

The Context Platform Stack started as four layers.
Production added two more: Integration and Harness.

Open Horizons is an open-source Agentic DevOps Platform.
An Azure-native accelerator that gives enterprises a production-grade Internal Developer Platform and an AI Agent Platform in one coherent stack.

Both personas share identity, catalog, RBAC, and observability.
An agent is just another component that needs to be managed, secured, and measured.

Two repositories under `github.com/Ohorizons`.
The agents in .github/ are what makes this an accelerator, not a template.

The single biggest mistake is treating platform as an infra project.
It is a product. Developers and agents are both customers.

Autonomous and semi-autonomous agents work alongside developers and operators
across every stage of the software lifecycle.

Microsoft ships the building blocks.
Open Horizons assembles them into one governed platform on your tenant.

VS Code, Specky extension, the .specs/ folder open.
Intent first, then plan, then tasks, then code. In that order.

One catalog. One identity model. One audit trail.
Two kinds of users.

You have seen the diagrams.
Now look at the actual product.

Where most LATAM enterprises score today across the four engineering layers.
Pre-platform vs post-H2 Open Horizons.

The CFO view. L1 cloud spend + L6 AI spend, one board.
Closes the "AI is unaffordable" objection before it leaves the room.

The opinionated, paved, well-lit road for the most common developer journeys.
Fully scaffolded, fully wired, ready to ship.

The single fastest way to lose executive trust in AI is a surprise bill.
Open Horizons prevents it with six controls.

A complete business case quantifies all three.
Plug your own numbers in. That is the only credible business case.

The business case is yours to build.
The first step is small. Two weeks. Fixed scope. Walk-away clause.

"Pilots fail at 95 percent because teams build agents without the four-layer foundation.
Open Horizons gives you that foundation on Day 1.
The 95 percent becomes a 5 percent problem instead."