Microsoft Azure AI Foundry Architecture for Agentic Apps: RAG, Tools, and Evaluation

Azure AI Foundry (formerly Azure AI Studio) is Microsoft’s platform for building enterprise-grade AI applications. But “enterprise-grade” means different things at the prototype stage vs. in production. This post covers the architecture patterns that make the difference—specifically for industrial agentic apps where reliability, traceability, and integration with existing enterprise systems are non-negotiable.

The Core Architecture Pattern

Most production agentic apps on Foundry follow a five-layer architecture:

1. Intent Router

Classifies incoming requests into workflow types (e.g., “quoting”, “inspection”, “document lookup”). Routes to the appropriate agent or tool chain. Uses a fast, cheap model (e.g., GPT-4o-mini) to reduce latency and cost on the routing step.

2. LLM Planner

Given the intent and context, generates a step-by-step execution plan. For manufacturing workflows, plans might be: “1. Extract BOM from drawing. 2. Look up supplier prices. 3. Check lead times. 4. Generate estimate.” The planner uses a capable model (GPT-4o or similar) and outputs a structured JSON plan.

3. Tool Executor (Parallel Where Possible)

Executes plan steps using registered tools—APS extraction, ERP lookups, web search, database queries, email dispatch. Steps with no data dependencies run in parallel to minimize latency. The tool registry is a dynamic lookup (not hard-coded) so new tools can be added without modifying core agent logic.

4. RAG Layer

Many steps need to look up domain knowledge: SOPs, manuals, product specs, historical decisions. Azure AI Search (hybrid vector + keyword) provides retrieval. The agent constructs a focused query, retrieves relevant chunks, and injects them into the LLM context before generation.

5. Summarizer + Evidence Builder

Takes all tool outputs and generates the final response with explicit citations. For industrial workflows, every AI recommendation needs a traceable source: “Supplier lead time is 14 days [from ERP record #SR-1042]. Material cost is $12.40/kg [from price list updated 2025-01-15].”

Foundry’s Role in This Architecture

Azure AI Foundry provides: (1) the Agent Service for stateful, multi-turn conversations with tool use; (2) the model catalog for selecting and deploying the right models at each layer; (3) Prompt Flow for visual orchestration of complex chains; (4) the evaluation SDK for measuring agent performance on your domain tasks.

Evaluation: The Step Most Teams Skip

Foundry’s evaluation SDK lets you define ground-truth test cases and measure agent outputs against them before deploying to production. For an industrial quoting agent, you’d test: “Given drawing XYZ, does the agent produce a BOM that matches the hand-verified reference?” You can evaluate accuracy, completeness, latency, and cost per run. Run evaluations in CI/CD so regressions are caught before deployment.

# Foundry evaluation with the Azure AI Evaluation SDK
from azure.ai.evaluation import evaluate, QAEvaluator

evaluator = QAEvaluator(model_config={
    "azure_endpoint": os.environ["AZURE_OPENAI_ENDPOINT"],
    "azure_deployment": "gpt-4o",
    "api_version": "2024-02-01"
})

result = evaluate(
    data="test_cases.jsonl",  # ground truth: {"query": ..., "response": ..., "ground_truth": ...}
    evaluators={"qa": evaluator},
    evaluator_config={"qa": {"query": "${data.query}", "response": "${data.response}"}},
    output_path="eval_results.json"
)
print(result["metrics"])  # {"qa.f1_score": 0.87, "qa.gpt_groundedness": 4.2, ...}

Building an agentic app on Azure AI Foundry?

Kamna designs and deploys Foundry-based agents for industrial teams. See our Microsoft Foundry services or book a call.

Book a Discovery Call →