Massimo Bonanni (@massimobonanni.bsky.social)

From Zero to Hero: AgentOps - End-to-End Lifecycle Management for Production AI Agents The shift from proof-of-concept AI agents to production-ready systems isn't just about better models—it's about building robust infrastructure that can develop, deploy, and maintain intelligent agents at enterprise scale. As organizations move beyond simple chatbots to agentic systems that plan, reason, and act autonomously, the need for comprehensive Agent LLMOps becomes critical. This guide walks through the complete lifecycle for building production AI agents, from development through deployment to monitoring, with special focus on leveraging Azure AI Foundry's hosted agents infrastructure. The Evolution: From Single-Turn Prompts to Agentic Workflows Traditional AI applications operated on a simple request-response pattern. Modern AI agents, however, are fundamentally different. They maintain state across multiple interactions, orchestrate complex multi-step workflows, and dynamically adapt their approach based on intermediate results. According to recent analysis, agentic workflows represent systems where language models and tools are orchestrated through a combination of predefined logic and dynamic decision-making. Unlike monolithic systems where a single model attempts everything, production agents break down complex tasks into specialized components that collaborate effectively. The difference is profound. A simple customer service chatbot might answer questions from a knowledge base. An agentic customer service system, however, can search multiple data sources, escalate to specialized sub-agents for technical issues, draft response emails, schedule follow-up tasks, and learn from each interaction to improve future responses. Stage 1: Development with any agentic framework Why LangGraph for Agent Development? LangGraph has emerged as a leading framework for building stateful, multi-agent applications. Unlike traditional chain-based approaches, LangGraph uses a graph-based architecture where each node represents a unit of work and edges define the workflow paths between them. The key advantages include: Explicit State Management: LangGraph maintains persistent state across nodes, making it straightforward to track conversation history, intermediate results, and decision points. This is critical for debugging complex agent behaviors. Visual Workflow Design: The graph structure provides an intuitive way to visualize and understand agent logic. When an agent misbehaves, you can trace execution through the graph to identify where things went wrong. Flexible Control Flows: LangGraph supports diverse orchestration patterns—single agent, multi-agent, hierarchical, sequential—all within one framework. You can start simple and evolve as requirements grow. Built-in Memory: Agents automatically store conversation histories and maintain context over time, enabling rich personalized interactions across sessions. Core LangGraph Components Nodes: Individual units of logic or action. A node might call an AI model, query a database, invoke an external API, or perform data transformation. Each node is a Python function that receives the current state and returns updates. Edges: Define the workflow paths between nodes. These can be conditional (routing based on the node's output) or unconditional (always proceeding to the next step). State: The data structure passed between nodes and updated through reducers. Proper state design is crucial—it should contain all information needed for decision-making while remaining manageable in size. Checkpoints: LangGraph's checkpointing mechanism saves state at each node, enabling features like human-in-the-loop approval, retry logic, and debugging. Implementing the Agentic Workflow Pattern A robust production agent typically follows a cyclical pattern of planning, execution, reflection, and adaptation: Planning Phase: The agent analyzes the user's request and creates a structured plan, breaking complex problems into manageable steps. Execution Phase: The agent carries out planned actions using appropriate tools—search engines, calculators, code interpreters, database queries, or API calls. Reflection Phase: After each action, the agent evaluates results against expected outcomes. This critical thinking step determines whether to proceed, retry with a different approach, or seek additional information. Decision Phase: Based on reflection, the agent decides the next course of action—continue to the next step, loop back to refine the approach, or conclude with a final response. This pattern handles real-world complexity far better than simple linear workflows. When an agent encounters unexpected results, the reflection phase enables adaptive responses rather than brittle failure. Example: Building a Research Agent with LangGraph from langgraph.graph import StateGraph, END from langchain_openai import ChatOpenAI from typing import TypedDict, List class AgentState(TypedDict): query: str plan: List[str] current_step: int research_results: List[dict] final_answer: str def planning_node(state: AgentState): # Agent creates a research plan llm = ChatOpenAI(model="gpt-4") plan = llm.invoke(f"Create a research plan for: {state['query']}") return {"plan": plan, "current_step": 0} def research_node(state: AgentState): # Execute current research step step = state['plan'][state['current_step']] # Perform web search, database query, etc. results = perform_research(step) return {"research_results": state['research_results'] + [results]} def reflection_node(state: AgentState): # Evaluate if we have enough information if len(state['research_results']) >= len(state['plan']): return {"next": "synthesize"} return {"next": "research", "current_step": state['current_step'] + 1} def synthesize_node(state: AgentState): # Generate final answer from all research llm = ChatOpenAI(model="gpt-4") answer = llm.invoke(f"Synthesize research: {state['research_results']}") return {"final_answer": answer} # Build the graph workflow = StateGraph(AgentState) workflow.add_node("planning", planning_node) workflow.add_node("research", research_node) workflow.add_node("reflection", reflection_node) workflow.add_node("synthesize", synthesize_node) workflow.add_edge("planning", "research") workflow.add_edge("research", "reflection") workflow.add_conditional_edges( "reflection", lambda s: s["next"], {"research": "research", "synthesize": "synthesize"} ) workflow.add_edge("synthesize", END) agent = workflow.compile() This pattern scales from simple workflows to complex multi-agent systems with dozens of specialized nodes. Stage 2: CI/CD Pipeline for AI Agents Traditional software CI/CD focuses on code quality, security, and deployment automation. Agent CI/CD must additionally handle model versioning, evaluation against behavioral benchmarks, and non-deterministic behavior. Build Phase: Packaging Agent Dependencies Unlike traditional applications, AI agents have unique packaging requirements: Model artifacts: Fine-tuned models, embeddings, or model configurations Vector databases: Pre-computed embeddings for knowledge retrieval Tool configurations: API credentials, endpoint URLs, rate limits Prompt templates: Versioned prompt engineering assets Evaluation datasets: Test cases for agent behavior validation Best practice is to containerize everything. Docker provides reproducibility across environments and simplifies dependency management: FROM python:3.11-slim WORKDIR /app COPY . user_agent/ WORKDIR /app/user_agent RUN if [ -f requirements.txt ]; then \ pip install -r requirements.txt; \ else \ echo "No requirements.txt found"; \ fi EXPOSE 8088 CMD ["python", "main.py"] Register Phase: Version Control Beyond Git Code versioning is necessary but insufficient for AI agents. You need comprehensive artifact versioning: Container Registry: Azure Container Registry stores Docker images with semantic versioning. Each agent version becomes an immutable artifact that can be deployed or rolled back at any time. Prompt Registry: Version control your prompts separately from code. Prompt changes can dramatically impact agent behavior, so treating them as first-class artifacts enables A/B testing and rapid iteration. Configuration Management: Store agent configurations (model selection, temperature, token limits, tool permissions) in version-controlled files. This ensures reproducibility and enables easy rollback. Evaluate Phase: Testing Non-Deterministic Behavior The biggest challenge in agent CI/CD is evaluation. Unlike traditional software where unit tests verify exact outputs, agents produce variable responses that must be evaluated holistically. Behavioral Testing: Define test cases that specify desired agent behaviors rather than exact outputs. For example, "When asked about product pricing, the agent should query the pricing API, handle rate limits gracefully, and present information in a structured format." Evaluation Metrics: Track multiple dimensions: Task completion rate: Did the agent accomplish the goal? Tool usage accuracy: Did it call the right tools with correct parameters? Response quality: Measured via LLM-as-judge or human evaluation Latency: Time to first token and total response time Cost: Token usage and API call expenses Adversarial Testing: Intentionally test edge cases—ambiguous requests, tool failures, rate limiting, conflicting information. Production agents will encounter these scenarios. Recent research on CI/CD for AI agents emphasizes comprehensive instrumentation from day one. Track every input, output, API call, token usage, and decision point. After accumulating production data, patterns emerge showing which metrics actually predict failures versus noise. Deploy Phase: Safe Production Rollouts Never deploy agents directly to production. Implement progressive delivery: Staging Environment: Deploy to a staging environment that mirrors production. Run automated tests and manual QA against real data (appropriately anonymized). Canary Deployment: Route a small percentage of traffic (5-10%) to the new version. Monitor error rates, latency, user satisfaction, and cost metrics. Automatically rollback if any metric degrades beyond thresholds. Blue-Green Deployment: Maintain two production environments. Deploy to the inactive environment, verify it's healthy, then switch traffic. Enables instant rollback by switching back. Feature Flags: Deploy new agent capabilities behind feature flags. Gradually enable them for specific user segments, gather feedback, and iterate before full rollout. Now since we know how to create an agent using langgraph, the next step will be understand how can we use this langgraph agent to deploy in Azure AI Foundry. Stage 3: Azure AI Foundry Hosted Agents Hosted agents are containerized agentic AI applications that run on Microsoft's Foundry Agent Service. They represent a paradigm shift from traditional prompt-based agents to fully code-driven, production-ready AI systems. When to Use Hosted Agents: ✅ Complex agentic workflows - Multi-step reasoning, branching logic, conditional execution ✅ Custom tool integration - External APIs, databases, internal systems ✅ Framework-specific features - LangGraph graphs, multi-agent orchestration ✅ Production scale - Enterprise deployments requiring autoscaling ✅ Auth- Identity and authentication, Security and compliance controls ✅ CI/CD integration - Automated testing and deployment pipelines Why Hosted Agents Matter Hosted agents bridge the gap between experimental AI prototypes and production systems: For Developers: Full control over agent logic via code Use familiar frameworks and tools Local testing before deployment Version control for agent code For Enterprises: No infrastructure management overhead Built-in security and compliance Scalable pay-as-you-go pricing Integration with existing Azure ecosystem For AI Systems: Complex reasoning patterns beyond prompts Stateful conversations with persistence Custom tool integration and orchestration Multi-agent collaboration Before you get started with Foundry. Deploy Foundry project using the starter code using AZ CLI. # Initialize a new agent project azd init -t https://github.com/Azure-Samples/azd-ai-starter-basic # The template automatically provisions: # - Foundry resource and project # - Azure Container Registry # - Application Insights for monitoring # - Managed identities and RBAC # Deploy azd up The extension significantly reduces the operational burden. What previously required extensive Azure knowledge and infrastructure-as-code expertise now works with a few CLI commands. The extension significantly reduces the operational burden. What previously required extensive Azure knowledge and infrastructure-as-code expertise now works with a few CLI commands. Local Development to Production Workflow A streamlined workflow bridges development and production: Develop Locally: Build and test your LangGraph agent on your machine. Use the Foundry SDK to ensure compatibility with production APIs. Validate Locally: Run the agent locally against the Foundry Responses API to verify it works with production authentication and conversation management. Containerize: Package your agent in a Docker container with all dependencies. Deploy to Staging: Use azd deploy to push to a staging Foundry project. Run automated tests. Deploy to Production: Once validated, deploy to production. Foundry handles versioning, so you can maintain multiple agent versions and route traffic accordingly. Monitor and Iterate: Use Application Insights to monitor agent performance, identify issues, and plan improvements. Azure AI Toolkit for Visual Studio offer this great place to test your hosted agent. You can also test this using REST. FROM python:3.11-slim WORKDIR /app COPY . user_agent/ WORKDIR /app/user_agent RUN if [ -f requirements.txt ]; then \ pip install -r requirements.txt; \ else \ echo "No requirements.txt found"; \ fi EXPOSE 8088 CMD ["python", "main.py"] Once you are able to run agent and test in local playground. You want to move to the next step of registering, evaluating and deploying agent in Microsoft AI Foundry. CI/CD with GitHub Actions This repository includes a GitHub Actions workflow (`.github/workflows/mslearnagent-AutoDeployTrigger.yml`) that automatically builds and deploys the agent to Azure when changes are pushed to the main branch. 1. Set Up Service Principal # Create service principal az ad sp create-for-rbac \ --name "github-actions-agent-deploy" \ --role contributor \ --scopes /subscriptions/$SUBSCRIPTION_ID/resourceGroups/$RESOURCE_GROUP # Output will include: # - appId (AZURE_CLIENT_ID) # - tenant (AZURE_TENANT_ID) 2. Configure Federated Credentials # For GitHub Actions OIDC az ad app federated-credential create \ --id $APP_ID \ --parameters '{ "name": "github-actions-deploy", "issuer": "https://token.actions.githubusercontent.com", "subject": "repo:YOUR_ORG/YOUR_REPO:ref:refs/heads/main", "audiences": ["api://AzureADTokenExchange"] }' 3. Set Required Permissions Critical: Service principal needs Azure AI User role on AI Services resource: # Get AI Services resource ID AI_SERVICES_ID=$(az cognitiveservices account show \ --name $AI_SERVICES_NAME \ --resource-group $RESOURCE_GROUP \ --query id -o tsv) # Assign Azure AI User role az role assignment create \ --assignee $SERVICE_PRINCIPAL_ID \ --role "Azure AI User" \ --scope $AI_SERVICES_ID 4. Configure GitHub Secrets Go to GitHub repository → Settings → Secrets and variables → Actions Add the following secrets: AZURE_CLIENT_ID=<from-service-principal> AZURE_TENANT_ID=<from-service-principal> AZURE_SUBSCRIPTION_ID=<your-subscription-id> AZURE_AI_PROJECT_ENDPOINT=<your-project-endpoint> ACR_NAME=<your-acr-name> IMAGE_NAME=calculator-agent AGENT_NAME=CalculatorAgent 5. Create GitHub Actions Workflow Create .github/workflows/deploy-agent.yml: name: Deploy Agent to Azure AI Foundry on: push: branches: - main paths: - 'main.py' - 'custom_state_converter.py' - 'requirements.txt' - 'Dockerfile' workflow_dispatch: inputs: version_tag: description: 'Version tag (leave empty for auto-increment)' required: false type: string permissions: id-token: write contents: read jobs: deploy: runs-on: ubuntu-latest steps: - name: Checkout code uses: actions/checkout@v4 - name: Generate version tag id: version run: | if [ -n "${{ github.event.inputs.version_tag }}" ]; then echo "VERSION=${{ github.event.inputs.version_tag }}" >> $GITHUB_OUTPUT else # Auto-increment version VERSION="v$(date +%Y%m%d-%H%M%S)" echo "VERSION=$VERSION" >> $GITHUB_OUTPUT fi - name: Azure Login (OIDC) uses: azure/login@v1 with: client-id: ${{ secrets.AZURE_CLIENT_ID }} tenant-id: ${{ secrets.AZURE_TENANT_ID }} subscription-id: ${{ secrets.AZURE_SUBSCRIPTION_ID }} - name: Set up Python uses: actions/setup-python@v4 with: python-version: '3.11' - name: Install Azure AI SDK run: | pip install azure-ai-projects azure-identity - name: Build and push Docker image run: | az acr build \ --registry ${{ secrets.ACR_NAME }} \ --image ${{ secrets.IMAGE_NAME }}:${{ steps.version.outputs.VERSION }} \ --image ${{ secrets.IMAGE_NAME }}:latest \ --file Dockerfile \ . - name: Register agent version env: AZURE_AI_PROJECT_ENDPOINT: ${{ secrets.AZURE_AI_PROJECT_ENDPOINT }} ACR_NAME: ${{ secrets.ACR_NAME }} IMAGE_NAME: ${{ secrets.IMAGE_NAME }} AGENT_NAME: ${{ secrets.AGENT_NAME }} VERSION: ${{ steps.version.outputs.VERSION }} run: | python - <<EOF import os from azure.ai.projects import AIProjectClient from azure.identity import DefaultAzureCredential from azure.ai.projects.models import ImageBasedHostedAgentDefinition project_endpoint = os.environ["AZURE_AI_PROJECT_ENDPOINT"] credential = DefaultAzureCredential() project_client = AIProjectClient.from_connection_string( credential=credential, conn_str=project_endpoint, ) agent_name = os.environ["AGENT_NAME"] version = os.environ["VERSION"] image_uri = f"{os.environ['ACR_NAME']}.azurecr.io/{os.environ['IMAGE_NAME']}:{version}" agent_definition = ImageBasedHostedAgentDefinition( image=image_uri, cpu=1.0, memory="2Gi", ) agent = project_client.agents.create_or_update( agent_id=agent_name, body=agent_definition ) print(f"Agent version registered: {agent.version}") EOF - name: Start agent run: | echo "Agent deployed successfully with version ${{ steps.version.outputs.VERSION }}" - name: Deployment summary run: | echo "### Deployment Summary" >> $GITHUB_STEP_SUMMARY echo "- **Agent Name**: ${{ secrets.AGENT_NAME }}" >> $GITHUB_STEP_SUMMARY echo "- **Version**: ${{ steps.version.outputs.VERSION }}" >> $GITHUB_STEP_SUMMARY echo "- **Image**: ${{ secrets.ACR_NAME }}.azurecr.io/${{ secrets.IMAGE_NAME }}:${{ steps.version.outputs.VERSION }}" >> $GITHUB_STEP_SUMMARY echo "- **Status**: Deployed" >> $GITHUB_STEP_SUMMARY 6. Add Container Status Verification To ensure deployments are truly successful, add a script to verify container startup before marking the pipeline as complete. Create wait_for_container.py: """ Wait for agent container to be ready. This script polls the agent container status until it's running successfully or times out. Designed for use in CI/CD pipelines to verify deployment. """ import os import sys import time import requests from typing import Optional, Dict, Any from azure.identity import DefaultAzureCredential class ContainerStatusWaiter: """Polls agent container status until ready or timeout.""" def __init__( self, project_endpoint: str, agent_name: str, agent_version: str, timeout_seconds: int = 600, poll_interval: int = 10, ): """ Initialize the container status waiter. Args: project_endpoint: Azure AI Foundry project endpoint agent_name: Name of the agent agent_version: Version of the agent timeout_seconds: Maximum time to wait (default: 10 minutes) poll_interval: Seconds between status checks (default: 10s) """ self.project_endpoint = project_endpoint.rstrip("/") self.agent_name = agent_name self.agent_version = agent_version self.timeout_seconds = timeout_seconds self.poll_interval = poll_interval self.api_version = "2025-11-15-preview" # Get Azure AD token credential = DefaultAzureCredential() token = credential.get_token("https://ml.azure.com/.default") self.headers = { "Authorization": f"Bearer {token.token}", "Content-Type": "application/json", } def _get_container_url(self) -> str: """Build the container status URL.""" return ( f"{self.project_endpoint}/agents/{self.agent_name}" f"/versions/{self.agent_version}/containers/default" ) def get_container_status(self) -> Optional[Dict[str, Any]]: """Get current container status.""" url = f"{self._get_container_url()}?api-version={self.api_version}" try: response = requests.get(url, headers=self.headers, timeout=30) if response.status_code == 200: return response.json() elif response.status_code == 404: return None else: print(f"⚠️ Warning: GET container returned {response.status_code}") return None except Exception as e: print(f"⚠️ Warning: Error getting container status: {e}") return None def wait_for_container_running(self) -> bool: """ Wait for container to reach running state. Returns: True if container is running, False if timeout or error """ print(f" 🔍 Checking container status for {self.agent_name} v{self.agent_version}") print(f"⏱️ Timeout: {self.timeout_seconds}s | Poll interval: {self.poll_interval}s") print("-" * 70) start_time = time.time() iteration = 0 while time.time() - start_time < self.timeout_seconds: iteration += 1 elapsed = int(time.time() - start_time) container = self.get_container_status() if not container: print(f"[{iteration}] ({elapsed}s) ⏳ Container not found yet, waiting...") time.sleep(self.poll_interval) continue # Extract status information status = ( container.get("status") or container.get("state") or container.get("provisioningState") or "Unknown" ) # Check for replicas information replicas = container.get("replicas", {}) ready_replicas = replicas.get("ready", 0) desired_replicas = replicas.get("desired", 0) print(f"[{iteration}] ({elapsed}s) 📊 Status: {status}") if replicas: print(f" 🔢 Replicas: {ready_replicas}/{desired_replicas} ready") # Check if container is running and ready if status.lower() in ["running", "succeeded", "ready"]: if desired_replicas == 0 or ready_replicas >= desired_replicas: print(" " + "=" * 70) print("✅ Container is running and ready!") print("=" * 70) return True elif status.lower() in ["failed", "error", "cancelled"]: print(" " + "=" * 70) print(f"❌ Container failed to start: {status}") print("=" * 70) return False time.sleep(self.poll_interval) # Timeout reached print(" " + "=" * 70) print(f"⏱️ Timeout reached after {self.timeout_seconds}s") print("=" * 70) return False def main(): """Main entry point for CLI usage.""" project_endpoint = os.getenv("AZURE_AI_PROJECT_ENDPOINT") agent_name = os.getenv("AGENT_NAME") agent_version = os.getenv("AGENT_VERSION") timeout = int(os.getenv("TIMEOUT_SECONDS", "600")) poll_interval = int(os.getenv("POLL_INTERVAL_SECONDS", "10")) if not all([project_endpoint, agent_name, agent_version]): print("❌ Error: Missing required environment variables") sys.exit(1) waiter = ContainerStatusWaiter( project_endpoint=project_endpoint, agent_name=agent_name, agent_version=agent_version, timeout_seconds=timeout, poll_interval=poll_interval, ) success = waiter.wait_for_container_running() sys.exit(0 if success else 1) if __name__ == "__main__": main() Key Features: REST API Polling: Uses Azure AI Foundry REST API to check container status Timeout Handling: Configurable timeout (default 10 minutes) Progress Tracking: Shows iteration count and elapsed time Replica Checking: Verifies all desired replicas are ready Clear Output: Emoji-enhanced status messages for easy reading Exit Codes: Returns 0 for success, 1 for failure (CI/CD friendly) Update workflow to include verification: Add this step after starting the agent: - name: Start the new agent version id: start_agent env: FOUNDRY_ACCOUNT: ${{ steps.foundry_details.outputs.FOUNDRY_ACCOUNT }} PROJECT_NAME: ${{ steps.foundry_details.outputs.PROJECT_NAME }} AGENT_NAME: ${{ secrets.AGENT_NAME }} run: | LATEST_VERSION=$(az cognitiveservices agent show \ --account-name "$FOUNDRY_ACCOUNT" \ --project-name "$PROJECT_NAME" \ --name "$AGENT_NAME" \ --query "versions.latest.version" -o tsv) echo "AGENT_VERSION=$LATEST_VERSION" >> $GITHUB_OUTPUT az cognitiveservices agent start \ --account-name "$FOUNDRY_ACCOUNT" \ --project-name "$PROJECT_NAME" \ --name "$AGENT_NAME" \ --agent-version $LATEST_VERSION - name: Wait for container to be ready env: AZURE_AI_PROJECT_ENDPOINT: ${{ secrets.AZURE_AI_PROJECT_ENDPOINT }} AGENT_NAME: ${{ secrets.AGENT_NAME }} AGENT_VERSION: ${{ steps.start_agent.outputs.AGENT_VERSION }} TIMEOUT_SECONDS: 600 POLL_INTERVAL_SECONDS: 15 run: | echo "⏳ Waiting for container to be ready..." python wait_for_container.py - name: Deployment Summary if: success() run: | echo "## Deployment Complete! 🚀" >> $GITHUB_STEP_SUMMARY echo "" >> $GITHUB_STEP_SUMMARY echo "- **Agent**: ${{ secrets.AGENT_NAME }}" >> $GITHUB_STEP_SUMMARY echo "- **Version**: ${{ steps.version.outputs.VERSION }}" >> $GITHUB_STEP_SUMMARY echo "- **Status**: ✅ Container running and ready" >> $GITHUB_STEP_SUMMARY echo "" >> $GITHUB_STEP_SUMMARY echo "### Deployment Timeline" >> $GITHUB_STEP_SUMMARY echo "1. ✅ Image built and pushed to ACR" >> $GITHUB_STEP_SUMMARY echo "2. ✅ Agent version registered" >> $GITHUB_STEP_SUMMARY echo "3. ✅ Container started" >> $GITHUB_STEP_SUMMARY echo "4. ✅ Container verified as running" >> $GITHUB_STEP_SUMMARY - name: Deployment Failed Summary if: failure() run: | echo "## Deployment Failed ❌" >> $GITHUB_STEP_SUMMARY echo "" >> $GITHUB_STEP_SUMMARY echo "Please check the logs above for error details." >> $GITHUB_STEP_SUMMARY Benefits of Container Status Verification: Deployment Confidence: Know for certain that the container started successfully Early Failure Detection: Catch startup errors before users are affected CI/CD Gate: Pipeline only succeeds when container is actually ready Debugging Aid: Clear logs show container startup progress Timeout Protection: Prevents infinite waits with configurable timeout REST API Endpoints Used: GET {endpoint}/agents/{agent_name}/versions/{agent_version}/containers/default?api-version=2025-11-15-preview Response includes: status or state: Container state (Running, Failed, etc.) replicas.ready: Number of ready replicas replicas.desired: Target number of replicas error: Error details if failed Container States: Running/Ready: Container is operational InProgress: Container is starting up Failed/Error: Container failed to start Stopped: Container was stopped 7. Trigger Deployment # Automatic trigger - push to main git add . git commit -m "Update agent implementation" git push origin main # Manual trigger - via GitHub UI # Go to Actions → Deploy Agent to Azure AI Foundry → Run workflow Now this will trigger the Workflow as soon as you checkin the implementation code. You can play with the Agent in Foundry UI: Evaluation is now part the workflow You can also visualize the Evaluation in AI Foundry Best Practices for Production Agent LLMOps 1. Start with Simple Workflows, Add Complexity Gradually Don't build a complex multi-agent system on day one. Start with a single agent that does one task well. Once that's stable in production, add additional capabilities: Single agent with basic tool calling Add memory/state for multi-turn conversations Introduce specialized sub-agents for complex tasks Implement multi-agent collaboration This incremental approach reduces risk and enables learning from real usage before investing in advanced features. 2. Instrument Everything from Day One The worst time to add observability is after you have a production incident. Comprehensive instrumentation should be part of your initial development: Log every LLM call with inputs, outputs, token usage Track all tool invocations Record decision points in agent reasoning Capture timing metrics for every operation Log errors with full context After accumulating production data, you'll identify which metrics matter most. But you can't retroactively add logging for incidents that already occurred. 3. Build Evaluation into the Development Process Don't wait until deployment to evaluate agent quality. Integrate evaluation throughout development: Maintain a growing set of test conversations Run evaluations on every code change Track metrics over time to identify regressions Include diverse scenarios—happy path, edge cases, adversarial inputs Use LLM-as-judge for scalable automated evaluation, supplemented with periodic human review of sample outputs. 4. Embrace Non-Determinism, But Set Boundaries Agents are inherently non-deterministic, but that doesn't mean anything goes: Set acceptable ranges for variability in testing Use temperature and sampling controls to manage randomness Implement retry logic with exponential backoff Add fallback behaviors for when primary approaches fail Use assertions to verify critical invariants (e.g., "agent must never perform destructive actions without confirmation") 5. Prioritize Security and Governance from Day One Security shouldn't be an afterthought: Use managed identities and RBAC for all resource access Implement least-privilege principles—agents get only necessary permissions Add content filtering for inputs and outputs Monitor for prompt injection and jailbreak attempts Maintain audit logs for compliance Regularly review and update security policies 6. Design for Failure Your agents will fail. Design systems that degrade gracefully: Implement retry logic for transient failures Provide clear error messages to users Include fallback behaviors (e.g., escalate to human support) Never leave users stuck—always provide a path forward Log failures with full context for post-incident analysis 7. Balance Automation with Human Oversight Fully autonomous agents are powerful but risky. Consider human-in-the-loop workflows for high-stakes decisions: Draft responses that require approval before sending Request confirmation before executing destructive actions Escalate ambiguous situations to human operators Provide clear audit trails of agent actions 8. Manage Costs Proactively LLM API costs can escalate quickly at scale: Monitor token usage per conversation Set per-conversation token limits Use caching for repeated queries Choose appropriate models (not always the largest) Consider local models for suitable use cases Alert on cost anomalies that indicate runaway loops 9. Plan for Continuous Learning Agents should improve over time: Collect feedback on agent responses (thumbs up/down) Analyze conversations that required escalation Identify common failure patterns Fine-tune models on production interaction data (with appropriate consent) Iterate on prompts based on real usage Share learnings across the team 10. Document Everything Comprehensive documentation is critical as teams scale: Agent architecture and design decisions Tool configurations and API contracts Deployment procedures and runbooks Incident response procedures Version migration guides Evaluation methodologies Conclusion You now have a complete, production-ready AI agent deployed to Azure AI Foundry with: LangGraph-based agent orchestration Tool-calling capabilities Multi-turn conversation support Containerized deployment CI/CD automation Evaluation framework Multiple client implementations Key Takeaway LangGraph provides flexible agent orchestration with state management Azure AI Agent Server SDK simplifies deployment to Azure AI Foundry Custom state converter is critical for production deployments with tool calls CI/CD automation enables rapid iteration and deployment Evaluation framework ensures agent quality and performance Resources Azure AI Foundry Documentation LangGraph Documentation Azure AI Agent Server SDK OpenAI Responses API Thanks Manoranjan Rajguru https://www.linkedin.com/in/manoranjan-rajguru/ https://techcommunity.microsoft.com/t5/microsoft-foundry-blog/from-zero-to-hero-agentops-end-to-end-lifecycle-management-for/ba-p/4484922