The term “agentic AI” is everywhere right now. Every platform vendor, every consultancy, every demo deck uses it. But if you’re running a 50–500 person manufacturing operation—stamping, machining, assembly, process control—the label matters far less than what the system actually does on your floor.
At Kamna, we built a manufacturing intelligence platform using digital twins and LangChain/LangGraph that coordinates multiple AI agents across planning, production, maintenance, and supply chain. That experience taught us exactly what to look for—and what to ignore—when evaluating agentic AI for SME manufacturing.
Here are the four filters we use. They work whether you’re talking to a platform vendor, a systems integrator, or building in-house.
Filter 1: Direct Manufacturing References
This sounds obvious, but most “agentic AI” demos are built for knowledge workers—summarizing emails, writing reports, searching documents. Manufacturing is a different animal. You need systems that understand time-series telemetry, recipe parameters, equipment fault codes, and process deviations.
What to ask for:
- Case studies in predictive maintenance for plants your size—not a Fortune 100 semiconductor fab, but a 200-person shop with real constraints on IT staff and budget.
- Demonstrations of quality control workflows that go beyond dashboards: agents that detect a deviation, cross-reference historical patterns, and recommend (or execute) a parameter adjustment.
- Evidence of scheduling and dispatch optimization that accounts for WIP status, tool availability, and bottleneck detection—not just Gantt chart generation.
- Supply chain coordination that connects to your actual MES/ERP systems, not a hypothetical API.
When we built ViaConnect, our manufacturing intelligence platform, we started with the data layer: InfluxDB for time-series telemetry, PostgreSQL for structured asset data (using DTDL—Digital Twin Definition Language), Neo4j for the knowledge graph that maps relationships between equipment, processes, and documentation, and Typesense for full-text search across SOPs and manuals. Every agent in the system queries real manufacturing data, not a generic vector store.
A concrete example: process deviation detection
One of our earliest use cases was detecting recipe parameter deviations in real time. The agent monitors process parameters against the recipe setpoints, classifies deviations using historical correlation data, and either auto-adjusts within predefined bounds or escalates to an operator. This is a five-agent workflow: signal collection → deviation classification → failure prediction → supervisor review → orchestration. Each agent has a specific role, specific data sources, and specific authority limits.
If a vendor can’t describe their manufacturing workflows at this level of specificity, they’re selling you a chatbot with a manufacturing skin.
Filter 2: Multi-Agent Workflows
A single AI assistant that answers questions about your production data is useful. But it’s not agentic. The defining characteristic of agentic AI is autonomous coordination across multiple specialized agents, each with its own tools, data sources, and decision authority.
What to look for:
- Agent specialization: separate agents for diagnostics, planning, execution, and monitoring—not one monolithic prompt.
- State management: the ability to pass context between agents as a workflow progresses, maintaining a shared understanding of what’s happened and what needs to happen next.
- Dynamic routing: a supervisory layer that decides which agent handles which task based on intent classification, not hard-coded rules.
- Parallel execution: agents that can work simultaneously where dependencies allow, rather than sequential hand-offs.
In our platform, we use LangGraph (the workflow orchestration framework from the LangChain ecosystem) to compile workflow definitions into executable state graphs. Each workflow is defined as a JSON specification with typed nodes—triage, analyst, supervisor, tool, retrieval, answer, and human-in-the-loop gates. The GraphCompiler service takes these definitions and produces a LangGraph StateGraph that manages state transitions, error handling, and context accumulation across agents.
Here’s what the architecture looks like in practice:
Our CopilotRouter acts as the central nervous system. When a query or event comes in, it classifies the intent and routes to the appropriate mode: direct LLM response for simple questions, RAG for document retrieval, or a full multi-agent workflow for complex operations. For a predictive maintenance scenario, that means:
- Signal Collector Agent pulls telemetry from InfluxDB—RF power, chamber pressure, temperature, gas flow rates.
- Deviation Classifier Agent compares current readings against baseline models and historical patterns.
- Failure Prediction Analyst runs multivariate cross-correlation analysis (e.g., RF Power vs. Pressure, Temperature vs. Flow) to predict failures 48+ hours in advance.
- Supervisor Agent evaluates the prediction confidence and determines whether to auto-create a work order or escalate.
- Orchestration Agent executes the decision—creating work orders in the CMMS, adjusting throughput recommendations, and notifying maintenance teams.
Each agent has access to different tools via MCP (Model Context Protocol), can query different databases, and operates within defined authority boundaries. That’s what multi-agent coordination looks like. If what you’re being shown is a single LLM with different system prompts, that’s prompt engineering—not agent architecture.
Filter 3: SME-Ready Delivery
Enterprise agentic AI platforms are designed for companies with dedicated AI teams, Kubernetes clusters, and six-figure annual budgets. That’s not you. SME-ready means:
- Cloud deployment with minimal infrastructure: You shouldn’t need to provision GPU clusters or manage container orchestration. Look for platforms that deploy on managed services—Azure Container Apps over AKS, for example—where scaling is automatic and you pay for what you use.
- Modular pricing: Start with one use case (predictive maintenance, say), prove ROI, then expand. Avoid platforms that require you to buy the full stack upfront.
- Low integration overhead: The platform should connect to your existing MES, ERP, and CMMS systems through standard connectors or lightweight APIs. If integration requires a six-month SI engagement, the total cost will kill your business case.
- Local model support: For latency-sensitive or data-sensitive use cases, the ability to run models locally (via Ollama or similar) alongside cloud models gives you flexibility without vendor lock-in.
We designed ViaConnect with this reality in mind. The system runs on a standard Node.js/React stack with PostgreSQL, InfluxDB, and Neo4j—all of which can be deployed as managed services. Model routing supports OpenAI, Anthropic, and Ollama (local), so you can choose the right model for each task based on cost, latency, and data sensitivity. The digital twin layer uses DTDL (an open standard from Azure Digital Twins) for asset modeling, which means your twin definitions are portable—not locked into a proprietary schema.
Filter 4: Human-in-the-Loop and Governance
This is the filter that separates production-ready platforms from demos. In manufacturing, an AI agent that autonomously adjusts a recipe parameter, creates a maintenance work order, or changes a production schedule without appropriate oversight is a liability, not an asset.
What to require:
- Escalation gates: explicit points in the workflow where the system pauses and requires human approval before executing a consequential action.
- Auditability: a complete decision trace—what data the agent observed, what reasoning it applied, what action it recommended, who approved it, and when.
- Authority boundaries: configurable limits on what each agent can do autonomously vs. what requires human sign-off. These should be granular (per-agent, per-action-type) not binary (all-or-nothing).
- Exception handling: a clear mechanism for handling edge cases—out-of-spec conditions, conflicting recommendations, or situations the agent hasn’t encountered before.
In our system, we implemented this through HITL (Human-in-the-Loop) gates in the LangGraph workflow. Certain agents are classified as “sensitive”—the Enrichment Agent that writes data back to production systems, for instance. When a workflow reaches a HITL gate involving a sensitive agent, execution pauses. The system presents the proposed action, the supporting data, and the agent’s reasoning to an authorized operator via the approval API. The operator can approve, reject, or modify the action. Every decision is recorded in a decision trace table with full context: trigger type, entity affected, context gathered, action taken, approval status, and exception justification if applicable.
This isn’t optional. ISO 9001, IATF 16949, FDA 21 CFR Part 11—your quality management system requires traceability. If the AI platform can’t provide it, it can’t go into production.
How to Run the Evaluation
Here’s the practical playbook we recommend to our clients:
- Pick one workflow. Not the most complex one—pick the one with the clearest ROI and the most accessible data. Predictive maintenance on a critical bottleneck machine is usually the right starting point.
- Define the success metric. Unplanned downtime reduction, mean time to detect, first-pass yield improvement—something measurable within 30 days.
- Run the four filters. For each vendor or platform you’re evaluating, ask them to demonstrate that specific workflow against the four criteria above. If they can’t show you multi-agent coordination on a manufacturing use case with human-in-the-loop governance at an SME price point, move on.
- Demand a pilot. Two to four weeks, on your data, in your environment. Not a sandbox demo with synthetic data—a working system that touches your actual telemetry, your actual work orders, your actual operators.
The Bottom Line
Agentic AI is real, and it’s genuinely useful for manufacturing SMEs—but only when it’s built on actual manufacturing data models, coordinates specialized agents across real workflows, deploys without enterprise overhead, and maintains the governance your quality system requires.
Stop evaluating the label. Start evaluating the engineering.
Kiran Karunakaran is the founder of Kamna Ventures, an AI Innovation Studio that builds and deploys agentic AI systems for manufacturing and industrial teams. He previously built ViaConnect, a manufacturing intelligence platform using digital twins and LangGraph-based multi-agent orchestration for semiconductor and industrial operations.
