The AI Agent Workforce Is Coming. Here’s How to Scale It Safely

Haziqa Sajid

Data Scientist and Content Writer

12 min read

min read

Friday, December 12, 2025

The AI Agent Workforce Is Coming. Here’s How to Scale It Safely | Domo

We’re moving past the days of one-off chatbots to a time when a full-fledged AI agent workforce is embedded across an organization’s marketing, operations, finance, and customer service departments. Instead of relying on a standalone assistant to simply answer questions, organizations are now deploying networks of agents that actively monitor data signals, trigger workflows, and take action within business systems with little human involvement.

This change creates tension for leaders in data and AI technology. While executives want quick wins and expect these agents to deliver fast results, security, legal, and IT teams warn about compliance exposure, data misuse, and hidden automation.

Using generative AI can drive ROI, but many find that most enterprise pilots fail to move the profit and loss (P&L) statement. Leaders are being pulled between the urgency to act and the need for caution, feeling optimistic yet wary of the risks when deploying AI without strict controls.

Effective governance makes the difference between an AI agent that adds value and one that creates liability. Good governance brings granular permissions, full observability, and the ability to recover from issues. A scalable AI workforce should have these safety measures in place from the start.

In this article, we’ll go over what an AI agent workforce is and the risks of scaling without proper oversight so you can operate agents safely. We’ll also see how to use Domo to set up guardrails and monitor an AI workforce.

Understanding the AI agent workforce

To govern an agent workforce, we must first define what it is.

Intelligent virtual agents are autonomous (or semi-autonomous) systems that reason over data, take actions via tools, and learn or adapt over time within a defined set of guardrails.

Unlike a standard Large Language Model (LLM) interface that only responds to prompts, an agent chains together tools, calls internal systems, and pursues goals such as reconciling a report, segmenting customers, or orchestrating a workflow end-to-end.

Jake Heaps, marketing operations manager at Domo, shares his thoughts on the distinction between a passive model and an active agent. He says, “A custom GPT pulls from resources, whereas an AI agent can actually go and take action based on that information. AI agents represent the next step forward.”

Enterprises are now assembling multiple specialized AI agents for marketing, finance, support, and analytics to build an AI agent workforce. These agents coordinate with each other and with people, much like a digital version of cross-functional teams.

A scalable artificial intelligence workforce can consist of agents with distinct architectures and permission scopes. For example:

The analyst agent: It connects to structured data like SQL databases and data warehouses. An analyst agent’s goal is to provide insight by translating natural-language questions into queries, executing those queries, interpreting the results, and visualizing the data. ‍
Key risk: Data privacy and potential misinterpretation of schema. ‍
The action/ops agent: This agent stays in the background and executes tasks. It triggers workflows across applications based on logical criteria, such as “If lead score > 80, update Salesforce status to 'Qualified' and alert the account executive via Teams.” ‍
Key risk: Potential for runaway loops and unintended consequences in production systems, like flooding the CRM with duplicate records or auto‑advancing deals to the wrong stage. ‍
The retrieval/support agent: The retrieval agent goes through unstructured data (PDFs, Wikis, support tickets) to find answers. It uses RAG (Retrieval-Augmented Generation) to ground answers in company policy. ‍
Key risk: Hallucinations and citing outdated documents. ‍
The creative agent: This agent generates content variations based on performance data feeds. It also analyzes which copy performed best last week and iterates on it. ‍
Key risk: Brand voice deviation and copyright issues.

The impact on business operations

The AI agents workforce is reshaping automation in the workplace, turning periodic, manual processes into continuous and event-driven operations with fewer manual handoffs.

In fact, Wharton reports that 82 percent of enterprise decision-makers now use generative AI weekly, up from just 37 percent in 2023. And three-quarters of leaders already report positive returns on their AI investments while systematically tracking ROI metrics like productivity, throughput, and profitability.

However, as we move from individual tasks to enterprise-wide systems, the data tells a different story. MIT’s NANDA report indicates that roughly 95 percent of corporate generative AI pilots fail to deliver the anticipated financial impact. Though 80 percent of companies are piloting these systems, only 5 percent successfully scale them.

This generative AI divide separates organizations that focus on a few high‑value, well‑integrated use cases from those that run many shallow pilots with little governance.

Thoughtful leadership must acknowledge that while agents offer significant efficiency, success requires disciplined design and governance to prevent stalled initiatives or eroding trust. That means facing the hard truth that most AI pilots still fail today, not due to model limitations, but because data, workflows, and guardrails aren’t designed for scalability.

The risk of ungoverned scaling: shadow AI and hallucinations

When organizations scale agents without a safety framework, their risk profile changes across several dimensions.

At the technical level, poorly bounded agents amplify known LLM issues, such as hallucinations, data leakage, and brittle tool use, into operational failures. These failures can include incorrect financial figures in board materials, miscalculated pricing recommendations, or misclassified customers and segments that cascade into misdirected campaigns.
For instance, Deloitte was required to repay the Australian government part of its fee after using generative AI to create a $440,000 report on the welfare system. The report contained numerous errors, false references, and incorrect footnotes.
At the organizational level, unclear ownership and inadequate monitoring make it hard to know who is accountable when an agent acts outside expectations or when a minor error propagates into an incident.

A key driver of these failures is data governance. Agents that operate on incomplete, outdated, or low-quality knowledge bases are more likely to hallucinate or produce inconsistent results, especially when teams over-trust outputs.

“You have to make sure that you have a good knowledge base,” Heaps warns, “or else you’re going to start getting a lot of hallucinations.”

Shadow AI sprawl compounds the problem, and the MIT study mentioned it as the most considerable risk (often what IT doesn’t see).

When individual teams create their own agents, connect sensitive data, or automate actions via unofficial channels, organizations lose visibility over critical decisions and system control.

The result is a patchwork of ungoverned agents acting on production data, exactly the opposite of a safe, scalable artificial intelligence workforce.

Designing a secure and scalable AI agent workforce

Mitigating risks requires treating safety and governance as first-class design constraints (security protocols) for every agent.

A practical way to think about secure design is in three layers: instructions, data, and maintenance, each of which can either reduce or amplify risk depending on how it’s implemented.

Governance as the operating system (instructions)

An agent’s system instructions function like its operating system, defining its role, capabilities, boundaries, and escalation rules. Well-structured instructions specify what the agent is allowed to do, which data and tools it can access, how it should handle ambiguity, and when it must hand off to a human.

Heaps draws a clear line between foundational instructions and specific requests, stating, “I like to think about system instructions as my operating system. My prompt is what I actually want to achieve in the moment.”

For example, a finance agent’s system instructions might explicitly prohibit posting entries directly to the general ledger. Instead, they restrict the process to drafting proposed entries that require human approval.

Effective design requires separating system instructions (non-negotiable constraints like policies, safety, and brand rules) from task-level prompts (specific requests like “prepare a variance analysis for Q3 revenue”).

Data as the fuel and guardrail (hallucination prevention)

Data defines what an agent can know and how trustworthy its outputs can be. Agents grounded in curated, governed enterprise data are far less likely to hallucinate or contradict official metrics.

“The data that you feed it is going to make or break whether this is a successful use of AI,” Heaps notes. “When we feed it good, custom data, that’s when we’re really going to be able to get [reliable] results from AI.”

Integration platforms like Snowflake or governed data lakes ensure agents read from a single source of truth with consistent semantics and access controls. High-performing implementations treat the knowledge layer as a product:

Consolidating brand guidelines, templates, and examples into structured repositories for content-oriented agents.
Exposing metric definitions, lineage, and historical context via semantic models or data catalogs for analytics agents.
Continuously refreshing and validating knowledge bases so that stale or deprecated information is pruned and reducing the risk of agents using outdated policies, pricing, or product details.

Put simply, design agents to reference specific, versioned knowledge sets and to surface citations in responses. To clarify what constitutes a “critical output,” the agent’s governance layer must enforce authorization thresholds.

Outputs are considered critical if they trigger actions with high financial, compliance, or irreversible data risks. The system automatically flags these outputs for Human-in-the-Loop (HITL) verification and provides the individual with both the proposed action and the underlying citations to quickly audit the logic and authorize the execution.

Maintenance, monitoring, and memory management

Even well-designed agents drift over time as business conditions, data, and workflows change. Without proactive maintenance, their understanding of the current reality diverges from actual conditions, increasing the risk of subtle, compounding errors.

Effective AI workforces treat agents as digital team members, and they get performance reviews, updated responsibilities, and sometimes retirement. Crucially, the agentic system provides the raw performance logs and error metrics. But the strategic review and updating of responsibilities require human supervision.

Heaps advises auditing this memory regularly, explaining, “I go through systems probably once a month to ensure it's being trained on my most accurate, current information.”

At a minimum, this implies:

Regularly reviewing agent behavior logs and key metrics (error rates, escalation frequency, user satisfaction, business impact) to detect regressions or misuse.
Periodically updating system instructions to reflect new policies, products, or risk thresholds, and validating that those updates actually change behavior as intended.
Managing memory explicitly, removing outdated customer, client, or project-specific context that should no longer be used, and ensuring that new priorities (such as revised ICP definitions) are consistently reflected in agent behavior.

That continuous governance loop turns static pilots into live systems that remain aligned with the changing organization’s objectives and risk appetite.

Cost efficiency and resource allocation: Humans + Agents

Once the safety layer is in place, the discussion can move to ROI. Many people mistakenly believe that AI agents replace headcount, but a more accurate and sustainable perspective is to see them as augmenting existing teams.

Agents take on repetitive, structured work, so humans can focus on judgment tasks like interpreting results, negotiating trade-offs, and communicating decisions. For example, a reporting agent might assemble, validate, and narrate a dashboard. An analyst decides which recommendations actually fit the business context, weighs trade-offs for different stakeholders, and presents the options to leadership.

Resource allocation among agents also matters. Overloading one agent with too many tasks often leads to vague instructions, inconsistent behavior, and a higher risk. On the other hand, creating separate agents for channels or domains allows for better control and clearer evaluation.

As Heaps puts it, if you want a custom GPT that “follows my brand voice and follows these specific instructions for Google Ads,” it’s often better to “pull it out and do a separate one rather than putting them all together,” especially when each channel needs deeper, more opinionated guidance

But this specialization introduces a new layer of complexity. As the number of agents increases, so do coordination and configuration costs, and potentially higher computational and maintenance complexity. So, human oversight is essential to manage the overall fleet to ensure handoffs between agents and optimize the total cost of ownership (TCO).

How Domo’s Agent Catalyst enables a governed workforce

Most agent platforms start from the model and work outward. They expect you to move or copy data into a new environment, reconfigure permissions, and trust that nothing will drift. That architecture adds latency, creates new attack surfaces, and often results in a parallel, less-governed shadow copy of your core data.

Domo’s approach with Agent Catalyst is the opposite. It assumes your governed data already has gravity. Instead of moving data to the agent, Domo brings the agent to where your governed data, metrics, and access controls already live.

Agent Catalyst is built on top of Domo’s AI and data platform and uses the same semantic layer, data permissions, and logging that already underpin dashboards and workflows. That means every agent you create inherits the same governance model you use for Business Intelligence (BI) and automation, rather than inventing a new one just for AI.

Inherited permissions: stopping leaks by design

Agents execute in the context of Domo’s existing r ole-based and object-level permissions (security), so they can only see and act on data that a given individual or group is allowed to access.

If a sales manager builds an agent to analyze pipeline health, that agent automatically operates with the manager’s effective permissions. It can’t see around those constraints to access executive compensation data, HR tables, or restricted financial metrics.

The permission inheritance model reduces the risk of data leakage scenarios in which an AI layer accidentally stitches together information from data sets that were never meant to be joined.

Contextual awareness and guardrails

Agent Catalyst connects agents to Domo’s semantic understanding by using consistent definitions for terms like “revenue” via data sets and metrics, reducing “financial hallucinations.”

On top of that, the framework enforces guardrails through explicit instructions and tool assignments. Agent builders define system instructions (allowed behaviors and escalation rules) and assign only the tools the agent needs.

High-risk actions may require human approval or step-up authentication, in line with least-privilege principles and AI security guidance.

Performance visibility and auditability: turning the black box into a glass box

Agent Catalyst also addresses the shadow AI problem through observability. Every interaction is logged and tied back to the user or system context that authorized it. It creates a full audit trail of what each agent did, when it did it, and on whose behalf.

Because agents run inside Domo’s platform, you can use the same analytics capabilities you already have to monitor the AI workforce. Dashboards that show which agents are used most, which data sets they touch, typical response patterns, and anomalies in behavior or usage. This allows data and IT leaders to:

Detect underperforming or misconfigured agents before they cause large-scale issues.
Identify high-value agent use cases to invest in and scale across teams.
Demonstrate compliance to auditors by showing end-to-end logs of agent activity and access.

In short, Agent Catalyst turns opaque AI agents into visible, governable digital workers. You can scale an AI agent workforce without creating a parallel, uncontrolled automation layer because they inherit permissions, operate on governed data, and are fully observable.

When you’re ready to start building, you can catch up on the Blue Yeti session: Scaling Safely and Governing AI Agents.

Table of contents

Example H2