What Can AI Agents Actually Do?
Mar 12, 2026 | 5 min read
TL;DR
- AI agents are not chatbots. They take a goal, break it into steps, and use tools to get it done — without being told every move.
- They can browse websites, read documents, write code, send emails, and coordinate across software.
- Traditional automation (like Zapier) follows fixed rules. Agents can handle situations those rules never planned for.
- They’re not perfect. They make things up, fail on complex tasks, and still need humans in the loop.
- The teams winning with AI agents start narrow, stay specific, and keep a human on the final call.
So What Actually Is an AI Agent?
An AI agent is software that works toward a goal — not just a prompt.
You give it a destination. It figures out the route.
That’s the key difference from a chatbot. When you ask ChatGPT a question, it answers. When an AI agent gets a task, it decides what steps to take, uses tools to take them, and adjusts based on what happens. No hand-holding required.
Craig Taylor, who builds agentic systems for enterprise clients, puts it simply:
You’re giving the agentic agent a goal to achieve. You’re saying, hey, this is where I want to get — and you don’t have to tell it every single step to get there.
That matters because most business work isn’t step-by-step. It’s messy. Judgment calls come up. Data doesn’t fit neatly into a form. An agent can handle that. A traditional automation can’t.
How Is This Different From Automation We Already Use?
Think about how a tool like Zapier works. A new payment comes in → a row gets added to a spreadsheet → a Slack message goes out. That’s powerful. But it’s also locked. If something unexpected happens — a field is blank, the format changed, the logic needs a judgment call — the workflow breaks.
AI agents don’t break the same way. They can reason through unexpected situations.
As Craig describes it:
Automation is executing a plan that’s already been put in before it, whereas an agent kind of makes that plan.
So if you’re moving structured data between systems on a schedule, traditional automation is still the right tool — fast, cheap, reliable. But if the work involves unstructured information — emails, PDFs, research, open-ended decisions — that’s where agents start to shine.
Wondering if AI agents are the right fit for your team? Let’s talk →
What Can They Actually Do? (Real Examples)
Here’s what’s working in production today, not in demos:
Reading and Summarizing Documents
Agents can open PDFs, scan hundreds of pages, and pull out exactly what you’re looking for. Craig’s team built an agent for a pharma client that opens healthcare payer websites, parses insurance coverage PDFs for specific drugs, identifies competitor references, and compiles the findings — all without a human touching it. His estimate for the monthly cost of running it? Around $150.
Customer Support Triage
Klarna’s AI agent handled 2.3 million conversations in its first month — equivalent to 700 full-time employees — cutting resolution time from 11 minutes to under 2 minutes. Reddit cut average response times by 84% using Salesforce’s Agentforce. These aren’t bots reading from a script. They’re agents looking up account data, understanding context, and resolving tickets on their own.
Writing and Reviewing Legal Documents
Law firms are deploying agents that review contracts, flag unusual clauses, and surface relevant case history. Harvey AI — used by 42% of the top 100 U.S. law firms — reports that lawyers save an average of 118 hours per year on routine document work.
Writing and Debugging Code
GitHub Copilot’s agent can take a bug report, write a fix, run the tests, and open a pull request — without a developer doing any of it manually. Developers using it complete tasks 55% faster on average. The catch: it only accepts about 30% of what the agent suggests. The human still decides what ships.
Finance and Operations
Ramp’s AI agents for finance teams caught 15 times more policy violations than human reviewers and processed invoices with 7 times fewer clicks. JPMorgan automated 360,000 staff hours with AI across its operations. In marketing, JPMorgan’s AI-generated ad copy lifted click-through rates by up to 450% compared to human-written versions.
What Do Multiple Agents Look Like Working Together?
You don’t have to stop at one.
Teams are now deploying groups of specialized agents — each with a specific job — that hand work off to each other. One agent checks for fraud. Another verifies the merchant. A third flags geographic anomalies. Together, they do something no single agent could do alone.
Craig sees this as one of the clearest advantages over hiring:
You can have six AI agents running at once doing content operations and they’re not gonna step on each other’s toes. They’re not gonna edit each other’s files. They’re just going to do the thing they need to do and move on.
Want to see how this fits into a bigger operations strategy? Read The Rise of Agentic Operations →
What Can’t They Do?
Here’s where honesty matters.
AI agents still hallucinate. They make up information that sounds true. The best models do it less than 1% of the time on controlled benchmarks — but in open-ended tasks, the numbers get less clean fast. OpenAI’s o3 model hallucinated 33% of the time on factual questions about people.
Craig flagged this directly:
An AI will do some hallucination, where it’ll kind of give you false results sometimes, or it’s trying so hard to get to that goal — because it knows that’s what we want — that it’ll make things up.
Complex, multi-step workflows are also harder than they look. If an agent is 85% accurate on each step, a 10-step task succeeds about 20% of the time. Error stacks fast. Gartner predicts more than 40% of agentic AI projects will be canceled by 2027 — not because the technology is fake, but because companies overshot what it could handle and skipped the human oversight layer.
So Where Should a Human Stay in the Loop?
This is the question Craig hears most, and his answer is practical:
I would like to do it before it actually publishes information. If it wrote a new blog, or if it changed an image, or whatever it did — let’s have a human check it before it goes out.
The same principle extends further: before it deletes a record, before it sends a mass email, before it executes something irreversible. Agents are fast and tireless — but the judgment call at the end still belongs to a person.
The Honest ROI Picture
Workers using AI agents are about 33% more productive during the hours they’re using them. Microsoft 365 Copilot delivered 116% ROI over three years in a Forrester study. Support agents handle 14% more tickets per hour.
But 70–85% of AI projects still fail to deliver expected value. The gap between “we tried AI” and “we got results from AI” usually comes down to one thing: scope.
Teams that win start with a single, specific, bounded workflow — not “automate our entire marketing operation.” Something like: “Every time a new lead comes in, research the company, check if they fit our ICP, and draft a personalized first email.” That’s specific enough to build, test, and measure.
Craig’s advice is to resist the urge to go big first:
It’s not gonna be a cure-all from day one. It’s gonna be a process. It takes some time to teach the models to get the output that you’re expecting — and to slowly release the controls.
FAQ
What’s the difference between an AI agent and a chatbot?
A chatbot responds to questions. An AI agent takes action — it uses tools, makes decisions across multiple steps, and works toward a goal without needing input at every turn.
Do AI agents work without human oversight?
They can run without it, but they shouldn’t — especially for anything public-facing or irreversible. Most production deployments keep a human in the loop before final actions.
How much does it cost to run an AI agent?
It depends on the task. Craig’s pharma document agent runs for about $150/month. Enterprise-grade systems with compliance requirements can run $3,000–$13,000/month or more after integration costs.
What’s the biggest risk with AI agents?
Hallucination — the agent confidently stating something that isn’t true. The second biggest is scope creep: trying to automate too much at once before the system is reliable.
Which teams benefit most from AI agents?
Every team has something to gain. Legal can automate contract review. Dev teams can automate pull requests. Marketing can automate research and personalization. Finance can automate invoice processing. The right question isn’t which team — it’s which specific task takes the most repetitive human hours.
Ready to find out what AI agents can actually do for your team? Let’s talk →
Gradial
PEGA