Media hype is all about Artificial Intelligence (AI) these days. ChatGPT, Gemini, CoPilot, Claude, and others are names that you might already recognize. Along with AI there is a drive to build new data centres all around the country (because the cloud has to have a home somewhere). This is a new economic bubble reminiscent of the 1990’s “dot-com” bubble as companies pivot to AI (even companies that weren’t even tech companies before).
But can we trust AI? It’s not an easy question. Maybe a better question would be, “Is AI ‘reliable’?”
What does it mean to be reliable? When we ask if a co-worker is reliable, we usually mean four things:
- Do they get it right consistently? Not just occasionally, but almost always. We mean, “Do they finish their work with acceptable quality and on-time 90-95% of the time?” We call this Consistency.
- Do they keep a level head through crises or do they lose their cool or call in sick when things aren’t perfect? We call this Robustness.
- Do they tell us when they’re unsure about something or when a task exceeds their competencies or training? We can call this Calibration.
- When they do mess up, do they manage the risk so that the mistakes are more likely to be fixable than catastrophic? Safety.
AI agents are amazing in many ways. I use them to create presentation graphics, to write blocks of text, and even generate some software or web page code. They are certainly very capable (able to do many things both faster and concurrently than I can do them on my own). If this is all we care about, then AI is a wonderful tool.
But I’m not sure I’m ready to trust them without reviewing their work. I’m not sure I can trust them as being reliable. So far, the evidence confirms my reluctance.
In a recent paper by Sayash Kapoor and Arvind Naravanan (and others) from Princeton University, the current crop of AI agents are growing in their capabilities but don’t really measure up to our current standards of reliability. What failure rate are we prepared to accept to get this capability?
Right now AI agents are very good at “Augmentation.” They can clean up, annotate, and review our work very well. Asking an AI agent to review a human-generated contract proposal to identify areas of ambiguity and inconsistency can easily produce more value than it costs.
Of course, weighing an acceptable fault tolerance against the power of an AI agent isn’t easy. Getting a bit too much salt in a food recipe may not be a critical error; getting too much of a drug in an IV line can be fatal. Food manufacturers might have a higher risk tolerance for AI agents than hospitals do.
AI supporters often minimize these risks by using euphemisms. A failure to produce an acceptable result (often in a wildly inappropriate and sometimes humorous manner) is called a “hallucination.” A more minor failure is referred to as a “glitch.”
This brings us full circle to our expectations of reliability from our human colleagues. If a human is ill or had a personal tragedy, they might tell us that they can’t perform well today (self-reporting their loss of robustness) or we may ask them if they’re OK and tell them to step away from the production line for a while. AI agents often don’t produce any signs or symptoms until they actually do fail–often in catastrophic ways (lack of safety).
AI agents are not our friends. They will be an important part of future business because they’re already an increasingly important part of our current business environments. But we need to remember they are a service and a product invented for our use. We are still responsible to put guardrails up as we evaluate their work and their risks.
0 Comments