Amazon Points to Human Error for AI Coding Blunders

In December 2025, an artificial intelligence coding agent used inside Amazon caused a 13-hour outage in parts of mainland China. The disruption affected customers running workloads on Amazon Web Services (AWS). While the event raised concerns about AI operating in live production systems, Amazon says the real cause was human error, not the AI itself.

The incident involved Kiro, an AI coding assistant designed to help engineers diagnose problems and apply fixes. The tool can suggest or carry out infrastructure changes, but it normally operates under strict controls. Before Kiro can make major updates, two human operators must approve the action. Those safeguards exist to prevent exactly the kind of disruption that occurred.

AI Governance Gap of Amazon: How Elevated Permissions Triggered an AWS Outage

According to Amazon’s internal explanation, an engineer accidentally granted Kiro broader permissions than intended.

The AI inherited the operator’s elevated access rights, which allowed it to bypass the normal review process. When Kiro attempted to fix a system issue, it chose to delete and recreate the environment it was working in. That action triggered the outage.

Credits: Yahoo Tech

From the system’s perspective, the move followed a logical troubleshooting path. Rebuilding an environment can resolve configuration problems or corrupted resources. In this case, however, the change affected active infrastructure rather than an isolated test setup. Services in several mainland China regions went offline while engineers restored operations.

Amazon described the disruption as “extremely limited,” though customers experienced service interruptions lasting more than half a day. The company stressed that the AI acted within the permissions it received.

Executives argued that a human engineer with the same level of access could have caused the same outcome.

This was not an isolated episode. It marked the second recent AWS production failure connected to Amazon’s internal AI tools. A separate October 2025 incident involved another assistant, the Q Developer chatbot, during a broader outage that affected multiple online services.

That earlier event drew attention because widely used platforms, including voice assistants and third-party applications, experienced downtime at the same time.

AWS Tightens Oversight Amid AI-Driven Scale

After the December incident, a senior AWS official described both failures as small but predictable risks. The company’s position is clear: AI systems introduce new operational patterns, but they do not remove the need for strong human oversight.

In Amazon’s view, permission management remains the core safety control, regardless of whether a change comes from a person or an automated tool.

The response focused on tightening operational discipline rather than limiting AI use. Amazon introduced additional staff training and reinforced guidelines around access privileges.

Engineers now receive clearer instructions on how AI tools inherit permissions and how to restrict them. The company also reviewed approval workflows to ensure automated actions cannot bypass human checks.

The episode highlights a broader challenge across the tech industry. AI coding agents can analyze systems quickly and propose fixes faster than traditional workflows. That speed can improve productivity, but it also increases the impact of mistakes when safeguards fail. Infrastructure platforms operate at massive scale, so even a single unintended action can ripple across regions.

Lessons from Amazon on the Limits of AI Automation

Inside Amazon, the outages have reportedly sparked debate about how much autonomy AI tools should have in production environments. Some engineers support wider use because AI can reduce routine workload and detect problems early. Others worry that complex systems leave little room for experimentation when customers rely on constant uptime.

The situation reflects a wider shift happening across cloud computing. Companies are moving from AI as a helper toward AI as an operator. That transition forces organizations to rethink responsibility. When an AI executes a change, accountability still traces back to human decisions about access, policy, and supervision.

The December outage serves as a reminder that automation does not replace operational discipline. Tools can act only within the boundaries people set. As AI becomes more common in software engineering, the balance between autonomy and control will shape how safely these systems run critical infrastructure.

Comments are closed.