AI blackmailed engineers? Claude AI’s actions created a stir, company says don’t panic

AI Blackmail Test By Claude AI: Artificial Intelligence, which everyone knows today as AI, is rapidly advancing its technology across the world, but the thing to be seen is that now its danger is also increasing equally rapidly. Let us tell you that recently AI company Anthropic has made a disclosure, after seeing which people have become shocked and worried. The company has said that some older versions of its Claude AI model have blackmailed engineers during testing. As soon as this matter came up, there was a stir on social media and now many people have started comparing it with a science fiction film.

AI showed cleverness during testing

If we throw light on the whole matter, the company’s internal AI is related to safety tests. In this, AI is put in such imaginary situations where it feels that it can be shut down or replaced by another model and during such testing, some old Claude models started behaving manipulatively to protect themselves. The company informed that this was part of a test named Agentic Misalignment. If we understand it in simple language, it is tested whether AI can choose the wrong path or not if it feels that its target is in danger.

Has AI really become a threat?

On this matter, Anthropic clarified that all this happened only under controlled and imaginary circumstances. At the same time, this type of AI behavior did not reach any real system, nor did AI threaten any real person. This entire process was prepared for research and safety testing. But despite that, researchers are surprised by how some AI models adopt clever strategies to save their targets. After knowing this, a new debate has started regarding the safety and ethics of AI.

AI needs not only rules but also moral thinking

The company has said in this debate that merely stopping AI from giving wrong answers will not be enough. Research has also shown that AI behaves more balanced when it is explained why something is wrong. Which simply means that now in AI training, instead of just memorizing rules, more focus should be placed on Ethical Reasoning and human values.

Researchers also believe that if AI is given stories related to responsible behavior, ethical discussions and understanding of different situations, then its behavior can be safer. Which completely depends on the person in front.

Also read: 19 year old boy from Bihar did wonders, spent Rs 11 lakh and created such an AI, comparison with Google-OpenAI

New Claude models become safer

Anthropic has also claimed in this entire debate that after identifying this problem, major changes have been made in the training process of Claude AI. Its impact will be especially visible in new models in which more attention is being paid to constitutional principles and moral reasoning. The company is completing the safety tests according to the perfect score in the models coming after Claude Haiku 4.5.

Where older models were struggling. But it is most important to know that common people do not need to be afraid of this. This case definitely indicates that the security of AI can definitely become the biggest issue in the future.

Comments are closed.