Are you interested in REQUESTS? Save with our coupons on WHATSAPP o TELEGRAM!

Jailbreaking ChatGPT and Bard is possible and easy

December 29 2023

The evolution of linguistic models of large dimensions has opened new horizons in communication and artificial intelligence, but brings with it significant challenges and ethical questions. A recent study by Nanyang Technological University of Singapore explore a new algorithm, Masterkey, designed to “jailbreak” or overcome limitations imposed on other neural networks such as Chat GPT e Google Bard, raising important questions about safety and ethics in the use of artificial intelligence technologies.

Masterkey's innovative and simple approach to researching the security of chatbots like ChatGPT and Bard

In recent research conducted by Nanyang Technological University in Singapore, an innovative approach is introduced to address and overcome these limitations. Their algorithm, known as Masterkey, is designed to bypass restrictions imposed on other neural networks through sophisticated jailbreaking techniques (term used in the Apple ecosystem). This not only highlights potential vulnerabilities of existing language models but also paves the way for new methods to improve their security and effectiveness.

Masterkey operates through specific text requests, which can push models like ChatGPT to behave in unexpected ways, such as communicating in ways considered unethical or bypassing security filters. These jailbreaking techniques, while they may seem advantageous for testing and hardening models, also represent a double-edged sword, as they could be used for malicious purposes.

The research team he analysed specifically the security vulnerabilities of language models when faced with multilingual cognitive loads, veiled expressions, and cause-and-effect reasoning. These attacks, defined as "cognitive overload", are particularly insidious as they do not require in-depth knowledge of the model's architecture or access to its weights to be conducted, making them effective black-box attacks.

In detail, the research team adopted a strategy of reverse engineering to fully understand the defenses of artificial intelligence systems and develop innovative methods to overcome them. The result of this approach was the “Masterkey”, a model, a sort of framework designed for automatically generate prompts that bypass security mechanisms.

The results were significant: the prompts generated by the Masterkey showed a rate of average success of 21,58%, much higher than the 7,33% of previous methods. An example of their technique includes adding extra spaces between characters to evade keyword detection systems on ChatGPT and Bard. A truly "silly" strategy if we think about the complexity of a large linguistic model.

Faced with these findings, it is crucial to consider not only how language models can be improved to resist such attacks, but also the importance of ethical regulation in the use of artificial intelligence. The research highlights the urgency of more robust defense strategies and ongoing dialogue between developers, researchers and policymakers to ensure that technological progress does not outpace society's ability to manage its implications.