How researchers ‘tricked’ ChatGPT into revealing users’ phone numbers and email addresses – Times of India

Asking OpenAI‘s ChatGPT to repeat a word like “poem” forever could make it expose private information, including personally identifiable data, such as phone numbers and email addresses, scraped from the web.
A team of researchers recently found out that ChatGPT, the AIchatbot of OpenAI, had a potential vulnerability that could expose some of the information it has been trained on.This could occur if the chatbot is asked to repeat random words repeatedly, revealing private details like phone numbers, email addresses, content from research papers, news articles,Wikipedia entries, and other data.
A study on “extractable memorization” was conducted by a group of researchers from Google DeepMind, the University of Washington, Cornell, Carnegie Mellon, University of California Berkeley, and ETH Zurich.
The study focused on the phenomenon where an adversary can extract training data by repeatedly asking a machine learning model, such as ChatGPT, to repeat a specific word or phrase, like “poem.” Although it is relatively easy to extract training data from open-source models that publicly share their model weights and training data, models like ChatGPT are “aligned” with human feedback to prevent them from regurgitating training data.
Retrieving training data from ChatGPT was not easy for the researchers. The group had to devise a way to prompt the model to generate responses based on its initial training data instead of its alignment training. To do this, they asked ChatGPT to repeat the word “poem” multiple times. Although initially successful, ChatGPT eventually began to produce nonsensical responses.
ChatGPT generated some text by copying directly from pre-training data. The researchers were able to extract over 10,000 unique memorised training examples by spending only $200. These examples included personal identifiable information, such as phone numbers, email addresses, birthdays, social media handles. Additionally, the researchers were able to extract Bitcoin addresses, NSFW content, paragraphs from published books and poems, and snippets of text from academic articles and bibliographic information of various authors.
To verify if the output generated by ChatGPT was part of its training data, researchers downloaded a vast corpus of text from the internet and created an auxiliary dataset. By cross-referencing the chatbot’s text with this dataset, researchers could determine if a sequence of words appeared verbatim in both cases.