What if the biggest threat to generative artificial intelligence (AI) is not its potential harm to humanity, but rather the danger it poses to itself? According to a recent study, the accumulation of errors resulting from training AI models with data generated by other AI systems can lead to a phenomenon called “collapse of the model.”
Generative AIs are designed to create content based on textual prompts or produce images and sound. To accomplish this, they rely on being trained with properly contextualized and representative datasets. These datasets are initially generated by humans and often sourced from the vast amount of information available on the internet, including social media posts, e-commerce reviews, and news articles.
However, the study reveals that when AIs are trained with their own production, the rare occurrences within the data can vanish over generations. This phenomenon occurs because generative AIs are statistical prediction systems that rely on probabilities established from real-world data. As the AI system recursively generates content, it tends to overestimate frequent events and underestimate implausible ones. Just like making copies of copies, details from the original document are gradually lost.
Dr. Nicolas Papernot, a researcher at the University of Toronto and co-author of the study, draws an analogy to a photocopier. While the first copy retains the quality of the original, subsequent copies lose important details. Similarly, as generative AIs continue to produce content based on their own generated data, the original source becomes diluted, leading to a collapse of the model.
This study sheds light on a lesser-known aspect of generative AI and underscores the importance of carefully curating and diversifying datasets during the training process. It highlights the need to strike a balance between relying on existing data and introducing new, real-world information to avoid the collapse of AI models.
While the debate on the risks and benefits of AI continues, understanding and addressing the potential self-destructive tendencies of AI models is crucial for their long-term viability and reliability.
Frequently Asked Questions (FAQ)
Q: What is generative artificial intelligence (AI)?
A: Generative AI refers to a type of artificial intelligence that is designed to create content, such as images, sound, or text, based on given input.
Q: What is the collapse of the model in generative AI?
A: The collapse of the model refers to a phenomenon where generative AI models accumulate errors and stop functioning normally due to being trained with data that itself comes from AI systems.
Q: How are generative AI models trained?
A: Generative AI models are trained using datasets that provide contextualized and representative information. These datasets are often generated by humans and sourced from various online platforms.
Q: Why do generative AI models collapse?
A: Generative AI models can collapse when trained with their own production because they tend to overestimate frequent events and underestimate implausible ones, gradually losing important details over generations.
Q: How can the collapse of generative AI models be prevented?
A: To prevent the collapse of AI models, it is essential to curate and diversify datasets during the training process. Introducing new, real-world information alongside existing data can help maintain the viability and reliability of AI models.