A groundbreaking study reveals that AI's portrayal of Neanderthals is rooted in outdated and incorrect information. Over the past 40 years, technology has transformed into a vast library, providing instant answers. However, the accuracy of AI's responses remains a challenge. Researchers Matthew Magnani and Jon Clindaniel, from the University of Maine and the University of Chicago, respectively, conducted a study published in the journal Advances in Archaeological Practice. They aimed to assess whether AI reflects modern scientific understanding or perpetuates outdated ideas when depicting ancient life.
Neanderthals, scientifically known as Homo neanderthalensis, have been a subject of debate for over a century. Early scientists depicted them as hunched and primitive, while more recent research highlights their cultural sophistication, social complexity, and physical diversity. This evolution in understanding made Neanderthals an ideal test case for evaluating AI's grasp of changing scientific knowledge.
The researchers employed two popular AI systems: DALL-E 3 for image generation and ChatGPT using the GPT-3.5 model for text. They crafted four prompts, two without scientific accuracy requirements and two based on expert knowledge. Each prompt was tested 100 times, resulting in 400 images and 200 one-paragraph descriptions.
The findings were striking. AI outputs often relied on outdated scientific concepts. Images frequently depicted Neanderthals with heavy hunches, thick body hair, and ape-like features, reflecting 19th-century ideas. Women and children were absent, and scenes predominantly featured muscular adult males. Text descriptions also fell short, with half not aligning with modern scholarly understanding. Over 80% of paragraphs for one prompt missed the mark, downplaying Neanderthal cultural diversity and skills.
Both images and text mixed timelines, blending primitive bodies with advanced tools like basketry, ladders, glass, metal tools, and thatched roofs, which Neanderthals did not possess. By comparing AI outputs with decades of archaeological writing, researchers estimated the AI's alignment with different eras of science. ChatGPT's text matched early 1960s scholarship, while DALL-E 3's images resembled late 1980s and early 1990s work.
The study's findings highlight a critical issue: AI's tendency to rely on older, more accessible data rather than current research. This is partly due to the limited availability of recent scientific publications, which often remain behind paywalls. The researchers emphasize the importance of making anthropological datasets and scholarly articles AI-accessible to improve accuracy.
The implications of this study extend beyond archaeology and anthropology. Generative AI influences how we create and trust images, writing, and sound, empowering individuals without formal training to explore history and science. However, it can also inadvertently spread old stereotypes and errors on a massive scale. In archaeology and anthropology, where public understanding relies heavily on visual and textual representations, inaccurate depictions can solidify misconceptions.
The study provides a template for researchers to assess the gap between scholarship and AI-generated content. Magnani suggests that teaching students to approach generative AI cautiously will lead to a more technically literate and critical society. The research underscores the need for careful use of AI tools, especially in education and science communication, and emphasizes the importance of open access research to ensure AI reflects current knowledge.
Additionally, the study introduces a method for testing AI accuracy across various fields, which can help ensure that AI technology supports learning without distorting it. The findings also highlight the interconnectedness of humans and Neanderthals, challenging previous beliefs. New discoveries suggest that humans and Neanderthals interbred 100,000 years earlier than thought, and they shared a far more complex relationship than previously imagined.