Why AI Fails to Capture Authentic Historical Language
Recent research shows that AI models like ChatGPT struggle to generate authentic early 20th-century language, with fine-tuning improving style but not fully eliminating modern biases.
Challenges in Mimicking Historical Language with AI
A joint study by researchers from the United States and Canada reveals that large language models (LLMs) like ChatGPT struggle to accurately replicate historical idioms and prose without extensive, costly pretraining. This limitation complicates projects such as using AI to finish Charles Dickens’s unfinished novel, making them unlikely with current technology.
Experimental Approaches to Emulating Early 20th Century Prose
The team experimented with various methods to generate text that sounds historically accurate. Starting with simple prompting of ChatGPT-4o using early 1900s texts, they then fine-tuned a smaller GPT-2-based model on a small set of books from that period. They also compared these with a model trained solely on literature published between 1880 and 1914 (GPT-1914).
When asked to continue authentic historical passages, ChatGPT-4o often reverted to a modern blogging style, failing to maintain the original idiomatic expression. Conversely, the fine-tuned GPT-2 model better captured the period style but was less accurate in other dimensions.
Evaluation Using Machine and Human Judgments
To assess stylistic accuracy, a RoBERTa classifier trained to estimate publication dates was used. GPT-1914’s outputs clustered closely with early 20th-century styles, while ChatGPT-4o’s outputs skewed towards 21st-century language despite multiple historical prompts.
Fine-tuning GPT-4o-mini on historical texts improved stylistic alignment significantly, as measured by Jensen-Shannon divergence metrics, but this only captured superficial style, not deeper anachronisms.
Human evaluators found it challenging to distinguish plausible historical language from modern influences or to assess cultural biases embedded in the text. Fine-tuned GPT-4o-mini was rated most plausible, yet still fell short of authentic period writing.
Inherent Limitations and Anachronism
The researchers conclude that economical shortcuts to perfectly emulate historical idiom do not exist. Some level of anachronism may be unavoidable due to the inherent tension between authenticity and conversational fluency. AI models trained primarily on contemporary data reflect modern perspectives, often including disclaimers or signals that reveal their present-day standpoint.
Implications for AI in Historical Research
While fine-tuning commercial models on historical passages can produce stylistically convincing text at a lower cost, it cannot fully eliminate traces of modern bias. Pretraining on period material avoids anachronism but demands vast resources and results in less fluent output.
The study highlights the fundamental challenge of reconstructing vanished cultural perspectives using AI. There is no true "ground truth" for historical voices, only an interpretation that necessarily blends past and present views. Further research is needed to better balance authenticity with coherence in AI-generated historical language.
Сменить язык
Читать эту статью на русском