<RETURN_TO_BASE

Google DeepMind Unveils Aeneas: AI Revolutionizing Restoration and Contextual Analysis of Ancient Latin Inscriptions

Google DeepMind introduces Aeneas, an AI-powered tool that enhances restoration, dating, and contextual understanding of ancient Latin inscriptions, significantly aiding historians and researchers.

Challenges in Studying Ancient Latin Inscriptions

Epigraphy, the study of texts inscribed on durable materials like stone and metal, plays a crucial role in understanding the Roman world. Latin inscriptions span over two millennia (7th century BCE to 8th century CE) and cover more than sixty Roman provinces. These inscriptions include imperial decrees, legal documents, tombstones, and votive altars. However, the field faces significant challenges such as fragmentary and damaged texts, uncertain dating, diverse geographic origins, and widespread use of abbreviations. The Latin Epigraphic Dataset (LED) contains over 176,000 inscriptions with approximately 1,500 new inscriptions added every year, making manual restoration and analysis extremely labor-intensive.

Introducing Aeneas: A Neural Network for Latin Epigraphy

Google DeepMind developed Aeneas, a transformer-based generative neural network designed to assist in the restoration, dating, geographic attribution, and contextualization of Latin inscriptions. Aeneas is trained on the Latin Epigraphic Dataset (LED), which aggregates 176,861 inscriptions with around 16 million characters, spanning seven centuries BCE to eight centuries CE. The dataset includes character-level transcriptions with special tokens denoting missing text segments and metadata for province-level provenance and dating by decade.

Model Architecture and Capabilities

Aeneas employs a deep, narrow transformer decoder based on the T5 architecture, enhanced with rotary positional embeddings for effective character processing. It processes textual input along with optional inscription images via a shallow convolutional network (ResNet-8) for geographical attribution.

The model performs several specialized tasks:

  • Restoration: Predicts missing characters, including gaps of unknown length, using an auxiliary neural classifier.
  • Geographical Attribution: Classifies inscriptions into 62 Roman provinces by combining text and image embeddings.
  • Chronological Attribution: Estimates inscription dates by decade, providing probabilistic distributions aligned with historical ranges.
  • Contextual Retrieval: Generates enriched embeddings to retrieve ranked epigraphic parallels using cosine similarity, capturing linguistic and cultural analogies beyond exact matches.

Training and Data Augmentation

Training utilizes TPU v5e hardware with large batch sizes and combines losses from all tasks with optimized weighting. Data augmentation includes random text masking (up to 75%), text clipping, word deletions, punctuation dropping, image augmentations (zoom, rotation, brightness/contrast), dropout, and label smoothing to enhance generalization. Prediction employs beam search with custom logic for unknown-length text restoration, producing multiple ranked candidate restorations.

Performance Highlights

Evaluations on the LED test set and a human-AI collaboration study with 23 epigraphers show substantial improvements:

  • Restoration: Character error rate (CER) drops to ~21% with Aeneas assistance, versus 39% unaided; the model alone achieves ~23% CER.
  • Geographical Attribution: Around 72% accuracy in province classification; historians aided by Aeneas improve accuracy to 68%, outperforming either alone.
  • Chronological Attribution: Average date estimation error is ~13 years; historians aided by Aeneas reduce error from about 31 to 14 years.
  • Contextual Parallels: Retrieved parallels are useful for historical research in ~90% of cases, increasing historians’ confidence by 44% on average.

Case Studies

  • Res Gestae Divi Augusti: Aeneas identifies bimodal dating distributions reflecting scholarly debates on compositional layers. Saliency maps highlight linguistic and institutional features, retrieving parallels with imperial decrees and senatorial texts.
  • Votive Altar from Mainz (CIL XIII, 6665): Accurately dated to 211 CE and attributed to Germania Superior, Aeneas identified consular dating formulas and cultic references, retrieving related altars with similar rare formulas and iconography.

Integration in Research and Education

Aeneas serves as a cooperative tool, augmenting historians’ expertise by accelerating epigraphic parallel searches, restoration, and attribution. It is openly available via the Predicting the Past platform under permissive licenses. Additionally, an educational curriculum has been developed to promote interdisciplinary digital literacy among high school students and educators by bridging AI and classical studies.

FAQs

What is Aeneas? Aeneas is a multimodal generative neural network assisting with restoration, dating, geographic attribution, and contextualization of ancient Latin inscriptions.

How does it handle incomplete inscriptions? It predicts missing text segments, including unknown-length gaps, generating multiple ranked restoration hypotheses.

How is Aeneas integrated? It provides ranked epigraphic parallels and predictive hypotheses, boosting accuracy and reducing research time. The model and datasets are openly accessible.

For further details, refer to the Paper, Project, and Google DeepMind Blog.

🇷🇺

Сменить язык

Читать эту статью на русском

Переключить на Русский