TableRAG: Revolutionizing Multi-Hop QA with Hybrid SQL and Text Retrieval

Challenges in Multi-Modal Question Answering

Handling questions that integrate natural language and structured tables is critical for developing advanced AI systems. Many real-world documents, such as business reports and research papers, contain mixed content of text and numerical tables. Effective AI must reason across both text explanations and tabular data, a task far more complex than traditional text-only question answering.

Limitations of Current Models

Existing language models struggle to accurately interpret tables once they are flattened into text, losing crucial relationships between rows and columns. This leads to errors in computations, aggregations, and multi-fact reasoning, making it difficult to rely on these models for multi-hop questions requiring insights from both text and tables.

Previous Approaches and Their Shortcomings

Retrieval-Augmented Generation (RAG) techniques have been applied to retrieve relevant text segments for language models. However, these fall short for compositional or global reasoning over large tabular datasets. Methods like NaiveRAG and TableGPT2 try to convert tables into Markdown or generate Python code for execution, but they still fail to maintain the original table structure necessary for accurate interpretation.

Introducing TableRAG: A Hybrid Framework

Huawei Cloud BU researchers proposed TableRAG, a hybrid system alternating between text retrieval and structured SQL execution. This method preserves table layouts and treats table queries as unified reasoning units, respecting the relational organization of rows and columns. To evaluate their approach, the team also developed HeteQA, a diverse benchmark dataset for multi-step reasoning across domains.

How TableRAG Works

TableRAG operates in two phases:

Offline stage: Parses heterogeneous documents, extracting tables into relational databases and textual content into chunked knowledge bases stored separately.
Online stage: Processes user questions through four iterative steps—query decomposition, text retrieval, SQL programming and execution, and intermediate answer generation. The system dynamically decides whether tabular or textual reasoning is needed and combines results accordingly. SQL enables precise symbolic execution, improving numerical and logical computations.

Performance and Benchmarking

TableRAG was tested on HybridQA, WikiTableQuestions, and the new HeteQA dataset, which contains 304 complex questions spanning nine domains, 136 unique tables, and over 5,300 Wikipedia-derived entities. Tasks include filtering, aggregation, grouping, calculation, and sorting. TableRAG outperformed baseline models such as NaiveRAG, React, and TableGPT2, achieving higher accuracy using up to five iterative reasoning steps and advanced models like Claude-3.5-Sonnet and Qwen-2.5-72B.

Impact and Future Directions

By maintaining structural integrity of tables and leveraging SQL for structured operations, TableRAG offers a robust solution to multi-hop question answering on mixed-format documents. This approach enables more accurate, scalable, and interpretable document understanding, marking a significant advancement in AI systems that handle heterogeneous data sources.

For more details, check the original paper and GitHub repository.