Skip to Main Content (Press Enter)

Logo UNILINK
  • ×
  • Home
  • Corsi
  • Insegnamenti
  • Professioni
  • Persone
  • Pubblicazioni
  • Strutture

UNI-FIND
Logo UNILINK

|

UNI-FIND

unilink.it
  • ×
  • Home
  • Corsi
  • Insegnamenti
  • Professioni
  • Persone
  • Pubblicazioni
  • Strutture
  1. Pubblicazioni

Accuracy of LLMs to retrieve numeric data for meta-analysis in dentistry

Articolo
Data di Pubblicazione:
2026
Abstract:
Objectives: Evidence-based dentistry relies heavily on systematic reviews and meta-analyses (SRMA), considered the most robust forms of evidence. Still, conducting SRMA is time- and resource-intensive, with high error rates in data extraction. Artificial intelligence (AI) and large language models (LLMs) offer the potential to automate and accelerate SRMA processes such as data extraction. However, assessing the reliability and accuracy of these new AI-based technologies for SRMA is crucial. This study evaluated the accuracy of four LLMs (DeepSeek v3 R1, Claude 3.5 Sonnet, ChatGPT-4o, and Gemini 2.0-flash) in extracting different primary numeric outcomes data in various dental topics. Methods: LLMs were queried via APIs using default settings and a SMART-format prompt. Descriptive analysis was conducted at sub-outcome, outcome, and study levels. Errors were classified as hallucinations, missed, or omitted data. Results: Overall extraction accuracy was exceptionally high at the sub-outcome level, with only 3 hallucinations (from Gemini 2.0-flash). Total errors increased at the outcome level and study level. Gemini 2.0-flash generally performed significantly worse than others (p < 0.01). Claude 3.5 Sonnet and DeepSeek-v3 R1 generally exhibited superior accuracy and lower omission rates in full-text extraction compared to Gemini 2.0-flash and ChatGPT-4o. Conclusions: This first comparative evaluation of multiple LLMs for data extraction in dental research from full-text PDFs highlights their significant potential but also limitations. Performance varied notably between models, with cost not directly correlating with superior performance. While single data point extraction was highly accurate, errors increased at higher aggregation levels. Standardized outcome reporting in studies could benefit future LLM extraction, and we offer a solid benchmark for future performance comparisons. Clinical Significance: This study demonstrates that LLMs can achieve high accuracy in extracting single numeric outcomes, but omission errors in full-text analyses limit their independent use in SRMA. Improving outcome reporting standards and leveraging accurate, lower-cost models may enhance evidence synthesis efficiency in dentistry and beyond.
Tipologia CRIS:
1.1 Articolo in rivista
Keywords:
Artificial intelligence; Data extraction; Dentistry; Large language model; Meta-analysis; Systematic review
Elenco autori:
Caponio, V. C. A.; Lorenzo-Pouso, A. I.; Magalhaes, M.; Ali, A.; Adamo, D.; Cirillo, N.; Lopez-Pintor, R. M.; Musella, G.
Autori di Ateneo:
ADAMO DANIELA
CAPONIO VITO CARLO ALBERTO
Link alla scheda completa:
https://iris.unilink.it/handle/20.500.14085/54624
Pubblicato in:
JOURNAL OF DENTISTRY
Journal
  • Utilizzo dei cookie

Realizzato con VIVO | Designed by Cineca | 26.6.1.0