Review Paper

Application of LLMs/Transformer-Based Models for Metabolite Annotation in Metabolomics
Application of LLMs/Transformer-Based Models for Metabolite Annotation in Metabolomics

Liquid Chromatography-Mass Spectrometry (LC-MS) untargeted metabolomics has become a cornerstone of modern biomedical research, enabling the analysis of complex metabolite profiles in biological systems. However, metabolite annotation, a key step in LC-MS untargeted metabolomics, remains a major challenge due to the limited coverage of existing reference libraries and the vast diversity of natural metabolites. Recent advancements in large language models (LLMs) powered by Transformer architecture have shown significant promise in addressing challenges in data-intensive fields, including metabolomics. LLMs, which when fine-tuned with domain-specific datasets such as mass spectrometry (MS) spectra and chemical property databases, together with other Transformer-based models, excel at capturing complex relationships and processing large-scale data and significantly enhance metabolite annotation. Various metabolomics tasks include retention time prediction, chemical property prediction, and theoretical MS2 spectra generation. For example, methods such as LipiDetective and MS2Mol have shown the potential of machine learning in lipid species prediction and de novo molecular structure annotation directly from MS2 spectra. These tools leverage transformer principles and their integration with LLM frameworks could further expand their utility in metabolomics. Moreover, the ability of LLMs to integrate multi-modal datasets—spanning genomics, transcriptomics, and metabolomics—positions them as powerful tools for systems-level biological analysis. This review highlights the application and future perspectives of Transformer-based LLMs for metabolite annotation of LC-MS metabolomics incorporating with multiomics. Such transformative potential paves the way for enhanced annotation accuracy, expanded metabolite coverage, and deeper insights into metabolic processes, ultimately driving advancements in precision medicine and systems biology.

Apr 15, 2025