APA

Sahu, G. (2025). LitLLMs, LLMs for Literature Review: Are We There Yet?. Perimeter Institute. https://pirsa.org/25040076

MLA

Sahu, Gaurav. LitLLMs, LLMs for Literature Review: Are We There Yet?. Perimeter Institute, Apr. 08, 2025, https://pirsa.org/25040076

BibTex

@misc{ pirsa_PIRSA:25040076,
  doi = {10.48660/25040076},
  url = {https://pirsa.org/25040076},
  author = {Sahu, Gaurav},
  keywords = {},
  language = {en},
  title = {LitLLMs, LLMs for Literature Review: Are We There Yet?},
  publisher = {Perimeter Institute},
  year = {2025},
  month = {apr},
  note = {PIRSA:25040076 see, \url{https://pirsa.org}}
}
            

Abstract

Literature reviews are an essential component of scientific research, but they remain time-intensive and challenging to write, especially due to the recent influx of research papers. In this talk, we will explore the zero-shot abilities of recent Large Language Models (LLMs) in assisting with the writing of literature reviews based on an abstract. We will decompose the task into two components: 1. Retrieving related works given a query abstract, and 2. Writing a literature review based on the retrieved results. We will then analyze how effective LLMs are for both components. For retrieval, we will discuss a novel two-step search strategy that first uses an LLM to extract meaningful keywords from the abstract of a paper and then retrieves potentially relevant papers by querying an external knowledge base. Additionally, we will study a prompting-based re-ranking mechanism with attribution and show that re-ranking doubles the normalized recall compared to naive search methods, while providing insights into the LLM's decision-making process. We will then discuss the two-step generation phase that first outlines a plan for the review and then executes steps in the plan to generate the actual review. To evaluate different LLM-based literature review methods, we create test sets from arXiv papers using a protocol designed for rolling use with newly released LLMs to avoid test set contamination in zero-shot evaluations. We will also see a quick demo of LitLLM in action towards the end.