Studies on chatbot health advice (CHA) driven by large language models are rapidly increasing, yet their reporting is marked by significant heterogeneity and incompleteness, which severely limits the scientific credibility and reproducibility of their findings. To promote the effective dissemination and application of the newly released chatbot assessment reporting tool (CHART) statement, this paper provides a systematic interpretation and example-based analysis of the guideline. This paper dissects the 12 main items and 39 sub-items of the CHART checklist on an item-by-item basis, systematically elaborating on the methodological rationale behind each reporting requirement. A particular focus is placed on key requirements tailored to the unique characteristics of generative AI, such as the transparent disclosure of prompt engineering, query strategies, and dialogue safety. To bridge the gap between theory and practice, a high-quality, published CHA study is used as an exemplar to demonstrate the practical application of each reporting item. This interpretation report aims to provide a clear and practical handbook for researchers, journal reviewers, and editors, with the goal of fostering standardized, high-quality development in the field of CHA research and promoting the safe and effective application of AI in healthcare.
The burgeoning application of large language models (LLM) in healthcare demonstrates immense potential, yet simultaneously poses new challenges to the standardization of research reporting. To enhance the transparency and reliability of medical LLM research, an international expert group published the TRIPOD-LLM reporting guideline in Nature Medicine in January 2024. As an extension of the TRIPOD+AI guideline, TRIPOD-LLM provides detailed reporting items specifically tailored to the unique characteristics of LLMs, including general foundational models (e.g., GPT-4) and domain-specific fine-tuned models (e.g., Med-PaLM 2). It addresses critical aspects such as prompt engineering, inference parameters, generative evaluation, and fairness considerations. Notably, the guideline introduces an innovative modular design and a "living guideline" mechanism. This paper provides a systematic, item-by-item interpretation and example-based analysis of the TRIPOD-LLM guideline. It is intended to serve as a clear and practical handbook for researchers in this field, as well as for journal reviewers and editors responsible for assessing the quality of such studies, thereby fostering the high-quality development of medical LLM research in China.