Objective To explore the use of ChatGPT (Chat Generative Pre-trained Transformer) in pediatric diagnosis, treatment and doctor-patient communication, evaluate the professionalism and accuracy of the medical advice provided, and assess its ability to provide psychological support. Methods The knowledge databases of ChatGPT 3.5 and 4.0 versions as of April 2023 were selected. A total of 30 diagnosis and treatment questions and 10 doctor-patient communication questions regarding the pediatric urinary system were submitted to ChatGPT versions 3.5 and 4.0, and the answers to ChatGPT were evaluated. Results The answers to the 40 questions answered by ChatGPT versions 3.5 and 4.0 all reached the qualified level. The answers to 30 diagnostic and treatment questions in ChatGPT 4.0 version were superior to those in ChatGPT 3.5 version (P=0.024). There was no statistically significant difference in the answers to the 10 doctor-patient communication questions answered by ChatGPT 3.5 and 4.0 versions (P=0.727). For prevention, single symptom, and disease diagnosis and treatment questions, ChatGPT’s answer scores were relatively high. For questions related to the diagnosis and treatment of complex medical conditions, ChatGPT’s answer scores were relatively low. Conclusion ChatGPT has certain value in assisting pediatric diagnosis, treatment and doctor-patient communication, but the medical advice provided by ChatGPT cannot completely replace the professional judgment and personal care of doctors.
As technology continues to advance and artificial intelligence technology is widely applied, ChatGPT (Chat Generative Pre-trained Transformer) is beginning to make its mark in the field of healthcare consultation services. This article summarizes the current applications of ChatGPT in healthcare consultation services, reviewing its roles in four areas: dissemination of disease knowledge, assisting in the understanding of medical information, personalized health education and guidance, and preliminary diagnostic assistance and medical guidance. It also explores the development prospects of ChatGPT in healthcare consultation services, as well as the challenges and ethical dilemmas it faces in this field.
As one of the hot topics in the field of artificial intelligence, large language models are being applied in various domains, including medical research. ChatGPT (Chat Generative Pre-trained Transformer), as one of the most representative and leading large language models, has gained popularity among researchers due to its logical coherence and natural language generation capabilities. This article reviews the applications and limitations of ChatGPT in three key areas of medical research: scientific writing, data analysis, and drug development. Furthermore, it explores future development trends and provides recommendations for improvement, offering a reference for the application of ChatGPT in medical research.
ObjectiveTo construct a lung cancer surgery-oriented disease-specific database covering the entire perioperative care pathway, thereby improving the quality and usability of key surgical data elements. Methods Real-world clinical data were extracted from a single-center thoracic surgery department. A standardized data model was established based on the open electronic health record (openEHR) standard. Large language model (LLM), optical character recognition (OCR), and artificial intelligence (AI)-driven techniques were employed to extract, structure, and perform quality control on unstructured clinical narratives, imaging reports, and radiological data, with a focus on capturing surgically relevant perioperative indicator. Results A multimodal database comprising 19 917 patients was established, including 7 930 males and 11 987 females, with ages ranging from 15 to 97 (61.7±9.7) years. The database includes 582 structured data variables, textual report data corresponding to 69 clinical indicators, 13 000 pulmonary function test PDF reports, and chest CT imaging data from 16 884 patients. This database comprehensively covers major information relevant to surgical diagnosis and treatment of lung cancer, significantly improving the completeness and granularity of surgical detail data. Large language models (LLMs) and optical character recognition (OCR) technologies enhanced the efficiency of converting unstructured data into structured formats, while a multi-level manual verification process ensured data accuracy and traceability. The database supports real-world research including comparisons of surgical procedures, prediction of postoperative complications, prognosis assessment, and multimodal data association analyses.
ObjectiveTo explore the application value of artificial intelligence in medical research assistance, and analyze the key paths to achieve precise execution of model instructions, improvement of model interpretation completeness, and control of hallucinations. MethodsTaking esophageal cancer research as the scenario, five types of literature including research articles, case reports, reviews, editorials, and guidelines were selected for model interpretation tests. The model performance was systematically evaluated from five dimensions: recognition accuracy, format accuracy, instruction execution accuracy, content reliability rate, and content completeness index. The performance differences of Ruibin Agent, GPT-4o, Claude 3.7 Sonnet, DeepSeek V3, and DouBao-pro models in medical literature interpretation tasks were compared. ResultsA total of 15 studies were included, with 3 studies of each type. The five models collectively conducted 1 875 tests. Due to the poor recognition accuracy of the editorial type, the overall recognition accuracy of Ruibin Agent was significantly lower than other models (92.0% vs. 100.0%, P<0.001). In terms of format accuracy, Ruibin Agent was significantly better than Claude 3.7 Sonnet (98.7% vs. 92.0%, P=0.002) and GPT-4o (98.7% vs. 78.9%, P<0.001). In terms of instruction execution accuracy, Ruibin Agent was better than GPT-4o (97.3% vs. 80.0%, P<0.001). In terms of content reliability rate, Ruibin Agent was significantly lower than Claude 3.7 Sonnet (84.0% vs. 92.0%, P=0.010) and DeepSeek V3 (84.0% vs. 94.7%, P<0.001). In terms of content completeness index, the median scores of Ruibin Agent, GPT-4o, Claude 3.7 Sonnet, DeepSeek V3, and DouBao-pro were 0.71, 0.60, 0.85, 0.74, and 0.77, respectively. ConclusionRuibin Agent has significant advantages in terms of formatted interpretation of medical literature and instruction execution accuracy. In the future, it is necessary to focus on optimizing the recognition ability of editorial types, strengthening the coverage ability of core elements of various types of literature to improve interpretation completeness, and improving content reliability through optimizing the confidence mechanism to ensure the rigor of medical literature interpretation.
Objective To explore the application of the GPT-4 large language model in simplifying lung cancer radiology reports to enhance patient comprehension and doctor–patient communication efficiency. Methods A total of 362 radiology reports of non-small cell lung cancer (NSCLC) patients were collected from two hospitals between September and December 2024. Interpretive radiology reports (IRRs) were generated using GPT-4. Original reports (ORRs) and IRRs were compared through radiologist consistency evaluation and volunteer-based assessments of reading time, comprehension scores, and simulated communication duration. Results The average word count of ORRs was (459.83±55.76) words, compared with (625.42±41.59) words for IRRs (P<0.001). No significant differences were observed in expert consistency scores between ORRs and IRRs across dimensions of image interpretation accuracy, report detail completeness, explanatory depth and insight, and clinical practicality. Compared with reading ORRs, volunteers (simulated patient) read IRRs with shorter time [(346.88±29.15) s versus (409.01 ±102.40) s], with higher comprehension scores [(7.83±1.04) points versus (5.53±0.94) points] and shorter doctor-patient communication times [(317.31±57.81) s versus (714.20±56.67) s]. All differences were statistically significant (all P<0.001). Conclusion GPT-4 generated IRRs significantly improve patient comprehension and shorten communication time while maintaining medical accuracy. These findings suggest a new approach to optimizing radiology report management and enhancing healthcare service quality.