ChatGPT performance on pharmacology examination andboard review questions: Implications for medical education andknowledge assessment

Main Article Content

Rima A.Hijazeen
Al-Motassem Yousef https://orcid.org/0000-0002-3841-4132
Ahmed Almousa https://orcid.org/0000-0001-5183-1988
Aya N. Alzoghair https://orcid.org/0009-0008-4171-5578
Jude K. Dwairi https://orcid.org/0009-0000-4750-2903
Majd I. Sawaqed https://orcid.org/0000-0002-3841-4132
Ghaith F. Al-Ryahneh https://orcid.org/0009-0004-7630-1819
Marwan H Ali https://orcid.org/0009-0007-9205-1460

Keywords

Artificial intelligence, ChatGPT, Medical education, Pharmacology, Reasoning, Multiple-choice questions, Large language model

Abstract

Objectives: This study aimed to evaluate ChatGPT’s performance on pharmacology exam questions by assessing its accuracy in basic and clinical pharmacology, reasoning processes, and response consistency over time. Methods: A dataset of 583 multiple-choice questions from the Pharmacology Examination and Board Review (13th edition) was used. ChatGPT’s responses were evaluated for logical justification, use of internal question stem information, and integration of external knowledge. Statistical analyses, including chi-square and McNemar tests, assessed associations and changes in response accuracy over a four-week interval. Results: ChatGPT achieved 76.2% accuracy (444/583 questions), demonstrating logical reasoning in 97% of responses. Internal information was used in 99.7% of cases, while external information was incorporated in 98% of correct and 93.5% of incorrect responses (p = 0.008). Information errors were the most common reason for incorrect answers. A statistically significant improvement in accuracy upon re-evaluation (χ² = 37.3, p < 0.0001) was observed, suggesting potential temporal variation in performance. Conclusion: ChatGPT meets or exceeds typical passing standards in many educational settings, with evidence of improved response accuracy over time. These findings highlight its capabilities in processing pharmacological content, with potential implications for future research into AI-assisted educational tools.

Abstract 29 | PDF Downloads 1

References

1. Biswas S. CHAT GPT and the Future of Medical Writing. Radiology. 2023;307(2):e223312. doi:10.1148/radiol.223312.
2. Chen Y, Zhao C, Yu Z, McKeown K, He H. On the relation between sensitivity and accuracy in in-context learning. Findings Assoc Comput Linguist EMNLP. 2023:155–167. doi:10.18653/v1/2023.findings-emnlp.12.
3. Wang S, Scells H, Koopman B, Zuccon G. Can ChatGPT write a good Boolean query for systematic review literature search? Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2023:1426–1436. doi:10.1145/3539618.3591703.
4. Guo B, Zhang X, Wang Z, Jiang M, Nie J, Ding Y, Wang F, Chen J, Zhang S. How close is ChatGPT to human experts? Comparison corpus, evaluation, and detection. arXiv [Preprint]. 2023.
5. OpenAI. Optimizing language models for dialogue. OpenAI Blog. 2022.
6. Ouyang L, Wu J, Jiang X, Almeida D, Wainwright C, Mishkin P, Zhang C, Agarwal S, Slama K, Ray A, Schulman J, Hilton J, Kelton F, Miller L, Simens M, Christiano P, Leike J, Lowe R. Training language models to follow instructions with human feedback. Adv Neural Inf Process Syst. 2022;35:27730–27744.
7. Moons P, Van Bulck L. ChatGPT: Can artificial intelligence language models be of value for cardiovascular nurses and allied health professionals? Eur J Cardiovasc Nurs. 2023;22(7):e9. doi:10.1093/eurjcn/zvad022.
8. Yeo YH, Samaan JS, Ng WH, Ting PS, Trivedi H, Vipani A, Jiang D, Lee DH, Lee TH, Cheung R, Nguyen MH. Assessing the performance of ChatGPT in answering questions regarding cirrhosis and hepatocellular carcinoma. Clin Mol Hepatol. 2023;29(3):721–732. doi:10.3350/cmh.2023.0089.
9. Sng GGR, Tung JYM, Lim DYZ, Bee YM. Potential and pitfalls of ChatGPT and natural-language artificial intelligence models for diabetes education. Diabetes Care. 2023;46(5):e78–e80.
10. Grünebaum A, Chervenak J, Pollet SL, Katz A, Chervenak FA. The exciting potential for ChatGPT in obstetrics and gynecology. Am J Obstet Gynecol. 2023;228(6):696–705. doi:10.1016/j.ajog.2023.03.009.
11. D’Amico RS, White TG, Shah HA, Langer DJ. I asked ChatGPT to write an editorial about how we can incorporate chatbots into neurosurgical research and patient care…. Neurosurgery. 2023;92(4):663–664.
12. Hammer A. ChatGPT can pass the US medical licensing exam and the bar exam. Mail Online. 2023;23(23).
13. Amin Z, Khoo HE. Basics in Medical Education. Singapore: World Scientific Publishing; 2003.
14. Gilson A, Safranek CW, Huang T, Socrates V, Chi L, Taylor RA, Chartash D, Hartman B, Moeller J, Kearney K, Xu Y. How does ChatGPT perform on the medical licensing exams? The implications of large language models for medical education and knowledge assessment. medRxiv [Preprint]. 2022.
15. Kung TH, Cheatham M, Medenilla A, Sillos C, De Leon L, Elepaño C, Madriaga M, Aggabao R, Diaz-Candido G, Maningo J, Tseng V. Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models. PLOS Digit Health. 2023;2(2):e0000198.
16. Gutiérrez BJ, McNeal N, Washington C, Chen Y, Li L, Sun H, Wang X, Lin C, Ji H, Xie Q, Yu X. Thinking about GPT-3 in-context learning for biomedical IE? Think again. Findings Assoc Comput Linguist EMNLP. 2022:4526–4541.
17. Logé C, Ross E, Yaw D, Dadey A, Jain S, Saporta A, Aloufi H, Alabi A, Ching C, Gordon T. Q-Pain: A question answering dataset to measure social bias in pain management. Harvard Dataverse. 2023.
18. Katzung BG, Trevor AJ. Pharmacology Examination & Board Review. 13th ed. New York: McGraw-Hill Education; 2021.
19. Buckwalter JA, Schumacher R, Albright JP, Cooper RR. Use of an educational taxonomy for evaluation of cognitive performance. J Med Educ. 1981;56(2):115–121.
20. Ha T, Yaneva V. Evaluating the performance of OpenQA on USMLE Step 1 and Step 2 questions. arXiv [Preprint]. 2019.
21. Jin D, Pan E, Oufattole N, Weng WH, Fang H, Szolovits P. Information retrieval and neural networks for medical question answering: Performance on USMLE-style questions. arXiv [Preprint]. 2021.
22. Sharma A, Patel V, Singh H, Kumar N. ChatGPT performance on USMLE: A step toward AI-assisted learning. arXiv [Preprint]. 2023.
23. Antaki F, Cahill M, Gaudet V, Shah AS, Darvishian F. Evaluating ChatGPT in ophthalmology: Performance and limitations. JAMA Ophthalmol. 2023;141(7):577–584.
24. Jin D, Pan E, Oufattole N, Weng WH, Fang H, Szolovits P. What disease does this patient have? A large-scale open domain question answering dataset from medical exams. Appl Sci. 2021;11(14):6421.
25. Gao L, Xu M, Zhou X, Wang Y. Self-Evolving GPT: Autonomous improvement through iterative refinement. arXiv [Preprint]. 2024.
26. Madaan A, Muqeeth M, Yazdanbakhsh A, Chen X, Yao S, Zhou D. Self-Refine: Iterative refinement with self-feedback in large language models. arXiv [Preprint]. 2023.