ChatGPT performance on pharmacology examination andboard review questions: Implications for medical education andknowledge assessment

Rima A.Hijazeen; Al-Motassem  Yousef; Ahmed Almousa; Aya N. Alzoghair; Jude K. Dwairi; Majd I. Sawaqed; Ghaith F. Al-Ryahneh; Marwan H Ali

doi:10.18549/

Rima A.Hijazeen

. B.Sc. Pharmacy, M.Sc. Clinical Pharmacy, PhD Clinical Pharmacy Practice Associate Professor in Clinical Pharmacy Practice, The University of Jordan Faculty of Pharmacy Department of Biopharmaceutics and Clinical Pharmacy, Amman 11942, Jordan

Al-Motassem Yousef

B.Sc. Pharmacy, PhD in Pharmacology and Therapeutics, Professor of Pharmacology and Therapeutics, The University of Jordan, Faculty of Pharmacy, Department of Biopharmaceutics and Clinical Pharmacy, Amman 11942, Jordan.

https://orcid.org/0000-0002-3841-4132
Ahmed Almousa

RPh, MSc, PhD, Assistant Professor of Clinical Pharmacy, Department of Biopharmaceutics and Clinical Pharmacy, The Faculty of Pharmacy, University of Jordan. Amman-Jordan.

https://orcid.org/0000-0001-5183-1988
Aya N. Alzoghair

Undergraduate Pharmacy student, The University of Jordan Faculty of Pharmacy, Amman 11942, Jordan.

https://orcid.org/0009-0008-4171-5578
Jude K. Dwairi

Undergraduate Pharmacy student, The University of Jordan Faculty of Pharmacy Amman 11942, Jordan.

https://orcid.org/0009-0000-4750-2903
Majd I. Sawaqed

Undergraduate Pharmacy student, The University of Jordan Faculty of Pharmacy, Amman 11942, Jordan.

https://orcid.org/0000-0002-3841-4132
Ghaith F. Al-Ryahneh

Undergraduate Pharmacy student, The University of Jordan Faculty of Pharmacy, Amman 11942, Jordan.

https://orcid.org/0009-0004-7630-1819
Marwan H Ali

Undergraduate Pharmacy student, The University of Jordan Faculty of Pharmacy, Amman 11942, Jordan

https://orcid.org/0009-0007-9205-1460

Keywords

Artificial intelligence, ChatGPT, Medical education, Pharmacology, Reasoning, Multiple-choice questions, Large language model

Abstract

Objectives: This study aimed to evaluate ChatGPT’s performance on pharmacology exam questions by assessing its accuracy in basic and clinical pharmacology, reasoning processes, and response consistency over time. Methods: A dataset of 583 multiple-choice questions from the Pharmacology Examination and Board Review (13th edition) was used. ChatGPT’s responses were evaluated for logical justification, use of internal question stem information, and integration of external knowledge. Statistical analyses, including chi-square and McNemar tests, assessed associations and changes in response accuracy over a four-week interval. Results: ChatGPT achieved 76.2% accuracy (444/583 questions), demonstrating logical reasoning in 97% of responses. Internal information was used in 99.7% of cases, while external information was incorporated in 98% of correct and 93.5% of incorrect responses (p = 0.008). Information errors were the most common reason for incorrect answers. A statistically significant improvement in accuracy upon re-evaluation (χ² = 37.3, p < 0.0001) was observed, suggesting potential temporal variation in performance. Conclusion: ChatGPT meets or exceeds typical passing standards in many educational settings, with evidence of improved response accuracy over time. These findings highlight its capabilities in processing pharmacological content, with potential implications for future research into AI-assisted educational tools.

Abstract 29 | PDF Downloads 1

References

1. Biswas S. CHAT GPT and the Future of Medical Writing. Radiology. 2023;307(2):e223312. doi:10.1148/radiol.223312.
2. Chen Y, Zhao C, Yu Z, McKeown K, He H. On the relation between sensitivity and accuracy in in-context learning. Findings Assoc Comput Linguist EMNLP. 2023:155–167. doi:10.18653/v1/2023.findings-emnlp.12.
3. Wang S, Scells H, Koopman B, Zuccon G. Can ChatGPT write a good Boolean query for systematic review literature search? Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2023:1426–1436. doi:10.1145/3539618.3591703.
4. Guo B, Zhang X, Wang Z, Jiang M, Nie J, Ding Y, Wang F, Chen J, Zhang S. How close is ChatGPT to human experts? Comparison corpus, evaluation, and detection. arXiv [Preprint]. 2023.
5. OpenAI. Optimizing language models for dialogue. OpenAI Blog. 2022.
6. Ouyang L, Wu J, Jiang X, Almeida D, Wainwright C, Mishkin P, Zhang C, Agarwal S, Slama K, Ray A, Schulman J, Hilton J, Kelton F, Miller L, Simens M, Christiano P, Leike J, Lowe R. Training language models to follow instructions with human feedback. Adv Neural Inf Process Syst. 2022;35:27730–27744.
7. Moons P, Van Bulck L. ChatGPT: Can artificial intelligence language models be of value for cardiovascular nurses and allied health professionals? Eur J Cardiovasc Nurs. 2023;22(7):e9. doi:10.1093/eurjcn/zvad022.
8. Yeo YH, Samaan JS, Ng WH, Ting PS, Trivedi H, Vipani A, Jiang D, Lee DH, Lee TH, Cheung R, Nguyen MH. Assessing the performance of ChatGPT in answering questions regarding cirrhosis and hepatocellular carcinoma. Clin Mol Hepatol. 2023;29(3):721–732. doi:10.3350/cmh.2023.0089.
9. Sng GGR, Tung JYM, Lim DYZ, Bee YM. Potential and pitfalls of ChatGPT and natural-language artificial intelligence models for diabetes education. Diabetes Care. 2023;46(5):e78–e80.
10. Grünebaum A, Chervenak J, Pollet SL, Katz A, Chervenak FA. The exciting potential for ChatGPT in obstetrics and gynecology. Am J Obstet Gynecol. 2023;228(6):696–705. doi:10.1016/j.ajog.2023.03.009.
11. D’Amico RS, White TG, Shah HA, Langer DJ. I asked ChatGPT to write an editorial about how we can incorporate chatbots into neurosurgical research and patient care…. Neurosurgery. 2023;92(4):663–664.
12. Hammer A. ChatGPT can pass the US medical licensing exam and the bar exam. Mail Online. 2023;23(23).
13. Amin Z, Khoo HE. Basics in Medical Education. Singapore: World Scientific Publishing; 2003.
14. Gilson A, Safranek CW, Huang T, Socrates V, Chi L, Taylor RA, Chartash D, Hartman B, Moeller J, Kearney K, Xu Y. How does ChatGPT perform on the medical licensing exams? The implications of large language models for medical education and knowledge assessment. medRxiv [Preprint]. 2022.
15. Kung TH, Cheatham M, Medenilla A, Sillos C, De Leon L, Elepaño C, Madriaga M, Aggabao R, Diaz-Candido G, Maningo J, Tseng V. Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models. PLOS Digit Health. 2023;2(2):e0000198.
16. Gutiérrez BJ, McNeal N, Washington C, Chen Y, Li L, Sun H, Wang X, Lin C, Ji H, Xie Q, Yu X. Thinking about GPT-3 in-context learning for biomedical IE? Think again. Findings Assoc Comput Linguist EMNLP. 2022:4526–4541.
17. Logé C, Ross E, Yaw D, Dadey A, Jain S, Saporta A, Aloufi H, Alabi A, Ching C, Gordon T. Q-Pain: A question answering dataset to measure social bias in pain management. Harvard Dataverse. 2023.
18. Katzung BG, Trevor AJ. Pharmacology Examination & Board Review. 13th ed. New York: McGraw-Hill Education; 2021.
19. Buckwalter JA, Schumacher R, Albright JP, Cooper RR. Use of an educational taxonomy for evaluation of cognitive performance. J Med Educ. 1981;56(2):115–121.
20. Ha T, Yaneva V. Evaluating the performance of OpenQA on USMLE Step 1 and Step 2 questions. arXiv [Preprint]. 2019.
21. Jin D, Pan E, Oufattole N, Weng WH, Fang H, Szolovits P. Information retrieval and neural networks for medical question answering: Performance on USMLE-style questions. arXiv [Preprint]. 2021.
22. Sharma A, Patel V, Singh H, Kumar N. ChatGPT performance on USMLE: A step toward AI-assisted learning. arXiv [Preprint]. 2023.
23. Antaki F, Cahill M, Gaudet V, Shah AS, Darvishian F. Evaluating ChatGPT in ophthalmology: Performance and limitations. JAMA Ophthalmol. 2023;141(7):577–584.
24. Jin D, Pan E, Oufattole N, Weng WH, Fang H, Szolovits P. What disease does this patient have? A large-scale open domain question answering dataset from medical exams. Appl Sci. 2021;11(14):6421.
25. Gao L, Xu M, Zhou X, Wang Y. Self-Evolving GPT: Autonomous improvement through iterative refinement. arXiv [Preprint]. 2024.
26. Madaan A, Muqeeth M, Yazdanbakhsh A, Chen X, Yao S, Zhou D. Self-Refine: Iterative refinement with self-feedback in large language models. arXiv [Preprint]. 2023.

PDF

Published

Jun 20, 2026

How to Cite

1.

ChatGPT performance on pharmacology examination andboard review questions: Implications for medical education andknowledge assessment. Pharm Pract (Granada) [Internet]. 2026 Jun. 20 [cited 2026 Jun. 21];24(2):1-10. Available from: https://www.pharmacypractice.org/index.php/pp/article/view/3488

Issue

Vol. 24 No. 2 (2026): Apr-Jun

Section

Original Research

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

How to Cite

1.

ChatGPT performance on pharmacology examination andboard review questions: Implications for medical education andknowledge assessment. Pharm Pract (Granada) [Internet]. 2026 Jun. 20 [cited 2026 Jun. 21];24(2):1-10. Available from: https://www.pharmacypractice.org/index.php/pp/article/view/3488

Main Article Content

Keywords

Abstract

References

Article Sidebar

Article Details

How to Cite