Accuracy and consistency of ChatGPT responses as an educational tool in the vestibular system
Gülce Kirazlı1, Sümeyye Kapusızoğlu2
1Department of Audiology, Faculty of Health Sciences, Ege University, İzmir, Türkiye
2Department of Audiology, Institute of Health Sciences, Ege University, İzmir, Türkiye
Keywords: Audiology, ChatGPT, vestibular system.
Abstract
OBJECTIVE: The aim of this study was to compare the accuracy and consistency of ChatGPT’s responses in Turkish and English, specifically within the vestibular field.
METHODS: Based on a review of the current literature, a total of 42 questions were created in three subcategories: vestibular system anatomy and physiology, vestibular diagnosis and tests, and vestibular rehabilitation and treatment. These questions were presented to ChatGPT 3.5 in both Turkish and English. The accuracy of the responses was evaluated by nine experts using a five-point Likert scale. One week later, the same questions were presented in a different order, and the consistency of the responses was assessed by the authors using the same five-point Likert scale.
RESULTS: English responses (3.57±0.34 and 3.97±0.47, respectively) were significantly more accurate than Turkish responses (3.33±0.27 and 3.64±0.36, respectively) in the first and third subcategories. Both inter-rater reliability and test-retest reliability were found to be high for both languages.
CONCLUSION: Our study findings suggest that ChatGPT may serve as a complementary educational tool; however, the reliability and accuracy of its content in specific areas such as the vestibular system require further validation. Additionally, language choice, such as Turkish versus English, may influence performance in certain subcategories.