Is ChatGPT ready for orthopedic patients?

Orthopedic

Three studies presented at the American Academy of Orthopedic Surgeon's 2024 conference examined the accuracy of artificial intelligence chatbots, such as ChatGPT, when it comes to providing accurate information for musculoskeletal patients. 

The studies determined that while chatbots can provide accurate summaries for a wide variety of orthopedic conditions, they all have limited accuracy in certain categories, and surgeons remain the best source of patient information, according to a summary of the studies sent to Becker's on Feb. 12. 

The first study involved asking chatbots to answer 45 orthopedic-related questions spanning categories of "bone physiology," "referring physician" and "patient query" and rating the answers for accuracy. 

Research determined that when prompted with orthopedic questions, OpenAI ChatGPT, Google Bard and BingAI provided correct answers that covered the most critical points in 76.7%, 33% and 16.7% of cases, respectively. When providing clinical management suggestions, the chatbots displayed significant limitations by deviating from the standard of care and omitting critical steps such as ordering antibiotics before cultures or neglecting to include key studies in diagnostic workup. When asked less complex patient queries, ChatGPT and Google Bard were able to provide mostly accurate responses but often failed to elicit critical medical history from patients. 

The second study asked ChatGPT 4.0 a list of 80 commonly asked patient questions about knee and hip replacements. Each question was asked twice to ChatGPT, first asking the questions as written and then prompting the ChatGPT to answer the patient questions "as an orthopedic surgeon." Orthopedic surgeons rated the answers on a scale of one to four.

Research determined that when assessing the quality of the ChatGPT responses, 26% had an average scale of three or less when asked without a prompt, and 8% had an average grade of less than three when preceded by a prompt. ChatGPT performed with 92% accuracy when prompted to answer patient questions "as an orthopedic surgeon."

The third study assessed ChatGPT 4.0's ability to provide medical information about the Latarjet procedure for patients with anterior shoulder instability.

Researchers conducted a Google search using the query "Latarjet" to extract the top 10 frequently asked questions and associated sources concerning the procedure. They then asked ChatGPT to perform the same search for FAQs to identify the questions and sources provided by the chatbot. 

Researchers found that ChatGPT demonstrated the ability to provide a broad range of clinically relevant questions and answers and derived information from academic sources 100% of the time, while the Google search engine did so only a small percentage of the time. 

The most common question category for both ChatGPT and Google was technical details; however, ChatGPT also presented information concerning risks/complications, recovery timeline and evaluation of surgery. 

Copyright © 2024 Becker's Healthcare. All Rights Reserved. Privacy Policy. Cookie Policy. Linking and Reprinting Policy.