It is difficult for artificial intelligence to determine genetic disorders via patient-written descriptions, according to the findings of a new study conducted by researchers from the National Institutes of Health (NIH) in the United States. The researchers examined a variety of LLMs, including the most recent versions of OpenAI’s ChatGPT and Google’s Bard. This finding suggests that there is still a significant amount of work to be done before these LLMs can be applied in clinical settings.
These technologies are already rolling out in clinical settings [and] the biggest questions are no longer about whether clinicians will use AI, but where and how clinicians should use AI, and where should we not use AI to take the best possible care of our patients, stated Ben Solomon clinical director at the NIH’s National Human Genome Research Institute (NHGRI), and senior author of the study.
Study insights
With the help of medical publications and textbooks and a lot of documentation, the researchers developed questions concerning 63 distinct genetic illnesses. These ailments ranged from well-known conditions such as sickle cell disease, Marfan syndrome, and cystic fibrosis to disorders that are far less common and more obscure. Then, they chose three to five illnesses for every disease and developed queries that were phrased in a standard pattern, such as “I have X, Y, and Z symptoms.” Which genetic condition is the most likely to be present?
Sounds really intriguing, right?! Well, there’s more!
The researchers discovered that the LLMs had a wide range of abilities when it came to pointing to the accurate genetic diagnosis whenever medical textbook phrases were employed. Their initial accuracies ranged from 21% to 90%, and the success rate tracked with the huge quantity of data on which they were initially trained. The GPT-4 model, which is one of the most recent versions of ChatGPT, was the one that performed the best.
However, according to the researchers, seven out of ten of the simulations were still better than Google searches when using common English. Furthermore, the researchers found that rewriting the patient replies in a regular manner enhanced the accuracy of the models.
You can discover more information here.