Can I trust ChatGPT for arthrofibrosis advice?

kayleyusher
Mar 25
6 min read

Updated: Apr 2

A friend asked this question after he’d used ChatGPT-Pro to generate a report about arthrofibrosis therapies. The professional-sounding report was very convincing, but contained a mix of accurate science, such as the potential use of metformin (see blog Metformin - A helping hand) and outdated, dangerous opinion, such as the advice to push through pain. Other important information regarding potential harm was missing (see more on this below). So, I was interested to know, is ChatGPT advice helpful or a problem in healthcare? There is a lot of excitement around the applications of AI in healthcare. ChatGPT is evolving, with some anticipating it will become a useful tool for assisting in the discovery of treatment targets and analysing medical images in the future [1].

ChatGPT (Chat Generative Pre-Trained Transformer) receives over one billion queries per day and is appealing to people looking for quick answers to complex health challenges [2]. However, ChatGPT was not designed for generating health advice [3], and we need to find a balance between misplaced trust and missed opportunities to improve lives. Incorrect health information from ChatGPT can lead to irreversible harm to patients [1], so it’s really important to understand the limitations of this technology [2]. Advice for managing a disease needs to consider a person’s history, symptoms and test results, but ChatGPT’s advice is general and it’s not able to identify specific indicators that determine which treatment option is appropriate for an individual [1]. This need to thoroughly evaluate every patient and adapt their therapy is particularly true for arthrofibrosis.

Arthrofibrosis is a highly complex and variable condition requiring an approach that is tailored to the joint’s history, symptoms and location of the pathology within the joint.

You may have heard about ChatGPT “hallucinating”, where it presents credible answers that are entirely made up [1], generating incorrect advice that sounds legitimate, but which can be dangerous [3, 4]. Unless you’re a specialist in the area, it’s not possible to spot this. ChatGPT even makes up references. More than one in every four references provided in version GPT-4o were fake, and this “research” can skew the perception of researchers [1]. And it turns out that ChatGPT is smart in the way that it fabricates, using realistic titles and the real names of researchers as “authors” of the fictional papers [1].

So, blindly trusting ChatGPT is not wise. Another issue is that ChatGPT generates reports from many sources and doesn’t distinguish between factual and non-factual information, or real and faked information, leading to answers that can be highly misleading [3, 4]. Nor can ChatGPT reflect on conflicts and gaps in knowledge, and it can miss key findings [2], all issues that are particularly prevalent in arthrofibrosis research. In some instances ChatGPT did not agree with the latest empirical evidence or went against the latest health advice [2].

We can see that it can be dangerous to follow health advice from ChatGPT, but we all have a built-in vulnerability. As humans we instinctively trust and follow confident people [5], and health reports generated by ChatGPT are confident, authoritative and sound human - so much so that people can find its responses to medical questions more convincing than reports written by doctors [1][6]. Even doctors are not immune to this and are more likely to adopt ChatGPT-generated impressions of radiology reports, even when they might be false and harmful, due to better language compared to the reports written by radiologists [3].

As AI experts said “ChatGPT is incredibly limited, but good enough at some things to create a misleading impression of greatness” [4].

Nonetheless, growing familiarity with ChatGPT makes us feel more trusting and research indicates people increasingly view the program as having human characteristics [2]. Some see the program as a trusted friend or teacher and around 4% of USA respondents thought of ChatGPT as a genie, suggesting they felt it had mystical or magical abilities [2]. But, in addition to not being able to distinguish between factual and non-factual information, ChatGPT doesn’t understand what quality research is. And, as they say of any analysis, "rubbish in, rubbish out." Historically, a lot of arthrofibrosis papers are poor quality and have a high risk of bias, particularly in the areas of knee surgery and physiotherapy. Recommendations from unreliable, poor-quality experiments, which are in the majority in arthrofibrosis research, will likely dominate ChatGPT reports. I’ll explain what I mean.

To prevent bias influencing the results, good quality research uses “blinded” assessors, who don’t know the treatment allocation when they measure treatment outcomes. Double blinding, where the treatment is concealed from patients, is used where this is possible, for example by using a sham surgical procedure in the control group. But knee surgery and physiotherapy papers have a poor record for quality, with very few using concealed treatment allocation and therefore likely reporting unrealistically positive results for their preferred approach. In addition, not enough patients and a lack of randomised control groups means the results may not be meaningful, as people usually improve over time regardless of the therapy.

Then we have the problem of cell culture (in vitro) and animal experiments - almost all the papers listed in the ChatGPT-Pro arthrofibrosis report were these types of experiments. These experimental systems are essential first steps before human clinical trials begin, but they do not accurately replicate human biology or complex diseases [7]. For example, the biological response of rodents to chronic inflammation is so different to that of humans that the rodent research has poor predictive value [7]. In addition, rodent experiments frequently have flawed design, such as introducing the therapy before disease onset and failing to consider the progressive nature of the condition [8]. More than 80% of studies using animal models in arthrofibrosis research started the therapy before, or at the same time, as fibrosis being established, and the overall quality of studies was poor with a high risk of bias [9].

Perhaps it's not surprising that the vast majority of therapies that look good in animal experiments are not effective and do not make it into clinical use [7].

The ChatGPT-Pro report also failed to mention important papers demonstrating adverse effects of certain therapies. For example, it states that collagenase injections are an appealing option and cites a rodent study supporting its use for treating arthrofibrosis. However, Fitzpatrick et al. [10] found that collagenase caused adverse events including internal bleeding in all shoulder arthrofibrosis patients, together with pain and swelling. The double-blinded, randomised controlled study did not find a significant benefit from the treatment, and the severe bruising over an extensive area of the body in most patients caused them to conclude that the drug should not be recommended for treating arthrofibrosis. The ChatGPT-Pro report stated that other therapies, like PRP, are associated with swelling and pain but concludes that they’re low risk. These symptoms of inflammation increase the risk of a fibrotic reaction and could promote arthrofibrosis, but ChatGPT fails to consider the different biological context of fibrosis. There were other examples of this problem in the report.

Perhaps soon it will be possible for the average person to constrain ChatGPT to double-blinded, randomised controlled studies in humans that have appropriate statistical analysis, and that would be interesting. In the meantime, ChatGPT is incredibly powerful at jobs that people are poor at, analysing massive datasets to find previously unknown correlations and associations, and has potential for many areas of research and medicine. But I feel that it will be some time before you and I can safely use it for health advice.

If you have used ChatGPT for health advice, we would love to hear about your experience in the Comments below.

References

Tan, S., Xin, X. & Wu, D. ChatGPT in medicine: prospects and challenges: a review article. Int J Surg 110, 3701-3706 (2024). https://doi.org/10.1097/JS9.0000000000001312
Cheng M., L. A. Y., Rapuano K., Niederhoffer K., Liebscher A., Hancock J. From tools to thieves: Measuring and understanding public perceptions of AI through crowdsourced metaphors. (2025).
Li, J., Dada, A., Puladi, B., Kleesiek, J. & Egger, J. ChatGPT in healthcare: A taxonomy and systematic review. Comput Methods Programs Biomed 245, 108013 (2024). https://doi.org/10.1016/j.cmpb.2024.108013
Oviedo-Trespalacios, O. et al. The risks of using ChatGPT to obtain common safety-related information and advice. Safety Science 167 (2023). https://doi.org/10.1016/j.ssci.2023.106244
Moore, D. A. Perfectly Confident Leadership. California Management Review 63, 58-69 (2021). https://doi.org/10.1177/0008125621992173
Cavalier, J. S. et al. Ethics in Patient Preferences for Artificial Intelligence-Drafted Responses to Electronic Messages. JAMA Netw Open 8, e250449 (2025). https://doi.org/10.1001/jamanetworkopen.2025.0449