AI healthcare warning – Science Spot

The integration of large language models (LLMs), commonly referred to as generative artificial intelligence (AI), into healthcare systems is increasingly common. The hope is that LLMs will reduce the workload on clinicians, streamline administrative tasks, and improve access to medical information. A new analysis of the output of LLMs published in the International Journal of Electronic Healthcare urges caution in their use, however. The work highlights the need for a nuanced understanding of what these tools actually do, and more importantly, what they do not.

LLMs, are statistical algorithms trained on large amounts of text and used to generate plausible sequences of words based on known patterns. They may appear to have what is commonly referred to as artificial intelligence (AI), but they do not understand the content of their training datasets nor their output. Indeed, the output is shaped not by reasoning but by mathematical correlations between words and phrases in the data on which it was trained and so, in a sense, LLMs are really just very sophisticated autocomplete systems.

The distinction between the received wisdom concerning AI and the workings of LLMs is critical, especially in the context of healthcare. The uncritical reliance on seemingly fluent and authoritative output from an LLM could be a matter of life or death when that output contains errors or worse, wholly incorrect “hallucinations”.

There are also concerns regarding inherent bias in LLMs. Given that they are built on human-generated data, there is the possibility that said data contains embedded social, cultural, and institutional biases. These can manifest in outputs that inadvertently reinforce health disparities, particularly for patients from underrepresented or marginalised groups. For instance, if a model is trained primarily on data from Western clinical populations, its diagnostic suggestions or educational content may be less relevant, or even misleading, for patients with different backgrounds or needs.

Healthcare is not a static field; it is what scholars call a complex adaptive system, meaning that small changes can produce wide-reaching effects. The introduction of LLMs could subtly reshape how care is delivered, how records are kept, or even how professional expertise is perceived. Over time, increased reliance on AI-generated documentation or advice could deskill clinicians, shifting responsibility from trained professionals to automated systems. Meanwhile, patients may begin to see AI as a substitute for human attention, altering expectations around trust, empathy, and accountability in care.

In the light of these various risks, the research advocates for a model of reflexive governance, a flexible and responsive framework for monitoring and regulating AI in healthcare. This approach would replace rigid, one-time assessments with ongoing oversight that can adapt to new uses, emerging harms, and shifting ethical considerations. Crucially, such governance could and should go beyond technical safety to include values like equity, transparency, and patient autonomy.

Salzmann-Erikson, M., Göras, C., Lindberg, M., Arakelian, E. and Olsson, A. (2025) ‘ChatGPT in complex adaptive healthcare systems: embrace with caution’, Int. J. Electronic Healthcare, Vol. 14, No. 6, pp.1–17.