Dr. Scott Gottlieb is a CNBC contributor and is a member of the boards of Pfizer, genetic testing startup Tempus, health-care tech company Aetion Inc. and biotech company Illumina. He also serves as co-chair of Norwegian Cruise Line Holdings’ and Royal Caribbean’s “Healthy Sail Panel.”
Researchers at Harvard presented a study demonstrating an achievement that would challenge any medical student. ChatGPT, a large language model, passed the U.S. Medical Licensing Exam, outperforming about 10 percent of medical students who fail the test annually.
The inevitable question isn’t so much if, but when, these artificial intelligence devices can step into the shoes of doctors. For some tasks, this medical future is sooner than we think.
To grasp the potential of these tools to revolutionize the practice of medicine, it pays to start with a taxonomy of the different technologies and how they’re being used in medical care.
The AI tools being applied to healthcare can generally be divided into two main categories. The first is machine learning that uses algorithms to enable computers to learn patterns from data and make predictions. These algorithms can be trained on a variety of data types, including images.
The second category encompasses natural language processing, which is designed to understand and generate human language. These tools enable a computer to transform human language and unstructured text into machine-readable, organized data. They learn from a multitude of human trial-and-error decisions and emulate a person’s responses.
A key difference between the two approaches resides in their functionality. While machine learning models can be trained to perform specific tasks, large language models can understand and generate text, making them especially useful for replicating interactions with providers.
In medicine, the use of these technologies is generally following one of four different paths. The first encompass large language models that are applied to administrative functions like processing medical claims or creating and analyzing medical records. Amazon’s HealthScribe is a programmable interface that transcribes conversations between doctors and patients and can extract medical information, allowing providers to create structured records of encounters.
The second bucket involves the use of supervised machine learning to enhance the interpretation of clinical data. Specialties such as radiology, pathology and cardiology are already using AI for image analysis, to read MRIs, evaluate pathology slides or interpret electrocardiograms. In fact, up to 30 percent of radiology practices have already adopted AI tools. So have other specialties. Google Brain AI has developed software that analyzes images from the back of the eye to diagnose diabetic macular edema and diabetic retinopathy, two common causes of blindness.
Since these tools offer diagnoses and can directly impact patient care, the FDA often categorizes them as medical devices, subjecting them to regulation to verify their accuracy. However, the fact that these tools are trained on closed data sets, where the findings in data or imaging have been rigorously confirmed, gives the FDA increased confidence when assessing these devices’ integrity.
The third broad category comprises AI tools that rely on large language models that extract clinical information from patient-specific data, interpreting it to prompt providers with diagnoses or treatments to consider. Generally known as clinical decision support software, it evokes a picture of an brainy assistant designed to aid, not to supplant, a doctor’s judgment. IBM’s “Watson for Oncology” uses AI to help oncologists make more informed decisions about cancer treatments, while Google Health is developing DeepMind Health to create similar tools.
As long as the doctor remains involved and exercises independent judgment, the FDA doesn’t always regulate this kind of tool. The FDA focuses more on whether it’s meant to make a definitive clinical decision, as opposed to providing information to help doctors with their assessments.
The fourth and final grouping represents the holy grail for AI: large language models that operate fully automated, parsing the entirety of a patient’s medical record to diagnose conditions and prescribe treatments directly to the patient, without a physician in the loop.
Right now, there are only a few clinical language models, and even the largest ones possess a relatively small number of parameters. However, the strength of the models and the datasets available for their training might not be the most significant obstacles to these fully autonomous systems. The biggest hurdle may well be establishing a suitable regulatory path. Regulators are hesitant, fearing that the models are prone to errors and that the clinical datasets on which they’re trained contain wrong decisions, leading AI models to replicate these medical mistakes.
Overcoming the hurdles in bringing these fully autonomous systems to patient care holds significant promise, not only for improving outcomes but also for addressing financial challenges.
Healthcare is often cited as a field burdened by Baumol’s theory of cost disease, an economic theory, developed by economist William J. Baumol, that explains why costs in labor-intensive industries tend to rise more rapidly than in other sectors. In fields like medicine, it’s less likely that technological inputs will provide major offsets to labor costs, as each patient encounter still requires the intervention of a provider. In sectors like medicine, the labor itself is the product.
To compensate for these challenges, medicine has incorporated more non-physician providers to lower costs. However, this strategy reduces, but doesn’t eliminate the central economic dilemma. When the technology becomes the doctor, however, it can be a cure for Baumol’s cost disease.
As the quality and scope of clinical data available for training these large language models continue to grow, so will their capabilities. Even if the current stage of development isn’t quite ready to completely remove doctors from the decision-making loop, these tools will increasingly enhance the productivity of providers and, in many cases, begin to substitute for them.