What’s So Special About Medical AI, Anyway?

By Anura S. Fernando, Principal Adviser and Global Head of Medical Cybersecurity, UL Solutions

Artificial intelligence (AI) has become a household term around the world, but the notion of AI is not so new. People have long transferred aspects of human cognition into tools and software, beginning with writing and calculation as a way to reduce the mental burden of memorization and recall. What is new is the growing use of AI in safety‑critical medical technologies, where small design or performance and learning differences can result in disproportionate consequences.

From traditional health software to learning‑based AI

Narrowly applied AI has progressed into what we now call generative artificial intelligence (gen AI), introducing new considerations for safety, performance and oversight — even though AI in its more primitive forms has supported lifesaving applications for decades. As these systems become more complex, manufacturers and regulators must address how learning-based behavior is designed, controlled and monitored over time.

The most advanced medical AI systems used in clinical settings today evolved from what is often called narrow AI. At their core is deep learning, a technology that grew out of machine learning (ML), neural networks and earlier forms of digital logic that underpin modern health software. While these systems can adapt through training or updates, their behavior remains bounded by architectures, data controls and change processes defined by manufacturers.

Health software exists in a few different forms, each with different regulatory expectations and risk profiles:

Software in a medical device (SiMD) has been described using many different terms, such as embedded systems, embedded software, firmware, etc. It refers to software embedded within a medical device that controls or supports its operation. Because this software is integrated with the device’s hardware, it has traditionally been regulated as part of the medical device itself.
Software as a medical device (SaMD) has also been in use for a couple of decades now. It has been described in a variety of ways, including medical application software, mobile apps, cloud-based services and clinical decision support software. Regardless of its terminology, SaMD refers to software that operates on a general-purpose computing platform, such as a personal computer or a server.
Non-medical-device software that runs on specialized medical device hardware is another category of health software. An example of this would be an operating system (OS) that runs on hardware regulated as a medical device. In most regulatory jurisdictions and medical device implementations, the OS manufacturer would not be considered the legal manufacturer of record of a medical device. Rather, the medical device manufacturer (MDM) would be the developer of the fully integrated product that includes the app, which would run on the OS.
Non-medical-device software that runs on general-purpose computing hardware is the final category of health software. This is the category for all other software that can connect or is somehow related to the medical device without having either hardware or software that falls within the purview of the regulator. This also varies based on jurisdictional definitions of the term “medical device,” in addition to differing regulatory policies around the world.

An example of such a variation would be electronic health record (EHR) systems. EHR systems often accept data from medical devices and can provide data to medical devices. In some cases, they can even serve as the integration point for clinical workflows. In some jurisdictions, like the U.S., they are not regulated as medical devices, while in others, like the European Union, they are.

Taken together, these categories show how broadly the term “health software” is used. With such a wide definition, it is reasonable to pause and ask what, if anything, makes medical software with AI different.

What’s so special about medical AI?

The answer lies in how learning systems challenge longstanding assumptions about software control and predictability. To answer this question, we examine how advances in software technology have progressed alongside regulatory science. This discussion focuses on U.S. medical device regulation, which established early frameworks for overseeing software-based medical technologies.

The U.S. Food and Drug Administration (FDA) has been reviewing medical devices containing software since the 1970s. In the late 1990s, it established a regulatory science framework and refined software policies to create the first FDA software guidance. It even qualified companies such as UL Solutions to collaborate with them under their 510(k) Third Party Review Program in assessing software, among other considerations.

The FDA’s focus then, as it is now, has been on supporting the safety and effectiveness of software-dependent medical devices through proper verification and validation. In regulatory terms, verification asks whether the device was built according to its specified requirements — often summarized by MDMs as, “Did I build the product right?” Validation, by contrast, examines whether the device meets its intended clinical need, captured by the question, “Did I build the right product?”

Although these two questions appear simple, they introduced enough nuance that the FDA built an expanding body of guidance around them. Over time, this led to dedicated guidance for off-the-shelf software used in medical devices, SaMD, mobile medical applications, medical device data systems, clinical decision support software and other digital health technologies. Those frameworks, however, were developed for software whose behavior does not change on its own.

As health software became more interconnected and widespread, additional guidance followed to address cross-cutting risks, including cybersecurity, quality systems considerations and the content of premarket submissions, along with expectations for post-market cybersecurity management.

Why medical AI challenges traditional validation

There are now dozens of guidance documents that touch on software in some way, but there are some attributes that make medical AI a unique form of software, necessitating its own dedicated regulatory frameworks. Unlike traditional software, learning‑based systems may change performance characteristics over time within predefined limits, rather than remaining fixed after release.

The FDA began taking a step-wise approach, starting by addressing how more sophisticated AI tools can support regulatory submissions, as seen in the January 2025 draft guidance entitled, “Considerations for the Use of Artificial Intelligence to Support Regulatory Decision-Making for Drug and Biological Products,” as well as device-level guidance in “Artificial Intelligence-Enabled Device Software Functions: Life Cycle Management and Marketing Submission Recommendations.”

Because gen AI relies heavily on learning‑based adaptation, traditional approaches to configuration management (CM) no longer fully apply. Unlike conventional software, these systems can change performance characteristics through learning processes that are defined, constrained and authorized in advance, introducing new forms of uncertainty that must be managed through life cycle controls.

The FDA addresses this challenge through its guidance on Predetermined Change Control Plans for Medical Devices. This requires manufacturers to define, in advance, the types of changes a device is expected to undergo over time and the methods used to verify and validate those changes against prespecified acceptance criteria. The approach also requires a benefit-risk assessment that identifies risk mitigations and clearly defines any acceptable residual risks.

Importantly, these learning‑based systems are not autonomous. Any adaptation occurs within boundaries established by the manufacturer and reviewed by regulators, rather than through independent goal‑setting or self‑directed control.

The FDA has evaluated AI — more precisely, the machine learning aspect of AI — since the late 1990s in applications such as image processing for mammography. What is emerging today as gen AI, however, introduces additional layers of technical complexity. These systems rely on advanced mathematical structures, such as tensors, to model aspects of physical reality through neural networks.

This added complexity dramatically increases the volume of data these systems process. Managing that data efficiently is essential, not only to optimize memory and performance, but also to control energy use and heat generation as large datasets move through high‑speed computing systems.

One common technique used to manage this complexity is quantization, which reduces the volume and precision of data to lower computational load, energy consumption and heat output. That efficiency, however, comes with a trade‑off: a modest loss of data fidelity and accuracy compared to the original dataset.

In many non‑safety‑critical consumer AI applications, this trade‑off is widely accepted because it rarely affects overall performance. In medical applications, however, even small losses in accuracy can have meaningful clinical consequences — a distinction we examine next.

Another attribute that distinguishes medical AI is the context in which it is used. In medicine, patient safety explicitly allows for technologies that can cause harm when the therapeutic benefit justifies the risk. This is evident in treatments such as chemotherapy and tumor ablation, as well as in routine clinical procedures, such as diagnostic X‑ray imaging.

When gen AI is introduced into these applications, it must account for edge conditions — subtle anatomical or physiological variations — that can determine whether an intervention is effective or harmful. In complex biological systems such as blood or soft tissue, this includes accurately targeting specific cell types, such as cancer cells, without damaging surrounding structures. Failure to do so can lead to outcomes ranging from relatively low‑risk effects, like anemia, to catastrophic consequences, including patient death.

As discussed earlier, common optimization techniques used in gen AI, such as quantization, can introduce small losses in data fidelity. In clinical contexts, these trade‑offs take on greater significance, where even minor reductions in accuracy — particularly if the system is not optimized for medical edge conditions — can have disproportionately severe consequences.

How regulators monitor medical AI after deployment

These are just a few of the considerations that make medical AI special. In response, regulators worldwide are collaborating to advance regulatory science and align expectations for AI-enabled medical technologies. They also work with private-sector standards development organizations, such as UL Standards & Engagement, to help MDMs apply consistent approaches to software verification and validation, often in collaboration with testing, inspection and certification (TIC) organizations like UL Solutions.

Beyond premarket review, regulators like the U.S. FDA also conduct ongoing market surveillance for established and emerging technologies. This post-market oversight plays a critical role in identifying real-world performance issues that may not be fully apparent during development or initial approval.

History underscores why this matters.

As software technologies advanced toward increasingly autonomous behavior, failures such as those involving the Therac-25 therapeutic linear accelerator have revealed how software defects could lead to serious injuries and loss of life.¹Working to prevent incidents like this has helped shape modern expectations for post-market vigilance and corrective action.

While AI is fundamentally software, its ability to learn and adapt within defined constraints introduces characteristics that are still being understood. Without robust post market surveillance tools — such as recalls, field actions and corrective action programs — emerging vulnerabilities in medical AI may go undetected, increasing the risk of patient harm and strain on healthcare systems.

Continued oversight is essential to advancing innovation safely.

How UL Solutions helps advance safer medical AI

Developing and overseeing medical AI safely requires managing challenges that traditional regulatory guidance and life cycle control were not originally developed to address.

Drawing on decades of technical, regulatory and clinical expertise, our experts help manufacturers navigate medical device safety, cybersecurity testing, regulatory review and post‑market oversight — so AI‑enabled technologies can advance in step with the regulatory frameworks, quality systems and oversight mechanisms designed to support patient safety*.

Learn more about how we support the medical industry.

References

1. Leveson, N., Turner, G., “An Investigation of the Therac-25 Accidents,” Computer, Vol. 26, 7; pp 18-41; July 1993. https://web.stanford.edu/class/archive/cs/cs240/cs240.1236/old//sp2014/readings/therac-25.pdf

* Within UL Solutions we provide a broad portfolio of offerings to all the medical device industries. This includes certification, Approved/Notified Body and consultancy services. In order to protect and prevent any conflict of interest, perception of conflict of interest and protection of both our brand and our customers brand, we have processes in place to identify and manage any potential conflicts of interest and maintain impartiality. UL Solutions is unable to provide consultancy services to EU MDD, MDR, IVDD or IVDR Notified Body, UKCA MD Approved Body or MDSAP Customers.

Related services

Medical Device Performance and Safety Testing

Medical technology description doctor using connected devices

Healthcare Technology Cybersecurity

Get connected with our sales team

Thanks for your interest in our products and services. Let's collect some information so we can connect you with the right person.

Please wait…

Introducing ULTRUS™

Featured Products