This story was originally published in our July/August 2022 issue as Ghosts in the Machine. click here Subscribe to read more stories like this.
If a heart attack isn’t documented, did it really happen? For an artificial intelligence program, the answer may well be “no”. An estimated one each year 170,000 people in the United States have asymptomatic – or “silent” – heart attacks. During these events, patients are likely unaware that a blockage is preventing blood flow or that vital tissue is dying. You will not experience chest pain, dizziness, or difficulty breathing. They will not blush or collapse. Instead, they may just feel a little tired or have no symptoms at all. But while the patient may not realize what happened, the underlying damage can be severe and long-lasting: people who have silent heart attacks are at higher risk of coronary artery disease and stroke, and are more likely to die within the following 10 years .
But if a doctor doesn’t diagnose this attack, it won’t be included in a patient’s electronic health record. This omission can have dangerous consequences. AI systems are trained on health records and sift through databases to examine how doctors have treated past patients and make predictions that can inform decisions about future care. “That makes a lot of medical AI very challenging,” says Ziad Obermeyer, an associate professor at the University of California, Berkeley, who studies machine learning, medicine and health policy. “We almost never observe what’s really close to our hearts.”
Read more about AI in medicine:
The problem is in the data—or rather, what’s not in the data. Electronic health records only show what doctors and nurses notice. If they can’t see a problem, even one as serious as a heart attack, then the AI can’t see it either. Likewise, physicians may unknowingly inject their own racial, gender, or socioeconomic biases into the system. This can lead to algorithms that prioritize certain demographics over others, perpetuating inequalities and failing to deliver on the promise that AI can help deliver better care.
One such problem is that medical records can only store information about patients who have access to the medical system and can afford to see a doctor. “Data sets that do not adequately represent certain groups — whether it’s racial groups, gender for certain diseases, or rare diseases themselves — can produce algorithms that are biased against those groups,” says Curtis Langlotz, radiologist and director of the Center for Artificial Intelligence in Medicine and Imaging at Stanford University.
In addition, diagnoses can reflect a doctor’s prejudices and beliefs—for example, about what might be behind a patient’s chronic pain—as well as reflect the reality of what is happening. “The dirty secret of many artificial intelligence tools is that many things that appear like biological variables that we predict are actually just someone’s opinion,” says Obermeyer. This means that instead of helping doctors make better decisions, these tools often perpetuate the very inequalities they were designed to help avoid.
When scientists train algorithms to operate a car, they know what’s happening out there on the road. There’s no debate about whether there’s a stop sign, a school zone, or a pedestrian in front of you. But in medicine, truth is often measured by what the doctor says, not by what is actually going on. A chest X-ray can indicate pneumonia because a doctor has diagnosed it and recorded it on the medical record, not because it is necessarily the correct diagnosis. “These proxies are often skewed by financial things, racial things and gender things and all sorts of other things of a social nature,” says Obermeyer.
in one Study 2019, Obermeyer and colleagues studied an algorithm developed by healthcare company Optum. Hospitals use similar algorithms to predict which patients need the most care, estimating the needs of over 200 million people annually. But there’s no simple variable to determine who gets sickest. Rather than predicting specific health needs, Optum’s algorithm predicted which patients were likely to cost more, using the logic that sicker people will require more care and therefore be more expensive to treat. For a variety of reasons, including income, access to health care, and poor treatment by doctors, black people spend less on health care, on average, than their white counterparts. Therefore, the study authors found that using cost as a proxy measure of health caused the algorithm to consistently underestimate Black health needs.
Rather than reflecting reality, the algorithm mimicked racial bias and further embedded it in the healthcare system. “How do we get algorithms to be better than us?” asks Obermeyer. “And not only reflect our prejudices and our mistakes?”
Also, it’s not always clear to determine the truth of a situation — whether a doctor made a mistake because of poor judgment, racism, or sexism, or if a doctor just got lucky — says Rayid Ghani, a professor in the machine learning department at Carnegie Mellon University. If a doctor runs a test and determines that a patient has diabetes, has the doctor done a good job? Yes, they diagnosed the disease. But maybe they should have tested the patient earlier or treated his rising blood sugar months ago, before the diabetes developed.
If the same test was negative, the calculation becomes even more difficult. Should the doctor have ordered this test at all or was it just a waste of resources? “You can only measure late diagnosis if no early diagnosis has been made,” says Ghani. Decisions about what tests to do (or what patient complaints to take seriously) often reflect physician bias rather than best medical care. But when medical records encode those biases as facts, those biases are replicated in the AI systems that learn from them, no matter how good the technology.
“If the AI uses the same data to train itself, it’s going to have some of these inherent biases,” Ghani adds, “not because AI is, but unfortunately that’s how humans are.”
But if this flaw in AI is used intentionally, it could be a powerful tool, says Kadija Ferryman, an anthropologist at Johns Hopkins University who studies prejudice in medicine. She points to a Study 2020 used in AI as a resource to assess what the data shows: a kind of diagnostic to assess bias. If an algorithm is less accurate for women and people with statutory health insurance, for example, this is an indication that care is not being provided fairly. “Rather than AI being the end, AI is almost sort of a starting point to help us really understand the biases in clinical spaces,” she says.
in one 2021 Study of natural medicine, researchers described an algorithm they developed to examine racial bias in diagnosing arthritic knee pain. Historically, black and low-income patients have been significantly less likely to be recommended surgery, although they often report much more pain than white patients. Doctors would attribute this phenomenon to psychological factors, such as stress or social isolation, rather than physiological causes. Instead of relying on radiologists’ diagnoses to predict the severity of a patient’s knee pain, the researchers trained the AI with a dataset that included knee X-rays and the patient’s descriptions of his or her own condition.
The AI not only more accurately predicted who felt pain than the doctors, but also showed that the black patients’ pain was not psychosomatic. Rather, the AI revealed that the problem was how radiologists should view diseased knees. Because our understanding of arthritis is based on research conducted almost exclusively on a white population, physicians may not recognize the characteristics of diseased knees that are more common in black patients.
It’s much more difficult to develop AI systems like the knee pain algorithm that can correct or check doctors’ biases, rather than simply mimicking them — and it requires a lot more oversight and testing than currently exists. However, Obermeyer notes that in some ways, fixing the biases in AI can be done much faster than fixing the biases in our systems — and in ourselves — that helped cause these problems in the first place.
And building AIs that honor bias could be a promising move to address larger systemic problems. After all, changing the way a machine works requires just a few keystrokes; It takes a lot more to change the way people think.
An early prototype by Watson, seen here in 2011, was originally the size of a master bedroom. (Source: Clockready/Wikimedia Commons)
IBM’s failed revolution
In 2011, IBM’s Watson computer crushed its human competitors on the trivia show Jeopardy!. Ken Jennings, the highest-earning gambler of all time, lost over $50,000. “In any case, I welcome our new computer overlords,” he wrote on his reply card during the final round.
But Watson’s reign was short-lived. One of the earliest — and most famous — attempts to use artificial intelligence in healthcare, Watson is one of medical AI’s greatest failures today. IBM has spent billions to build a vast archive of patient information, insurance claims and medical images. Watson Health could (allegedly) raid this database to suggest new treatments, match patients to clinical trials, and discover new drugs.
Despite Watson’s impressive database and all the noise from IBM, doctors complained that useful recommendations were seldom made. The AI did not account for regional differences in patient populations, access to care, or treatment protocols. For example, since the cancer data came exclusively from a hospital, Watson for Oncology simply reflected the preferences and biases of the doctors practicing there.
In January 2022, IBM eventually dismantled Watson, which is selling its most valuable data and analytics to investment firm Francisco Partners. That decline hasn’t stopped other data giants like Google and Amazon from hyping their own AIs, promising systems that can do everything from transcribing notes to predicting kidney failure. For big tech companies experimenting with medical AI, the machine-powered doctor is still very “in”.