It also scrambles the origin of some data sets. This may mean that researchers are missing important features that distort the training of their models. Many unwittingly used a dataset containing chest scans of children who did not have covid as examples of what non-covid cases looked like. But as a result, RNs have learned to identify children, not the covid.
Driggs’ group formed their own model using a dataset containing a mix of analyzes taken when patients were lying down and standing. Because patients scanned while lying down were more likely to be seriously ill, AI was mistakenly learned to predict severe covid risk from a person’s position.
In yet other cases, some AIs detected the text font that some hospitals were using to label scans. As a result, hospital policies with more severe workloads have become predictors of covid risk.
Mistakes like these seem obvious in hindsight. They can also be corrected by adjusting the models, if the researchers know about them. It is possible to recognize the shortcomings and come up with a less precise, but less misleading model. But many tools were developed either by AI researchers who lacked the medical expertise to spot flaws in data, or by medical researchers who lacked the mathematical skills to compensate for those flaws.
A more subtle problem that Driggs highlights is embedding bias, or the bias introduced when a dataset is labeled. For example, many medical scanners have been labeled according to whether the radiologists who created them said they showed covid. But it does integrate, or incorporate, all of that particular physician’s biases into the fundamental truth of a data set. It would be much better to label a medical test with the result of a PCR test rather than a doctor’s opinion, Driggs says. But there isn’t always time for statistical niceties in busy hospitals.
That hasn’t stopped some of these tools from rushing into clinical practice. Wynants says it’s unclear which ones are used or how. Hospitals will sometimes say that they are only using a tool for research purposes, which makes it difficult to gauge how much doctors rely on them. “There is a lot of secrecy,” she said.
Wynants asked a company that markets deep learning algorithms to share information about their approach, but received no response. She then found several models published by researchers related to this company, all of which were at high risk of bias. “We don’t really know what the company implemented,” she says.
According to Wynants, some hospitals are even signing nondisclosure agreements with medical AI providers. When she asked doctors what algorithms or software they were using, they sometimes told her they weren’t allowed to say it.
How to fix it
What is the solution ? Better data would help, but in times of crisis, that’s a big demand. It’s more important to get the most out of the datasets we have. The simplest move would be for AI teams to collaborate more with clinicians, Driggs explains. Researchers should also share their models and disclose how they were trained so that others can test and build on them. “These are two things we could do today,” he says. “And they would solve maybe 50% of the problems that we identified.”
Obtaining data would also be easier if the formats were standardized, says Bilal Mateen, a doctor who leads clinical technology research at the Wellcome Trust, a London-based global health research charity.