The N3C, on the other hand, is auditable by thousands of researchers in hundreds of participating institutions and is accountable to them, with a strong emphasis on transparency and replicability. Everything users do in the interface, which uses GovCloud by Palantir platform, is carefully preserved, so that anyone with access can retrace their steps.
“It’s not rocket science, and it’s not really new. It’s just hard work. It’s tedious, it has to be done carefully and we have to validate every step, ”says Christopher Chute, professor of medicine at Johns Hopkins who also co-directs N3C. “The worst thing we can do is methodically turn data into garbage that gives us the wrong answers. “
Handel points out that these efforts have not been easy. “The diversity of expertise it took to make this happen, the persistence, dedication and, frankly, brute strength, are simply unparalleled,” she says.
This brute force has come from many different fields, many of which are not traditionally part of medical research.
“Having everyone on board all aspects of science has really helped. During covid, people were much more willing to collaborate, ”says Mary Boland, professor of computer science at the University of Pennsylvania. “You could have engineers, you could have computer scientists, physicists, all those people who don’t normally participate in public health research. “
Boland is part of a group using data from the N3C to determine whether covid increases irregular bleeding in women with polycystic ovary syndrome. Outside of covid, most researchers need to use insurance claims data to get a database large enough for population-level analyzes, she says.
Claims data can answer some questions about the effectiveness of drugs in the real world, for example. But these databases are missing huge amounts of information, including lab results, symptoms reported by people, and even whether patients die.
Collection and cleaning
Outside of insurance claims databases, most health data collaborations in the United States use a federated model. Participants in these studies all agree to format their own datasets in a common format and then run queries to the collective, such as the proportion of severe cases of covid by age group. Several international research groups on covid, including the Observational health data science and informatics (OHDSI, pronounced “Odyssey”), operate in this way, avoiding legal and policy issues with cross-border patient data.
The OHDSI, which was founded in 2014, has researchers from 30 countries, who together hold the records of 600 million patients.
“This allows each institution to keep their data behind their own firewalls, with their own data protections in place. There is no need to move patient data from side to side, ”explains Boland. “It’s heartwarming for a lot of places, especially with all the hacking that’s been going on lately.”