Can we get some Privacy? Hopes & Fears for Healthcare Data Sharing

8 min readJul 20, 2022

For the past few months, I have been working on an interesting project for Gramener on enabling health data sharing while protecting privacy requirements. This led me into several rabbit-holes into healthcare data sharing, privacy regulations and innovations in this space. Below are some threads that seem interesting.

✨Peeling the onion of healthcare data sharing

Healthcare data sharing can come various hues and forms. At the most basic level, it would mean a patient sharing their health related data with a doctor/healthcare provider. In most cases, such sharing of data is required to provide healthcare services to the patient and hence this is termed as primary use of data. On the other hand, researchers, innovators, regulators and public health experts might have interest in this data and may want to use it for secondary use.

Another angle to see healthcare data sharing is individual vs aggregate data. Individual data sharing is relatively less complicated, although challenges such as access, consent and interoperability continue to make this an area of interest. On the other hand, in most cases of secondary use, aggregation of data and access to such data has been a major focus area for researchers and innovators. And this is where concepts of data privacy clash with data access and usage.

✨A case study from the biopharma world

One of the clearest examples of real-life data sharing of healthcare data comes to us from the pharmaceutical world. In a need to to make pharmaceutical research data more accessible, regulators globally have started asking for more transparency for drug R&D data.

Foremost in these regulations are EMA’s Policy 0070- ‘publication of clinical data for medicinal products for human use’. EMA Policy 0070 requires sponsors to demonstrate careful consideration of data utility within anonymised CSRs published within the scope of the policy. In EMA’s own words, by proactively publishing clinical data, we would avoid duplication of clinical trials, foster innovation and encourage development of new medicines, build public trust and confidence in EMA’s scientific and decision-making processes and help academics and researchers to re-assess clinical data.

Similar regulations were implemented by Health Canada to make anonymized clinical information in drug submissions and medical device applications publicly available for non-commercial purposes following the completion of Health Canada’s regulatory review process, while adhering to Canada’s Privacy Act.

US FDA launched a pilot to assess implementing similar regulations however concluded it without implementing any actual requirements for clinical data sharing.

Both EMA and Health Canada regulations rely heavily on balancing the need for transparency of research data with the need to protect patient privacy and confidential business information. Neither policy currently demands public release of individual patient records. To balance out the conflicting needs of transparency v/s privacy, both EMA and Health Canada allow a combination redaction and anonymization of data contained in the clinical documents.

✨Redaction, Anonymization and possible risks

Both regulatory agencies allow redaction of confidential business information as well as personal data of investigators and sponsor staff. They also require that Clinical information must be adequately anonymized prior to public disclosure to avoid the serious possibility of identifying individual clinical trial patients; this requires the application of an objective, systematic, and documented process of anonymization. As pointed out by researchers, both EMA and Health Canada provide detailed guidance around the specifics of anonymization requirements, but do not provide any specific instructions to do so, apart from providing literature references.

However redaction and anonymization can be tricky and sometimes ineffective. A team of researchers were able to ‘re-identify’ six patients from a a clinical document submitted to EMA, only the search through death records and social media searches identified suspected matches. There has also been questions raised on the actual utility of the data shared through these mechanisms, which at best can be described as secondary use of already collected structured and unstructured data from clinical trials. This kind of data usage is of course significantly different from using direct patient data being generated in the healthcare systems every day.

✨Data sharing in real world

For a long time, health data has been fragmented and siloed. Even with the increasing adoption of electronic health records, it has not been easy to access data for primary or secondary use. Regulations have not helped in easing such sharing of data. As Tim Hulsen points out, the dilemma of the use of patient data versus privacy rights has gotten much attention because of the implementation of the EU General Data Protection Regulation in 2018 (as well as the California Consumer Privacy Act in 2020), initiating an international debate on the sharing of big data in the healthcare domain. Earlier laws such as the Health Insurance Portability and Accountability Act (HIPAA) Privacy Rule of the USA and the Personal Information Protection and Electronic Documents Act (PIPEDA) of Canada already gave more rights to patients regarding their data, but the GDPR and CCPA have taken it to another level.

However, GDPR and similar laws do not say much about data ownership. The GDPR’s main entities are the data controller and the data processor. In countries outside of the European Union, where GDPR does not apply, there is also not much agreement on data ownership, making it even more justifiable to always ask for the consent of the patient.

Of the various mechanisms being proposed to overcome these challenges, a recent review by UK NHS stands out.

Goldacre Review for UK NHS Data

UK NHS recently released its strategy for data sharing called ‘Data Saves Lives’. Much of the content of this strategy is based on a review led by Dr Ben Goldacre, titled ‘Better, broader, safer: using health data for research and analysis’. As pointed out by this team, the review focuses on Privacy and Security (Trusted Research Environments), Information Governance, Engagement, and Ethics and Open Working.

The review and its subsequent adoption into NHS strategy has particular notes about privacy. The review notes that pseudonymising data and disseminating it to multiple (potentially unknown) endpoints has several limitations.

It does little to protect the privacy of patients as GP data is incredibly disclosive and people are, therefore, readily re-identifiable even if basic demographic information has been removed.

We explain that recognition of this limitation has led to a false belief that the NHS can either provide broad access to its data for research and analytics purposes, or it can preserve patients’ privacy.

As a solution, they suggest building a small number of secure analytics platforms — shared ‘Trusted Research Environments’ (TRE) and make these the norm for all analysis of NHS patient records data by academics, NHS analysts and innovators, wherever there is any privacy risk to patients, unless those patients have consented to their data flowing elsewhere. Every new TRE brings a risk of duplicated effort, duplicated information governance, duplicated privacy risks, monopolies on access or task, and obstructive divergence around data curation and similar activity: there should be as few TREs as possible, with a strong culture of openness and re-use around all code and platforms. The NHS accepted this recommendation and plans to make TREs a standard way to access NHS data.

✨India story

India launched its health digitization journey with the launch of its National Digital Health Mission (NDHM), now known as Ayushman Bharat Digital Mission (ABDM), that aims to develop the backbone necessary to support the integrated digital health infrastructure of the country.

The mission has four key building blocks, namely the Ayushman Bharat Health Account (ABHA), Healthcare Professionals Registry, Health Facility Registry, and Health Information Exchange and Consent Manager (HIE-CM). These blocks are designed to identify healthcare providers, professionals, and patients, as well as enable the exchange of health data with prior patient consent.

Currently a key shortcoming to this framework is absence of data protection laws. Currently, there is a Data Empowerment and Protection Architecture (DEPA) in draft stage to govern access of such data by public and private agencies. The DEPA involves the use of ‘consent managers’ that will act as an intermediary between the individual and the agency seeking access to your data. The consent managers will not have access to the data but merely facilitate sharing of the data subject to the individual’s consent. The DEPA draft is more aligned to the financial sector wherein rural individuals or small-medium enterprises need to seek loans or access insurance services. For ABDM, the DEPA entails that if the individual/patient provides consent then their data can be shared to the agency requesting access.

Some experts point out that Privacy self-management addresses privacy in a series of isolated transactions guided by particular individuals. Privacy costs and benefits, however, are more appropriately assessed cumulatively and holistically — not merely at the individual level. Another criticism is that The ABDM is being ‘marketed’ as a service provider to redefine how healthcare is accessed by Indians. In the current form, ABDM lays little emphasis on the use of this health data by the public health research community.

India has just begun its journey towards digital health and with an open and consultative approach, ABDM is trying to bring together diverse opinions that hopefully should lead to a sound data sharing strategy.

✨Health data definitions continue to expand…so do the challenges

Traditionally, the term “health data” has referred to information produced and stored by healthcare provider organizations, vast amounts of health-relevant data are collected from individuals and entities elsewhere, both passively and actively. Most of these non-obvious data elements fall outside the scope of privacy/data sharing laws around the world.

As researchers point out, nontraditional health-relevant data , often equally revealing of health status, are in widespread commercial use and, in the hands of commercial companies, largely unregulated — yet often less accessible by providers, patients and public health for improving individual and population health.

The US Supreme Court’s recent decision to overturn Roe v. Wade in Dobbs v. Jackson Women’s Health Organization has raised many questions about potential efforts by law enforcement agencies to obtain data from healthcare and other service providers to detect the performance of a possibly unlawful abortion. For example, data collected by period-tracking apps, patients’ self-reported symptoms, or diagnostic-testing results might be used to establish the timeframe in which an individual became pregnant, and then demonstrate that a pregnancy was terminated, as part of investigative or enforcement efforts against individuals or organizations allegedly involved in such termination.

As pointed out rightly in this article, although consumer education and information collection transparency could help users make informed decisions about data-sharing settings on their devices, patients and application users will continue to look to providers and application developers to answer questions and concerns about data protection for reproductive health information. Healthcare providers and application developers should therefore consider updating their online privacy policies or posting information about their reproductive health information privacy practices to address potential patient and user information collection concerns, being careful not to overstate the protections that HIPAA and other privacy laws provide against disclosure of health information to law enforcement.

As McGraw and Mandl point out in this excellent Nature Digital Medicine article-

what is needed is a multi-pronged approach that implements strong privacy protections but also includes accountability even for uses of so-called “de-identified” or anonymized data and addresses the potential for harm to individuals and populations.

Liked what you read? Subscribe & Share!

I would love to hear your feedback and thoughts. You can also connect with me via Twitter and LinkedIn!