Mapping and Evaluating National Data Flows: Transparency, Privacy, and Guiding Infrastructural Transformation
Joe Zhang, BMBCh1,2Institute of Global Health Innovation, Imperial College London, London, UK; Department of Critical Care, Guys and St Thomas NHS Foundation Trust, London, UK, Jess Morley, MS3Oxford Internet Institute, University of Oxford, Oxford, UK, Jack Gallifant, MSc4,5Department of Intensive Care, Imperial College Healthcare NHS Trust, London, UK; Laboratory for Computational Physiology, Massachusetts Institute of Technology, Cambridge, MA, USA, Chris Oddy, MBBS6Department of Anaesthesia, Critical Care and Pain, St Georges Healthcare NHS Trust, London, UK, Prof James T Teo, PhD2,7Department of Critical Care, Guys and St Thomas NHS Foundation Trust, London, UK; Department of Neurology, Kings College Hospital NHS Foundation Trust, London, UK, Prof Hutan Ashrafian, PhD1,8Institute of Global Health Innovation, Imperial College London, London, UK; Leeds University Business School, Leeds, UK, Prof A Darzi PhD1Institute of Global Health Innovation, Imperial College London, London, UK
DOI: 10.1016/S2589-7500(23)00157-7
joe.zhang@imperial.ac.uk
This study scrutinizes the UK National Health Service's (NHS) electronic health records, revealing significant challenges in data sharing. It maps out data flows to over 460 entities, including academic, commercial, and public sectors. The findings show that multistage data flow chains obscure transparency, jeopardizing public trust. Moreover, most data interactions fail to meet best practices for secure access, raising privacy concerns. The existing infrastructure also leads to duplicate data, diminishing the diversity and value of the data. Recommendations for infrastructure transformation and a new website DataInsights.uk aim to enhance transparency and showcase NHS data assets.
Data Flow Patterns in NHS England
NHS England, comprising 216 hospital trusts and 6,544 primary care providers, manages healthcare interactions for a population of about 56 million. Figure 2 illustrates the national data flows, highlighting four primary models of data extraction: 1) structured clinical codes from primary care EHRs, 2) administrative data from secondary care by NHS Digital, 3) data aggregation within regional shared care record data warehouses, and 4) proprietary secondary care data pipelines.
Electronic patient data flows in NHS England Data flows go upwards and are coloured by destination. For data source and extractors, node size is proportional to population catchment (eg, NHS Digital=55 million). For data consumers, node size is proportional to the number of projects (eg, University of Oxford=178). NHS=National Health Service.
These models vary in the resolution and type of data extracted, ranging from standard clinical codes to high-resolution data from secondary care. The visual representation in Figure 1, with data flow directions and node sizes, provides an insightful overview of the data extraction sources and their reach.
Secondary Use Ecosystem and Top Data Consumers
Figure 3. Voronoi chart showing eight top consumers for NHS data across each of six categories, by number of discovered projects during the study period.
The NHS data, as revealed in Figure 2, feeds a diverse and extensive ecosystem of secondary uses, involving over 460 non-NHS organizations. These entities, which have accessed, maintained, or utilized NHS data since April 2021, include a wide array of sectors such as academia, pharmaceuticals, life sciences, and non-profits. Prominent among these are 216 universities, 143 companies in life sciences and data analytics, and 44 non-profit organizations. The figure also shows the eight top consumers across six categories, demonstrating the dominant forms of data use, which span research studies, publications, audits, and various forms of partnerships. This comprehensive view underlines the significant reach and impact of NHS data beyond its immediate healthcare context.
Balance and Diversity of NHS Data Assets
Figure 3. Individual data assets per extractor type, showing volume of data types and linkages
The data extractors within the NHS vary significantly in type and volume of data maintained, acting as multipliers in the data distribution network. Figure 3 highlights this diversity, showing primary care data as the most prevalent type maintained. Whole-population primary care data are accessible for COVID-19 research and through platforms like OpenSAFELY. The figure also reveals an overlap in data extractions, with some primary care practices reporting data extraction by multiple databases, indicating substantial duplication. This comprehensive view underscores the complex landscape of data assets within the NHS, from primary care records to shared care and regional systems, each contributing to a vast, yet intricate web of data flows.
NHS Data Transformation Recommendations
Our work builds upon insights in other work that has examined robustness of models and metrics among subpopulations:
Notes: This work proposes a framework that states the main risks associated with data sharing, systematically presents risk mitigation strategies and provide examples through a healthcare lens In order to move towards Open Data, the creation of mechanisms for incentivisation, beginning with recentring data sharing on patient benefits, is required.
For academic referencing, please cite this work as follows.
Joe Zhang, Jess Morley, Jack Gallifant, Chris Oddy, James T Teo, Hutan Ashrafian, Brendan Delaney, Ara Darzi, "Mapping and evaluating national data flows: transparency, privacy, and guiding infrastructural transformation," The Lancet Digital Health, Volume 5, Issue 10, 2023, Pages e737-e748, ISSN 2589-7500, [https://doi.org/10.1016/S2589-7500(23)00157-7](https://www.sciencedirect.com/science/article/pii/S2589750023001577).
@article{zhang2023mapping, title={Mapping and evaluating national data flows: transparency, privacy, and guiding infrastructural transformation}, author={Zhang, Joe and Morley, Jess and Gallifant, Jack and Oddy, Chris and Teo, James T and Ashrafian, Hutan and Delaney, Brendan and Darzi, Ara}, journal={The Lancet Digital Health}, volume={5}, number={10}, pages={e737-e748}, year={2023}, publisher={Elsevier}, doi={10.1016/S2589-7500(23)00157-7} url={https://www.sciencedirect.com/science/article/pii/S2589750023001577} }