New developments in medical IT systems are helping link clinical data with biospecimens, creating new potential for biobanks and clinical research. Because biospecimens are only as valuable as the data associated with them, it would be advantageous to integrate relevant data on these samples into biobank systems. When data is integrated in this way, the quantity and quality of clinical data on patients from whom these samples derive add considerably to their value (1).
Advantages of Clinical and Biobanking Data Integration
In an article by Genetic Engineering & Biotechnology News, Dr. Barnholtz-Sloan, professor and associate director for bioinformatics/translational informatics at the Case Western Reserve University School of Medicine, explained why this form of data integration is so important. “If you have a biobank, and all you know is the diagnosis of the patient and the age of the patient, your options for doing something with that information are very limited,” Dr. Barnholtz-Sloan says. “But, if you also know what treatment the patient received, how long they survived, and/or whether or not they had other diseases, these factors make the biobank data more valuable” (1). By integrating health datasets from clinical research, biobanks can raise the value of their data while opening new paths and potential for research.
Challenges for Biobanking
Despite this, however, the information biobanks have on their samples is still rather limited. One reason for this is biobank’s early history. Maureen Lane, PhD., a clinical development scientist at ExecuPharm explains that, “Initially, when biobanks started, samples were completely de-identified” (2). This patient anonymization step was done to protect the privacy of patients and out of concerns that the data could fall into the wrong hands, such as those of big insurance companies. For biobanks, however, this created a high level of data ambiguity. While one might have known what part of the body the tumor came from, other valuable information, such as tumor type or previous treatments, would have been unavailable. Even now, with massive numbers of biospecimens available in hospitals and so much data being collected in clinical research, biobanks still find clinical data on the samples hard to come by (2). Sek Won Kong, M.D., an assistant professor at Boston Children’s Hospital explains that searching through protected patient information requires approval and that this can be difficult to obtain, posing problems for research (2). The responsibility of supporting medical research while simultaneously protecting patient data has limited the efficiency of biobanking in the past. But new tools are now changing things.
Some biobanks, hospitals, and laboratories are now finding ways around these issues. Thanks to developments in medical IT, new systems are being built to not only manage biobanking or clinical data but to support researchers in combining them, while ensuring patient data protection. These complex, integrated software solutions are designed in ways that protect patients while providing comprehensive information, opening vast amounts of valuable data to researchers and biobanks. The most nuanced of these systems include built-in and customizable patient consent forms, user-based viewing rights to adhere to strict data protection rules, internal de-identification tools, and the ability to hook up third-party de-identification tools (2).
In a 2018 study on the feasibility of a nationwide clinical research network, the IBCB Project evaluated two IT systems and their ability to annotate samples with clinical information, integrate biobank data, and interface legacy data. The study found that with these technologies “biobank data and clinical data can be integrated and leveraged between hospitals” with little effort (3). According to the study, for a multi-site platform to be able to provide clinical information for biobanks in a secure and efficient way, it requires certain features. These include interfacing with other systems and integration of legacy data, comparable vocabularies to integrate semantic deviation between sites, powerful and detailed query tools, and a multi-user supported data security system.
The study found that not only is it possible to combine biobanks with bio clinical data and share them with the research community on a national level using todays’ technologies, but that new research efforts in this direction will also leverage existing technologies to develop in support of these needs (3). This would require further collaboration between clinical research, biobanking and healthcare IT. As a result, new specialized software systems are now being developed that not only support biobanking and clinical trials but also interdepartmental collaboration and data integration.
Biobanks in Collaboration
In addition to integrating datasets from clinical research and biobanks, new technologies are being designed to facilitate another growing trend in biobanking: Collaborative biobanking. This concept of setting up shared biobanking facilities for industry, academia, and government would provide storage for researchers and firms that are otherwise unable or unwilling to invest in developing a biobank of their own (4).
Another development in research is the increasing collaboration between different biobanks to supply larger and more diverse datasets in combination with biospecimens. For this to happen, biobank data must be harmonized and made accessible across biobank information systems. The 2013 white paper “Creating a global alliance to enable responsible sharing of genomic and clinical data” outlines the main challenges for this kind of biomedical data integration (5). These include harmonization and data security, as well as homogenizing data structures and access policies.
In 2016 a study was conducted investigating this subject under the title, “Harmonising and linking biomedical and clinical data across disparate data archives to enable integrative cross-biobank research.” The study examines a broader set of technical obstacles to biobanking collaboration and finds that the lack of harmonized access policies, data heterogeneity, disparate information systems, and inconsistent vocabulary between biobanks all make it difficult to search for specimens across biobanks and integrate data. According to the study, “Addressing the many challenges requires that researchers have proper knowledge and resources to help them easily, but formally and explicitly, achieve data harmonisation and integration processes that are scientifically valid and replicable, and that lead to implementation of adequate software solutions” (5). The study reports that current open source biobank information systems typically catalogue available samples, which can be queried based on a general description of sample content. These systems do not support more complex queries such as, “For how many DNA samples in which cohorts are there Type 2 Diabetes status records, as well as fasting glucose concentration and body-mass index?” (5). Lacking powerful and expansive query tools, such systems pose challenges for finding suitable, available samples across multiple biobank databases.
Another challenge for harmonizing biobanks is the slow process of obtaining data access rights across biobanks. “At the moment, online data resources serving biobank information typically only offer a binary choice between a data access scheme that is ‘open to all’ and one that is ‘highly restricted’” the study reports. This poses challenges for research, interrupting the workflow of large meta-studies with the processing of data access applications, which “often takes longer than the data analysis itself” (5). The importance of data security and access rights in biobanking has posed complications for integrative cross-biobank research. However, new solutions do now exist.
The Software Solution for Biobanks
Many of the challenges for cross-biobank research are the same as those faced by biobanks integrating clinical data (i.e., heterogeneous data structures, binary access restrictions, inconsistent vocabulary, interfacing legacy data, etc.). Most interoperable solutions that can integrate clinical data in biobanks would also support both intra-biobank and inter-biobank collaboration.
Utilizing complex and expansive query tools, systems like CentraXX by KAIROS can capture sample data, as well as information on their availability, across enormous databases. CentraXX can also integrate data from a wide range of medical and research data formats (i.e., HL7, FHIR, CDISC, XML, etc.), allowing biobanks to process heterogeneous data and interface legacy data. Additionally, new methods of allocating access rights across sites have been developed that, in combination with built-in consent forms, allow data to be requested and accessed in a streamlined fashion. In CentraXX, administrators can define user roles and therefore assign user-specific access rights. For managing inconsistent vocabulary between sites, the CentraXX MDR (Meta Data Repository) becomes a valuable tool, enabling users to set standard vocabulary and define semantic equivalences for the system.
When it comes to biobank collaboration and integrating biobank and clinical data, the future is wide open. Adequate software is no longer a limit. It’s now the solution. Software like CentraXX by Kairos.
By PD Dr. rer. nat. Christian Stephan/Amir Sohn Firestone, Kairos GmbH
- https://www.ncbi.nlm.nih.gov/pubmed/29677914 2018