Data Quality Needs to Move More Into the Focus of Science

The workshop, led by Prof. Dr. Carsten Oliver Schmidt, extensively addressed the topic of data quality. © TMF
Inconsistent data standards, data errors, and opaque methods of data processing and presentation are significant obstacles in health and life sciences. To define fields of action for a more systematic and transparent data quality handling, around 140 experts from professional societies, data initiatives, and associations gathered for the first joint hybrid workshop in Germany on this topic in Berlin from November 17-18, 2022. "Data quality needs to move more into the focus of scientific work," demands the organizer of the workshop, Prof. Dr. Carsten Oliver Schmidt from the University of Greifswald. "We need a more systematic and transparent handling of data quality and initial data analyses to make health and life sciences more efficient and transparent."
Data Quality Assessments Must Become a Transparent Part of Scientific Work
At the workshop, data scientists emphasized that structured data quality assessments should be a significant part of every scientific study. The results of these assessments should be made available and transparent for reuse. The event was hosted by the German Society for Medical Informatics, Biometry and Epidemiology (GMDS), the Technology and Methods Platform for Networked Medical Research (TMF), the German Society for Epidemiology (DGEpi), the German Region of the International Biometric Society (IBS-DR), the German Society for Social Medicine and Prevention (DGSMP), the international initiative STRengthening Analytical Thinking for Observational Studies (STRATOS), and the Consortium National Research Data Infrastructure for Personal Health Data (NFDI4Health), which supported the workshop with funds from the German Research Foundation.
Even Funders and Scientific Journals Do Not Demand Transparency Regarding Data Quality
An impressive paradox characterizes the handling of data quality in health sciences. On the one hand, every reliable scientific work and statement on pressing issues regarding health, disease, prevention, therapy, and disease consequences depends on high data quality. On the other hand, data quality is not sufficiently focused on across health and life sciences.
Insufficient data management, lack of use of standards, and insufficiently informative descriptions of data sets are significant reasons why many data scientists spend a considerable amount of their time creating analyzable data bodies. "This unnecessarily wastes resources and creates the potential for errors," notes Dr. Nicole Rübsamen, spokesperson for the Workgroup Epidemiological Methods of the DGEpi. Part of the problem is that no consensus exists on methods for capturing and describing data quality problems. Funders or scientific journals do not demand a systematic and transparent description of data quality.
"A systematic reporting on activities for reviewing and processing data prior to actual statistical analyses is missing or often not comprehensible," adds Prof. Marianne Hübner from Michigan State University and co-spokesperson of the STRATOS Initiative. Based on the guidelines for reporting studies in health sciences coordinated by the EQUATOR Network in Oxford, criteria for describing data quality should be developed. For verified data sets, this could generally be done in the form of structured reports to allow for comprehensible insights, as agreed on by Rübsamen, Hübner, and Schmidt.
A systematic reporting on activities for reviewing and processing data prior to actual statistical analyses is missing or often not comprehensible.
Options for Action Exist but Are Inadequately Utilized
Research and teaching still consider existing options for action too little. These range from data standards to data quality concepts and software to facilitate data quality assessments. Against this background, the workshop provided a forum for discussing more efficient and transparent design processes. At the same time, it highlighted the need for further coordination to achieve better harmonized practices in research. "The Data Quality and Transparency Workgroup of the TMF has therefore set itself the goal, in cooperation with other national and international societies and networks, to further develop recommendations, standards, and tools for quality assurance and data evaluations," says Carsten Oliver Schmidt, who leads the workgroup.
Scientific Contact
Prof. Dr. Carsten Oliver Schmidt (University Medicine Greifswald)
Phone: +49 3834 867713
E-mail: carsten.schmidt@uni-greifswald.de
Press Contact
Wiebke Lesch
Phone: +49 30 2200 24731
Mobile: +49 177 2663257
E-mail: presse@tmf-ev.de
Twitter: @tmf_eV