Data Management

Sharing results simply and securely

Data management in LiSyM improves handling and comprehensibility

LiSyM manages the network’s research data according to the FAIR principals. These stipulate 15 guidelines for the content, form, storage and accessibility. “These assure that scientific results are easier to find, reproducible and exchangeable,” Associate Professor Dr.Wolfgang Müller from the Heidelberg Institute of Theoretical Studies (HITS) explains. His research group builds the central platform LiSyM-SEEK. Through this platform LiSyM scientists can control data sharing simply and securely, store data packages systematically and publish results. LiSyM-SEEK offers a variety of options for presenting projects and furthering collaboration.


Standardizing form, content and storage

Surveys reveal that, too often, scientific results cannot be reproduced by other research groups. Sometimes details of methodology are missing: in other cases the original data cannot be found or the format is unusable. Consequently, today most institutions that fund and promote research stipulate criteria for data management to ensure that data remains available for a long time. Data content, form and storage must become more uniform. The intention behind this requirement is to make the scientific results more reproducible and to ensure the supporting data can be used repeatedly more simply –through data exchange or when supplementing a dataset with data from other sources, for example.

These aims are also followed by FAIRDOM, a transnational association to which Müller belongs. Müller, also Associate Professor at the Faculty of Information Systems and Applied Computer Sciences, Bamberg University, implements these regulations for LiSyM. The HITS Corporation, where Müller leads the Scientific Databases and Visualization Group, is a private, non-profit research institute. Together with his colleagues he is active in a number of international organizations for the standardization of scientific data. For LiSyM, Müller applies the principles of FAIRDOM and, in the process, develops them further. At the same time he is developing extensions and modifications specific to LiSyM’s needs: “This way we can respond to specific needs quickly.”
FAIRDOM, including partners from LiSyM, developed SEEK, a software which supports collaboration, the FAIRDOM software platform, and the data catalogue and repository, FAIRDOMHub. The system allows users to store, organize and combine datasets or link them with other metadatabases. Through it, data can be published for open access or access can be specifically restricted – internally for collaborative project or also externally. Furthermore, FAIRDOM members have teamed up with others to formulate the FAIR principles. FAIR stands for: findable, accessible, interoperable und reusable.


The meaning of FAIR for data in practice

In practice “findable” means that every dataset which remains consistent over time is issued a permanent identification code - a digital object identifier (DOI), for example. It assures that datasets can be found and thus cited for at least ten years. In addition, all datasets are provided with supplementary detailed metadata, including information on the creators and the methodology used as well as additional background information. Furthermore, SEEK analyzes the data, making them easy to find. Data must be accessible through documented protocols such as the Hypertext Transfer Protocol, HTTP. If additional datasets are needed to ensure the information is complete, for example when the research builds on another study, then references must indicate this. To this end, SEEK facilitates the linking of data. The platform can also make the data available for long periods by ensuring long-term data storage.

Data should also be interoperable, and thus easy to combine and exchangeable between programs. “That is the most challenging feature,” Müller says. To achieve this goal, the data must correspond both technically and in terms of content. This requires the standardization of formats for the arrangement of data, results and terminologies. Usability relies on the completeness of the information: often small omissions, obscurities, and discrepancies mean results cannot be compared or reproduced – when, for example, details are missing, information imprecise or different terms for the same symptoms or substances are used. Thus, Martin Golebiewski and other colleagues in Müller’s group are very active in committees responsible for the standardization of biographical data.

Another key concern is security. Together with Müller and his colleagues, LiSyM researchers, who work as experimental, computer modelling and clinical scientists, have developed guidelines for sharing data within the network. The creators of data or those responsible for the databases define the restrictions. They determine who can access the data – ranging from only a few close cooperation partners, to divisions of LiSyM, the entire network or the whole world. One can also regulate who can view data, who can use it, and for what purpose. Sometimes LiSyM scientists want to share their data, for example, with only a few colleagues within the network or with external project partners. “For such cases we have special protective measures such as ‘secret links’,” Müller explains. Clinical patient information is legally subject to special protection. Müller is working with LiSyM on a system that facilitates the sharing of data whilst upholding patient rights. “Scientists are granted access to summarized data to maintain patient anonymity.” These data packets provide the desired information, yet not all information and not enough to identify individual patients.


User opportunities increase

Müller describes the network-specific LiSyM-SEEK as “a kind of facade”. Through this platform, members specify internal and external access rights. Internal contributions land in LiSyM-SEEK, and moreover in the FAIRDOMHub. The two separate servers allow for more flexibility when new tools are being developed or implemented. LiSyM-SEEK was developed during the project, Müller explains: “This means we can slowly expand the datasets, while periodically separating out small snapshots.” These snapshots are given their own DOI, making them fixed, findable and citable. Later, many of them can then be compiled for presentations. An Application Programming Interface (API) has recently been completed. This interface makes it possible to transfer large datasets through programs rather than step-by-step via mouse-click. Thus, API simplifies the process of using, exchanging and transferring data via SEEK, LiSyM-SEEK and the FAIRDOMHub.

Müller and his colleagues support LiSyM by responding to telephone or email inquiries. They advise members and conduct tutorials to explain the various options available to the user for managing their data. “Initially users want to know how best to conceal data for security reasons,” he says. In his experience, the emphasis quickly shifts, Müller adds: “Later they want to know how to disseminate data as widely as possible and maintain it in the long-term.”