Time-Efficient Curation and Quality Assurance of Clinical and Phenotype Big Data for More Effective Analysis and Insights [by Medisapiens]

Abstract/Technology Overview

Two challenges have been bothering bioinformaticians, biobanks and researchers for a long time: the tedious, manual and time-consuming work of cleaning up datasets and mapping them with ontologies of choice before the actual analysis and discovery phase can begin.

Our technology is a browser-based solution developed in Finland that combines data curation and automated ontology mapping functions to efficiently curate clinical and phenotypic datasets and apply ontologies. Using artificial intelligence and machine learning-based proprietary algorithms, it automatically maps clinical terms to selected ontologies or controlled vocabularies.

Our technology is an intuitive, easy-to-use application that can be hosted on any preferred server to guarantee data privacy. It allows you to map with any ontology of choice. It provides a framework for organizational curation rules and workflows, ensuring standard practices in the curation process across teams in different locations, and providing a complete audit trail of all actions performed.

No matter how large or small the users' data set, our technology saves time and resources in the curation and ontology mapping processes, with reductions of up to 80% in time spent on curation and ontologizing work from the immediate start of its use. Users' datasets increase in quality and value both at the present and in the future.

We are looking for

  • Companies and research institutions that face the challenge of “dirty” data from numerous sources to use our solution
  • Regional representation to represent the company and its solutions
  • Investors of the company

Technology Features, Specifications and Advantages

Heterogeneity between data sets makes integrated analysis difficult and, therefore, cleaning and harmonizing the data, as well as mapping data values to ontologies are of great importance for data driven research. By providing a common vocabulary, ontologies allow integrating heterogeneous data sources and applying cross-person, cross-team, and organization wide rules.

Our curation and ontology mapping solution provides significant savings in time of a reduction of 80%, increased consistency in curation and a reduction of human error, thus enabling large amounts of data to be fully compatible and enabling easier discoveries and insights. This increases the quality and value of users' data both at the present and for future projects.

Other solutions struggle to provide the time savings, and do not combine the stated functions in one single and easy to use application.

Our technology uses artificial intelligence and machine learning-based proprietary algorithms to automatically map clinical terms to selected ontologies or controlled vocabularies. The AI and machine-learning technology not only ensures the increase in time-saving as its use progresses, but also ensures consistency in terminology throughout all datasets.

Using our intuitive, browser-based solution, it can be hosted on any server of choice, thus supporting data security. It enables the use of any controlled vocabulary/ontology and provides options for teams to work in different locations.

Potential Applications

The technology's main application is to curate, clean and ontologize clinical and phenotype data, enabling easier and faster discoveries.

The technology can be used “off-the-shelf” as well as applied to dedicated mass-conversion projects.

Users include bio-informaticians, pharmaceuticals, biobanks, academic research groups, hospital groups and data professionals working with clinical and phenotype data.

Customer Benefit

Our technology enables bioinformaticians, data specialists and researchers to spend more time on discovery and 80% less time on cleaning up data, in addition to increasing the quality and value of users' datasets.

Technology Owner

Hans Garritzen


Embassy of Finland in Singapore

Technology Category
  • Healthcare ICT
  • Natural Language Processing & Semantic Technology
Technology Status
  • Available for Licensing
Technology Readiness Level
  • TRL 9

Curation, ontology, data, clinical data, phenotype, data curation