We have developed a unique algorithm with a suite of software tools to enable the use of Artificial Intelligence (AI), specifically deep learning, for sparse, or noisy data, that is typicaly found in experimental research and development environments.
Our core software encapsulates a new algorithm and architecture for deep learning that allows very sparse data to be used to train new models. Currently deep learning requires large amounts of high quality data to train on to generate accurate models. This requirement is currently preventing the use of AI in areas where it could add the most value.
This new approach is being applied in a number of domains such as:
- Advanced materials where there are lots of historical materials data (composition, properties and processing) but not all datapoints have been recorded and new properties are constantly being defined. For instance,in field of additive manufacturing;
- Drug discovery where the large matrix of compound / protein interactions has only been partially experimentally verified (0.05% complete), and
- Healthcare, where patient profiles could be considered as incomplete data - every patient will have varying amounts of medical history or known information.
Technology Features, Specifications and Advantages
This new algorithm for deep learning can be easily used by scientific users with no detailed knowledge of how deep learning works. The core application consists of 3 key stages:
1. Upload data : Simple comma-separated values (CSV) files are used to upload data
2. Training of models : A few simple configuration options to indicate how accurate or how long the models should be trained for
3. Use of generated models : Once the models have been generated there are interfaces to
i) Predict new values: predicting the value of a target variable
ii) Identify errors in data
iii) Design new elements/materials: for instance, the user can specify desired physical properties of a material (ie good thermal conductivity) and the model will predict the required values for the rest of parameters in order to achieve the desired property
iv) Data visualisation for analysis : Helps the user understand the data and any correlations between the given datapoints
Additionally, reports are produced during client engagements with qualification of estimates indicative of uncertainty by robust and meaningful quality metrics (ie accuracy tests, Area Under the Curve (AUC), pairwise comparisons). We may also include external data that, a priori, should correlate with the target outcome in order to improve overall predictive power.
Proven applications with the following type of problems:
- Pharma (P), MedTech (M), Healthcare (H): We can process large amounts of incomplete, anonymised, numerical data that would otherwise be considered an unsuitable input by other AI tools
- P, M, H: Estimation of values previously only accessible by expensive, empirical experimentation
- H: Ability to estimate the endpoints in complex, multistage processes
- P: Ability to estimate target variables among interaction between complex, molecular conformations
- P, M, H: Qualification of estimates indicative of uncertainty by robust and meaningful quality metrics (ie accuracy tests, Area Under the Curve (AUC), pairwise comparisons)
- P, M, H: Ability to identify and correct outlier data and to suggest empirical experiments that will reduce overall uncertainty of the model
- P, M, H: Computationally efficient and scalable from small matrices to big data
- P, M, H: Numerical data combined with models or graph functions
Our tool helps gain and maximize insights from available data, even when such data is sparse. The outputs from the tool deliver knowledge about variables' correlations and highly accurate predictions of target variables. Users define the target condition and our model predicts the rest.
During our engagement with healthcare clinics, we were given datasets with confidential information about patients' medical background, treatment type, lifestyle habits etc. These clinics then specified the target treatment outcome (ie Outcome A) to be achieved by doctors. As such, a deep learning model was generated that optimized the target variable (Outcome A) and estimated what the range of values for the rest of parameters should be, and gave optimum patient pathways to achieve this outcome.
During our engagement with drug discovery companies, our model predicted the likely compounds that may interact with certain target proteins, thus reducing the need to engage in expensive experimentation.
The tool can be used to support professionals in each of these spaces to help guide actions, reduce the chance of errors and ultimately save costs (eg. when a new material is designed in 3 cycles instead of 10, or a drug can treat an illness more effectively)