We have developed a unique algorithm with a suite of software tools to enable the use of Artificial Intelligence (AI), specifically deep learning, for sparse, or noisy data, that is typicaly found in experimental research and development environments.
Our core software encapsulates a new algorithm and architecture for deep learning that allows very sparse data to be used to train new models. Currently deep learning requires large amounts of high quality data to train on to generate accurate models. This requirement is currently preventing the use of AI in areas where it could add the most value.
This new approach is being applied in a number of domains such as:
- Advanced materials where there are lots of historical materials data (composition, properties and processing) but not all datapoints have been recorded and new properties are constantly being defined. For instance,in field of additive manufacturing;
- Drug discovery where the large matrix of compound / protein interactions has only been partially experimentally verified (0.05% complete), and
- Healthcare, where patient profiles could be considered as incomplete data - every patient will have varying amounts of medical history or known information.
Technology Features, Specifications and Advantages
This new algorithm for deep learning can be easily used by scientific users with no detailed knowledge of how deep learning works. The core application consists of 3 key stages:
1. Upload data : Simple comma-separated values (CSV) files are used to upload data
2. Training of models : A few simple configuration options to indicate how accurate or how long the models should be trained for
3. Use of generated models : Once the models have been generated there are interfaces to
i) Predict new values: predicting the value of a target variable
ii) Identify errors in data
iii) Design new elements/materials: for instance, the user can specify desired physical properties of a material (ie good thermal conductivity) and the model will predict the required values for the rest of parameters in order to achieve the desired property
iv) Data visualisation for analysis : Helps the user understand the data and any correlations between the given datapoints
Additionally, reports are produced during client engagements with qualification of estimates indicative of uncertainty by robust and meaningful quality metrics (ie accuracy tests, Area Under the Curve (AUC), pairwise comparisons). We may also include external data that, a priori, should correlate with the target outcome in order to improve overall predictive power.
Proven applications with the following type of problems:
- Estimation of values previously only accessible by expensive, empirical experimentation
- Ability to estimate the endpoints in complex, multistage, multi-ingredient processes
- Qualification of estimates indicative of uncertainty by robust and meaningful quality metrics
- Ability to identify and correct outlier data and to suggest empirical experiments that will reduce overall uncertainty of the model
- Computationally efficient and scalable from small matrices to big data
- Large amounts of incomplete anonymised, numerical data
- Numerical data combined with models or graph functions
Our tool helps maximize and gain insights from the value in all of the data available, even when there is a lot of unknown or noisy data. The outputs from the tool suggest many data points - these could be:
- Optimum composition and treatment processes for advanced materials to achieve certain properties,
- Likely compounds (drugs) that may activate certain proteins, or
- Optimum patient pathways to achieve desired outcomes
The tool can be used to support professionals in each of these spaces to help guide actions, reduce the chance of errors and ultimately save costs (eg. when a new material is designed in 3 cycles instead of 10, or a drug can treat an illness more effectively).