Many applications in healthcare require classifying data to determine a test result or to decipher various outcomes from sensor data such as a positive or negative result or the use of an accelerometer to determine whether a patient is sitting, sleeping, or walking.
The bulk of these applications uses ‘supervised learning’ which requires a set of ‘labeled’ data to enable the ML system to correlate labels associated with desired outcomes to each data set. As we will see, it can be difficult or impossible for a Machine Learning Model to find a correlation between the data provided and the labels assigned, if we provide too few examples or if the data provided is limited or restricted in some fashion. For this reason, not all use cases are good candidates for AI/ML, and I hope that the examples in this series will help readers develop a ‘horse sense’ for when AI may be appropriate and when building a traditional algorithm to solve a problem is a better choice.
Dr. Eric Topal shared this example in his excellent book “Deep Medicine” to illustrate the challenges one faces when building and validating a Machine Learning model for a healthcare application.
Although they were given more than 1 Million ECG T-waves with lab results, the AI/ML algorithm initially could not find a reliable indicator in the data that linked T-Waves to potassium levels. When AliveCor then asked their hospital partner to provide full ECGs and full lab results for all patients and re-ran their ML Models, voila, the model found a reliable correlation between the complete patient ECG and elevated Potassium levels.
So why did the first attempt fail? It seems that the first set of data was for out-patients only, so the lab results were likely taken at a much greater difference in time than the ECG data, and further, outpatients tend to be healthier and have less tendency toward serious kidney failure conditions which was the condition of interest. In addition, the first request for the T-Wave ECG data excluded the full ECG, so this assumption further blinded the ML algorithm. The lesson here is to be careful when pre-sorting or pre-filtering data as you may lose important data by virtue of your assumption that only the requested subset of data is significant for your results. (Dr. Eric Topal – Deep Medicine)
In the AliveCor example, AI/ML was an excellent choice to sift through data to find a correlation, but it was only successful once it was presented with full sets of ECG data and lab results taken from admitted patients so they were obtained close to the time of the ECG. As a result, the AliveCor ECG product can be used to detect high-potassium conditions in patients who use their product.
AliveCor was able to show that their ML model produced accurate results on a large sample of patients which enabled them to receive FDA clearance to market their product with a claim that it detects elevated potassium levels in patients. Since this ML model functions as a classifier, its operation was trained using a large data set, and it has been proven to perform this function reliably across the “test data” that was kept separate from the “training data” used to train the model.
The AliveCor example illustrates why the use of Supervised Learning with ML classifiers can be a great use case for Machine Learning in a variety of medical product applications. To this end, a number of software tools make it possible to build Machine Learning models in the cloud that can be deployed on embedded devices, often with only minimal loss in performance.
While ML can be a useful tool, we have learned that the AI/ML model is not ideal in many situations since it adds complexity and requires a good amount of data for training and testing the model. It may also not be appropriate in situations where an algorithm can be used to perform a sequence of well-defined tasks such as applying a filter, an envelope, and a threshold to detect heart rate from a PPG or ECG signal. Detecting a heart rate with an ML model would surely require significantly more ECG data than the algorithm approach, and it introduces much complexity into an otherwise well-defined algorithmic approach.
I learned long ago that the best musicians first master their instrument and then learn when NOT to play. So it is with planning system architecture and knowing which tools are best suited for the application. As a result, it is best to investigate alternate approaches before deciding whether the AI/ML approach is best for your application.
Our team is always here to help and in our next blog, we will review the impact of latency with an ML implementation vs. the latency of an algorithm performing the same function in a tech-enabled version of a familiar everyday medical product.
Frequently Asked Questions About AI/ML
When are traditional algorithms a better choice than Machine Learning (ML)?
Many healthcare applications involve data classification tasks, such as interpreting sensor data to determine patient activity (sitting, walking, sleeping) or analyzing test results. Traditionally, these tasks are handled by pre-programmed algorithms. However, ML offers an alternative approach.
The key factor influencing this choice is the availability of labeled data. Supervised learning, the most common ML approach, requires a large dataset where each data point is labeled with the desired outcome. For instance, an ML model designed to detect heart rhythm abnormalities from ECG data needs a vast collection of ECG recordings, each labeled as normal or abnormal. If such labeled data is limited or unavailable, a traditional, rule-based algorithm might be a more suitable solution.
Real-world example: Why did initial attempts at AliveCor’s ML model fail?
Dr. Eric Topal, in his book “Deep Medicine,” presents a case study from AliveCor’s development of an AI-powered ECG device for detecting high potassium levels. Initial attempts using a limited dataset of ECG T-waves and outpatient lab results proved unsuccessful. The ML model couldn’t find a reliable correlation between the data.
However, when AliveCor expanded the data to include full ECGs and lab results from hospitalized patients (collected closer to the ECG readings), the model identified a clear correlation between complete ECG data and elevated potassium. This highlights the importance of using comprehensive, high-quality data for training ML models.
The success story: How did AliveCor leverage ML to create a valuable product?
By providing the ML model with a more comprehensive dataset (full ECGs and near-time lab results), AliveCor successfully trained a model to detect high potassium levels. This model was then validated using a separate dataset, demonstrating its accuracy and reliability. As a supervised learning classifier, AliveCor’s ML model received FDA clearance to be marketed for detecting elevated potassium.
Key takeaways: When is supervised learning with ML classifiers a good choice?
The AliveCor example showcases the effectiveness of supervised learning with ML classifiers in various medical device applications. Several cloud-based software tools simplify the creation of ML models that can be deployed on embedded devices with minimal performance loss.
When might a traditional algorithm be preferable to an ML model?
While ML offers a powerful toolset, it’s not always the optimal solution. Here are some factors to consider:
- Complexity: ML models introduce additional complexity to the system design.
- Data Requirements: Training and testing ML models often necessitate significant data volumes.
- Well-defined tasks: For well-defined tasks with clear steps (e.g., heart rate detection from ECG signals), traditional algorithms might be more efficient and require less data compared to an ML model.
Just as a skilled musician understands when to play and when to hold back, effectively designing medical devices involves choosing the right tools for the job. Carefully evaluate your specific needs and consider traditional algorithms before jumping to an ML approach.