For the second year in a row, the Machine Learning Week Europe conference has been held online. Unfortunately, that meant no mingling in the corridors, flagging down waiters with trays of hors d’oeuvres, and clinking glasses with fellow ML enthusiasts and practitioners. Fortunately, however, the conference did succeed in bringing together some fantastic talent for some world-class talks.
Here’s a quick summary of some that stood out for me.
30 Golden Rules of Deep Learning Performance
Siddha Ganju – NVIDIA
Siddha provided an extremely accessible and comprehensive list of speed-ups for modern Deep Learning projects (with a bit of cheeky humour sprinkled in). The talk comes on the heels of her new book Practical Deep Learning for Cloud, Mobile & Edge. Also very cool: everything from the talk is available if you send an e-mail to PracticalDLBook@gmail.com with the title “wget School of AI presentation”.
Explainable AI in Deep Brain Medicine
Afsaneh Asaei & Ozgur Polat – UnternehmerTUM and the Schoen Clinic of Neurology
In a phenomenally creative use of Machine Learning, the voice patterns of Parkinson’s patients can be used to identify if their condition is “flaring up” (entering an “off” period). Audio recordings are changed into spectrogram images, which are then fed into a neural network (1 CNN layer, 2 RNN layers, 1 FC layer). The network outputs a binary value corresponding to the state of the patient’s condition. The full article is available at Jain et al. 2021.
Data Science at The New York Times
Chris Wiggins – New York Times
When you think “New York Times” the mental imagery is likely unfurling a canvas of crisp typeface onto your dining room table, coffee in hand. You probably do not think of innovative data science projects. The reality is that the New York Times operates a large, and wonderfully creative, data science department, headed by Chris Wiggins. Using a wealth of data generated by online readership and subscriptions, the team plays – and “plays” is really the correct word here – with different imaginative projects.
As an example, the team once thought to themselves: “I wonder if we could figure out how articles make people feel.” Through questionnaires for online readers, they were able to create an NLP + clustering system able to identify, for a given article, emotions likely to be triggered within readers. Through this, advertisers could opt to show their ads alongside articles likely to invoke a specific emotion. The result was a huge success, with better advertisement revenue, and a more focused understanding of the impact of published stories.
A summary of this specific project can be found here.
Marcus Gross – INWT Statistics
What is the best way to approach a predictive maintenance use case? Although there is no one “right” way of doing things, Marcus suggests a few general tips from recent research in the domain:
- Recently, there has been some research into applying Cox Proportional Hazard to Deep Learning. This can be a good solution, especially with very sparce datasets containing few instances of failure/breakdown. A worked example can be found here.
- To understand which features (or feature combinations) are the strongest indicators of failure, use Shapley values, or (more specifically) SHAP values.
- A Weibull Distribution can be a useful metric for helping the industry decide when to service machinery. A worked example in Keras can be found here.
- If you have a wealth of data with many examples of failure, then a simple answer may work best: a custom search algorithm that identifies parts or features that contribute to high relative failure.
- Finally, XGBoost is the weapon of choice when training tree models predicting failure. Don’t skip it!
Leveraging Unstructured Data in Insurance
Raymond von Es – Milliman
Does your company have a huge whack of customer call recordings? How do you extract value?
Some points from an expert:
- It is imperative that customer calls are recorded on 2 audio channels (1 for customers, 1 for call centre employees). Without doing this, it can become a huge headache to split them later.
- GCP speech-to-text API (batch processing) works great
- Technical or domain-specific words often have to be treated separately
- On German data GCP is good enough for Sentiment Analysis & Topic Identification
- On English data it is also sometimes possible to create automated summaries
The Role of TensorFlow, PyTorch, and Friends in a Medical Device – the Regulatory Perspective
Oliver Haase – Validate ML
If I’m looking to get an AI system certified by TÜV, can I used ML packages and libraries, or should everything be coded “from scratch”?
Oliver Haase, of Validate ML, provides the answer to this question: it will depend on the industry.
In general plant deployment, packages can be used without worry. Certification will principally depend on factors such as (among others):
- Human operators being present to oversee the AI system
- The AI system being stoppable at any given moment
- The AI system being impenetrable to hackers (air-gapped)
However, for medical devices, where human lives may be at stake, the requirements are stricter. Packages do need to be validated before use. For an example of why this is necessary, consider this situation, in which a bug in the data augmentation method combining NumPy with PyTorch leads to unrealistic accuracy estimates.
Discovering Key Topics from Real-World Medical Inquiries via Natural Language Processing at Bayer
Angelo Ziletti – Bayer
Pharmaceutical companies receive countless medical inquiries, often in the form of short text blurbs. So how do you tackle them? Ziletti’s team used various NLP and clustering methods to sort and group by topics. The task was complicated by the fact that inquiries would sometimes be a single sentence or only a few words.
The resulting collage of topics and patient concerns provided an actionable roadmap for which areas should be focused on in patient communication, as well as for general improvements in the medical context. The full paper is available here.