dc.description |
Data insufficiency and heterogeneity are challenges of representation learning for machine learning in medicine due to the diversity of medical data and the expense of data collection and annotation. To learn generalizable representations from such limited and heterogeneous medical data, we aim to utilize various learning paradigms to overcome the issue. In this dissertation, we systematically explore the machine learning frameworks for limited data, data imbalance, and heterogeneous data, using cross-domain learning, self-supervised learning, contrastive learning, meta-learning, multitask learning, and robust learning. We present studies with different medical applications, such as clinical language translation, ultrasound image classification and segmentation, medical image retrieval, skin diagnosis classification, pathology metadata prediction, and lung pathology prediction.
We first focus on the limited data problem, which is common in medical domains. We learn cross-domain representations for clinical language translation with limited and unpaired medical language corpora using unsupervised embedding space alignment with identical anchors for word translation, and conduct sentence translation using statistical language modeling. Using metrics of clinical correctness and readability, the developed method outperforms a dictionary-based algorithm in both word- and sentence-level translation. For learning better data representations of limited numbers of ultrasound images, we then adopt the self-supervised learning technique and integrate the corresponding metadata as a multimodal resource to introduce inductive biases. We find that the representations learned by the developed approach yield better downstream task performance, such as ultrasound image quality classification and organ segmentation, compared with the standard transfer learning methods.
Next, we zoom into the data imbalance problem. We explore the utility of contrastive learning, specifically the Siamese network, to learn representations from an imbalanced fundoscopic imaging dataset for diabetic retinopathy image retrieval. Compared with the standard supervised learning setup, we obtain comparable but interpretable results using the representations learned from the Siamese network. We also utilize meta-learning for skin disease classification with an extremely imbalanced long-tailed skin image dataset. We find that model ensemble with meta-learning models and models trained with conventional class imbalance techniques yields better prediction performance, especially for rare skin diseases.
Finally, for heterogeneous medical data, we develop a multimodal multitask learning framework to learn a shared representation for pathology metadata prediction. We use the multimodal fusion technique to integrate the slide image, free text, and structured metadata, and adopt a multitask objective loss to introduce the inductive bias while learning. This yields better prediction power than the standard single-modal single-task training setup. We also apply robust training techniques to learn representations that can tackle a distributional shift across two chest X-ray datasets. Compared with standard training, we find that robust training provides better tolerance when the shift exists, and learns a robust representation for lung pathology prediction.
The investigation in this dissertation is not exhaustive but it introduces an extensive understanding of utilizing machine learning in helping clinical decision making under the limited and heterogeneous medical data setting. We also provide insights and caveats to motivate future research directions of machine learning with low-resource and high-dimensional medical data, and hope to make a positive real-world clinical impact. |
|