Multi-task topic analysis framework for hallmarks of cancer withweak supervision. Batbaatar, E., Pham, V. H., & Ryu, K. H. Applied Sciences (Switzerland), 2020.
doi  abstract   bibtex   
The hallmarks of cancer represent an essential concept for discovering novel knowledge about cancer and for extracting the complexity of cancer. Due to the lack of topic analysis frameworks optimized specifically for cancer data, the studies on topic modeling in cancer research still have a strong challenge. Recently, deep learning (DL) based approaches were successfully employed to learn semantic and contextual information from scientific documents using word embeddings according to the hallmarks of cancer (HoC). However, those are only applicable to labeled data. There is a comparatively small number of documents that are labeled by experts. In the real world, there is a massive number of unlabeled documents that are available online. In this paper, we present a multi-task topic analysis (MTTA) framework to analyze cancer hallmark-specific topics from documents. The MTTA framework consists of three main subtasks: (1) cancer hallmark learning (CHL)-used to learn cancer hallmarks on existing labeled documents; (2) weak label propagation (WLP)-used to classify a large number of unlabeled documents with the pre-trained model in the CHL task; and (3) topic modeling (ToM)-used to discover topics for each hallmark category. In the CHL task, we employed a convolutional neural network (CNN) with pre-trained word embedding that represents semantic meanings obtained from an unlabeled large corpus. In the ToM task, we employed a latent topic model such as latent Dirichlet allocation (LDA) and probabilistic latent semantic analysis (PLSA) model to catch the semantic information learned by the CNN model for topic analysis. To evaluate the MTTA framework, we collected a large number of documents related to lung cancer in a case study. We also conducted a comprehensive performance evaluation for the MTTA framework, comparing it with several approaches.
@article{Pham2020,
	title = {Multi-task topic analysis framework for hallmarks of cancer withweak supervision},
	volume = {10},
	issn = {20763417},
	doi = {10.3390/app10030834},
	abstract = {The hallmarks of cancer represent an essential concept for discovering novel knowledge about cancer and for extracting the complexity of cancer. Due to the lack of topic analysis frameworks optimized specifically for cancer data, the studies on topic modeling in cancer research still have a strong challenge. Recently, deep learning (DL) based approaches were successfully employed to learn semantic and contextual information from scientific documents using word embeddings according to the hallmarks of cancer (HoC). However, those are only applicable to labeled data. There is a comparatively small number of documents that are labeled by experts. In the real world, there is a massive number of unlabeled documents that are available online. In this paper, we present a multi-task topic analysis (MTTA) framework to analyze cancer hallmark-specific topics from documents. The MTTA framework consists of three main subtasks: (1) cancer hallmark learning (CHL)-used to learn cancer hallmarks on existing labeled documents; (2) weak label propagation (WLP)-used to classify a large number of unlabeled documents with the pre-trained model in the CHL task; and (3) topic modeling (ToM)-used to discover topics for each hallmark category. In the CHL task, we employed a convolutional neural network (CNN) with pre-trained word embedding that represents semantic meanings obtained from an unlabeled large corpus. In the ToM task, we employed a latent topic model such as latent Dirichlet allocation (LDA) and probabilistic latent semantic analysis (PLSA) model to catch the semantic information learned by the CNN model for topic analysis. To evaluate the MTTA framework, we collected a large number of documents related to lung cancer in a case study. We also conducted a comprehensive performance evaluation for the MTTA framework, comparing it with several approaches.},
	number = {3},
	journal = {Applied Sciences (Switzerland)},
	author = {Batbaatar, Erdenebileg and Pham, Van Huy and Ryu, Keun Ho},
	year = {2020},
	keywords = {Biomedical domain, Cancer hallmark, Convolutional neural network, Latent semantic learning, Lung cancer, Multi-task learning, Semantic learning, Topic analysis},
}

Downloads: 0