4.7 Article

Cross-Domain Lithology Identification Using Active Learning and Source Reweighting

Journal

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/LGRS.2020.3041960

Keywords

Uncertainty; Training; Automation; Probability distribution; Prediction algorithms; Machine learning; Testing; Active learning (AL); domain adaptation (DA); lithology identification

Funding

  1. National Key Research and Development Project of China [2018AAA0100800, 2018YFE0106800]
  2. National Natural Science Foundation of China [61903353, 61725304, 61673361]
  3. Major Science and Technology Project of Anhui Province [912198698036]
  4. Fundamental Research Funds for the Central Universities [WK2100000013]
  5. China Petroleum & Chemical Corporation (SINOPEC) Programmes for Science and Technology Development [PE19008-8]

Ask authors/readers for more resources

Cross-domain lithology identification is a challenging problem that aims to predict the lithology of an uninterpreted well using logging data from an interpreted well. In this study, we propose a novel framework that combines active learning and domain adaptation to address the issues of data distribution shift and expensive label acquisition. Experimental results demonstrate that our method effectively suppresses performance degradation caused by data distribution shift and requires fewer target label queries.
Cross-domain lithology identification (CDLI) is a common case in lithology identification, which aims to train a machine learning model using the logging data of an interpreted well to predict the lithology of another uninterpreted well. Compared with the general lithology identification problem, the CDLI problem is more challenging for two reasons: the data distribution shift between the wells, and the expensive label acquisition on the uninterpreted well. To tackle these issues, we propose a novel framework that embeds active learning (AL) and domain adaptation into lithology identification. The proposed framework is composed of two components: an AL algorithm that selects the most uncertain and diverse target samples to query their real labels, and a source reweighting method that leverages the target labels to reduce data distribution discrepancy. Experimental results on two real-world data sets demonstrate that the proposed method can more effectively suppress the performance degradation caused by the data distribution shift than the baselines, with fewer target label queries.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available