4.5 Article

A Comparative Study on the Influence of Undersampling and Oversampling Techniques for the Classification of Physical Activities Using an Imbalanced Accelerometer Dataset

Journal

HEALTHCARE
Volume 10, Issue 7, Pages -

Publisher

MDPI
DOI: 10.3390/healthcare10071255

Keywords

physical activity; accelerometer; ensemble method; random forest; bootstrap aggregating (bagging); adaptive boosting; undersampling; oversampling

Funding

  1. National Research Foundation of Korea Grant - Ministry of Science and ICT [NRF-2021R1G1A1094236]
  2. Catholic University of Korea

Ask authors/readers for more resources

This study aims to classify physical activities in daily life using machine learning methods. By extracting features and applying sampling methods, the data imbalance issue was successfully addressed. The results showed that methods like random forest and adaptive boosting performed well in PA classification.
Accelerometer data collected from wearable devices have recently been used to monitor physical activities (PAs) in daily life. While the intensity of PAs can be distinguished with a cut-off approach, it is important to discriminate different behaviors with similar accelerometry patterns to estimate energy expenditure. We aim to overcome the data imbalance problem that negatively affects machine learning-based PA classification by extracting well-defined features and applying undersampling and oversampling methods. We extracted various temporal, spectral, and nonlinear features from wrist-, hip-, and ankle-worn accelerometer data. Then, the influences of undersampilng and oversampling were compared using various ML and DL approaches. Among various ML and DL models, ensemble methods including random forest (RF) and adaptive boosting (AdaBoost) exhibited great performance in differentiating sedentary behavior (driving) and three walking types (walking on level ground, ascending stairs, and descending stairs) even in a cross-subject paradigm. The undersampling approach, which has a low computational cost, exhibited classification results unbiased to the majority class. In addition, we found that RF could automatically select relevant features for PA classification depending on the sensor location by examining the importance of each node in multiple decision trees (DTs). This study proposes that ensemble learning using well-defined feature sets combined with the undersampling approach is robust for imbalanced datasets in PA classification. This approach will be useful for PA classification in the free-living situation, where data imbalance problems between classes are common.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.5
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available