4.7 Article

Sparse reproducible machine learning for near infrared hyperspectral imaging: Estimating the tetrahydrocannabinolic acid concentration in Cannabis sativa L.

Journal

INDUSTRIAL CROPS AND PRODUCTS
Volume 192, Issue -, Pages -

Publisher

ELSEVIER
DOI: 10.1016/j.indcrop.2022.116137

Keywords

Cannabis; THCA; NIR; HSI; Regression; PLS; CCA; Sparsity; Reproducibility

Ask authors/readers for more resources

This paper presents a reliable real-time technique using proximal near infrared hyperspectral imaging to measure the concentration of tetrahydrocannabinolic acid (THCA) in hemp. The study compares different regression algorithms and finds that regularized partial least squares (RPLS) achieves the best performance. A variation of RPLS with feature selection (PLSFS) is introduced to improve model interpretability and reproducibility.
The concentrations of cannabinoids in hemp are still tightly controlled in New Zealand and around the world with crops exceeding the legal limit being prohibited from cultivation. Thus, there is a need for high throughput methods to accurately assess the cannabinoid content and to evaluate compliance and harvest readiness infield. This paper reports a reliable real-time technique to measure the tetrahydrocannabinolic acid (THCA) concentration of Cannabis sativa L. using proximal near infrared (NIR) hyperspectral imaging (HSI). At implementation, scalability can be achieved by introducing sparsity to the model. Sparsity also enabled better model interpretability and is robust against fitting noisy HSI data. Model reproducibility was used to assess the quality of the model fitness. This work uses linear regression to map NIR HSI images to THCA measured with high performance liquid chromatography (HPLC). Four regression algorithms that cover different regression strategies were compared: Canonical Correlation Analysis (CCA), Ensemble CCA (EnCCA), Partial Least Squares Regression (PLS), and Regularized PLS (RPLS). The RPLS algorithm achieved the best performance but uses all spectral wavelengths for regression. Thus, a variation of RPLS with feature selection (PLSFS) was introduced to improve model interpretability. The proposed PLSFS method leads to reproducible models while maintaining small feature sets. To our knowledge, this publication reports the first research that has used HSI to estimate THCA concentration.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available