☆ 4.7 Article

Sparse reproducible machine learning for near infrared hyperspectral imaging: Estimating the tetrahydrocannabinolic acid concentration in Cannabis sativa L.

INDUSTRIAL CROPS AND PRODUCTS (2023)

Journal

INDUSTRIAL CROPS AND PRODUCTS

Volume 192, Issue -, Pages -

Publisher

ELSEVIER

DOI: 10.1016/j.indcrop.2022.116137

Keywords

Cannabis; THCA; NIR; HSI; Regression; PLS; CCA; Sparsity; Reproducibility

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

This paper presents a reliable real-time technique using proximal near infrared hyperspectral imaging to measure the concentration of tetrahydrocannabinolic acid (THCA) in hemp. The study compares different regression algorithms and finds that regularized partial least squares (RPLS) achieves the best performance. A variation of RPLS with feature selection (PLSFS) is introduced to improve model interpretability and reproducibility.

The concentrations of cannabinoids in hemp are still tightly controlled in New Zealand and around the world with crops exceeding the legal limit being prohibited from cultivation. Thus, there is a need for high throughput methods to accurately assess the cannabinoid content and to evaluate compliance and harvest readiness infield. This paper reports a reliable real-time technique to measure the tetrahydrocannabinolic acid (THCA) concentration of Cannabis sativa L. using proximal near infrared (NIR) hyperspectral imaging (HSI). At implementation, scalability can be achieved by introducing sparsity to the model. Sparsity also enabled better model interpretability and is robust against fitting noisy HSI data. Model reproducibility was used to assess the quality of the model fitness. This work uses linear regression to map NIR HSI images to THCA measured with high performance liquid chromatography (HPLC). Four regression algorithms that cover different regression strategies were compared: Canonical Correlation Analysis (CCA), Ensemble CCA (EnCCA), Partial Least Squares Regression (PLS), and Regularized PLS (RPLS). The RPLS algorithm achieved the best performance but uses all spectral wavelengths for regression. Thus, a variation of RPLS with feature selection (PLSFS) was introduced to improve model interpretability. The proposed PLSFS method leads to reproducible models while maintaining small feature sets. To our knowledge, this publication reports the first research that has used HSI to estimate THCA concentration.

Sparse reproducible machine learning for near infrared hyperspectral imaging: Estimating the tetrahydrocannabinolic acid concentration in Cannabis sativa L.

Journal

INDUSTRIAL CROPS AND PRODUCTS

Publisher

ELSEVIER

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Sparse reproducible machine learning for near infrared hyperspectral imaging: Estimating the tetrahydrocannabinolic acid concentration in Cannabis sativa L.

Journal

INDUSTRIAL CROPS AND PRODUCTS

Publisher

ELSEVIER

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper