4.6 Article

Machine learning predicts and provides insights into milk acidification rates of Lactococcus lactis

Journal

PLOS ONE
Volume 16, Issue 3, Pages -

Publisher

PUBLIC LIBRARY SCIENCE
DOI: 10.1371/journal.pone.0246287

Keywords

-

Funding

  1. Innovation Fund Denmark (IFD) [703800250B]

Ask authors/readers for more resources

Lactococcus lactis strains are important for cheese manufacturing and their strain-dependent properties can be predicted using machine learning models trained on different genomic representations. The models were able to predict the maximum hourly acidification rate and showed high correlation with the measured values, providing insights into lactose metabolism, casein degradation, and pH stress response. Each model also identified unique genetic features not found by the others.
Lactococcus lactis strains are important components in industrial starter cultures for cheese manufacturing. They have many strain-dependent properties, which affect the final product. Here, we explored the use of machine learning to create systematic, high-throughput screening methods for these properties. Fast acidification of milk is such a strain-dependent property. To predict the maximum hourly acidification rate (V-max), we trained Random Forest (RF) models on four different genomic representations: Presence/absence of gene families, counts of Pfam domains, the 8 nucleotide long subsequences of their DNA (8-mers), and the 9 nucleotide long subsequences of their DNA (9-mers). V-max was measured at different temperatures, volumes, and in the presence or absence of yeast extract. These conditions were added as features in each RF model. The four models were trained on 257 strains, and the correlation between the measured V-max and the predicted V-max was evaluated with Pearson Correlation Coefficients (PC) on a separate dataset of 85 strains. The models all had high PC scores: 0.83 (gene presence/absence model), 0.84 (Pfam domain model), 0.76 (8-mer model), and 0.85 (9-mer model). The models all based their predictions on relevant genetic features and showed consensus on systems for lactose metabolism, degradation of casein, and pH stress response. Each model also predicted a set of features not found by the other models.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available