☆ 4.5 Article

HMM-BiMM: Hidden Markov Model-based word segmentation via improved Bi-directional Maximal Matching algorithm

COMPUTERS & ELECTRICAL ENGINEERING (2021)

Journal

COMPUTERS & ELECTRICAL ENGINEERING

Volume 94, Issue -, Pages -

Publisher

PERGAMON-ELSEVIER SCIENCE LTD

DOI: 10.1016/j.compeleceng.2021.107354

Keywords

Bidirectional Maximal Matching; Hidden Markov model; Medical text segmentation; Dictionary dynamic update

Funding

National Natural Science Foundation of China [71974069]

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

The HMM-BiMM algorithm combines the Hidden Markov Model and Bi-directional Maximal Matching algorithm to achieve fast and accurate Chinese word segmentation. By dynamically updating the dictionary, it further improves the accuracy and efficiency of word segmentation.

Combining with the Hidden Markov Model and Bi-directional Maximal Matching algorithm, a new word segmentation algorithm, HMM-BiMM, is presented. In terms of the sub-dictionary matching, it can implement a fast word segmentation. After segmenting the text by the Bidirectional Maximal Matching (BiMM), the remaining text connected by the remaining single words will be segmented again by the strategy of the Hidden Markov Model (HMM). By the HMM, this algorithm can realize the dictionary dynamic update by the new segmentation words and improve the segmentation accuracy accordingly. Compared with five representative algorithms in the real-world clinical text (symptom), we show that the HMM-BiMM algorithm achieves the highest efficiency and accuracy for symptom text segmentation. In detail, this algorithm has around 3% in precision and 70% in running time improved to the BiMM.

HMM-BiMM: Hidden Markov Model-based word segmentation via improved Bi-directional Maximal Matching algorithm

Journal

COMPUTERS & ELECTRICAL ENGINEERING

Publisher

PERGAMON-ELSEVIER SCIENCE LTD

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

HMM-BiMM: Hidden Markov Model-based word segmentation via improved Bi-directional Maximal Matching algorithm

Journal

COMPUTERS & ELECTRICAL ENGINEERING

Publisher

PERGAMON-ELSEVIER SCIENCE LTD

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper