4.5 Article

Transitivity of transformation matrices to bridge word vector spaces over 1000 years

Journal

JOURNAL OF SUPERCOMPUTING
Volume 77, Issue 9, Pages 9848-9878

Publisher

SPRINGER
DOI: 10.1007/s11227-020-03584-5

Keywords

Synonym search; Analogy; Transformation matrix; Transitivity

Funding

  1. JST CREST, Japan [JPMJCR1402]
  2. JSPS KAKENHI [JP16H02906, JP17H00762, JP18H03494, JP18H03243]

Ask authors/readers for more resources

The study proposes a method for synonym search by utilizing transformation matrix word vectors to address analogical problems over time, showcasing experiments for evaluation using nDCG and MRR metrics.
We proposed a synonym search method to solve (A, B) similar to (C, D) problem over time with a query by an example in a known domain for information in an unknown domain. It seems a natural relation that Bush in the 2000s is similar to Reagan in the 1980s because Bush and Reagan are the president of the USA in these decades. The abstraction is A in B which is similar to C in D. We solve the (A, B) similar to (C, D) problem over time by using transformation matrix word vectors over time in the Skip-gram model. The example of the (A, B) similar to (C, D) problem is as below. For instance, the sentence Bush in the 2000s is similar to X in the 1980s is given. We search for the appropriate entity to X of the sentence. Therefore, we focus on the transitivity between the transformation matrix. Our approach is to convert the vector representation of A in the model of the word embedding model of the B to the vector representation of X in the model of the word embedding model of the D by getting the transformation matrix between word embedding models. We discuss the parameters of previous work and improve choosing words to make transformation matrix using co-occurrence cluster. Our aim is to search for synonyms in which there are more than the 1000 years of separation. However, there are a few common stable meaning words between the 2000s and the 1000s. Therefore, in the situation, there is difficulty to use co-occurrence cluster because the clusters of common words are more than 100,000. That is why, we use the transitive relation (2000s, X) similar to (1500s, Y), (1500s, Y) similar to (1000s, Z). (2000s, X) similar to (1000s, Z) (in the abstruction, the transitivity is x similar to y, y similar to z. x similar to z) to solve the (A, B) similar to (C, D) problem over the 1000 years of separation. We had experiments as the demonstration to solve the (A, B) similar to (C, D) problem and evaluate nDCG and MRR.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.5
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available