Measuring textual similarity

Semantic similarity in the area of natural language processing, aka semantic proximity (and some people also use the term semantic relatedness) is an assessment that can be estimated by defining a topological similarity, using, for example, dictionaries to define the distance between terms or concepts belonging to those dictionaries. For example, a naïve metric for the comparison of concepts ordered in a partially ordered set and represented as nodes of a directed acyclic graph (for example, a taxonomy), would be the shorter path that joins the two concept nodes. Currently, there are many methods to estimate this similarity. The problem emerges when there is no dictionary to calculate the number of nodes that one term is from each other. In that case, we can rely in other kinds of measures.

For more information, please refer to: Jorge Martinez-Gil, José Francisco Aldana-Montes: Semantic similarity measurement using historical google search patterns. Inf. Syst. Frontiers 15(3): 399-410 (2013)