**Similarity Measure Tool,** Text similarity has to determine how ‘close’ text/keywords of two document .To calculate similarity of two document similarity measure functions are used.

Similarity measure is the function which assigns a real number between 0 and 1 to the documents. A zero value means that the documents are dissimilar completely whereas one indicates that the documents are identical practically.

**Cosine Similarity**

- Cosine similarity calculates similarity by measuring the cosine of angle between two vectors. This is calculated as:

- With cosine similarity, we need to convert sentences into vectors. One way to do that is to use bag of words with either TF (term frequency) or TF-IDF (term frequency- inverse document frequency). The choice of TF or TF-IDFdepends on application and is immaterial to how cosine similarity is actually performed — which just needs vectors. TF is good for text similarity in general, but TF-IDF is good for search query relevance.

**Steps to calculate cosine Similarity**

**Step 1**, we will calculate Term Frequency using Bag of Words

**Step 2, **The main issue with term frequency counts is that it favors the documents or sentences that are longer. One way to solve this issue is to **normalize** the term frequencies with the respective magnitudes. Summing up squares of each frequency and taking a square root.

**Step 3, **as we have already normalized the two vectors to have a length of 1, we can calculate the cosine similarity with a dot product.

**Flow of cosine Similarity Measure**