13 Security Lab

Learn about TF-IDF 본문

Computer Security/Security Articles

Learn about TF-IDF

Maj0r Tom 2021. 2. 20. 20:51

What is TF-IDF ?

TF-IDF (Term Frequency-Inverse Document Frequency) is a weight used in information retrieval and text mining, and is a statistical value indicating how important a word is in a specific document when there is a document group consisting of multiple documents.

TF (term frequency) is a value that indicates how often a specific word appears in a document, and the higher this value, the more important it can be in the document. However, when the word itself is used frequently within a document family, this means that the word appears common. This is called DF (document frequency), and the reciprocal of this value is called IDF (inverse document frequency). TF-IDF is the product of TF and IDF.

Term Frequency

How often the word appears in the document.

TF (term frequency) is a value that indicates how often a specific word appears in a document. The higher this value, the more important it can be in the document. However, if it does not appear much in one document and appears frequently in another document, the importance of the word decreases.

Inverse Document Frequency

Weighing words that appear more frequently (such as and, the, or which are stop words are typically disregarded) and prioritizing unique words that appear commonly across documents.

It is called DF (document frequency), and the reciprocal of this value is called IDF (inverse document frequency).
TF-DF is the product of TF and IDF, and the higher the score, the less often it is in other documents, and it means words that appear frequently in the document.

Comments