KLASTERISASI DATA UNSUPERVISED MENGGUNAKAN METODE K-MEANS

No Thumbnail Available

Date

2020-04

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Each year the research of student’s thesis is increasing and it is possible to have the same or similar topics, where this thesis document can be grouped or clusterized based on the similiarity pattern of titles. Before doing a thesis document clustering, the title of the thesis will be weighted using the Text Mining method and Term Frequency-Inverse Document Frequency (TF-IDF). The grouping method used is the K-Means method which is an unsupervised clustering technique with the calculation distance of similarities using Cosine Similarity and the selection of initial cluster centroids that have been developed using Improved K-Means, which combines distance and density optimization methods. The final result of the clustering using 73 data title text of the thesis student generates seven clusters where members of each cluster have a high similiarity seen from the title text of a fellow cluster member.

Description

Keywords

Clustering,, Cosine Similiarity, Improved K-Means, K-Means, TF-IDF

Citation

Collections