Title

DENSITY-BASED AND PARAMETERLESS CLUSTERING OF EMBEDDED DATA STREAMS

Abstract

Abstract

With the accelerating digitalization of the world, the amount of high-speed data produced is increasing rapidly, and it is difficult to record and collectively process such a data-stream. This creates the need for processing as soon as it arrives without recording the data stream. Mostly, there is no prior information about data. Additionally, characteristics of data streams may change over time; this phenomenon is called concept drift. Since clustering works without actual labels, it is suitable to be used on data streams. Clustering algorithms for data streams should read the data only once, work in real-time, and adapt to the concept drift. With Density-Based and Parameterless Clustering of Embedded Data Streams (DBPCES) algorithm developed in this study, data streams are embedded into 2-dimensions and clustered with a parameterless density-based clustering algorithm. To embed the data stream into 2-dimensions, UMAP algorithm was adapted to handle data streams and concept drift. For clustering, DBSCAN algorithm was used on embedded data points. DBSCAN parameters were estimated with a heuristic so that data stream can be clustered without requiring any parameters from the user. DBPCES algorithm was run on artificial and real data streams that differ in actual cluster count, dimension count, and concept drift rate. The performance of DBPCES was compared with DenStream and Implementation of Zubaroğlu and Atalay. As evaluation metrics, adjusted rand index, purity, and silhouette coefficient were used. Additionally, execution times were compared as well. Although DBPCES was not as fast as DenStream, it achieved similar results with other algorithms.

Zoom Link:
https://zoom.us/j/96223649218?pwd=THBaTThYRTNib1gzRW5wSk1CZVlvUT09

Supervisor(s)

Supervisor(s)

OZLEM POYRAZ

Date and Location

Date and Location

2021-09-09 11:00:00

Category

Category

MSc_Thesis