k-mean clustering and its real use case in the security domain k-means clustering is a very famous…

5 min readJul 20, 2021

k-mean clustering and its real use case in the security domain

k-means clustering is a very famous and powerful unsupervised machine learning algorithm. It is used to solve many complex unsupervised machine learning problems. Before we start let’s take a look at the points which we are going to understand.

let's see a small simple note about supervised and unsupervised learning

supervised is the way through which have to train the model by knowing the final target => supervised learning
unsupervised is the way through which we have to train the model without knowing the final target => unsupervised learning
let's see unsupervised learning

what is the use case to train the model without knowing the final target value, what kind of problems can solve unsupervised learning, what exactly use case of unsupervised learning all these challenges can solve by k-mean cluster algorithm

Introduction
A K-means clustering algorithm tries to group similar items in the form of clusters. The number of groups is represented by K.

Let us understand the K-means clustering algorithm with its simple example

Suppose you went to a vegetable shop to buy some vegetables. There you will see different kinds of vegetables. The one thing you will notice there that the vegetables will be arranged in a group of their types. Like all the carrots will be kept in one place, potatoes will be kept with their kinds and so on. If you will notice here then you will find that they are forming a group or cluster, where each of the vegetables is kept within their kind of group forming the clusters.

Now we will understand this with the help of a beautiful figure.

Now, look at the above two figures. what did you observe?

•Let us talk about the first figure. The first figure shows the data before applying the k-means clustering algorithm. Here all three different categories are messed up. When you will see such data in the real world, you will not able to figure out the different categories. -- -- Now, look at the second figure(fig 2). This shows the data after applying the K-means clustering algorithm. you can see that all three different items are classified into three different groups of data which are called clusters. here one group of data = = one cluster
•here k = 3 ,
•k means how deep go you into the data = > how many different types of groups of data, which means how many clusters of data, normally how many different kinds of groups are represented as k
How Does the K-means clustering algorithm work?
k-means clustering tries to group similar kinds of items in form of clusters. It

finds the similarity between the items and groups them into clusters. K-means

The clustering algorithm works in three steps. Let’s see what are these three steps

1.Select the k values. 2.Initialize the centroids. 3.Select the group and find the average

Let us understand the above steps with the help of the figure because a good picture is better than the thousands of words

•Figure 1 shows the representation of data of two different items. the first item has shown in blue color and the second item has shown in red color. Here I am choosing the value of K randomly as
There are different methods by which we can choose the right k values. meanes k-value depends on you how deep you want to go into data, Deep means analyze every record of data, means more time on have to spend on data, then more group of data will get this normally achieve by the concept of k -mean clustering

NOTE: Please note that the K-means clustering uses the euclidean distance method to find out the distance between the points.

K-Mean Clustering Algorithm in the Security Domain
mostly in cybersecurity hackers always try to hit servers with different kinds of patterns, we don't know, what kind of patterns hackers used so, we try collecting the information from log files and grouping it with help of a cluster, then if we see same kind of pattern come up might be its potential thread, then security team analyze the patterns and alerts

identifying crime localities
with data related to crimes available in specific localities in a city, the category of crime, the area of the crime, and the association between the two can give quality insight into crime-prone areas within a city or a locality

insurance fraud detection
machine-learning has a critical role to play in fraud detection and has numerous applications in automobile, healthcare, and insurance fraud detection. utilizing past historical data on fraudulent claims, it is possible to isolate new claims based on its proximity to clusters that indicate fraudulent patterns. since insurance fraud can potentially have a multi-million dollar impact on a company, the ability to detect frauds is crucial

cyber-profiling criminals
cyber-profiling is the process of collecting data from individuals and groups to identify significant co-relations. the idea of cyber profiling is derived from criminal profiles, which provide information on the investigation division to classify the types of criminals who were at the crime scene

call record detail analysis
a call detail record (cdr) is the information captured by telecom companies during the call, sms, and internet activity of a customer. this information provides greater insights about the customer’s needs when used with customer demographics

automatic clustering of it alerts
large enterprise it infrastructure technology components such as network, storage, or database generate large volumes of alert messages. because alert messages potentially point to operational issues, they must be manually screened for prioritization for downstream processes

let’s see how the KMeans cluster helps in pattern detection