Skip to main content

Machine learning:- K-means

Why do we need Data Preprocessing?


A real-world data generally contains noises, missing values, and maybe in an unusable format that cannot be directly used for machine learning models. Data preprocessing is required tasks for cleaning the data and making it suitable for a machine learning model which also increases the accuracy and efficiency of a machine learning model.


It involves the below steps:
  • Getting the dataset
  • Importing libraries
  • Importing datasets
  • Finding Missing Data
  • Encoding Categorical Data
  • Splitting dataset into training and test set
  • Feature scaling



from sklearn.impute import SimpleImputer 
import pandas as pd
import numpy as np
from sklearn.preprocessing import LabelEncoder,OneHotEncoder 
from sklearn.preprocessing import StandardScaler
from sklearn.compose import ColumnTransformer
from sklearn.model_selection import train_test_split  
from sklearn.preprocessing import StandardScaler
data_set= pd.read_csv('d:/Data.csv') 

imputer= SimpleImputer(missing_values =np.nan, strategy='mean')  
#Fitting imputer object to the independent variables x.   
x= data_set.iloc[:,:-1].values 

imputer= imputer.fit(x[:, 1:3])  
#Replacing missing data with the calculated mean value  
x[:, 1:3]= imputer.transform(x[:, 1:3])  
print(x)

label_encoder_x_1 = LabelEncoder()
x[: , 0] = label_encoder_x_1.fit_transform(x[:,0])
transformer = ColumnTransformer(
   [('Country', OneHotEncoder(sparse=False),[0]),],remainder='passthrough'
)
x = transformer.fit_transform(x)
print(x)
y= data_set.iloc[:,3].values
labelencoder_y= LabelEncoder()  
y= labelencoder_y.fit_transform(y)  
print(y)
x_train, x_test, y_train, y_test= train_test_split(x, y, test_size= 0.2, random_state=0)

#print("Training Data Set are ",x_train,y_train)

#print("Testing Data Set are",x_test,y_test)

st_x= StandardScaler()  

x_train= st_x.fit_transform(x_train)  

x_test= st_x.transform(x_test) 

print(x_train)
print(x_test)

Machine learning:- K-means


K-means clustering is a method for finding clusters and cluster centers in a set of unlabelled data.

cluster means group of matching data ,we can show using different label for example we can create three different sub-group for red ,green and blue to manage related data ,if item will be belonging from red color then it will be the part of red cluster.

step for clustering:-

1) prepare data using repository or from array using numpy
2)  if we want to re scale data then we can use whiten()
3)  calculate centroid point from data based on number of cluster.
4)  display possible matching from cluster with values it will return the minimum difference using 0,1 and 2 ... form
Complete code of K-means Clustering algorithm ,it will be mainly implemented in ML:-
Complete example of Clustering concept
from numpy import hstack,array
from numpy.random import rand
from scipy.cluster.vq import *
data = vstack((rand(10,3) + array([1,1,1]),rand(10,3)))
centroids,_ = kmeans(data,5)
print(centroids)
clx,_ = vq(data,centroids)
print(clx) 
                       

Comments

Popular posts from this blog

DSA in C# | Data Structure and Algorithm using C#

  DSA in C# |  Data Structure and Algorithm using C#: Lecture 1: Introduction to Data Structures and Algorithms (1 Hour) 1.1 What are Data Structures? Data Structures are ways to store and organize data so it can be used efficiently. Think of data structures as containers that hold data in a specific format. Types of Data Structures: Primitive Data Structures : These are basic structures built into the language. Example: int , float , char , bool in C#. Example : csharp int age = 25;  // 'age' stores an integer value. bool isStudent = true;  // 'isStudent' stores a boolean value. Non-Primitive Data Structures : These are more complex and are built using primitive types. They are divided into: Linear : Arrays, Lists, Queues, Stacks (data is arranged in a sequence). Non-Linear : Trees, Graphs (data is connected in more complex ways). Example : // Array is a simple linear data structure int[] number...

JSP Page design using Internal CSS

  JSP is used to design the user interface of an application, CSS is used to provide set of properties. Jsp provide proper page template to create user interface of dynamic web application. We can write CSS using three different ways 1)  inline CSS:-   we will write CSS tag under HTML elements <div style="width:200px; height:100px; background-color:green;"></div> 2)  Internal CSS:-  we will write CSS under <style> block. <style type="text/css"> #abc { width:200px;  height:100px;  background-color:green; } </style> <div id="abc"></div> 3) External CSS:-  we will write CSS to create a separate file and link it into HTML Web pages. create a separate file and named it style.css #abc { width:200px;  height:100px;  background-color:green; } go into Jsp page and link style.css <link href="style.css"  type="text/css" rel="stylesheet"   /> <div id="abc"> </div> Exam...

Top 50 Most Asked MERN Stack Interview Questions and Answers for 2025

 Top 50 Most Asked MERN Stack Interview Questions and Answers for 2025 Now a days most of the IT Company asked NODE JS Question mostly in interview. I am creating this article to provide help to all MERN Stack developer , who is in doubt that which type of question can be asked in MERN Stack  then they can learn from this article. I am Shiva Gautam,  I have 15 Years of experience in Multiple IT Technology, I am Founder of Shiva Concept Solution Best Programming Institute with 100% Job placement guarantee. for more information visit  Shiva Concept Solution 1. What is the MERN Stack? Answer : MERN Stack is a full-stack JavaScript framework using MongoDB (database), Express.js (backend framework), React (frontend library), and Node.js (server runtime). It’s popular for building fast, scalable web apps with one language—JavaScript. 2. What is MongoDB, and why use it in MERN? Answer : MongoDB is a NoSQL database that stores data in flexible, JSON-like documents. It...