Why do we need Data Preprocessing?

A real-world data generally contains noises, missing values, and maybe in an unusable format that cannot be directly used for machine learning models. Data preprocessing is required tasks for cleaning the data and making it suitable for a machine learning model which also increases the accuracy and efficiency of a machine learning model.

It involves the below steps:
Getting the dataset
Importing libraries
Importing datasets
Finding Missing Data
Encoding Categorical Data
Splitting dataset into training and test set
Feature scaling

from sklearn.impute import SimpleImputer

import pandas as pd

import numpy as np

from sklearn.preprocessing import LabelEncoder,OneHotEncoder

from sklearn.preprocessing import StandardScaler

from sklearn.compose import ColumnTransformer

from sklearn.model_selection import train_test_split

from sklearn.preprocessing import StandardScaler

data_set= pd.read_csv('d:/Data.csv')

imputer= SimpleImputer(missing_values =np.nan, strategy='mean')

#Fitting imputer object to the independent variables x.

x= data_set.iloc[:,:-1].values

imputer= imputer.fit(x[:, 1:3])

#Replacing missing data with the calculated mean value

x[:, 1:3]= imputer.transform(x[:, 1:3])

print(x)

label_encoder_x_1 = LabelEncoder()

x[: , 0] = label_encoder_x_1.fit_transform(x[:,0])

transformer = ColumnTransformer(

[('Country', OneHotEncoder(sparse=False),[0]),],remainder='passthrough'

)

x = transformer.fit_transform(x)

print(x)

y= data_set.iloc[:,3].values

labelencoder_y= LabelEncoder()

y= labelencoder_y.fit_transform(y)

print(y)

x_train, x_test, y_train, y_test= train_test_split(x, y, test_size= 0.2, random_state=0)

#print("Training Data Set are ",x_train,y_train)

#print("Testing Data Set are",x_test,y_test)

st_x= StandardScaler()

x_train= st_x.fit_transform(x_train)

x_test= st_x.transform(x_test)

print(x_train)

print(x_test)

Machine learning:- K-means

K-means clustering is a method for finding clusters and cluster centers in a set of unlabelled data.

cluster means group of matching data ,we can show using different label for example we can create three different sub-group for red ,green and blue to manage related data ,if item will be belonging from red color then it will be the part of red cluster.

step for clustering:-

1) prepare data using repository or from array using numpy
2) if we want to re scale data then we can use whiten()
3) calculate centroid point from data based on number of cluster.
4) display possible matching from cluster with values it will return the minimum difference using 0,1 and 2 ... form
Complete code of K-means Clustering algorithm ,it will be mainly implemented in ML:-
Complete example of Clustering concept

from numpy import hstack,array

from numpy.random import rand

from scipy.cluster.vq import *

data = vstack((rand(10,3) + array([1,1,1]),rand(10,3)))

centroids,_ = kmeans(data,5)

print(centroids)

clx,_ = vq(data,centroids)

print(clx)

Ad Code

✨🎆 Diwali Dhamaka Offer! 🎆✨

Get 20% OFF on All Courses at Shiva Concept Solution click

Machine learning:- K-means

Why do we need Data Preprocessing?

It involves the below steps:
Getting the dataset
Importing libraries
Importing datasets
Finding Missing Data
Encoding Categorical Data
Splitting dataset into training and test set
Feature scaling

Machine learning:- K-means

Posted by Shiva Gautam

Post a Comment

0 Comments

Most Popular

Loop Tutorials in Python,For,While,For--else,While--else example in Python

Conditional Statement in Python

Loop Statement in Java, For Loop, While, Do-while, Nested For Loop

Tutorials Stack

Advertisement

✨🎆 Diwali Dhamaka Offer! 🎆✨

Get 20% OFF on All Courses at Shiva Concept Solution (No.1 IT Training Center )

Menu Footer Widget

Contact form

Ad Code

✨🎆 Diwali Dhamaka Offer! 🎆✨

Get 20% OFF on All Courses at Shiva Concept Solution click

Machine learning:- K-means

Why do we need Data Preprocessing?

It involves the below steps:Getting the datasetImporting librariesImporting datasetsFinding Missing DataEncoding Categorical DataSplitting dataset into training and test setFeature scaling

Machine learning:- K-means

Posted by Shiva Gautam

You may like these posts

Post a Comment

0 Comments

Most Popular

Loop Tutorials in Python,For,While,For--else,While--else example in Python

Conditional Statement in Python

Loop Statement in Java, For Loop, While, Do-while, Nested For Loop

Tutorials Stack

Advertisement

✨🎆 Diwali Dhamaka Offer! 🎆✨

Get 20% OFF on All Courses at Shiva Concept Solution (No.1 IT Training Center )

Menu Footer Widget

Contact form

It involves the below steps:
Getting the dataset
Importing libraries
Importing datasets
Finding Missing Data
Encoding Categorical Data
Splitting dataset into training and test set
Feature scaling