Machine learning Life cycle

Machine Learning Life Cycle:

Machine learning has given the computer systems the abilities to automatically learn without being explicitly programmed. But how does a machine learning system work? So, it can be described using the life cycle of machine learning. Machine learning life cycle is a cyclic process to build an efficient machine learning project. The main purpose of the life cycle is to find a solution to the problem or project.

Machine learning life cycle involves seven major steps, which are given below:

Gathering Data
Data preparation
Data Wrangling
Analyse Data
Train the model
Test the model
Deployment

Deployment

The most important thing in the complete process is to understand the problem and to know the purpose of the problem. Therefore, before starting the life cycle, we need to understand the problem because the good result depends on the better understanding of the problem.

In the complete life cycle process, to solve a problem, we create a machine learning system called "model", and this model is created by providing "training". But to train a model, we need data, hence, life cycle starts by collecting data.

1. Gathering Data:

Data Gathering is the first step of the machine learning life cycle. The goal of this step is to identify and obtain all data-related problems.

In this step, we need to identify the different data sources, as data can be collected from various sources such as files, database, internet, or mobile devices. It is one of the most important steps of the life cycle. The quantity and quality of the collected data will determine the efficiency of the output. The more will be the data, the more accurate will be the prediction.

This step includes the below tasks:

Identify various data sources
Collect data
Integrate the data obtained from different sources

By performing the above task, we get a coherent set of data, also called as a dataset. It will be used in further steps.

2. Data preparation

After collecting the data, we need to prepare it for further steps. Data preparation is a step where we put our data into a suitable place and prepare it to use in our machine learning training.

In this step, first, we put all data together, and then randomize the ordering of data.

This step can be further divided into two processes:

Data exploration:
It is used to understand the nature of data that we have to work with. We need to understand the characteristics, format, and quality of data.
A better understanding of data leads to an effective outcome. In this, we find Correlations, general trends, and outliers.
Data pre-processing:
Now the next step is preprocessing of data for its analysis.

3. Data Wrangling

Data wrangling is the process of cleaning and converting raw data into a useable format. It is the process of cleaning the data, selecting the variable to use, and transforming the data in a proper format to make it more suitable for analysis in the next step. It is one of the most important steps of the complete process. Cleaning of data is required to address the quality issues.

It is not necessary that data we have collected is always of our use as some of the data may not be useful. In real-world applications, collected data may have various issues, including:

Missing Values
Duplicate data
Invalid data
Noise

So, we use various filtering techniques to clean the data.

It is mandatory to detect and remove the above issues because it can negatively affect the quality of the outcome.

4. Data Analysis

Now the cleaned and prepared data is passed on to the analysis step. This step involves:

Selection of analytical techniques
Building models
Review the result

The aim of this step is to build a machine learning model to analyze the data using various analytical techniques and review the outcome. It starts with the determination of the type of the problems, where we select the machine learning techniques such as Classification, Regression, Cluster analysis, Association, etc. then build the model using prepared data, and evaluate the model.

Hence, in this step, we take the data and use machine learning algorithms to build the model.

5. Train Model

Now the next step is to train the model, in this step we train our model to improve its performance for better outcome of the problem.

We use datasets to train the model using various machine learning algorithms. Training a model is required so that it can understand the various patterns, rules, and, features.

6. Test Model

Once our machine learning model has been trained on a given dataset, then we test the model. In this step, we check for the accuracy of our model by providing a test dataset to it.

Testing the model determines the percentage accuracy of the model as per the requirement of project or problem.

7. Deployment

The last step of machine learning life cycle is deployment, where we deploy the model in the real-world system.

If the above-prepared model is producing an accurate result as per our requirement with acceptable speed, then we deploy the model in the real system. But before deploying the project, we will check whether it is improving its performance using available data or not. The deployment phase is similar to making the final report for a project.

How to get datasets for Machine Learning

The key to success in the field of machine learning or to become a great data scientist is to practice with different types of datasets. But discovering a suitable dataset for each kind of machine learning project is a difficult task. So, in this topic, we will provide the detail of the sources from where you can easily get the dataset according to your project.

Before knowing the sources of the machine learning dataset, let's discuss datasets.

What is a dataset?

A dataset is a collection of data in which data is arranged in some order. A dataset can contain any data from a series of an array to a database table. Below table shows an example of the dataset:

Country	Age	Salary	Purchased
India	38	48000	No
France	43	45000	Yes
Germany	30	54000	No

A tabular dataset can be understood as a database table or matrix, where each column corresponds to a particular variable, and each row corresponds to the fields of the dataset. The most supported file type for a tabular dataset is "Comma Separated File," or CSV. But to store a "tree-like data," we can use the JSON file more efficiently.

Types of data in datasets

Numerical data:Such as house price, temperature, etc.
Categorical data:Such as Yes/No, True/False, Blue/green, etc.
Ordinal data:These data are similar to categorical data but can be measured on the basis of comparison.

Note: A real-world dataset is of huge size, which is difficult to manage and process at the initial level. Therefore, to practice machine learning algorithms, we can use any dummy dataset.

Need of Dataset

To work with machine learning projects, we need a huge amount of data, because, without the data, one cannot train ML/AI models. Collecting and preparing the dataset is one of the most crucial parts while creating an ML/AI project.

The technology applied behind any ML projects cannot work properly if the dataset is not well prepared and pre-processed.

During the development of the ML project, the developers completely rely on the datasets. In building ML applications, datasets are divided into two parts:

Training dataset:
Test Dataset

Note: The datasets are of large size, so to download these datasets, you must have fast internet on your computer.

Popular sources for Machine Learning datasets

Below is the list of datasets which are freely available for the public to work on it:

1. Kaggle Datasets

2. UCI Machine Learning Repository

Google dataset search engine is a search engine launched by Google on September 5, 2018. This source helps researchers to get online datasets that are freely available for use.

The link for the Google dataset search engine is https://toolbox.google.com/datasetsearch.

complete practical example step by step

1) Data Gathering Code:

this is the dataset

https://github.com/shivaconceptsolution/machine-learning/blob/main/Data.csv

import numpy as np

import matplotlib.pyplot as mpt

import pandas as pd

data_set= pd.read_csv('Data.csv')

data_set

2) Data Preparation Code

x= data_set.iloc[:,:-1].values

y= data_set.iloc[:,3].values

print(x)

print(y)

3) Data Wrangling:

from sklearn.impute import SimpleImputer

imputer= SimpleImputer(missing_values =np.nan, strategy='mean')

imputerimputer= imputer.fit(x[:, 1:3])

x[:, 1:3]= imputer.transform(x[:, 1:3])

Encoding data

from sklearn.preprocessing import LabelEncoder

label_encoder_x= LabelEncoder()

x[:, 0]= label_encoder_x.fit_transform(x[:, 0])

Encoding Data

from sklearn.preprocessing import LabelEncoder, OneHotEncoder

from sklearn.compose import ColumnTransformer

ct = ColumnTransformer([("Country", OneHotEncoder(), [0])], remainder = 'passthrough')

x = ct.fit_transform(x)

Encoding

labelencoder_y= LabelEncoder()

y= labelencoder_y.fit_transform(y)

Data Extraction

from sklearn.model_selection import train_test_split

x_train, x_test, y_train, y_test= train_test_split(x, y, test_size= 0.2, random_state=0)

print(x_train,y_train)

print(x_test,y_test)

Data Scaling

from sklearn.preprocessing import StandardScaler

st_x= StandardScaler()

X_train= st_x.fit_transform(x_train)

X_test= st_x.transform(x_test)

print(X_train)

print(X_test)

Data Analysis

Choose Logistic Regression

Create Model and Train the model

from sklearn.linear_model import LogisticRegression

from sklearn.metrics import accuracy_score

# Initialize the model

model = LogisticRegression()

# Train the model

model.fit(X_train, y_train)

# Predict on the test set

y_pred = model.predict(X_test)

# Evaluate accuracy

accuracy = accuracy_score(y_test, y_pred)

print(f"Model Accuracy: {accuracy:.2f}")

Test the model:

# Example new data for prediction (must match feature format)

new_data = [[1.0, -0.37796447, -0.77459667, 0.5, 1.2]] # Example input

# Predict class

predicted_class = model.predict(new_data)

# Predict probability

predicted_probability = model.predict_proba(new_data)

print(f"Predicted Class: {predicted_class[0]}")

print(f"Prediction Probabilities: {predicted_probability[0]}")

If you want to Join Machine Learning Placement Oriented Training then visit Click to visit shiva concept solution

JDBC Database Connectivity using JSP and Servlet, Database connectivity on Java

JDBC Database Connectivity using JSP and Servlet, Database connectivity on Java JDBC:- JDBC means Java database connectivity, it is used to connect from the front-end(application server) to the back-end(database server) in the case of Java web application. The database provides a set of tables to store records and JDBC will work similarly to the bridge between the database table and application form. 1) Class.forName("drivername") // Manage Drive Class.formName("com.mysql.jdbc.Driver"); // MYSQL Class.forName ("oracle.jdbc.driver.OracleDriver"); //Oracle 2) Manage Connection String It establish connection from application server to database server, Java provide DriverManage class and getConnection that will return Connection object. Connection conn = DriverManager.getConnection("jdbc:mysql://localhost:3306/databasename","username","password"); 3) Manage Statement to...

Search This Blog