Pandas Introduction, Data Selection using pandas

0


It is the upper layer of NumPy that has been created by the NumPy library, pandas are used to provide data operation, data selection, and data structure similar to SQL query of the database.

How to install pandas in the machine?

pip install pandas

conda instal pandas

Pandas provide two different types of Objects:-

1)  Series:- Series is called Single Dimension Array that is used to contain labeled data under array objects.

It is also used to implement numerical operations similar to NumPy array but it is used to implement calculation on Dataframe columns.

Example 1st:-

Create Script to display data on series Without label

import pandas as pd

import numpy as np

arr = np.array([12,23,34,56,89,11])

data = pd.Series(arr)

print(data)

Create Script to display data on series With label

import pandas as pd

import numpy as np

arr = np.array([12,23,34,56,89,11])

data = pd.Series(arr,index=['P','q','r','s','t','u'])

print(data)

Create Series with List?

import pandas as pd

import numpy as np

arr = [12,23,34,56,89,11]

data = pd.Series(arr,index=['P','q','r','s','t','u'])

print(data)

Create Series with Dictionary?

Dictionary provide data using key=>value pair hence we can create labeled series using dictionary objects.

import pandas as pd

import numpy as np

arr = {'rno':1001,'sname':'jay kumar','branch':'cs','fees':45000}

data = pd.Series(arr)

print(data)

Create Series using Scaler Data:-

Scalar data means integer, float, double and String type data.

import pandas as pd

import numpy as np

data = pd.Series('SCS',index=['A','B','C','D','E'])

print(data)

Create Series object using NumPy predefine functions:-

import pandas as pd

import numpy as np

data = pd.Series(np.linspace(3,100,5))

print(data)

2)  DataFrame:-

It is a two-dimensional array that is used to contain data using rows and columns. It is mostly used to map the dataset data into applications.

When we load any repository then it returns data under CSV or Excel file that can be easily manipulated by Data Frame Objects.

How to create Data Frame Objects:-

import pandas as pd

 # list of strings

lst = ['SCS', 'For', 'SCS, 'is', 

            'portal', 'for', 'SCS]

 # Calling DataFrame constructor on list

df = pd.DataFrame(lst)

print(df)

Creating Data Frame Objects using Dictionary to List Objects that store elements into Multidimensional pattern.

import pandas as pd

 # intialise data of lists.

data = {'Name':['C', 'C++', 'DS', 'JAVA'],

        'Age':[20, 21, 19, 18]}

 # Create DataFrame

df = pd.DataFrame(data)

# Print the output.

print(df)

In this article, I am explaining the different view functions of pandas?

head():-  this function is used to return nth-number of rows of data frame and series both.

Syntax: Dataframe.head(n=5)

Parameters:
n: integer value, number of rows to be returned

Return type: Dataframe with top n rows

if we did not write any parameter on the head then it returns the top 5 records?

import pandas as pd

# making data frame

data = pd.read_csv("nba.csv")

# calling head() method

# storing in new variable

data_top = data.head()

# display

data_top

Convert data frame to series using pandas?

import pandas as pd

# making data frame

data = pd.read_csv("nba.csv")

# number of rows to return

n = 9

# creating series

series = data["Name"]

# returning top n rows

top = series.head(n = n)

# display

print(top)

display rows from the bottom?

pandas provide tail() to display row from bottom 

# importing pandas module

import pandas as pd

# making data frame

data = pd.read_csv("nba.csv")

# calling head() method

# storing in new variable

data_top = data.tail()

# display

data_top

# importing pandas module

import pandas as pd

# making data frame

data = pd.read_csv("nba.csv")

# number of rows to return

n = 2

# creating series

series = data["Name"]

# returning top n rows

top = series.tail(n = n)

# display    top


Statistical operation using pandas?

If we want to perform max, min,avg, std functionality then we can use describe()  in pandas.

Syntax: DataFrame.describe(percentiles=None, include=None, exclude=None)

Parameters:
percentile: list-like data type of numbers between 0-1 to return the respective percentile
include: List of data types to be included while describing data frame. Default is None
exclude: List of data types to be Excluded while describing data frame. Default is None

Return type: Statistical summary of the data frame.

,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,

Projection or Show particular data :-

It is used to show particular data  from data frame

# Import pandas package

import pandas as pd

# Define a dictionary containing employee data

data = {'Name':['Jai', 'Princi', 'Gaurav', 'Anuj'],

'Age':[27, 24, 22, 32],

'Address':['Delhi', 'Kanpur', 'Allahabad', 'Kannauj'],

'Qualification':['Msc', 'MA', 'MCA', 'Phd']}

# Convert the dictionary into DataFrame

df = pd.DataFrame(data)

# select two columns

print(df[['Name', 'Qualification']])

How to add a new column attribute in the data frame?

# Import pandas package

import pandas as pd

# Define a dictionary containing Students data

data = {'Name': ['Jai', 'Princi', 'Gaurav', 'Anuj'],

'Height': [5.1, 6.2, 5.1, 5.2],

'Qualification': ['Msc', 'MA', 'Msc', 'Msc']}

# Convert the dictionary into DataFrame

df = pd.DataFrame(data)

# Declare a list that is to be converted into a column

address = ['Delhi', 'Bangalore', 'Chennai', 'Patna']

# Using 'Address' as the column name

# and equating it to the list

df['Address'] = address

# Observe the result

print(df)

Select data based on rows?

# importing pandas package

import pandas as pd

# making data frame from csv file

data = pd.read_csv("nba.csv", index_col ="Name")

# retrieving row by loc method

first = data.loc["Avery Bradley"]

second = data.loc["R.J. Hunter"]

print(first, "\n\n\n", second)

How to merge rows in pandas?

import pandas as pd     
# making data frame 
df = pd.read_csv("nba.csv", index_col ="Name"
  
df.head(10)  
new_row = pd.DataFrame({'Name':'Geeks', 'Team':'Boston', 'Number':3,
                        'Position':'PG', 'Age':33, 'Height':'6-2',
                        'Weight':189, 'College':'MIT', 'Salary':99999},
                                                            index =[0])
# simply concatenate both dataframes
df = pd.concat([new_row, df]).reset_index(drop = True)
df.head(5)


Deletion of rows in Django:-

mport pandas as pd

# making data frame from csv file
data = pd.read_csv("nba.csv", index_col ="Name" )
  # dropping passed values
data.drop(["Avery Bradley", "John Holland", "R.J. Hunter","R.J. Hunter"], inplace = True)
  
# display
data




Post a Comment

0Comments

POST Answer of Questions and ASK to Doubt

Post a Comment (0)