SCIPY in Data Science

6


Scipy is the special library of python which is used to perform an advanced mathematical operation using the different predefined methods.

It is the top layer of NumPy because NumPy is used to perform the basic mathematical operation, scipy mostly focus on linear algebra and other advanced mathematical function.

SCIPY was written as a SIGH PY word.

How do we install scipy in the machine?

if you want to install scipy with a core python or python shell then you can use the command

python -m pip install scipy.

pip install scipy

Q)Create Script to write File in Matlab format and load it using scipy library?

Matlab is a programming language that is a specialist in scientific programming. If we want to convert application data to Matlab format or Matlab format data to the application means we want to implement read and write operation then we can scipy library io module.

1) savemat():-  It is used to write application data to Matlab format

2) loadmat():-  It is used to read application data to Matlab format

import numpy as np

from scipy import io as scs

array = np.zeros((4, 4))

scs.savemat('exm.mat', {'key': array}) 

data = scs.loadmat('exm.mat', struct_as_record=True)

print(data['key'])

Using scipy.io we can write and read data from wav file , 

import numpy as np

from scipy.io.wavfile import write

samplerate = 44100; fs = 100

t = np.linspace(0., 1., samplerate)

amplitude = np.iinfo(np.int16).max

data = amplitude * np.sin(2. * np.pi * fs * t)

write("d:\\example.wav", samplerate, data.astype(np.int16))

If you want to play and record audio file then visit on this URL:-

https://xn--llions-yua.jutge.org/upc-python-cookbook/signal-processing/audio-image.html

How to read wav file

import numpy as np

from scipy.io.wavfile import read

d = read("d:\\example.wav")

print(d)

How to read text files?

SCIPY supports the ARFF type file format that is used to read data on text mode.

from scipy.io import arff

from io import StringIO

data, meta = arff.loadarff("d://hello.txt")

print(data,meta)

Note:- Numpy support only CSV, txt format data but scipy can write data in Matlab format which is used in many Matlab device systems.

Scipy, I/O package, has a wide range of functions for work with different files format which are Matlab, Arff, Wave, Matrix Market, IDL, NetCDF, TXT, CSV, and binary format.

Assignment:-

Using scipy read text and write a text file, CSV file both.

Special Function package:-

It provides multiple predefined methods to perform a mathematical operation using scipy.special.

SciPy's special package includes Cubic Root, Exponential, Log sum Exponential, Lambert, Permutation and Combinations, Gamma, Bessel,  hypergeometric,  Kelvin,  beta,  parabolic cylinder,  Relative Error Exponential, etc.

Cubic root function in python:-

scipy.special.cbrt(x)

Script to calculate cube using python:-

from scipy.special import cbrt

#Find cubic root of 27 & 64 using cbrt() function

cb = cbrt([27, 64])

#print value of cb

print(cb)

Exponential function:-

from scipy.special import exp10

#define exp10 function and pass value in its

exp = exp10([1,10])

print(exp)

Permutations & Combinations:

SciPy also gives functionality to calculate Permutations and Combinations.

Combinations - scipy.special.comb(N,k)

Combination without parameters:-

from scipy.special import comb

#find combinations of 5, 2 values using comb(N, k)

com = comb(5, 2, exact = False, repetition=True)

print(com) 

n+r-1 / r         6 c 2       

from scipy.special import comb

#find combinations of 5, 2 values using comb(N, k)

com = comb(5, 2, exact = False, repetition=False)

print(com)

It will add a number (n) to the result If repetition is true.

Permutations –

scipy.special.perm(N,k)

from scipy.special import perm

#find permutation of 5, 2 using perm (N, k) function

per = perm(5, 2, exact = True)

print(per)

Log Sum Exponential Function

Log Sum Exponential computes the log of the sum exponential input element.

Syntax :

scipy.special.logsumexp(x) 

from scipy.special import logsumexp

#define exp10 function and pass value in its

exp = logsumexp([2,3,4])

print(exp)

Bessel Function

Nth integer order calculation function

Syntax :

scipy.special.jn()

it uses this formula internally

from scipy.special import Jn

#define exp10 function and pass value in its

exp = jn(2,3)

print(exp)

Bessel function provides a various predefined method to implement the order

Math SciPy

Jν jv

Yν yv

Iν iv

Kν kv

Hν(1) hankel1

Hν(2) hankel2

jν sph_jn

yν sph_yn

...............................................................................................................

Linear Algebra with SciPy:-

Linear Algebra of SciPy is an implementation of BLAS and ATLAS LAPACK libraries.

Performance of Linear Algebra is very fast compared to BLAS and LAPACK.

Linear algebra routine accepts two-dimensional array object and output is also a two-dimensional array.

from scipy import linalg

import numpy as np

#define square matrix

two_d_array = np.array([ [4,5], [3,2] ])

#pass values to det() function

linalg.det( two_d_array )

How to solve the linear equation of linear algebra.

scipy provide solve()  to solve linear equation based formula

import numpy as np

from scipy import linalg

A = np.array([[1,3,5], [2,5,1],[2,3,8]])

B = np.array([[10], [8],[3]])

res=linalg.solve(A,B)

print(res)

Discrete Fourier Transform – scipy.fftpack

DFT is a mathematical technique that is used in converting spatial data into frequency data.

FFT (Fast Fourier Transformation) is an algorithm for computing DFT

FFT is applied to a multidimensional array.

Frequency defines the number of signals or wavelengths in a particular time period.

Example: Take a wave and show using the Matplotlib library. we take a simple periodic function example of sin(20 × 2Ï€t)

%matplotlib inline

from matplotlib import pyplot as plt

import numpy as np 

#Frequency in terms of Hertz

fre  = 5 

#Sample rate

fre_samp = 50

t = np.linspace(0, 2, 2 * fre_samp, endpoint = False )

a = np.sin(fre  * 2 * np.pi * t)

figure, axis = plt.subplots()

axis.plot(t, a)

axis.set_xlabel ('Time (s)')

axis.set_ylabel ('Signal amplitude')

plt.show()

You can see this. The frequency is 5 Hz and its signal repeats in 1/5 seconds – it's called as a particular time period.

Now let us use this sinusoid wave with the help of DFT application.

%matplotlib inline

from matplotlib import pyplot as plt

from scipy import fftpack

A = fftpack.fft(a)

frequency = fftpack.fftfreq(len(a)) * fre_samp

figure, axis = plt.subplots()

axis.stem(frequency, np.abs(A))

axis.set_xlabel('Frequency in Hz')

axis.set_ylabel('Frequency Spectrum Magnitude')

axis.set_xlim(-fre_samp / 2, fre_samp/ 2)

axis.set_ylim(-5, 110)

plt.show()

What is Sparse Data

Sparse data is data that has mostly unused elements (elements that don't carry any information ).

It can be an array like this one:

[1, 0, 2, 0, 0, 3, 0, 0, 0, 0, 0, 0]

Sparse Data: is a data set where most of the item values are zero.

Dense Array: is the opposite of a sparse array: most of the values are not zero.

In scientific computing, when we are dealing with partial derivatives in linear algebra we will come across sparse data.

How to Work With Sparse Data

SciPy has a module, scipy.sparse that provides functions to deal with sparse data.

There are primarily two types of sparse matrices that we use:

CSC - Compressed Sparse Column. For efficient arithmetic, fast column slicing.

CSR - Compressed Sparse Row. For fast row slicing, faster matrix-vector products

We will use the CSR matrix in this tutorial.

CSR Matrix

We can create CSR matrix by passing an arrray into function scipy.sparse.csr_matrix().

Example

Create a CSR matrix from an array:

import numpy as np

from scipy.sparse import csr_matrix

arr = np.array([0, 0, 0, 0, 0, 1, 1, 0, 2])

print(csr_matrix(arr))

The example above returns:

  (0, 5) 1

  (0, 6) 1

  (0, 8) 2

From the result, we can see that there are 3 items with value.

The 1. the item is in row 0 position 5 and has the value 1.

The 2. item is in row 0 position 6 and has the value 1.

The 3. item is in row 0 position 8 and has the value 2.

Sparse Matrix Methods

Viewing stored data (not the zero items) with the data property:

Example

import numpy as np

from scipy.sparse import csr_matrix

arr = np.array([[0, 0, 0], [0, 0, 1], [1, 0, 2]])

print(csr_matrix(arr).data)

Counting nonzeros with the count_nonzero() method:

Example

import numpy as np

from scipy.sparse import csr_matrix

arr = np.array([[0, 0, 0], [0, 0, 1], [1, 0, 2]])

print(csr_matrix(arr).count_nonzero())

Removing zero-entries from the matrix with the eliminate_zeros() method:

Example

import numpy as np

from scipy.sparse import csr_matrix

arr = np.array([[0, 0, 0], [0, 0, 1], [1, 0, 2]])

mat = csr_matrix(arr)

mat.eliminate_zeros()

print(mat)

Eliminating duplicate entries with the sum_duplicates() method:

Another Example:-

import numpy as np

from scipy.sparse import csr_matrix

M = csr_matrix(np.ones([2, 2],dtype=np.int32))

print(M)

print(M.data.shape)

for i in range(np.shape(M)[0]):

    for j in range(np.shape(M)[1]):

        if i==j:

            M[i,j] = 0

print(M)

print(M.data)

M.eliminate_zeros()

print(M.data)

Example

Eliminating duplicates by adding them:

import numpy as np

from scipy.sparse import csr_matrix

arr = np.array([[0, 0, 0], [0, 0, 1], [1, 0, 2]])

mat = csr_matrix(arr)

mat.sum_duplicates()

print(mat)

Converting from csr to csc with the tocsc() method:

Example

import numpy as np

from scipy.sparse import csr_matrix

arr = np.array([[0, 0, 0], [0, 0, 1], [1, 0, 2]])

newarr = csr_matrix(arr).tocsc()

print(newarr)

SciPy Graphs:-

Working with Graphs

Graphs are an essential data structure. that is the part of non-linear structure.

SciPy provides us with the module scipy.sparse.csgraph for working with such data structures.

Adjacency Matrix

An adjacency matrix is a nxn matrix where n is the number of elements in a graph.

And the values represent the connection between the elements.

Example:

For a graph like this, with elements A, B and C, the connections are:

A & B are connected with weight 1.

A & C are connected with weight 2.

C & B is not connected.

The Adjency Matrix would look like this:

      A B C

   A:[0 1 2]  

   B:[1 0 0]

   C:[2 0 0]

Below follows some of the most used methods for working with adjacency matrices.

Connected Components

Find all of the connected components with the connected_components() method.

Example

import numpy as np

from scipy.sparse.csgraph import connected_components

from scipy.sparse import csr_matrix

arr = np.array([

  [0, 1, 2],

  [1, 0, 0],

  [2, 0, 0]

])

newarr = csr_matrix(arr)

print(connected_components(newarr))

Dijkstra

Use the Dijkstra method to find the shortest path in a graph from one element to another.

It takes the following arguments:

return_predecessors: boolean (True to return the whole path of traversal otherwise False).

indices: index of the element to return all paths from that element only.

limit: max weight of path.

Example

Find the shortest path from element 1 to 2:

import numpy as np

from scipy.sparse.csgraph import dijkstra

from scipy.sparse import csr_matrix

arr = np.array([

  [0, 1, 2],

  [1, 0, 0],

  [2, 0, 0]

])

newarr = csr_matrix(arr)

print(dijkstra(newarr, return_predecessors=True, indices=0))

Floyd Warshall

Use the floyd_warshall() method to find shortest path between all pairs of elements.

Example

Find the shortest path between all pairs of elements:

import numpy as np

from scipy.sparse.csgraph import floyd_warshall

from scipy.sparse import csr_matrix

arr = np.array([

  [0, 1, 2],

  [1, 0, 0],

  [2, 0, 0]

])

newarr = csr_matrix(arr)

print(floyd_warshall(newarr, return_predecessors=True))

Bellman Ford

The bellman_ford() method can also find the shortest path between all pairs of elements, but this method can handle negative weights as well.

Example

Find shortest path from element 1 to 2 with given graph with a negative weight:

import numpy as np

from scipy.sparse.csgraph import bellman_ford

from scipy.sparse import csr_matrix

arr = np.array([

  [0, -1, 2],

  [1, 0, 0],

  [2, 0, 0]

])

newarr = csr_matrix(arr)

print(bellman_ford(newarr, return_predecessors=True, indices=0))

Depth First Order

The depth_first_order() method returns a depth first traversal from a node.

This function takes following arguments:

the graph.

the starting element to traverse graph from.

Example

Traverse the graph depth first for given adjacency matrix:

import numpy as np

from scipy.sparse.csgraph import depth_first_order

from scipy.sparse import csr_matrix

arr = np.array([

  [0, 1, 0, 1],

  [1, 1, 1, 1],

  [2, 1, 1, 0],

  [0, 1, 0, 1]

])

newarr = csr_matrix(arr)

print(depth_first_order(newarr, 1))

Breadth First Order

The breadth_first_order() method returns a breadth first traversal from a node.

This function takes following arguments:

the graph.

the starting element to traverse graph from.

Example

Traverse the graph breadth first for given adjacency matrix:

import numpy as np

from scipy.sparse.csgraph import breadth_first_order

from scipy.sparse import csr_matrix

arr = np.array([

  [0, 1, 0, 1],

  [1, 1, 1, 1],

  [2, 1, 1, 0],

  [0, 1, 0, 1]

])

newarr = csr_matrix(arr)

print(breadth_first_order(newarr, 1))

Test Yourself With Exercises

Exercise:

Insert the missing method to find all the connected components:

import numpy as np

from scipy.sparse.csgraph import connected_components

from scipy.sparse import csr_matrix

arr = np.array([

  [0, 1, 2],

  [1, 0, 0],

  [2, 0, 0]

])

newarr = csr_matrix(arr)

print(

(newarr))

Working with Spatial Data

Spatial data refers to data that is represented in a geometric space.

E.g. points on a coordinate system.

We deal with spatial data problems on many tasks.

E.g. finding if a point is inside a boundary or not.

SciPy provides us with the module scipy.spatial, which has functions for working with spatial data.

Triangulation

A Triangulation of a polygon is to divide the polygon into multiple triangles with which we can compute an area of the polygon.

A Triangulation with points means creating surface composed triangles in which all of the given points are on at least one vertex of any triangle in the surface.

One method to generate these triangulations through points is the Delaunay() Triangulation.

Example

Create a triangulation from following points:

import numpy as np

from scipy.spatial import Delaunay

import matplotlib.pyplot as plt

points = np.array([

  [2, 4],

  [3, 4],

  [3, 0],

  [2, 2],

  [4, 1]

])

simplices = Delaunay(points).simplices

plt.triplot(points[:, 0], points[:, 1], simplices)

plt.scatter(points[:, 0], points[:, 1], color='r')

plt.show()

Result:

Post a Comment

6Comments

POST Answer of Questions and ASK to Doubt

  1. # DATA SCIENCE ( 7 to 8 PM )
    # Program to Solve a System of Linear Equation
    # A matrix is usually shown by a capital letter.

    import numpy as np
    from scipy import linalg as la # import scipy.linalg as la

    A = np.array([[1,2,3],[0,4,5],[1,0,6]])
    B = np.array([4,-11,1])

    print("Solution for Linear Equation :-",la.solve(A,B))

    ReplyDelete
  2. # DATA SCIENCE ( 7 to 8 PM )
    # Program for Dot product and matrix multiplication.
    import numpy as np
    import scipy.linalg as la # from scipy import linalg as la

    A = np.array([[1,2,3],[0,4,5],[1,0,6]])
    B = np.array([4,-11,1])

    Ainv = la.inv(A)
    Dot_AB = np.dot(Ainv,B)

    print("Dot product of A and B :- ",Dot_AB)

    ReplyDelete
  3. # DATA SCIENCE ( 7 to 8 PM)

    # Program to Find λ(Eigenvalue) and v(Eigenvector).
    # Av = λv
    # Here A is square matrix ,v is Eigenvector and λ is Eigenvalue ,which make this(Av = λv) equation true:

    import numpy as np
    import scipy.linalg as la # from scipy import linalg as la

    A = np.array([[1,2,3],[0,4,5],[1,0,6]])

    λ,v = la.eig(A)
    print("λ is Eigenvalue :-",λ,"\n")
    print("v is Eigenvector :-\n",v)


    ReplyDelete
  4. # DATA SCIENCE ( 7 to 8 PM )

    # Cayley-Hamilton Theorem (2x2)
    # The Cayley-Hamilton Theorem states that any square matrix satisfies its characteristic polynomial.
    # This means that For a matrix A = [a b / c d] , A2 - (a + d)A + (ad - bc)I = O is true.
    # I: Identity matrix , O: Zero matrix

    import numpy as np
    import scipy.linalg as la # from scipy import linalg as la


    A = np.array([[1 ,2],[3, 4]])

    trace_A = np.trace(A) # Matrix Trace, the sum of the diagonal elements.
    det_A = la.det(A) # Determinant of a Matrix ((1*4)-(2*3) = -2)
    I = np.eye(2) # Identity Matrix , n×n square matrix with ones on the main diagonal and zeros elsewhere.

    print(A @ A - trace_A * A + det_A * I) # @ is an operator. It is named as __matmul__, designed to do matrix multiplication

    ReplyDelete

  5. # Least squares is a standard approach to problems with more equations than unknowns.
    # Least squares also known as overdetermined systems.

    from scipy import linalg as la
    import numpy as np

    #Declaring the numpy arrays
    A = np.array([[3,2],[1,-1,],[4,-5]])
    B = np.array([[2,2,3,1],[4,5,6,7],[8,9,7,5]])

    #Passing the values to the solve function
    C,residuals,rank,s = np.linalg.lstsq(A,B)

    print("C is the solution :-\n",C)
    print("\nResiduals :-",residuals,"\nRank :-",rank,"\nSingular values :-",s)
    #Residuals the sum, Rank the matrix rank of input A, and s the singular values of A.

    ReplyDelete
  6. import numpy as np
    from scipy.sparsve import csr_matrix
    x=np.array([[1,0,0],
    [2,4,0],
    [0,0,1]])
    Sparse_matrix=csr_matrix(x)
    print(Sparse_matrix)

    OUTPUT
    (0, 0) 1
    (1, 0) 2
    (1, 1) 4
    (2, 2) 1


    # Convert sparse matrix To dense matrix


    # Row to coloum transpose


    dense_matrix=Sparse_matrix.todense()
    print(dense_matrix) # dense matrix
    print( )
    print(dense_matrix.getT()) # coloum to row transpose


    OUTPUT
    [[1 0 0]
    [2 4 0]
    [0 0 1]]

    [[1 2 0]
    [0 4 0]
    [0 0 1]]

    ReplyDelete
Post a Comment