# Notebook Instructions

1. All the <u>code and data files</u> used in this course are available in the downloadable unit of the <u>last section of this course</u>.
2. You can run the notebook document sequentially (one cell at a time) by pressing **shift + enter**. 
3. While a cell is running, a [*] is shown on the left. After the cell is run, the output will appear on the next line.

This course is based on specific versions of python packages. You can find the details of the packages in <a href='https://quantra.quantinsti.com/quantra-notebook' target="_blank" >this manual</a>.

# NumPy

NumPy is an acronym for "Numeric Python" or "Numerical Python".

NumPy is the fundamental package for scientific computing with Python. It is an open source extension module for Python.

1. A powerful N-dimensional array object
2. Sophisticated (broadcasting) functions
3. Useful linear algebra, Fourier transform, and random number capabilities
4. Besides its obvious scientific uses, NumPy can also be used as an efficient multi-dimensional container of generic data 
5. Arbitrary data-types can be defined. This allows NumPy to seamlessly and speedily integrate with a wide variety of database

Source: numpy.org

# Notebook Contents

##### <span style="color:green">1. A simple numpy array example</span>
##### <span style="color:green">2. Functions to create an array</span>
#####  <span style="color:green">3. Dimensionality of an array</span>
##### <span style="color:green">4. The shape of an array</span>
##### <span style="color:green">5. Just for fun</span>

## A simple numpy array example

We will create two arrays SV and S_V 
- Using lists
- Using tuples 

In [1]:
# We will first import the 'numpy' module
import numpy as np

In [2]:
stock_values = [20.3, 25.3, 22.7, 19.0, 18.5,
                21.2, 24.5, 26.6, 23.2, 21.2]  # This is a list

In [3]:
# Converting a list into an array

SV = np.array(stock_values)

print(SV)

[20.3 25.3 22.7 19.  18.5 21.2 24.5 26.6 23.2 21.2]


In [4]:
type(SV)  # Understanding the data type of 'SV'

numpy.ndarray

In [5]:
stockvalues = (20.3, 25.3, 22.7, 19.0, 18.5, 21.2, 24.5,
               26.6, 23.2, 21.2)  # This is a tuple

# Converting tuple into an array
S_V = np.array(stockvalues)
print(S_V)

[20.3 25.3 22.7 19.  18.5 21.2 24.5 26.6 23.2 21.2]


In [6]:
type(S_V)  # Understanding the data type of 'S_V'

numpy.ndarray

## Functions to create arrays quickly 

The above-discussed methods to create arrays require us to manually input the data points. To automatically create data points for an array we use these functions: 
- **arange**
- **linspace**

Both these functions create data points lying between two endpoints, starting and ending, so that they are evenly distributed. For example, we can create 50 data points lying between 1 and 10. 


### arange

Numpy.arange returns evenly spaced arrays by using a 'given' step or interval by the user.

Syntax:
####  arange ([start], [stop], [step], [dtype=None])

The 'start and the 'stop' determines the range of the array. 'Step' determines the spacing between two adjacent values. The datatype of the output array can be determined by setting the parameter 'dtype'. 

In [7]:
# If the start parameter is not given, it will be set to 0

# '10' is the stop parameter

# The default interval for a step is '1'

# If the 'dtype' is not given, then it will be automatically inferred from the other input arguments

a = np.arange(10)  # Syntax a = np.arange (0,10,1,None)
print(a)

[0 1 2 3 4 5 6 7 8 9]


In [8]:
# Here the range is '1 to 15'. It will include 1 and exclude 15

b = np.arange(1, 15)
print(b)

[ 1  2  3  4  5  6  7  8  9 10 11 12 13 14]


In [9]:
# We have changed the 'step' or spacing between two adjacent values, from a default 1, to a user given value of 2

c = np.arange(0, 21, 2)
print(c)

[ 0  2  4  6  8 10 12 14 16 18 20]


In [10]:
# Even though our input arguments are of the datatype 'float', it will return an 'int' array
# Since we have set the 'dtype' parameter as 'int'

d = np.arange(1.3, 23.3, 2.1, int)
print(d)

[ 1  3  5  7  9 11 13 15 17 19 21]


### Try on your own

In [11]:
# You may now be able to understand this example, all by yourself

e = np.arange(1.4, 23.6, 1, float)
print(e)

[ 1.4  2.4  3.4  4.4  5.4  6.4  7.4  8.4  9.4 10.4 11.4 12.4 13.4 14.4
 15.4 16.4 17.4 18.4 19.4 20.4 21.4 22.4 23.4]


### linspace

Numpy.linspace also returns an evenly spaced array but needs the 'number of array elements' as an input from the user and creates the distance automatically.

Syntax:
#### linspace(start, stop, num=50, endpoint=True, retstep=False)

The 'start and the 'stop' determines the range of the array. 'num' determines the number of elements in the array. If the 'endpoint' is True, it will include the stop value and if it is false, the array will exclude the stop value.

If the optional parameter 'retstep' is set, the function will return the value of the spacing between adjacent values.

In [12]:
# By default, since the 'num' is not given, it will divide the range into 50 individual array elements

# By default, it even includes the 'endpoint' of the range, since it is set to True by default

a = np.linspace(1, 10)
print(a)

[ 1.          1.18367347  1.36734694  1.55102041  1.73469388  1.91836735
  2.10204082  2.28571429  2.46938776  2.65306122  2.83673469  3.02040816
  3.20408163  3.3877551   3.57142857  3.75510204  3.93877551  4.12244898
  4.30612245  4.48979592  4.67346939  4.85714286  5.04081633  5.2244898
  5.40816327  5.59183673  5.7755102   5.95918367  6.14285714  6.32653061
  6.51020408  6.69387755  6.87755102  7.06122449  7.24489796  7.42857143
  7.6122449   7.79591837  7.97959184  8.16326531  8.34693878  8.53061224
  8.71428571  8.89795918  9.08163265  9.26530612  9.44897959  9.63265306
  9.81632653 10.        ]


In [13]:
# This time around, we have specified that we want the range of 1 - 10 to be divided into 8 individual array elements

b = np.linspace(1, 10, 8)
print(b)

[ 1.          2.28571429  3.57142857  4.85714286  6.14285714  7.42857143
  8.71428571 10.        ]


In [14]:
# In this line, we have specified not to include the end point of the range

c = np.linspace(1, 10, 8, False)
print(c)

[1.    2.125 3.25  4.375 5.5   6.625 7.75  8.875]


In [15]:
# In this line, we have specified 'retstep' as true, the function will return the value of the spacing between adjacent values

d = np.linspace(1, 10, 8, True, True)
print(d)

(array([ 1.        ,  2.28571429,  3.57142857,  4.85714286,  6.14285714,
        7.42857143,  8.71428571, 10.        ]), 1.2857142857142858)


### Try on your own

In [16]:
# This line should be self-explanatory

e = np.linspace(1, 10, 10, True, True)
print(e)

(array([ 1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9., 10.]), 1.0)


## Dimensionality of arrays

### Zero dimensional arrays or scalars

What we encountered in the above examples are all 'one dimensional arrays', also known as 'vectors'. 'Scalars' are zero-dimensional arrays, with a maximum of one element in it. 

In [17]:
# Creating a 'scalar'

a = np.array(50)  # Should have only 1 element, at the maximum!

print("a:", a)

a: 50


In [18]:
# To print the dimension of any array, we use 'np.ndim' method

print("The dimension of array 'a' is", np.ndim(a))

The dimension of array 'a' is 0


In [19]:
# To know the datatype of the array

print("The datatype of array 'a' is", a.dtype)

The datatype of array 'a' is int32


In [20]:
# Combining it all together

scalar_array = np.array("one_element")
print(scalar_array, np.ndim(scalar_array), scalar_array.dtype)

one_element 0 <U11


## One-dimensional arrays

One dimensional arrays, are arrays with minimum of two elements in it in a single row. 

In [21]:
one_d_array = np.array(["one_element", "second_element"])

print(one_d_array, np.ndim(one_d_array), one_d_array.dtype)

['one_element' 'second_element'] 1 <U14


In [22]:
# We have already worked with one-dimensional arrays. Let us revise what we did so far!

a = np.array([1, 1, 2, 3, 5, 8, 13, 21])  # Fibonnacci series
b = np.array([4.4, 6.6, 8.8, 10.1, 12.12])

print("a: ", a)
print("b: ", b)

print("Type of 'a': ", a.dtype)
print("Type of 'b': ", b.dtype)

print("Dimension of 'a':", np.ndim(a))
print("Dimension of 'b':", np.ndim(b))

a:  [ 1  1  2  3  5  8 13 21]
b:  [ 4.4   6.6   8.8  10.1  12.12]
Type of 'a':  int32
Type of 'b':  float64
Dimension of 'a': 1
Dimension of 'b': 1


## Two-dimensional arrays

Two-dimensional arrays have more than one row and more than one column.

In [23]:
# The elements of the 2D arrays are stored as 'rows' and 'columns'
two_d_array = np.array([["row1col1", "row1col2", "row1col3"],
                        ["row2col1", "row2col2", "row2col3"]])

print(two_d_array)

print("Dimension of 'two_d_array' :", np.ndim(two_d_array))

[['row1col1' 'row1col2' 'row1col3']
 ['row2col1' 'row2col2' 'row2col3']]
Dimension of 'two_d_array' : 2


In [24]:
# Another example of a data table!
# You can see how working with numpy arrays will help us working with dataframes further on!
studentdata = np.array([["Name", "Year", "Marks"],
                        ["Bela", 2014, 78.2],
                        ["Joe", 1987, 59.1],
                        ["Sugar", 1990, 70]])

print(studentdata)

print("Dimension of 'studentdata' :", np.ndim(studentdata))

[['Name' 'Year' 'Marks']
 ['Bela' '2014' '78.2']
 ['Joe' '1987' '59.1']
 ['Sugar' '1990' '70']]
Dimension of 'studentdata' : 2


Even though Year and Marks are not string type data, here by default they are considered as string type data. So we can't perform any mathematical operations on these values. In order to perform any calculations, we need to convert the data into integers or float type data.

That is where dataframe, which we will study in the next section is very useful. It is a powerful 2-d data structure that can convert the data type with ease and help us perform various operations. 

For example:

In [25]:
# Example when we save this data as a dataframe and not as a numpy array.
import numpy as np
import pandas as pd

studentdata1 = {
    "Name": ["Bela", "Joe", "Sugar"],
    "Year": [2014, 1987, 1990],
    "Marks": [78.2, 59.1, 70]
}

studentdata1_df = pd.DataFrame(studentdata1)
print(studentdata1_df)
print(np.mean(studentdata1_df.Marks))

# Now we are able to find average of Marks of these three students.

    Name  Year  Marks
0   Bela  2014   78.2
1    Joe  1987   59.1
2  Sugar  1990   70.0
69.10000000000001


In [26]:
# The elements of the 2D arrays are stored as 'rows' and 'columns'
a = np.array([[1.8, 2.4, 5.3, 8.2],
              [7.8, 5.1, 9.2, 17.13],
              [6.1, -2.13, -6.3, -9.1]])
print(a)
print("Dimension of 'a' :", np.ndim(a))

# In this array we have 3 rows and 4 columns

[[ 1.8   2.4   5.3   8.2 ]
 [ 7.8   5.1   9.2  17.13]
 [ 6.1  -2.13 -6.3  -9.1 ]]
Dimension of 'a' : 2


In [27]:
# A 3D array is an 'array of arrays'. Have a quick look at it
b = np.array([[[111, 222], [333, 444]],
              [[121, 212], [221, 222]],
              [[555, 560], [565, 570]]])

print(b)
print("Dimension of 'b' :", np.ndim(b))

# In this array, there are three, 2-D arrays

[[[111 222]
  [333 444]]

 [[121 212]
  [221 222]]

 [[555 560]
  [565 570]]]
Dimension of 'b' : 3


## The shape of an array

**What it is:** Ths shape of an array returns the number of rows (axis = 0) and the number of columns (axis = 1)

**Why is it important to understand:** It helps you to understand the number of rows and columns in an array 

**How is it different from Dimensions:** It is not that different from dimensions, just that functions called are different. 

In [28]:
a = np.array([[11, 22, 33],
              [12, 24, 36],
              [13, 26, 39],
              [14, 28, 42],
              [15, 30, 45],
              [16, 32, 48]])

print(a)

[[11 22 33]
 [12 24 36]
 [13 26 39]
 [14 28 42]
 [15 30 45]
 [16 32 48]]


In [29]:
print(a.shape)

(6, 3)


We can even change the shape of the array. 

In [30]:
a.shape = (9, 2)
print(a)

[[11 22]
 [33 12]
 [24 36]
 [13 26]
 [39 14]
 [28 42]
 [15 30]
 [45 16]
 [32 48]]


You might have guessed by now that the new shape must correspond to the number of elements of the array, i.e. the total size of the new array must be the same as the old one. We will raise an exception if this is not the case.

In [31]:
# Shape of a 1 dimension array or scalar
a = np.array(165416113)
print(np.shape(a))

()


In the upcoming IPython notebook, we will continue understanding arrays and learning about array indexing, array Slicing and arrays of zeros and ones, but before that let us answer some multiple choice questions. <br><br>