# Notebook Instructions

1. All the <u>code and data files</u> used in this course are available in the downloadable unit of the <u>last section of this course</u>.
2. You can run the notebook document sequentially (one cell at a time) by pressing **shift + enter**. 
3. While a cell is running, a [*] is shown on the left. After the cell is run, the output will appear on the next line.

This course is based on specific versions of python packages. You can find the details of the packages in <a href='https://quantra.quantinsti.com/quantra-notebook' target="_blank" >this manual</a>.

# Notebook Contents

##### <span style="color:green">1. Why are we studying series?</span>
##### <span style="color:green">2. Series datastructure</span>
#####  <span style="color:green">3. Methods or functions</span>
##### <span style="color:green">4. pandas.Series.apply()</span>

# Why are we studying series?

In Python, understanding Series is a natural predecessor to understanding dataframes.<br>
<br>
Series are indexed data frame with only one data column. It is easier to understand them first before moving to study complex data frames.


# Series 

A series is a one-dimensional labelled 'array-like' object. The labels are nothing but the index of the data. <br>
Or <br>
A series is a special case of a two-dimensional array, which has only 2 columns- one column is for the index and the other column is for data. 

In [1]:
import pandas as pd

# Series created using a list
My_Series_int = pd.Series([10, 20, 30, 40, 50, 60])

print(My_Series_int)

0    10
1    20
2    30
3    40
4    50
5    60
dtype: int64


The constructor for Series data structure is <font color=red>pandas.Series (data=None, index=None, dtype=None, name=None)</font>. If you are using 'pd' as alias, then it would be <font color=red>pd.Series()</font>

In [2]:
import pandas

# Series created using a list
My_Series_flt = pandas.Series([10.1, 20.2, 30.4, 40.4, 50.5, 60.6])

print(My_Series_flt)

0    10.1
1    20.2
2    30.4
3    40.4
4    50.5
5    60.6
dtype: float64


You can see that it returns an indexed column and the data type of that column which is 'int' in this case.

Series is capable of holding any data type. For e.g. integers, float, strings and so on. A series can contain multiple data types too.

In [3]:
# Series created using a list
My_Series_mixed = pd.Series([10.1, 20, 'jay', 40.4])

print(My_Series_mixed)

0    10.1
1      20
2     jay
3    40.4
dtype: object


The above series returns an 'object' datatype since a Python object is created at this instance. 

Let us have a look at a few other ways of creating series objects.

In [4]:
# Defining series objects with individual indices

countries = ['India', 'USA', 'Japan', 'Russia', 'China']
leaders = ['Narendra Modi', 'Donald Trump',
           'Shinzo Abe', 'Vladimir Putin', 'Xi Jinpin']

S = pd.Series(leaders, index=countries)  # Index is explicitly defined here
S

India      Narendra Modi
USA         Donald Trump
Japan         Shinzo Abe
Russia    Vladimir Putin
China          Xi Jinpin
dtype: object

In [5]:
# Have a look at the series S1

stocks_set1 = ['Alphabet', 'IBM', 'Tesla', 'Infosys']

# Here, we are inserting data as a list in series constructor, but the argument of its index is passed as a pre-defined list
S1 = pd.Series([100, 250, 300, 500], index=stocks_set1)

print(S1)
print("\n")

# Now, have a look at the series S2

stocks_set2 = ['Alphabet', 'IBM', 'Tesla', 'Infosys']

# Here, we are inserting data as a list in series constructor, but the argument of its index is passed as a pre-defined list

S2 = pd.Series([500, 400, 110, 700], index=stocks_set2)

print(S2)
print("\n")

# We will add Series S1 and S2

print(S1 + S2)

Alphabet    100
IBM         250
Tesla       300
Infosys     500
dtype: int64


Alphabet    500
IBM         400
Tesla       110
Infosys     700
dtype: int64


Alphabet     600
IBM          650
Tesla        410
Infosys     1200
dtype: int64


In [6]:
# Adding lists that have different indexes  will create 'NaN' values

stocks_set1 = ['Alphabet', 'IBM', 'Tesla', 'Infosys']
stocks_set2 = ['Alphabet', 'Facebook', 'Tesla', 'Infosys']

S3 = pd.Series([100, 250, 300, 500], index=stocks_set1)
S4 = pd.Series([500, 700, 110, 700], index=stocks_set2)


print(S3)
print("\n")

print(S4)
print("\n")

print(S3+S4)

Alphabet    100
IBM         250
Tesla       300
Infosys     500
dtype: int64


Alphabet    500
Facebook    700
Tesla       110
Infosys     700
dtype: int64


Alphabet     600.0
Facebook       NaN
IBM            NaN
Infosys     1200.0
Tesla        410.0
dtype: float64


'NaN' is short for 'Not a Number'. It fills the space for missing or corrupt data.<br>
It is important to understand how to deal with NaN values because when you import actual time series data, you are bound to find some missing or corrupted data.

## Methods or functions

We will have a look at a few important methods or functions that can be applied on Series. 

##### <span style="color:black">Series.index</span>
It is useful to know the range of the index when the series is large.

In [7]:
My_Series = pd.Series([10, 20, 30, 40, 50])

print(My_Series.index)

RangeIndex(start=0, stop=5, step=1)


##### <span style="color:black">Series.values</span>
It returns the values of the series.

In [8]:
My_Series = pd.Series([10, 20, 30, 40, 50])

print(My_Series.values)

[10 20 30 40 50]


##### <span style="color:black">Series.isnull()</span>
We can check for missing values with this method.

In [9]:
# Remember the (S3 + S4) series? You may have a look at it

print(S3 + S4)

Alphabet     600.0
Facebook       NaN
IBM            NaN
Infosys     1200.0
Tesla        410.0
dtype: float64


In [10]:
# Returns whether the values are null or not. If it is 'True' then the value for that index is a 'NaN value

(S3 + S4).isnull()

Alphabet    False
Facebook     True
IBM          True
Infosys     False
Tesla       False
dtype: bool

##### <span style="color:black">Series.dropna()</span>
One way to deal with the 'NaN' values is to drop them completely from the series. This method filters out missing data.

In [11]:
print((S3 + S4).dropna())

Alphabet     600.0
Infosys     1200.0
Tesla        410.0
dtype: float64


In the above output, we have produced the (S3 + S4) addition of the values and along with the series elements, and we have even dropped the 'NaN' values. 

##### <span style="color:black">Series.fillna(1)</span>
Another way to deal with the 'NaN' values is to fill a custom value of your choice. Here, we are filling the 'NaN' values with the value '1'. 

In [12]:
print((S3 + S4).fillna(1))  # The output is self-explanatory in this case

Alphabet     600.0
Facebook       1.0
IBM            1.0
Infosys     1200.0
Tesla        410.0
dtype: float64


## pandas.Series.apply()

If at all one wants to 'apply' any functions on a particular series, e.g. one wants to 'sine' of each value in the series, then it is possible in pandas.
<br>
<b>Series.apply (func)</b>
<br>
func = A python function that will be applied to every single value of the series.

In [13]:
import numpy as np  # Create a new series as My_Series

My_Series = pd.Series([10, 20, 30, 40, 50, 60])

print(My_Series)

0    10
1    20
2    30
3    40
4    50
5    60
dtype: int64


In [14]:
My_Series.apply(np.sin)  # Find 'sine' of each value in the series

0   -0.544021
1    0.912945
2   -0.988032
3    0.745113
4   -0.262375
5   -0.304811
dtype: float64

In [15]:
My_Series.apply(np.tan)  # Finding 'tan' of each value in the series

0    0.648361
1    2.237161
2   -6.405331
3   -1.117215
4   -0.271901
5    0.320040
dtype: float64

In the upcoming Jupyter notebook, we will understand <b>DataFrames</b> but before that let us solve some Quizzes and a couple of exercises.<br><br>