Introduction

Pandas is a fast, powerful, flexible, and easy to use open source data analysis and manipulation tool, built on top of Python. Pandas is best known for for the extensive set of features that it provides:
  • Selecting, and filtering rows and columns
  • Proper handling of missing data
  • Applying operations across rows and columns
  • Merging data sets together
  • Grouping and applying aggregation functions
Pandas offers two classes Dataframe and Series which are 1D and 2D arrays, respectively, capable of holding data of any type (string, float, python objects, etc.). A pandas Series can simply be thought of a 1-dimensional data where each values is associated with an index (label). The DataFrame can be thought of as the extension of Series to 2D, where each values has a corresponding column name and index. Simply put, a DataFrame is a table similar to an Excel spreadsheet.

Series

import numpy as np
import pandas as pd
tempArr = np.arange(0, 6)
tempSeries = pd.Series(tempArr)
tempSeries
0    0
1    1
2    2
3    3
4    4
5    5
dtype: int64

The indices of a pandas Series can be specified when defining the Series:

tempIndices = list('abcedf')
tempSeries = pd.Series(tempArr, index=tempIndices)
tempSeries.head(3)
a    0
b    1
c    2
dtype: int64
tempSeries.tail()
b    1
c    2
d    3
e    4
f    5
dtype: int64

Note: head(n) shows the first n lines of a Series or DataFrame where the default value of n is 5. Similarly, tail() shows the last 5 lines.

The indices can also be assigned by modifying the index attribute::

tempSeries.index = list('fedcba')
tempSeries
f    0
e    1
d    2
c    3
b    4
a    5
dtype: int64

DataFrame

tempArrEven = np.arange(0, 10, 2)
tempArrOdd = np.arange(1, 11, 2)
tempDataFrame = pd.DataFrame(np.array([tempArrEven, tempArrOdd]).T)
tempDataFrame
0 1
0 0 1
1 2 3
2 4 5
3 6 7
4 8 9

Note

DataFrame object must have a second dimension ≥ 1. Similarly, Series object is 1-dimensional. See the following example:

print(pd.DataFrame(tempArr).shape)
print(pd.Series(tempArr).shape)
(6, 1)
(6,)

Column names and indices can be assigned to Dataframes in the same exact way as Series:

tempDataFrame = pd.DataFrame(data=np.array([tempArrEven, tempArrOdd]).T, 
                             columns=['even_numbers', 'odd_numbers'], 
                             index=list('abcde'))
tempDataFrame
0 1
a 0 1
b 2 3
c 4 5
d 6 7
e 8 9