Pandas

Pandas is a newer package built on top of NumPy, and provides an efficient implementation of a DataFrame. DataFrames are essentially multidimensional arrays with attached row and column labels, and often with heterogeneous types and/or missing data. As well as offering a convenient storage interface for labeled data, Pandas implements a number of powerful data operations familiar to users of both database frameworks and spreadsheet programs.

At the very basic level, Pandas objects can be thought of as enhanced versions of NumPy structured arrays in which the rows and columns are identified with labels rather than simple integer indices. As we will see during the course of this chapter, Pandas provides a host of useful tools, methods, and functionality on top of the basic data structures, but nearly everything that follows will require an understanding of what these structures are. Thus, before we go any further, let’s introduce these three fundamental Pandas data structures: the Series, DataFrame, and Index.

Series

The first main data type we will learn about for pandas is the Series data type. Let’s import Pandas and explore the Series object. A Series is very similar to a NumPy array (in fact it is built on top of the NumPy array object). What differentiates the NumPy array from a Series, is that a Series can have axis labels, meaning it can be indexed by a label, instead of just a number location. It also doesn’t need to hold numeric data, it can hold any arbitrary Python Object.

Let’s explore this concept through some examples:

import numpy as np
import pandas as pd

You can convert a list,numpy array, or dictionary to a Series:

labels = ['a','b','c']
my_list = [10,20,30]
arr = np.array([10,20,30])
d = {'a':10,'b':20,'c':30}

** Using Lists**

pd.Series(data=my_list)
0    10
1    20
2    30
dtype: int64
pd.Series(data=my_list,index=labels)
a    10
b    20
c    30
dtype: int64
pd.Series(my_list,labels)
a    10
b    20
c    30
dtype: int64

** NumPy Arrays **

pd.Series(arr)
0    10
1    20
2    30
dtype: int64
pd.Series(arr,labels)
a    10
b    20
c    30
dtype: int64

** Dictionary**

pd.Series(d)
a    10
b    20
c    30
dtype: int64

Data in a Series

A pandas Series can hold a variety of object types:

pd.Series(data=labels)
0    a
1    b
2    c
dtype: object
# Even functions (although unlikely that you will use this)
pd.Series([sum,print,len])
0      <built-in function sum>
1    <built-in function print>
2      <built-in function len>
dtype: object

Using an Index

The key to using a Series is understanding its index. Pandas makes use of these index names or numbers by allowing for fast look ups of information (works like a hash table or dictionary).

Let’s see some examples of how to grab information from a Series. Let us create two sereis, ser1 and ser2:

ser1 = pd.Series([1,2,3,4],index = ['USA', 'Germany','USSR', 'Japan'])                                   
ser1
USA        1
Germany    2
USSR       3
Japan      4
dtype: int64
ser2 = pd.Series([1,2,5,4],index = ['USA', 'Germany','Italy', 'Japan'])                                   
ser2
USA        1
Germany    2
Italy      5
Japan      4
dtype: int64
ser1['USA']
1

Operations are then also done based off of index:

ser1 + ser2
Germany    4.0
Italy      NaN
Japan      8.0
USA        2.0
USSR       NaN
dtype: float64

Let’s stop here for now and move on to DataFrames, which will expand on the concept of Series!

DataFrames

DataFrames are the workhorse of pandas and are directly inspired by the R programming language. We can think of a DataFrame as a bunch of Series objects put together to share the same index. Let’s use pandas to explore this topic!

import pandas as pd
import numpy as np
from numpy.random import randn
np.random.seed(101)
df = pd.DataFrame(randn(5,4),index='A B C D E'.split(),columns='W X Y Z'.split())
df
WXYZ
A2.7068500.6281330.9079690.503826
B0.651118-0.319318-0.8480770.605965
C-2.0181680.7401220.528813-0.589001
D0.188695-0.758872-0.9332370.955057
E0.1907941.9787572.6059670.683509

Selection and Indexing

Let’s learn the various methods to grab data from a DataFrame

df['W']
A    2.706850
B    0.651118
C   -2.018168
D    0.188695
E    0.190794
Name: W, dtype: float64
# Pass a list of column names
df[['W','Z']]
WZ
A2.7068500.503826
B0.6511180.605965
C-2.018168-0.589001
D0.1886950.955057
E0.1907940.683509
# SQL Syntax (NOT RECOMMENDED!)
df.W
A    2.706850
B    0.651118
C   -2.018168
D    0.188695
E    0.190794
Name: W, dtype: float64

DataFrame Columns are just Series

type(df['W'])
pandas.core.series.Series

Creating a new column

df['new'] = df['W'] + df['Y']
df
WXYZnew
A2.7068500.6281330.9079690.5038263.614819
B0.651118-0.319318-0.8480770.605965-0.196959
C-2.0181680.7401220.528813-0.589001-1.489355
D0.188695-0.758872-0.9332370.955057-0.744542
E0.1907941.9787572.6059670.6835092.796762

Removing Columns

df.drop('new',axis=1)
WXYZ
A2.7068500.6281330.9079690.503826
B0.651118-0.319318-0.8480770.605965
C-2.0181680.7401220.528813-0.589001
D0.188695-0.758872-0.9332370.955057
E0.1907941.9787572.6059670.683509
# Not inplace unless specified!
df
WXYZnew
A2.7068500.6281330.9079690.5038263.614819
B0.651118-0.319318-0.8480770.605965-0.196959
C-2.0181680.7401220.528813-0.589001-1.489355
D0.188695-0.758872-0.9332370.955057-0.744542
E0.1907941.9787572.6059670.6835092.796762
df.drop('new',axis=1,inplace=True)
df
WXYZ
A2.7068500.6281330.9079690.503826
B0.651118-0.319318-0.8480770.605965
C-2.0181680.7401220.528813-0.589001
D0.188695-0.758872-0.9332370.955057
E0.1907941.9787572.6059670.683509

Can also drop rows this way:

df.drop('E',axis=0)
WXYZ
A2.7068500.6281330.9079690.503826
B0.651118-0.319318-0.8480770.605965
C-2.0181680.7401220.528813-0.589001
D0.188695-0.758872-0.9332370.955057

Selecting Rows

df.loc['A']
W    2.706850
X    0.628133
Y    0.907969
Z    0.503826
Name: A, dtype: float64

Or select based off of position instead of label

df.iloc[2]
W   -2.018168
X    0.740122
Y    0.528813
Z   -0.589001
Name: C, dtype: float64

Selecting subset of rows and columns

df.loc['B','Y']
-0.84807698340363147
df.loc[['A','B'],['W','Y']]
WY
A2.7068500.907969
B0.651118-0.848077

Conditional Selection

An important feature of pandas is conditional selection using bracket notation, very similar to numpy:

df
WXYZ
A2.7068500.6281330.9079690.503826
B0.651118-0.319318-0.8480770.605965
C-2.0181680.7401220.528813-0.589001
D0.188695-0.758872-0.9332370.955057
E0.1907941.9787572.6059670.683509
df>0
WXYZ
ATrueTrueTrueTrue
BTrueFalseFalseTrue
CFalseTrueTrueFalse
DTrueFalseFalseTrue
ETrueTrueTrueTrue
df[df>0]
WXYZ
A2.7068500.6281330.9079690.503826
B0.651118NaNNaN0.605965
CNaN0.7401220.528813NaN
D0.188695NaNNaN0.955057
E0.1907941.9787572.6059670.683509
df[df['W']>0]
WXYZ
A2.7068500.6281330.9079690.503826
B0.651118-0.319318-0.8480770.605965
D0.188695-0.758872-0.9332370.955057
E0.1907941.9787572.6059670.683509
df[df['W']>0]['Y']
A    0.907969
B   -0.848077
D   -0.933237
E    2.605967
Name: Y, dtype: float64
df[df['W']>0][['Y','X']]
YX
A0.9079690.628133
B-0.848077-0.319318
D-0.933237-0.758872
E2.6059671.978757

For two conditions you can use | and & with parenthesis:

df[(df['W']>0) & (df['Y'] > 1)]
WXYZ
E0.1907941.9787572.6059670.683509

More Index Details

Let’s discuss some more features of indexing, including resetting the index or setting it something else. We’ll also talk about index hierarchy!

df
WXYZ
A2.7068500.6281330.9079690.503826
B0.651118-0.319318-0.8480770.605965
C-2.0181680.7401220.528813-0.589001
D0.188695-0.758872-0.9332370.955057
E0.1907941.9787572.6059670.683509
# Reset to default 0,1...n index
df.reset_index()
indexWXYZ
0A2.7068500.6281330.9079690.503826
1B0.651118-0.319318-0.8480770.605965
2C-2.0181680.7401220.528813-0.589001
3D0.188695-0.758872-0.9332370.955057
4E0.1907941.9787572.6059670.683509
newind = 'CA NY WY OR CO'.split()
df['States'] = newind
df
WXYZStates
A2.7068500.6281330.9079690.503826CA
B0.651118-0.319318-0.8480770.605965NY
C-2.0181680.7401220.528813-0.589001WY
D0.188695-0.758872-0.9332370.955057OR
E0.1907941.9787572.6059670.683509CO
df.set_index('States')
WXYZ
States
CA2.7068500.6281330.9079690.503826
NY0.651118-0.319318-0.8480770.605965
WY-2.0181680.7401220.528813-0.589001
OR0.188695-0.758872-0.9332370.955057
CO0.1907941.9787572.6059670.683509
df
WXYZStates
A2.7068500.6281330.9079690.503826CA
B0.651118-0.319318-0.8480770.605965NY
C-2.0181680.7401220.528813-0.589001WY
D0.188695-0.758872-0.9332370.955057OR
E0.1907941.9787572.6059670.683509CO
df.set_index('States',inplace=True)
df
WXYZ
States
CA2.7068500.6281330.9079690.503826
NY0.651118-0.319318-0.8480770.605965
WY-2.0181680.7401220.528813-0.589001
OR0.188695-0.758872-0.9332370.955057
CO0.1907941.9787572.6059670.683509

Multi-Index and Index Hierarchy

Let us go over how to work with Multi-Index, first we’ll create a quick example of what a Multi-Indexed DataFrame would look like:

# Index Levels
outside = ['G1','G1','G1','G2','G2','G2']
inside = [1,2,3,1,2,3]
hier_index = list(zip(outside,inside))
hier_index = pd.MultiIndex.from_tuples(hier_index)
hier_index
MultiIndex(levels=[['G1', 'G2'], [1, 2, 3]],
           labels=[[0, 0, 0, 1, 1, 1], [0, 1, 2, 0, 1, 2]])
df = pd.DataFrame(np.random.randn(6,2),index=hier_index,columns=['A','B'])
df
AB
G110.1536610.167638
2-0.7659300.962299
30.902826-0.537909
G21-1.5496710.435253
21.259904-0.447898
30.2662070.412580

Now let’s show how to index this! For index hierarchy we use df.loc[], if this was on the columns axis, you would just use normal bracket notation df[]. Calling one level of the index returns the sub-dataframe:

df.loc['G1']
AB
10.1536610.167638
2-0.7659300.962299
30.902826-0.537909
df.loc['G1'].loc[1]
A    0.153661
B    0.167638
Name: 1, dtype: float64
df.index.names
FrozenList([None, None])
df.index.names = ['Group','Num']
df
AB
GroupNum
G110.1536610.167638
2-0.7659300.962299
30.902826-0.537909
G21-1.5496710.435253
21.259904-0.447898
30.2662070.412580
df.xs('G1')
AB
Num
10.1536610.167638
2-0.7659300.962299
30.902826-0.537909
df.xs(['G1',1])
A    0.153661
B    0.167638
Name: (G1, 1), dtype: float64
df.xs(1,level='Num')
AB
Group
G10.1536610.167638
G2-1.5496710.435253

Missing Data

Let’s show a few convenient methods to deal with Missing Data in pandas:

import numpy as np
import pandas as pd
df = pd.DataFrame({'A':[1,2,np.nan],
                  'B':[5,np.nan,np.nan],
                  'C':[1,2,3]})
df
ABC
01.05.01
12.0NaN2
2NaNNaN3
df.dropna()
ABC
01.05.01
df.dropna(axis=1)
C
01
12
23
df.dropna(thresh=2)
ABC
01.05.01
12.0NaN2
df.fillna(value='FILL VALUE')
ABC
0151
12FILL VALUE2
2FILL VALUEFILL VALUE3
df['A'].fillna(value=df['A'].mean())
0    1.0
1    2.0
2    1.5
Name: A, dtype: float64

Groupby

The groupby method allows you to group rows of data together and call aggregate functions

import pandas as pd
# Create dataframe
data = {'Company':['GOOG','GOOG','MSFT','MSFT','FB','FB'],
       'Person':['Sam','Charlie','Amy','Vanessa','Carl','Sarah'],
       'Sales':[200,120,340,124,243,350]}
df = pd.DataFrame(data)
df
CompanyPersonSales
0GOOGSam200
1GOOGCharlie120
2MSFTAmy340
3MSFTVanessa124
4FBCarl243
5FBSarah350

Now you can use the .groupby() method to group rows together based off of a column name. For instance let’s group based off of Company. This will create a DataFrameGroupBy object:

df.groupby('Company')

You can save this object as a new variable:

by_comp = df.groupby("Company")

And then call aggregate methods off the object:

by_comp.mean()
Sales
Company
FB296.5
GOOG160.0
MSFT232.0
df.groupby('Company').mean()
Sales
Company
FB296.5
GOOG160.0
MSFT232.0

More examples of aggregate methods:

by_comp.std()
Sales
Company
FB75.660426
GOOG56.568542
MSFT152.735065
by_comp.min()
PersonSales
Company
FBCarl243
GOOGCharlie120
MSFTAmy124
by_comp.max()
PersonSales
Company
FBSarah350
GOOGSam200
MSFTVanessa340
by_comp.count()
PersonSales
Company
FB22
GOOG22
MSFT22
by_comp.describe()
Sales
Company
FBcount2.000000
mean296.500000
std75.660426
min243.000000
25%269.750000
50%296.500000
75%323.250000
max350.000000
GOOGcount2.000000
mean160.000000
std56.568542
min120.000000
25%140.000000
50%160.000000
75%180.000000
max200.000000
MSFTcount2.000000
mean232.000000
std152.735065
min124.000000
25%178.000000
50%232.000000
75%286.000000
max340.000000
by_comp.describe().transpose()
CompanyFBGOOGMSFT
countmeanstdmin25%50%75%maxcountmean...75%maxcountmeanstdmin25%50%75%max
Sales2.0296.575.660426243.0269.75296.5323.25350.02.0160.0...180.0200.02.0232.0152.735065124.0178.0232.0286.0340.0

1 rows × 24 columns

by_comp.describe().transpose()['GOOG']
countmeanstdmin25%50%75%max
Sales2.0160.056.568542120.0140.0160.0180.0200.0

Merging, Joining, and Concatenating

There are 3 main ways of combining DataFrames together: Merging, Joining and Concatenating. In this lecture we will discuss these 3 methods with examples.

Example DataFrames

import pandas as pd
df1 = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3'],
                        'B': ['B0', 'B1', 'B2', 'B3'],
                        'C': ['C0', 'C1', 'C2', 'C3'],
                        'D': ['D0', 'D1', 'D2', 'D3']},
                        index=[0, 1, 2, 3])
df2 = pd.DataFrame({'A': ['A4', 'A5', 'A6', 'A7'],
                        'B': ['B4', 'B5', 'B6', 'B7'],
                        'C': ['C4', 'C5', 'C6', 'C7'],
                        'D': ['D4', 'D5', 'D6', 'D7']},
                         index=[4, 5, 6, 7]) 
df3 = pd.DataFrame({'A': ['A8', 'A9', 'A10', 'A11'],
                        'B': ['B8', 'B9', 'B10', 'B11'],
                        'C': ['C8', 'C9', 'C10', 'C11'],
                        'D': ['D8', 'D9', 'D10', 'D11']},
                        index=[8, 9, 10, 11])
df1
ABCD
0A0B0C0D0
1A1B1C1D1
2A2B2C2D2
3A3B3C3D3
df2
ABCD
4A4B4C4D4
5A5B5C5D5
6A6B6C6D6
7A7B7C7D7
df3
ABCD
8A8B8C8D8
9A9B9C9D9
10A10B10C10D10
11A11B11C11D11

Concatenation

Concatenation basically glues together DataFrames. Keep in mind that dimensions should match along the axis you are concatenating on. You can use pd.concat and pass in a list of DataFrames to concatenate together:

pd.concat([df1,df2,df3])
ABCD
0A0B0C0D0
1A1B1C1D1
2A2B2C2D2
3A3B3C3D3
4A4B4C4D4
5A5B5C5D5
6A6B6C6D6
7A7B7C7D7
8A8B8C8D8
9A9B9C9D9
10A10B10C10D10
11A11B11C11D11
pd.concat([df1,df2,df3],axis=1)
ABCDABCDABCD
0A0B0C0D0NaNNaNNaNNaNNaNNaNNaNNaN
1A1B1C1D1NaNNaNNaNNaNNaNNaNNaNNaN
2A2B2C2D2NaNNaNNaNNaNNaNNaNNaNNaN
3A3B3C3D3NaNNaNNaNNaNNaNNaNNaNNaN
4NaNNaNNaNNaNA4B4C4D4NaNNaNNaNNaN
5NaNNaNNaNNaNA5B5C5D5NaNNaNNaNNaN
6NaNNaNNaNNaNA6B6C6D6NaNNaNNaNNaN
7NaNNaNNaNNaNA7B7C7D7NaNNaNNaNNaN
8NaNNaNNaNNaNNaNNaNNaNNaNA8B8C8D8
9NaNNaNNaNNaNNaNNaNNaNNaNA9B9C9D9
10NaNNaNNaNNaNNaNNaNNaNNaNA10B10C10D10
11NaNNaNNaNNaNNaNNaNNaNNaNA11B11C11D11

Example DataFrames

left = pd.DataFrame({'key': ['K0', 'K1', 'K2', 'K3'],
                     'A': ['A0', 'A1', 'A2', 'A3'],
                     'B': ['B0', 'B1', 'B2', 'B3']})
   
right = pd.DataFrame({'key': ['K0', 'K1', 'K2', 'K3'],
                          'C': ['C0', 'C1', 'C2', 'C3'],
                          'D': ['D0', 'D1', 'D2', 'D3']})    
left
ABkey
0A0B0K0
1A1B1K1
2A2B2K2
3A3B3K3
right
CDkey
0C0D0K0
1C1D1K1
2C2D2K2
3C3D3K3

Merging

The merge function allows you to merge DataFrames together using a similar logic as merging SQL Tables together. For example:

pd.merge(left,right,how='inner',on='key')
ABkeyCD
0A0B0K0C0D0
1A1B1K1C1D1
2A2B2K2C2D2
3A3B3K3C3D3

Or to show a more complicated example:

left = pd.DataFrame({'key1': ['K0', 'K0', 'K1', 'K2'],
                     'key2': ['K0', 'K1', 'K0', 'K1'],
                        'A': ['A0', 'A1', 'A2', 'A3'],
                        'B': ['B0', 'B1', 'B2', 'B3']})
    
right = pd.DataFrame({'key1': ['K0', 'K1', 'K1', 'K2'],
                               'key2': ['K0', 'K0', 'K0', 'K0'],
                                  'C': ['C0', 'C1', 'C2', 'C3'],
                                  'D': ['D0', 'D1', 'D2', 'D3']})
pd.merge(left, right, on=['key1', 'key2'])
ABkey1key2CD
0A0B0K0K0C0D0
1A2B2K1K0C1D1
2A2B2K1K0C2D2
pd.merge(left, right, how='outer', on=['key1', 'key2'])
ABkey1key2CD
0A0B0K0K0C0D0
1A1B1K0K1NaNNaN
2A2B2K1K0C1D1
3A2B2K1K0C2D2
4A3B3K2K1NaNNaN
5NaNNaNK2K0C3D3
pd.merge(left, right, how='right', on=['key1', 'key2'])
ABkey1key2CD
0A0B0K0K0C0D0
1A2B2K1K0C1D1
2A2B2K1K0C2D2
3NaNNaNK2K0C3D3
pd.merge(left, right, how='left', on=['key1', 'key2'])
ABkey1key2CD
0A0B0K0K0C0D0
1A1B1K0K1NaNNaN
2A2B2K1K0C1D1
3A2B2K1K0C2D2
4A3B3K2K1NaNNaN

Joining

Joining is a convenient method for combining the columns of two potentially differently-indexed DataFrames into a single result DataFrame.

left = pd.DataFrame({'A': ['A0', 'A1', 'A2'],
                     'B': ['B0', 'B1', 'B2']},
                      index=['K0', 'K1', 'K2']) 

right = pd.DataFrame({'C': ['C0', 'C2', 'C3'],
                    'D': ['D0', 'D2', 'D3']},
                      index=['K0', 'K2', 'K3'])
left.join(right)
ABCD
K0A0B0C0D0
K1A1B1NaNNaN
K2A2B2C2D2
left.join(right, how='outer')
ABCD
K0A0B0C0D0
K1A1B1NaNNaN
K2A2B2C2D2
K3NaNNaNC3D3

Operations

There are lots of operations with pandas that will be really useful to you, but don’t fall into any distinct category. Let’s show them here in this lecture:

import pandas as pd
df = pd.DataFrame({'col1':[1,2,3,4],'col2':[444,555,666,444],'col3':['abc','def','ghi','xyz']})
df.head()
col1col2col3
01444abc
12555def
23666ghi
34444xyz

Info on Unique Values

df['col2'].unique()
array([444, 555, 666])
df['col2'].nunique()
3
df['col2'].value_counts()
444    2
555    1
666    1
Name: col2, dtype: int64

Selecting Data

#Select from DataFrame using criteria from multiple columns
newdf = df[(df['col1']>2) & (df['col2']==444)]
newdf
col1col2col3
34444xyz

Applying Functions

def times2(x):
    return x*2
df['col1'].apply(times2)
0    2
1    4
2    6
3    8
Name: col1, dtype: int64
df['col3'].apply(len)
0    3
1    3
2    3
3    3
Name: col3, dtype: int64
df['col1'].sum()
10

Permanently Removing a Column

del df['col1']
df
col2col3
0444abc
1555def
2666ghi
3444xyz

Get column and index names

df.columns
Index(['col2', 'col3'], dtype='object')
df.index
RangeIndex(start=0, stop=4, step=1)

Sorting and Ordering a DataFrame

df
col2col3
0444abc
1555def
2666ghi
3444xyz
df.sort_values(by='col2') #inplace=False by default
col2col3
0444abc
3444xyz
1555def
2666ghi

Find Null Values or Check for Null Values

df.isnull()
col2col3
0FalseFalse
1FalseFalse
2FalseFalse
3FalseFalse
# Drop rows with NaN Values
df.dropna()
col2col3
0444abc
1555def
2666ghi
3444xyz

Filling in NaN values with something else

import numpy as np
df = pd.DataFrame({'col1':[1,2,3,np.nan],
                   'col2':[np.nan,555,666,444],
                   'col3':['abc','def','ghi','xyz']})
df.head()
col1col2col3
01.0NaNabc
12.0555.0def
23.0666.0ghi
3NaN444.0xyz
df.fillna('FILL')
col1col2col3
01FILLabc
12555def
23666ghi
3FILL444xyz
data = {'A':['foo','foo','foo','bar','bar','bar'],
     'B':['one','one','two','two','one','one'],
       'C':['x','y','x','y','x','y'],
       'D':[1,3,2,5,4,1]}

df = pd.DataFrame(data)
df
ABCD
0fooonex1
1foooney3
2footwox2
3bartwoy5
4baronex4
5baroney1
df.pivot_table(values='D',index=['A', 'B'],columns=['C'])
Cxy
AB
barone4.01.0
twoNaN5.0
fooone1.03.0
two2.0NaN

CSV Data

CSV Input

df = pd.read_csv('example')
df
abcd
00123
14567
2891011
312131415

CSV Output

df.to_csv('example',index=False)

Excel Data

Pandas can read and write excel files, keep in mind, this only imports data. Not formulas or images, having images or macros may cause this read_excel method to crash.

Excel Input

pd.read_excel('Excel_Sample.xlsx',sheetname='Sheet1')
abcd
00123
14567
2891011
312131415

Excel Output

df.to_excel('Excel_Sample.xlsx',sheet_name='Sheet1')

HTML Data

You may need to install htmllib5,lxml, and BeautifulSoup4. In your terminal/command prompt run:

conda install lxml
conda install html5lib
conda install BeautifulSoup4

Then restart Jupyter Notebook. (or use pip install if you aren’t using the Anaconda Distribution)

Pandas can read table tabs off of html. For example:

HTML Input

Pandas read_html function will read tables off of a webpage and return a list of DataFrame objects:

df = pd.read_html('http://www.fdic.gov/bank/individual/failed/banklist.html')
df[0]
Bank NameCitySTCERTAcquiring InstitutionClosing DateUpdated DateLoss Share TypeAgreement TerminatedTermination Date
0First CornerStone BankKing of PrussiaPA35312First-Citizens Bank & Trust CompanyMay 6, 2016July 12, 2016noneNaNNaN
1Trust Company BankMemphisTN9956The Bank of Fayette CountyApril 29, 2016August 4, 2016noneNaNNaN
2North Milwaukee State BankMilwaukeeWI20364First-Citizens Bank & Trust CompanyMarch 11, 2016June 16, 2016noneNaNNaN
3Hometown National BankLongviewWA35156Twin City BankOctober 2, 2015April 13, 2016noneNaNNaN
4The Bank of GeorgiaPeachtree CityGA35259Fidelity BankOctober 2, 2015April 13, 2016noneNaNNaN
5Premier BankDenverCO34112United Fidelity Bank, fsbJuly 10, 2015July 12, 2016noneNaNNaN
6Edgebrook BankChicagoIL57772Republic Bank of ChicagoMay 8, 2015July 12, 2016noneNaNNaN
7Doral BankEn EspanolSan JuanPR32102Banco Popular de Puerto RicoFebruary 27, 2015May 13, 2015noneNaNNaN
8Capitol City Bank & Trust CompanyAtlantaGA33938First-Citizens Bank & Trust CompanyFebruary 13, 2015April 21, 2015noneNaNNaN
9Highland Community BankChicagoIL20290United Fidelity Bank, fsbJanuary 23, 2015April 21, 2015noneNaNNaN
10First National Bank of CrestviewCrestviewFL17557First NBC BankJanuary 16, 2015January 15, 2016noneNaNNaN
11Northern Star BankMankatoMN34983BankVistaDecember 19, 2014January 6, 2016noneNaNNaN
12Frontier Bank, FSB D/B/A El Paseo BankPalm DesertCA34738Bank of Southern California, N.A.November 7, 2014January 6, 2016noneNaNNaN
13The National Republic Bank of ChicagoChicagoIL916State Bank of TexasOctober 24, 2014January 6, 2016noneNaNNaN
14NBRS FinancialRising SunMD4862Howard BankOctober 17, 2014March 26, 2015noneNaNNaN
15GreenChoice Bank, fsbChicagoIL28462Providence Bank, LLCJuly 25, 2014July 28, 2015noneNaNNaN
16Eastside Commercial BankConyersGA58125Community & Southern BankJuly 18, 2014July 11, 2016noneNaNNaN
17The Freedom State BankFreedomOK12483Alva State Bank & Trust CompanyJune 27, 2014March 25, 2016noneNaNNaN
18Valley BankFort LauderdaleFL21793Landmark Bank, National AssociationJune 20, 2014June 29, 2015noneNaNNaN
19Valley BankMolineIL10450Great Southern BankJune 20, 2014June 26, 2015noneNaNNaN
20Slavie Federal Savings BankBel AirMD32368Bay Bank, FSBMay 30, 2014June 15, 2015noneNaNNaN
21Columbia Savings BankCincinnatiOH32284United Fidelity Bank, fsbMay 23, 2014May 28, 2015noneNaNNaN
22AztecAmerica Bank En EspanolBerwynIL57866Republic Bank of ChicagoMay 16, 2014July 18, 2014noneNaNNaN
23Allendale County BankFairfaxSC15062Palmetto State BankApril 25, 2014July 18, 2014noneNaNNaN
24Vantage Point BankHorshamPA58531First Choice BankFebruary 28, 2014March 3, 2015noneNaNNaN
25Millennium Bank, National AssociationSterlingVA35096WashingtonFirst BankFebruary 28, 2014March 03, 2015noneNaNNaN
26Syringa BankBoiseID34296Sunwest BankJanuary 31, 2014April 12, 2016noneNaNNaN
27The Bank of UnionEl RenoOK17967BancFirstJanuary 24, 2014March 25, 2016noneNaNNaN
28DuPage National BankWest ChicagoIL5732Republic Bank of ChicagoJanuary 17, 2014February 19, 2015noneNaNNaN
29Texas Community Bank, National AssociationThe WoodlandsTX57431Spirit of Texas Bank, SSBDecember 13, 2013December 29, 2014noneNaNNaN
.................................
515ANB Financial, NABentonvilleAR33901Pulaski Bank and Trust CompanyMay 9, 2008August 28, 2012noneNaNNaN
516Hume BankHumeMO1971Security BankMarch 7, 2008August 28, 2012noneNaNNaN
517Douglass National BankKansas CityMO24660Liberty Bank and Trust CompanyJanuary 25, 2008October 26, 2012noneNaNNaN
518Miami Valley BankLakeviewOH16848The Citizens Banking CompanyOctober 4, 2007August 28, 2012noneNaNNaN
519NetBankAlpharettaGA32575ING DIRECTSeptember 28, 2007August 28, 2012noneNaNNaN
520Metropolitan Savings BankPittsburghPA35353Allegheny Valley Bank of PittsburghFebruary 2, 2007October 27, 2010noneNaNNaN
521Bank of EphraimEphraimUT1249Far West BankJune 25, 2004April 9, 2008noneNaNNaN
522Reliance BankWhite PlainsNY26778Union State BankMarch 19, 2004April 9, 2008noneNaNNaN
523Guaranty National Bank of TallahasseeTallahasseeFL26838Hancock Bank of FloridaMarch 12, 2004June 5, 2012noneNaNNaN
524Dollar Savings BankNewarkNJ31330No AcquirerFebruary 14, 2004April 9, 2008noneNaNNaN
525Pulaski Savings BankPhiladelphiaPA27203Earthstar BankNovember 14, 2003July 22, 2005noneNaNNaN
526First National Bank of BlanchardvilleBlanchardvilleWI11639The Park BankMay 9, 2003June 5, 2012noneNaNNaN
527Southern Pacific BankTorranceCA27094Beal BankFebruary 7, 2003October 20, 2008noneNaNNaN
528Farmers Bank of CheneyvilleCheneyvilleLA16445Sabine State Bank & TrustDecember 17, 2002October 20, 2004noneNaNNaN
529Bank of AlamoAlamoTN9961No AcquirerNovember 8, 2002March 18, 2005noneNaNNaN
530AmTrade International BankEn EspanolAtlantaGA33784No AcquirerSeptember 30, 2002September 11, 2006noneNaNNaN
531Universal Federal Savings BankChicagoIL29355Chicago Community BankJune 27, 2002April 9, 2008noneNaNNaN
532Connecticut Bank of CommerceStamfordCT19183Hudson United BankJune 26, 2002February 14, 2012noneNaNNaN
533New Century BankShelby TownshipMI34979No AcquirerMarch 28, 2002March 18, 2005noneNaNNaN
534Net 1st National BankBoca RatonFL26652Bank Leumi USAMarch 1, 2002April 9, 2008noneNaNNaN
535NextBank, NAPhoenixAZ22314No AcquirerFebruary 7, 2002February 5, 2015noneNaNNaN
536Oakwood Deposit Bank Co.OakwoodOH8966The State Bank & Trust CompanyFebruary 1, 2002October 25, 2012noneNaNNaN
537Bank of Sierra BlancaSierra BlancaTX22002The Security State Bank of PecosJanuary 18, 2002November 6, 2003noneNaNNaN
538Hamilton Bank, NAEn EspanolMiamiFL24382Israel Discount Bank of New YorkJanuary 11, 2002September 21, 2015noneNaNNaN
539Sinclair National BankGravetteAR34248Delta Trust & BankSeptember 7, 2001February 10, 2004noneNaNNaN
540Superior Bank, FSBHinsdaleIL32646Superior Federal, FSBJuly 27, 2001August 19, 2014noneNaNNaN
541Malta National BankMaltaOH6629North Valley BankMay 3, 2001November 18, 2002noneNaNNaN
542First Alliance Bank & Trust Co.ManchesterNH34264Southern New Hampshire Bank & TrustFebruary 2, 2001February 18, 2003noneNaNNaN
543National State Bank of MetropolisMetropolisIL3815Banterra Bank of MarionDecember 14, 2000March 17, 2005noneNaNNaN
544Bank of HonoluluHonoluluHI21029Bank of the OrientOctober 13, 2000March 17, 2005noneNaNNaN

545 rows × 10 columns

SQL Data

  • Note: If you are completely unfamiliar with SQL you can check out my other course: “Complete SQL Bootcamp” to learn SQL.

The pandas.io.sql module provides a collection of query wrappers to both facilitate data retrieval and to reduce dependency on DB-specific API. Database abstraction is provided by SQLAlchemy if installed. In addition you will need a driver library for your database. Examples of such drivers are psycopg2 for PostgreSQL or pymysql for MySQL. For SQLite this is included in Python’s standard library by default. You can find an overview of supported drivers for each SQL dialect in the SQLAlchemy docs.

If SQLAlchemy is not installed, a fallback is only provided for sqlite (and for mysql for backwards compatibility, but this is deprecated and will be removed in a future version). This mode requires a Python database adapter which respect the Python DB-API.

See also some cookbook examples for some advanced strategies.

The key functions are:

  • read_sql_table(table_name, con[, schema, …])
    • Read SQL database table into a DataFrame.
  • read_sql_query(sql, con[, index_col, …])
    • Read SQL query into a DataFrame.
  • read_sql(sql, con[, index_col, …])
    • Read SQL query or database table into a DataFrame.
  • DataFrame.to_sql(name, con[, flavor, …])
    • Write records stored in a DataFrame to a SQL database.
from sqlalchemy import create_engine
engine = create_engine('sqlite:///:memory:')
df.to_sql('data', engine)
sql_df = pd.read_sql('data',con=engine)
sql_df
indexabcd
000123
114567
22891011
3312131415
Greydon Gilmore
Greydon Gilmore
Ph.D. Candidate in Biomedical Engineering

My research interests include deep brain stimulation, machine learning and signal processing.

Previous
Next