Pandas

Understand python group by

Search for: Understand python group by

From tutorials point on groupby


# import the pandas library
import pandas as pd

ipl_data = {'Team': ['Riders', 'Riders', 'Devils', 'Devils', 'Kings',
   'kings', 'Kings', 'Kings', 'Riders', 'Royals', 'Royals', 'Riders'],
   'Rank': [1, 2, 2, 3, 3,4 ,1 ,1,2 , 4,1,2],
   'Year': [2014,2015,2014,2015,2014,2015,2016,2017,2016,2014,2015,2017],
   'Points':[876,789,863,673,741,812,756,788,694,701,804,690]}
df = pd.DataFrame(ipl_data)

print df.groupby('Team')

1. hash table {}
2. {key: value, key:value}
3. [a,b,c,d]: Tuple or list of values
4. {key:[a,b,c,d], key2:[x,y,z,a]}
5. A table of columns!

{'Kings': Int64Index([4, 6, 7],      dtype='int64'),
'Devils': Int64Index([2, 3],         dtype='int64'),
'Riders': Int64Index([0, 1, 8, 11],  dtype='int64'),
'Royals': Int64Index([9, 10],        dtype='int64'),
'kings' : Int64Index([5],            dtype='int64')}

Lists: [a,b,c] #Changeable
Tuples: (a,b,c) #Unchangeable
Sets: {a,b,c} #unordered, unindexed
Dictionaries: {a:b, c:d} #uordered, changeable, indexed

Group -> list of rows from the original data frame

grouped = df.groupby('Year')

for name,group in grouped:
   print name
   print group

2014
   Points  Rank     Team   Year
0     876     1   Riders   2014
2     863     2   Devils   2014
4     741     3   Kings    2014
9     701     4   Royals   2014

2015
   Points  Rank     Team   Year
1     789     2   Riders   2015
3     673     3   Devils   2015
5     812     4    kings   2015
10    804     1   Royals   2015

2016
   Points  Rank     Team   Year
6     756     1    Kings   2016
8     694     2   Riders   2016

2017
   Points  Rank    Team   Year
7     788     1   Kings   2017
11    690     2  Riders   2017

name
group

grouped = df.groupby('Year')
print grouped.get_group(2014)

will print

   Points  Rank     Team    Year
0     876     1   Riders    2014
2     863     2   Devils    2014
4     741     3   Kings     2014
9     701     4   Royals    2014

Can you override array operators on an object in python

Search for: Can you override array operators on an object in python

How can I convert Series to a Dataframe in pandas?

Search for: How can I convert Series to a Dataframe in pandas?

How do I list columns in a dataframe

Search for: How do I list columns in a dataframe

Most commonly used functions on a dataframe

Search for: Most commonly used functions on a dataframe


print (df)
col-list = list(df)
col-list = df.columns.values.tolist()
print (col-list)

This is very good list: Most used functions in DS

Here is another one

Another one

What are double brackets after a group by in pandas

Search for: What are double brackets after a group by in pandas

Understanding double brackets

Broaded picture: A full example with matlib, data science, pandas

what does reset_index() does in pandas?

Search for: what does reset_index() does in pandas?

How do you filter using panda data frames

Search for: How do you filter using panda data frames

Filtering data using data frames

Data normalization with Pandas and Scikit-Learn

Identify Outliers With Pandas, Statsmodels, and Seaborn


# obtain a Series object by passing in a string to the indexing operator
df_employees['salary']

# obtain a DataFrame object by passing a list with a single item to the indexing operator
df_employees[['salary']]

#Because the ['abc'] is a list owing to the brackets

# Selection
df.columname
df['column-name']
df[['col1', 'col2']]
df.select_dtypes(include=np.number)
df.select_dtypes(include='number')
df.info()
df.dtypes

#row index
df.index
df.loc('index-name') #row number

Show images for: Pandas cheatsheet


#Replace all nan values
df.fillna(value)

#Or provide options
df.interpolate()

#Drop all rows that have nan values
df.dropna()

#in a specific column
df["col"].fillna()

#That will not change the dataset
#to do that
df.col = df.col.fillna()

#interpolate works the same

Understanding the where clause in pandas and its differences with the index operator


df.col.isna()
..notna()
..isnull()
..notnull()

#you can use these in where clause
#Or in an index operator

is there an option on the where() method of df to just return the matching rows?

Search for: is there an option on the where() method of df to just return the matching rows?

Pandas dataframe query syntax

Search for: Pandas dataframe query syntax

Pandas cookbook

1. The indexing operators are convenience methods and recommendation for large data sets is to use the methods instead

2. Operations may return a reference (views) or copies

3. loc and the index operator are similar

4. both return views

5. where() is a bit different and has strange semantics

How do I update columns in a pandas dataframe where certain rows match?

Search for: How do I update columns in a pandas dataframe where certain rows match?


df.loc(Condition, [list-of-columns])
= [list-of-values]