Pandas

satya - 12/13/2022, 11:37:51 AM

Understand python group by

Understand python group by

Search for: Understand python group by

satya - 12/13/2022, 11:38:19 AM

From tutorials point on groupby

From tutorials point on groupby

satya - 12/13/2022, 11:40:09 AM

Creating a dataframe from memory


# import the pandas library
import pandas as pd

ipl_data = {'Team': ['Riders', 'Riders', 'Devils', 'Devils', 'Kings',
   'kings', 'Kings', 'Kings', 'Riders', 'Royals', 'Royals', 'Riders'],
   'Rank': [1, 2, 2, 3, 3,4 ,1 ,1,2 , 4,1,2],
   'Year': [2014,2015,2014,2015,2014,2015,2016,2017,2016,2014,2015,2017],
   'Points':[876,789,863,673,741,812,756,788,694,701,804,690]}
df = pd.DataFrame(ipl_data)

print df.groupby('Team')

satya - 12/13/2022, 11:42:59 AM

Understanding


1. hash table {}
2. {key: value, key:value}
3. [a,b,c,d]: Tuple or list of values
4. {key:[a,b,c,d], key2:[x,y,z,a]}
5. A table of columns!

satya - 12/13/2022, 11:44:01 AM

Way groups are represented


{'Kings': Int64Index([4, 6, 7],      dtype='int64'),
'Devils': Int64Index([2, 3],         dtype='int64'),
'Riders': Int64Index([0, 1, 8, 11],  dtype='int64'),
'Royals': Int64Index([9, 10],        dtype='int64'),
'kings' : Int64Index([5],            dtype='int64')}

satya - 12/13/2022, 11:45:44 AM

Python collection symbols


Lists: [a,b,c] #Changeable
Tuples: (a,b,c) #Unchangeable
Sets: {a,b,c} #unordered, unindexed
Dictionaries: {a:b, c:d} #uordered, changeable, indexed

satya - 12/13/2022, 11:47:21 AM

Understanding


Group -> list of rows from the original data frame

satya - 12/13/2022, 11:49:40 AM

Group structure


grouped = df.groupby('Year')

for name,group in grouped:
   print name
   print group

satya - 12/13/2022, 11:51:41 AM

An example


2014
   Points  Rank     Team   Year
0     876     1   Riders   2014
2     863     2   Devils   2014
4     741     3   Kings    2014
9     701     4   Royals   2014

2015
   Points  Rank     Team   Year
1     789     2   Riders   2015
3     673     3   Devils   2015
5     812     4    kings   2015
10    804     1   Royals   2015

2016
   Points  Rank     Team   Year
6     756     1    Kings   2016
8     694     2   Riders   2016

2017
   Points  Rank    Team   Year
7     788     1   Kings   2017
11    690     2  Riders   2017

satya - 12/13/2022, 11:52:13 AM

The two parts appears to be


name
group

satya - 12/13/2022, 11:54:09 AM

Getting one group


grouped = df.groupby('Year')
print grouped.get_group(2014)

will print

   Points  Rank     Team    Year
0     876     1   Riders    2014
2     863     2   Devils    2014
4     741     3   Kings     2014
9     701     4   Royals    2014

satya - 12/13/2022, 11:56:08 AM

Can you override array operators on an object in python

Can you override array operators on an object in python

Search for: Can you override array operators on an object in python

satya - 12/13/2022, 12:35:09 PM

How can I convert Series to a Dataframe in pandas?

How can I convert Series to a Dataframe in pandas?

Search for: How can I convert Series to a Dataframe in pandas?

satya - 12/13/2022, 12:45:57 PM

How do I list columns in a dataframe

How do I list columns in a dataframe

Search for: How do I list columns in a dataframe

satya - 12/13/2022, 12:46:07 PM

Most commonly used functions on a dataframe

Most commonly used functions on a dataframe

Search for: Most commonly used functions on a dataframe

satya - 12/13/2022, 12:50:52 PM

Couple of ways


print (df)
col-list = list(df)
col-list = df.columns.values.tolist()
print (col-list)

satya - 12/13/2022, 12:51:55 PM

This is very good list: Most used functions in DS

This is very good list: Most used functions in DS

satya - 12/13/2022, 12:53:53 PM

Here is another one

Here is another one

satya - 12/13/2022, 12:54:45 PM

Another one

Another one

satya - 12/13/2022, 1:04:15 PM

What are double brackets after a group by in pandas

What are double brackets after a group by in pandas

Search for: What are double brackets after a group by in pandas

satya - 12/13/2022, 1:06:08 PM

Understanding double brackets

Understanding double brackets

satya - 12/13/2022, 3:09:34 PM

Broaded picture: A full example with matlib, data science, pandas

Broaded picture: A full example with matlib, data science, pandas

satya - 12/13/2022, 3:14:07 PM

what does reset_index() does in pandas?

what does reset_index() does in pandas?

Search for: what does reset_index() does in pandas?

satya - 12/14/2022, 8:14:35 AM

How do you filter using panda data frames

How do you filter using panda data frames

Search for: How do you filter using panda data frames

satya - 12/14/2022, 8:15:28 AM

Filtering data using data frames

Filtering data using data frames

satya - 12/14/2022, 8:17:47 AM

Data normalization with Pandas and Scikit-Learn

Data normalization with Pandas and Scikit-Learn

satya - 12/14/2022, 8:18:39 AM

Identify Outliers With Pandas, Statsmodels, and Seaborn

Identify Outliers With Pandas, Statsmodels, and Seaborn

satya - 12/14/2022, 11:18:37 AM

The mystery of the double [] on data frame


# obtain a Series object by passing in a string to the indexing operator
df_employees['salary']

# obtain a DataFrame object by passing a list with a single item to the indexing operator
df_employees[['salary']]

#Because the ['abc'] is a list owing to the brackets

satya - 12/14/2022, 11:34:27 AM

Some examples


# Selection
df.columname
df['column-name']
df[['col1', 'col2']]
df.select_dtypes(include=np.number)
df.select_dtypes(include='number')
df.info()
df.dtypes

#row index
df.index
df.loc('index-name') #row number

satya - 12/14/2022, 11:34:37 AM

Pandas cheatsheet

Show images for: Pandas cheatsheet

satya - 12/29/2022, 8:06:48 AM

How do I replace nan values in a df?


#Replace all nan values
df.fillna(value)

#Or provide options
df.interpolate()

#Drop all rows that have nan values
df.dropna()

#in a specific column
df["col"].fillna()

#That will not change the dataset
#to do that
df.col = df.col.fillna()

#interpolate works the same

satya - 12/29/2022, 8:32:03 AM

Understanding the where clause in pandas and its differences with the index operator

Understanding the where clause in pandas and its differences with the index operator

satya - 12/29/2022, 10:11:15 AM

Couple of things


df.col.isna()
..notna()
..isnull()
..notnull()

#you can use these in where clause
#Or in an index operator

satya - 12/29/2022, 10:11:30 AM

is there an option on the where() method of df to just return the matching rows?

is there an option on the where() method of df to just return the matching rows?

Search for: is there an option on the where() method of df to just return the matching rows?

satya - 12/29/2022, 10:14:36 AM

Pandas dataframe query syntax

Pandas dataframe query syntax

Search for: Pandas dataframe query syntax

satya - 12/29/2022, 11:44:51 AM

Pandas cookbook

Pandas cookbook

satya - 12/29/2022, 12:08:13 PM

More on selection

1. The indexing operators are convenience methods and recommendation for large data sets is to use the methods instead

2. Operations may return a reference (views) or copies

3. loc and the index operator are similar

4. both return views

5. where() is a bit different and has strange semantics

satya - 12/29/2022, 12:45:00 PM

How do I update columns in a pandas dataframe where certain rows match?

How do I update columns in a pandas dataframe where certain rows match?

Search for: How do I update columns in a pandas dataframe where certain rows match?

satya - 12/29/2022, 12:47:45 PM

This is done through loc method


df.loc(Condition, [list-of-columns])
= [list-of-values]