Pandas
satya - 12/13/2022, 11:37:51 AM
Understand python group by
Understand python group by
satya - 12/13/2022, 11:40:09 AM
Creating a dataframe from memory
# import the pandas library
import pandas as pd
ipl_data = {'Team': ['Riders', 'Riders', 'Devils', 'Devils', 'Kings',
'kings', 'Kings', 'Kings', 'Riders', 'Royals', 'Royals', 'Riders'],
'Rank': [1, 2, 2, 3, 3,4 ,1 ,1,2 , 4,1,2],
'Year': [2014,2015,2014,2015,2014,2015,2016,2017,2016,2014,2015,2017],
'Points':[876,789,863,673,741,812,756,788,694,701,804,690]}
df = pd.DataFrame(ipl_data)
print df.groupby('Team')
satya - 12/13/2022, 11:42:59 AM
Understanding
1. hash table {}
2. {key: value, key:value}
3. [a,b,c,d]: Tuple or list of values
4. {key:[a,b,c,d], key2:[x,y,z,a]}
5. A table of columns!
satya - 12/13/2022, 11:44:01 AM
Way groups are represented
{'Kings': Int64Index([4, 6, 7], dtype='int64'),
'Devils': Int64Index([2, 3], dtype='int64'),
'Riders': Int64Index([0, 1, 8, 11], dtype='int64'),
'Royals': Int64Index([9, 10], dtype='int64'),
'kings' : Int64Index([5], dtype='int64')}
satya - 12/13/2022, 11:45:44 AM
Python collection symbols
Lists: [a,b,c] #Changeable
Tuples: (a,b,c) #Unchangeable
Sets: {a,b,c} #unordered, unindexed
Dictionaries: {a:b, c:d} #uordered, changeable, indexed
satya - 12/13/2022, 11:47:21 AM
Understanding
Group -> list of rows from the original data frame
satya - 12/13/2022, 11:49:40 AM
Group structure
grouped = df.groupby('Year')
for name,group in grouped:
print name
print group
satya - 12/13/2022, 11:51:41 AM
An example
2014
Points Rank Team Year
0 876 1 Riders 2014
2 863 2 Devils 2014
4 741 3 Kings 2014
9 701 4 Royals 2014
2015
Points Rank Team Year
1 789 2 Riders 2015
3 673 3 Devils 2015
5 812 4 kings 2015
10 804 1 Royals 2015
2016
Points Rank Team Year
6 756 1 Kings 2016
8 694 2 Riders 2016
2017
Points Rank Team Year
7 788 1 Kings 2017
11 690 2 Riders 2017
satya - 12/13/2022, 11:52:13 AM
The two parts appears to be
name
group
satya - 12/13/2022, 11:54:09 AM
Getting one group
grouped = df.groupby('Year')
print grouped.get_group(2014)
will print
Points Rank Team Year
0 876 1 Riders 2014
2 863 2 Devils 2014
4 741 3 Kings 2014
9 701 4 Royals 2014
satya - 12/13/2022, 11:56:08 AM
Can you override array operators on an object in python
Can you override array operators on an object in python
Search for: Can you override array operators on an object in python
satya - 12/13/2022, 12:35:09 PM
How can I convert Series to a Dataframe in pandas?
How can I convert Series to a Dataframe in pandas?
Search for: How can I convert Series to a Dataframe in pandas?
satya - 12/13/2022, 12:45:57 PM
How do I list columns in a dataframe
How do I list columns in a dataframe
satya - 12/13/2022, 12:46:07 PM
Most commonly used functions on a dataframe
Most commonly used functions on a dataframe
satya - 12/13/2022, 12:50:52 PM
Couple of ways
print (df)
col-list = list(df)
col-list = df.columns.values.tolist()
print (col-list)
satya - 12/13/2022, 12:51:55 PM
This is very good list: Most used functions in DS
satya - 12/13/2022, 1:04:15 PM
What are double brackets after a group by in pandas
What are double brackets after a group by in pandas
Search for: What are double brackets after a group by in pandas
satya - 12/13/2022, 3:09:34 PM
Broaded picture: A full example with matlib, data science, pandas
Broaded picture: A full example with matlib, data science, pandas
satya - 12/13/2022, 3:14:07 PM
what does reset_index() does in pandas?
what does reset_index() does in pandas?
satya - 12/14/2022, 8:14:35 AM
How do you filter using panda data frames
How do you filter using panda data frames
satya - 12/14/2022, 8:17:47 AM
Data normalization with Pandas and Scikit-Learn
satya - 12/14/2022, 8:18:39 AM
Identify Outliers With Pandas, Statsmodels, and Seaborn
satya - 12/14/2022, 11:18:37 AM
The mystery of the double [] on data frame
# obtain a Series object by passing in a string to the indexing operator
df_employees['salary']
# obtain a DataFrame object by passing a list with a single item to the indexing operator
df_employees[['salary']]
#Because the ['abc'] is a list owing to the brackets
satya - 12/14/2022, 11:34:27 AM
Some examples
# Selection
df.columname
df['column-name']
df[['col1', 'col2']]
df.select_dtypes(include=np.number)
df.select_dtypes(include='number')
df.info()
df.dtypes
#row index
df.index
df.loc('index-name') #row number
satya - 12/29/2022, 8:06:48 AM
How do I replace nan values in a df?
#Replace all nan values
df.fillna(value)
#Or provide options
df.interpolate()
#Drop all rows that have nan values
df.dropna()
#in a specific column
df["col"].fillna()
#That will not change the dataset
#to do that
df.col = df.col.fillna()
#interpolate works the same
satya - 12/29/2022, 8:32:03 AM
Understanding the where clause in pandas and its differences with the index operator
Understanding the where clause in pandas and its differences with the index operator
satya - 12/29/2022, 10:11:15 AM
Couple of things
df.col.isna()
..notna()
..isnull()
..notnull()
#you can use these in where clause
#Or in an index operator
satya - 12/29/2022, 10:11:30 AM
is there an option on the where() method of df to just return the matching rows?
is there an option on the where() method of df to just return the matching rows?
Search for: is there an option on the where() method of df to just return the matching rows?
satya - 12/29/2022, 10:14:36 AM
Pandas dataframe query syntax
Pandas dataframe query syntax
satya - 12/29/2022, 12:08:13 PM
More on selection
1. The indexing operators are convenience methods and recommendation for large data sets is to use the methods instead
2. Operations may return a reference (views) or copies
3. loc and the index operator are similar
4. both return views
5. where() is a bit different and has strange semantics
satya - 12/29/2022, 12:45:00 PM
How do I update columns in a pandas dataframe where certain rows match?
How do I update columns in a pandas dataframe where certain rows match?
Search for: How do I update columns in a pandas dataframe where certain rows match?
satya - 12/29/2022, 12:47:45 PM
This is done through loc method
df.loc(Condition, [list-of-columns])
= [list-of-values]