Skip to main content

Pandas cheat sheet


In my learning so far , I have observed that NumPy, Pandas and matplotlib are core parts of Python which are going to help in data analysis.
Here are some of the Pandas code snippets which I tested myself in Jupyter notebook.

==>to figure out number of duplicate rows
df1['is_duplicated'] = df1.duplicated(['col1', 'Col2'.....'coln'])
print(df1['is_duplicated'].sum())

==> to figure out rows with missing values ..
sum(df1.apply(lambda x: sum(x.isnull().values), axis = 1)>0)

==> to figure out unique values for a column
       np.unique(df1['column name'])

==>to drop a column
df.drop(['col_name', axis=1, inplace=True)

==> to replace a column
df_08.rename(columns=lambda x: x.strip().lower().replace(" ", "_"), inplace=True)

==> to rename a column
df = df.rename(columns={'old name': 'new name'})


==> replace spaces with underscore
df.rename(columns=lambda x: x.strip().lower().replace(" ", "_"), inplace=True)

==> # filter datasets for rows by a coulmn value
df.loc[df['col_name'] == 'value']

==> to figure out no of null values in a column
sum(pd.isnull(df['col_name']))

==>drop rows where any of the column value is missing.
df.dropna(inplace=True)

==> to check if any of the column contains missing value
df.isnull().sum().any()

==> to know the number of duplicate records..
print(df.duplicated().count())

==> to drop duplicate records
df.drop_duplicates()

Comments

Popular posts from this blog

The journey of a thousand miles..

"The journey of thousand miles begins with one step", this famous quote by Lao Tzu explains my current state of mind very well. And that journey, which I am embarking on, is the field of data science. This blog whose sole purpose is to share my growth from infancy to maturity , is going to be the testimony of my growth, my ups and downs and all the relevant experiences. Wish me good luck! 

WeRateDogs Complete Project

wrangle_act #Import libraries In [1]: import pandas as pd import requests import tweepy from tweepy import OAuthHandler from tweepy import API from tweepy import Cursor import time import datetime as dt import matplotlib import matplotlib.pyplot as plt import seaborn as sns % matplotlib inline #data gathering section Do the following activities: 1.Read from archive file ¶ 2.Read from TSV file with URL 3.Read from twitter via Twitter API In [2]: #Read CSV file into a dataframe using pandas read-csv function. archive_df = pd . read_csv ( 'twitter-archive-enhanced.csv' ) In [3]: #Read TSV file from a URL using requests function. url = 'https://d17h27t6h515a5.cloudfront.net/topher/2017/August/599fd2ad_image...