Pandas cheat sheet

In my learning so far , I have observed that NumPy, Pandas and matplotlib are core parts of Python which are going to help in data analysis.
Here are some of the Pandas code snippets which I tested myself in Jupyter notebook.

==>to figure out number of duplicate rows
df1['is_duplicated'] = df1.duplicated(['col1', 'Col2'.....'coln'])
print(df1['is_duplicated'].sum())

==> to figure out rows with missing values ..
sum(df1.apply(lambda x: sum(x.isnull().values), axis = 1)>0)

==> to figure out unique values for a column
np.unique(df1['column name'])

==>to drop a column
df.drop(['col_name', axis=1, inplace=True)

==> to replace a column
df_08.rename(columns=lambda x: x.strip().lower().replace(" ", "_"), inplace=True)

==> to rename a column
df = df.rename(columns={'old name': 'new name'})

==> replace spaces with underscore
df.rename(columns=lambda x: x.strip().lower().replace(" ", "_"), inplace=True)

==> # filter datasets for rows by a coulmn value
df.loc[df['col_name'] == 'value']

==> to figure out no of null values in a column
sum(pd.isnull(df['col_name']))

==>drop rows where any of the column value is missing.
df.dropna(inplace=True)

==> to check if any of the column contains missing value
df.isnull().sum().any()

==> to know the number of duplicate records..
print(df.duplicated().count())

==> to drop duplicate records
df.drop_duplicates()

Basics of Data Science

Search This Blog

Pandas cheat sheet

Comments

Post a Comment

Popular posts from this blog

Karnataka Election Stats

The journey of a thousand miles..

WeRateDogs Complete Project