In my learning so far , I have observed that NumPy, Pandas and matplotlib are core parts of Python which are going to help in data analysis.
Here are some of the Pandas code snippets which I tested myself in Jupyter notebook.
==>to figure out number of duplicate rows
df1['is_duplicated'] = df1.duplicated(['col1', 'Col2'.....'coln'])
print(df1['is_duplicated'].sum())
==> to figure out rows with missing values ..
sum(df1.apply(lambda x: sum(x.isnull().values), axis = 1)>0)
==> to figure out unique values for a column
np.unique(df1['column name'])
==>to drop a column
df.drop(['col_name', axis=1, inplace=True)
==> to replace a column
df_08.rename(columns=lambda x: x.strip().lower().replace(" ", "_"), inplace=True)
==> to rename a column
df = df.rename(columns={'old name': 'new name'})
==> replace spaces with underscore
df.rename(columns=lambda x: x.strip().lower().replace(" ", "_"), inplace=True)
==> # filter datasets for rows by a coulmn value
df.loc[df['col_name'] == 'value']
==> to figure out no of null values in a column
sum(pd.isnull(df['col_name']))
==>drop rows where any of the column value is missing.
df.dropna(inplace=True)
==> to check if any of the column contains missing value
df.isnull().sum().any()
==> to know the number of duplicate records..
print(df.duplicated().count())
==> to drop duplicate records
df.drop_duplicates()
Comments
Post a Comment