Basics of Data Science

Posts

Pandas cheat sheet

In my learning so far , I have observed that NumPy, Pandas and matplotlib are core parts of Python which are going to help in data analysis. Here are some of the Pandas code snippets which I tested myself in Jupyter notebook. ==>to figure out number of duplicate rows df1['is_duplicated'] = df1.duplicated(['col1', 'Col2'.....'coln']) print(df1['is_duplicated'].sum()) ==> to figure out rows with missing values .. sum(df1.apply(lambda x: sum(x.isnull().values), axis = 1)>0) ==> to figure out unique values for a column np.unique(df1['column name']) ==>to drop a column df.drop(['col_name', axis=1, inplace=True) ==> to replace a column df_08.rename(columns=lambda x: x.strip().lower().replace(" ", "_"), inplace=True) ==> to rename a column df = df.rename(columns={'old name': 'new name'}) ==> replace spaces with underscore df.rename(columns=lamb...

The journey of a thousand miles..

"The journey of thousand miles begins with one step", this famous quote by Lao Tzu explains my current state of mind very well. And that journey, which I am embarking on, is the field of data science. This blog whose sole purpose is to share my growth from infancy to maturity , is going to be the testimony of my growth, my ups and downs and all the relevant experiences. Wish me good luck!