Skip to main content

Visualization on Data Wrangling project WeRateDogs

visualization section begins here....
In [70]:
# Import the clean dataset into dataframe
df_master = pd.read_csv('twitter_archive_master.csv')
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2356 entries, 0 to 2355
Data columns (total 27 columns):
tweet_id                    2356 non-null int64
timestamp                   2356 non-null object
source                      2356 non-null object
text                        2356 non-null object
retweeted_status_user_id    181 non-null float64
expanded_urls               2297 non-null object
rating_out_of_10            2356 non-null int64
name                        2356 non-null object
doggo                       2356 non-null object
floofer                     2356 non-null object
pupper                      2356 non-null object
puppo                       2356 non-null object
jpg_url                     2075 non-null object
img_num                     2075 non-null float64
category1                   2075 non-null object
cat1_conf                   2075 non-null float64
cat1_dog                    2075 non-null object
category2                   2075 non-null object
cat2_conf                   2075 non-null float64
cat2_dog                    2075 non-null object
category3                   2075 non-null object
cat3_conf                   2075 non-null float64
cat3_dog                    2075 non-null object
favorites                   2345 non-null float64
retweets                    2345 non-null float64
user_followers              2345 non-null float64
user_favourites             2345 non-null float64
dtypes: float64(9), int64(2), object(16)
memory usage: 497.0+ KB
tweet_id timestamp source text retweeted_status_user_id expanded_urls rating_out_of_10 name doggo floofer ... category2 cat2_conf cat2_dog category3 cat3_conf cat3_dog favorites retweets user_followers user_favourites
0 892420643555336193 2017-08-01 16:23:56 +0000 <a href="" r... This is Phineas. He's a mystical boy. Only eve... NaN 13 Phineas None None ... bagel 0.085851 False banana 0.076110 False 38693.0 8558.0 6990257.0 134514.0
1 892177421306343426 2017-08-01 00:17:27 +0000 <a href="" r... This is Tilly. She's just checking pup on you.... NaN 13 Tilly None None ... Pekinese 0.090647 True papillon 0.068957 True 33169.0 6294.0 6990257.0 134514.0
2 891815181378084864 2017-07-31 00:18:03 +0000 <a href="" r... This is Archie. He is a rare Norwegian Pouncin... NaN 12 Archie None None ... malamute 0.078253 True kelpie 0.031379 True 24965.0 4176.0 6990257.0 134514.0
3 891689557279858688 2017-07-30 15:58:51 +0000 <a href="" r... This is Darla. She commenced a snooze mid meal... NaN 13 Darla None None ... Labrador_retriever 0.168086 True spatula 0.040836 False 42080.0 8691.0 6990257.0 134514.0
4 891327558926688256 2017-07-29 16:00:24 +0000 <a href="" r... This is Franklin. He would like you to stop ca... NaN 12 Franklin None None ... English_springer 0.225770 True German_short-haired_pointer 0.175219 True 40226.0 9452.0 6990927.0 134515.0
5 891087950875897856 2017-07-29 00:08:17 +0000 <a href="" r... Here we have a majestic great white breaching ... NaN 13 None None None ... Irish_terrier 0.116317 True Indian_elephant 0.076902 False 20173.0 3130.0 6990257.0 134514.0
6 890971913173991426 2017-07-28 16:27:12 +0000 <a href="" r... Meet Jax. He enjoys ice cream so much he gets ... NaN,ht... 13 Jax None None ... Border_collie 0.199287 True ice_lolly 0.193548 False 11820.0 2083.0 6990257.0 134514.0
7 890729181411237888 2017-07-28 00:22:40 +0000 <a href="" r... When you watch your owner call another dog a g... NaN 13 None None None ... Eskimo_dog 0.178406 True Pembroke 0.076507 True 65368.0 18984.0 6990257.0 134514.0
8 890609185150312448 2017-07-27 16:25:51 +0000 <a href="" r... This is Zoey. She doesn't want to be one of th... NaN 13 Zoey None None ... Irish_setter 0.193054 True Chesapeake_Bay_retriever 0.118184 True 27726.0 4282.0 6990257.0 134514.0
9 890240255349198849 2017-07-26 15:59:51 +0000 <a href="" r... This is Cassie. She is a college pup. Studying... NaN 14 Cassie doggo None ... Cardigan 0.451038 True Chihuahua 0.029248 True 31871.0 7453.0 6990257.0 134514.0
10 890006608113172480 2017-07-26 00:31:25 +0000 <a href="" r... This is Koda. He is a South Australian decksha... NaN 13 Koda None None ... Pomeranian 0.013884 True chow 0.008167 True 30586.0 7367.0 6990257.0 134514.0
11 889880896479866881 2017-07-25 16:11:53 +0000 <a href="" r... This is Bruno. He is a service shark. Only get... NaN 13 Bruno None None ... Labrador_retriever 0.151317 True muzzle 0.082981 False 27728.0 4993.0 6990257.0 134514.0
12 889665388333682689 2017-07-25 01:55:32 +0000 <a href="" r... Here's a puppo that seems to be on the fence a... NaN 13 None None None ... Cardigan 0.027356 True basenji 0.004633 True 48017.0 10111.0 6990257.0 134514.0
13 889638837579907072 2017-07-25 00:10:02 +0000 <a href="" r... This is Ted. He does his best. Sometimes that'... NaN 12 Ted None None ... boxer 0.002129 True Staffordshire_bullterrier 0.001498 True 27116.0 4567.0 6990927.0 134515.0
14 889531135344209921 2017-07-24 17:02:04 +0000 <a href="" r... This is Stuart. He's sporting his favorite fan... NaN 13 Stuart None None ... Labrador_retriever 0.013834 True redbone 0.007958 True 15066.0 2243.0 6990257.0 134514.0
15 889278841981685760 2017-07-24 00:19:32 +0000 <a href="" r... This is Oliver. You're witnessing one of his m... NaN 13 Oliver None None ... borzoi 0.194742 True Saluki 0.027351 True 25247.0 5452.0 6990257.0 134514.0
16 888917238123831296 2017-07-23 00:22:39 +0000 <a href="" r... This is Jim. He found a fren. Taught him how t... NaN 12 Jim None None ... Tibetan_mastiff 0.120184 True Labrador_retriever 0.105506 True 29014.0 4518.0 6990257.0 134514.0
17 888804989199671297 2017-07-22 16:56:37 +0000 <a href="" r... This is Zeke. He has a new stick. Very proud o... NaN 13 Zeke None None ... Labrador_retriever 0.184172 True English_setter 0.073482 True 25536.0 4364.0 6990257.0 134514.0
18 888554962724278272 2017-07-22 00:23:06 +0000 <a href="" r... This is Ralphus. He's powering up. Attempting ... NaN 13 Ralphus None None ... Eskimo_dog 0.166511 True malamute 0.111411 True 19859.0 3597.0 6990257.0 134514.0
19 888202515573088257 2017-07-21 01:02:36 +0000 <a href="" r... RT @dog_rates: This is Canela. She attempted s... 4.196984e+09 13 Canela None None ... Rhodesian_ridgeback 0.054950 True beagle 0.038915 True NaN NaN NaN NaN
20 888078434458587136 2017-07-20 16:49:33 +0000 <a href="" r... This is Gerald. He was just told he didn't get... NaN 12 Gerald None None ... pug 0.000932 True bull_mastiff 0.000903 True 21707.0 3511.0 6990257.0 134514.0
21 887705289381826560 2017-07-19 16:06:48 +0000 <a href="" r... This is Jeffrey. He has a monopoly on the pool... NaN 13 Jeffrey None None ... redbone 0.087582 True Weimaraner 0.026236 True 30118.0 5417.0 6990257.0 134514.0
22 887517139158093824 2017-07-19 03:39:09 +0000 <a href="" r... I've yet to rate a Venezuelan Hover Wiener. Th... NaN 14 such None None ... tow_truck 0.029175 False shopping_cart 0.026321 False 46129.0 11719.0 6990257.0 134514.0
23 887473957103951883 2017-07-19 00:47:34 +0000 <a href="" r... This is Canela. She attempted some fancy porch... NaN 13 Canela None None ... Rhodesian_ridgeback 0.054950 True beagle 0.038915 True 68965.0 18311.0 6990257.0 134514.0
24 887343217045368832 2017-07-18 16:08:03 +0000 <a href="" r... You may not have known you needed to see this ... NaN 13 None None None ... sea_lion 0.275645 False Weimaraner 0.134203 True 33602.0 10451.0 6990257.0 134514.0
25 887101392804085760 2017-07-18 00:07:08 +0000 <a href="" r... This... is a Jubilant Antarctic House Bear. We... NaN 12 None None None ... Eskimo_dog 0.035029 True Staffordshire_bullterrier 0.029705 True 30459.0 5989.0 6990257.0 134514.0
26 886983233522544640 2017-07-17 16:17:36 +0000 <a href="" r... This is Maya. She's very shy. Rarely leaves he... NaN 13 Maya None None ... toy_terrier 0.143528 True can_opener 0.032253 False 35070.0 7809.0 6990257.0 134514.0
27 886736880519319552 2017-07-16 23:58:41 +0000 <a href="" r... This is Mingus. He's a wonderful father to his... NaN,https:/... 13 Mingus None None ... Great_Pyrenees 0.186136 True Dandie_Dinmont 0.086346 True 12042.0 3312.0 6990257.0 134514.0
28 886680336477933568 2017-07-16 20:14:00 +0000 <a href="" r... This is Derek. He's late for a dog meeting. 13... NaN 13 Derek None None ... sports_car 0.139952 False car_wheel 0.044173 False 22368.0 4489.0 6990257.0 134514.0
29 886366144734445568 2017-07-15 23:25:31 +0000 <a href="" r... This is Roscoe. Another pupper fallen victim t... NaN 12 Roscoe None None ... Chihuahua 0.000361 True Boston_bull 0.000076 True 21148.0 3210.0 6990257.0 134514.0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
2326 666411507551481857 2015-11-17 00:24:19 +0000 <a href="" r... This is quite the dog. Gets really excited whe... NaN 2 quite None None ... barracouta 0.271485 False gar 0.189945 False 448.0 328.0 6990283.0 134514.0
2327 666407126856765440 2015-11-17 00:06:54 +0000 <a href="" r... This is a southern Vesuvius bumblegruff. Can d... NaN 7 None None None ... bloodhound 0.244220 True flat-coated_retriever 0.173810 True 110.0 41.0 6990283.0 134514.0
2328 666396247373291520 2015-11-16 23:23:41 +0000 <a href="" r... Oh goodness. A super rare northeast Qdoba kang... NaN 9 None None None ... toy_terrier 0.009397 True papillon 0.004577 True 166.0 86.0 6990283.0 134514.0
2329 666373753744588802 2015-11-16 21:54:18 +0000 <a href="" r... Those are sunglasses and a jean jacket. 11/10 ... NaN 11 None None None ... Afghan_hound 0.259551 True briard 0.206803 True 189.0 93.0 6990283.0 134514.0
2330 666362758909284353 2015-11-16 21:10:36 +0000 <a href="" r... Unique dog here. Very small. Lives in containe... NaN 6 None None None ... skunk 0.002402 False hamster 0.000461 False 779.0 574.0 6990283.0 134514.0
2331 666353288456101888 2015-11-16 20:32:58 +0000 <a href="" r... Here we have a mixed Asiago from the GalƔpagos... NaN 8 None None None ... Siberian_husky 0.147655 True Eskimo_dog 0.093412 True 221.0 73.0 6990283.0 134514.0
2332 666345417576210432 2015-11-16 20:01:42 +0000 <a href="" r... Look at this jokester thinking seat belt laws ... NaN 10 None None None ... Chesapeake_Bay_retriever 0.054787 True Labrador_retriever 0.014241 True 298.0 139.0 6990283.0 134514.0
2333 666337882303524864 2015-11-16 19:31:45 +0000 <a href="" r... This is an extremely rare horned Parthenon. No... NaN 9 None None None ... Newfoundland 0.278407 True groenendael 0.102643 True 199.0 92.0 6990283.0 134514.0
2334 666293911632134144 2015-11-16 16:37:02 +0000 <a href="" r... This is a funny dog. Weird toes. Won't come do... NaN 3 None None None ... otter 0.015250 False great_grey_owl 0.013207 False 509.0 357.0 6990283.0 134514.0
2335 666287406224695296 2015-11-16 16:11:11 +0000 <a href="" r... This is an Albanian 3 1/2 legged Episcopalian... NaN 1 None None None ... toy_poodle 0.063064 True miniature_poodle 0.025581 True 148.0 66.0 6990283.0 134514.0
2336 666273097616637952 2015-11-16 15:14:19 +0000 <a href="" r... Can take selfies 11/10 NaN 11 None None None ... toy_terrier 0.111884 True basenji 0.111152 True 175.0 76.0 6990283.0 134514.0
2337 666268910803644416 2015-11-16 14:57:41 +0000 <a href="" r... Very concerned about fellow dog trapped in com... NaN 10 None None None ... desk 0.085547 False bookcase 0.079480 False 104.0 35.0 6990283.0 134514.0
2338 666104133288665088 2015-11-16 04:02:55 +0000 <a href="" r... Not familiar with this breed. No tail (weird).... NaN 1 None None None ... cock 0.033919 False partridge 0.000052 False 14355.0 6637.0 6990283.0 134514.0
2339 666102155909144576 2015-11-16 03:55:04 +0000 <a href="" r... Oh my. Here you are seeing an Adobe Setter giv... NaN 11 None None None ... Newfoundland 0.149842 True borzoi 0.133649 True 80.0 13.0 6990283.0 134514.0
2340 666099513787052032 2015-11-16 03:44:34 +0000 <a href="" r... Can stand on stump for what seems like a while... NaN 8 None None None ... Shih-Tzu 0.166192 True Dandie_Dinmont 0.089688 True 156.0 68.0 6990283.0 134514.0
2341 666094000022159362 2015-11-16 03:22:39 +0000 <a href="" r... This appears to be a Mongolian Presbyterian mi... NaN 9 None None None ... German_shepherd 0.078260 True malinois 0.075628 True 164.0 74.0 6990283.0 134514.0
2342 666082916733198337 2015-11-16 02:38:37 +0000 <a href="" r... Here we have a well-established sunblockerspan... NaN 6 None None None ... bull_mastiff 0.404722 True French_bulldog 0.048960 True 119.0 45.0 6990283.0 134514.0
2343 666073100786774016 2015-11-16 01:59:36 +0000 <a href="" r... Let's hope this flight isn't Malaysian (lol). ... NaN 10 None None None ... English_foxhound 0.175382 True Ibizan_hound 0.097471 True 322.0 164.0 6990283.0 134514.0
2344 666071193221509120 2015-11-16 01:52:02 +0000 <a href="" r... Here we have a northern speckled Rhododendron.... NaN 9 None None None ... Yorkshire_terrier 0.174201 True Pekinese 0.109454 True 148.0 62.0 6990283.0 134514.0
2345 666063827256086533 2015-11-16 01:22:45 +0000 <a href="" r... This is the happiest dog you will ever see. Ve... NaN 10 None None None ... Tibetan_mastiff 0.093718 True Labrador_retriever 0.072427 True 476.0 220.0 6990283.0 134514.0
2346 666058600524156928 2015-11-16 01:01:59 +0000 <a href="" r... Here is the Rand Paul of retrievers folks! He'... NaN 8 None None None ... komondor 0.192305 True soft-coated_wheaten_terrier 0.082086 True 112.0 57.0 6990283.0 134514.0
2347 666057090499244032 2015-11-16 00:55:59 +0000 <a href="" r... My oh my. This is a rare blond Canadian terrie... NaN 9 None None None ... shopping_basket 0.014594 False golden_retriever 0.007959 True 298.0 142.0 6990283.0 134514.0
2348 666055525042405380 2015-11-16 00:49:46 +0000 <a href="" r... Here is a Siberian heavily armored polar bear ... NaN 10 None None None ... Tibetan_mastiff 0.058279 True fur_coat 0.054449 False 434.0 252.0 6990283.0 134514.0
2349 666051853826850816 2015-11-16 00:35:11 +0000 <a href="" r... This is an odd dog. Hard on the outside but lo... NaN 2 None None None ... mud_turtle 0.045885 False terrapin 0.017885 False 1225.0 853.0 6990283.0 134514.0
2350 666050758794694657 2015-11-16 00:30:50 +0000 <a href="" r... This is a truly beautiful English Wilson Staff... NaN 10 None None None ... English_springer 0.263788 True Greater_Swiss_Mountain_dog 0.016199 True 132.0 58.0 6990283.0 134514.0
2351 666049248165822465 2015-11-16 00:24:50 +0000 <a href="" r... Here we have a 1949 1st generation vulpix. Enj... NaN 5 None None None ... Rottweiler 0.243682 True Doberman 0.154629 True 109.0 41.0 6990284.0 134514.0
2352 666044226329800704 2015-11-16 00:04:52 +0000 <a href="" r... This is a purebred Piers Morgan. Loves to Netf... NaN 6 None None None ... redbone 0.360687 True miniature_pinscher 0.222752 True 298.0 141.0 6990284.0 134514.0
2353 666033412701032449 2015-11-15 23:21:54 +0000 <a href="" r... Here is a very happy pup. Big fan of well-main... NaN 9 None None None ... malinois 0.138584 True bloodhound 0.116197 True 125.0 45.0 6990284.0 134514.0
2354 666029285002620928 2015-11-15 23:05:30 +0000 <a href="" r... This is a western brown Mitsubishi terrier. Up... NaN 7 None None None ... miniature_pinscher 0.074192 True Rhodesian_ridgeback 0.072010 True 129.0 47.0 6990284.0 134514.0
2355 666020888022790149 2015-11-15 22:32:08 +0000 <a href="" r... Here we have a Japanese Irish Setter. Lost eye... NaN 8 None None None ... collie 0.156665 True Shetland_sheepdog 0.061428 True 2560.0 517.0 6990284.0 134514.0
2356 rows × 27 columns
In [71]:
#drwaing a scatter plot with axes as tweet_id and No. of retweets.
df_master.plot(kind = 'scatter', x = 'tweet_id', y = 'retweets', alpha = 0.5, color = 'red')
plt.title('Tweet Id vs Retweet Scatter plot')
Text(0.5,1,'Tweet Id vs Retweet Scatter plot')
In [72]:
#drawing a scatter plot with tweet_id and no. of user-favorites as axes.
df_master.plot(kind = 'scatter', x = 'tweet_id', y = 'user_favourites', alpha = 1, color = 'red')
plt.title('Tweet and user favorites Scatter plot')
Text(0.5,1,'Tweet and user favorites Scatter plot')
In [73]:
#draw bar chart for 5 most favourite tweets
test = df_master.sort_values(['favorites'], ascending=False)
test1 = test._slice(slice(0, 5))

test1.plot(x='tweet_id', y='favorites', kind='bar')
<matplotlib.axes._subplots.AxesSubplot at 0x7f0a20c7b320>
In [74]:
#top 10 most retweeted tweets
#draw bar chart for 5 most favourite tweets
test = df_master.sort_values(['retweets'], ascending=False)
test1 = test._slice(slice(0, 5))

test1.plot(x='tweet_id', y='retweets', kind='bar')
<matplotlib.axes._subplots.AxesSubplot at 0x7f0a211dc400>
In [75]:
#Display top 5 dog breeds
Labrador_retriever    104
golden_retriever       92
Cardigan               73
Chihuahua              44
Pomeranian             42
Name: category2, dtype: int64
In [76]:
#take top 5 breeds and name all other breeds as 'Others' and then draw a pie chart on breed breakup
df1 = df_master.copy()
df1.loc[(df1['category2'] != 'Labrador_retriever') & (df1['category2'] != 'golden_retriever')
         & (df1['category2'] != 'Cardigan')
         & (df1['category2'] != 'Chihuahua')
         & (df1['category2'] != 'Pomeranian')
            , 'category2'] = 'Others'
In [77]:

# plot chart for dog breed type distribution
df1[df1['category2'].notnull()]['category2'].value_counts().plot(kind = 'pie', autopct='%1.1f%%')
plt.title('Dog breed distribution')
Text(0.5,1,'Dog breed distribution')
In [78]:
# demonstration of using seaborn:plot scatter plot for retweets

sns.lmplot('tweet_id', 'retweets', data=df_master, fit_reg=False)
<seaborn.axisgrid.FacetGrid at 0x7f0a20c8d5c0>
**visualization section ends here....


Popular posts from this blog

The journey of a thousand miles..

"The journey of thousand miles begins with one step", this famous quote by Lao Tzu explains my current state of mind very well. And that journey, which I am embarking on, is the field of data science. This blog whose sole purpose is to share my growth from infancy to maturity , is going to be the testimony of my growth, my ups and downs and all the relevant experiences. Wish me good luck! 

WeRateDogs Complete Project

wrangle_act #Import libraries In [1]: import pandas as pd import requests import tweepy from tweepy import OAuthHandler from tweepy import API from tweepy import Cursor import time import datetime as dt import matplotlib import matplotlib.pyplot as plt import seaborn as sns % matplotlib inline #data gathering section Do the following activities: 1.Read from archive file ¶ 2.Read from TSV file with URL 3.Read from twitter via Twitter API In [2]: #Read CSV file into a dataframe using pandas read-csv function. archive_df = pd . read_csv ( 'twitter-archive-enhanced.csv' ) In [3]: #Read TSV file from a URL using requests function. url = '

Pandas cheat sheet

In my learning so far , I have observed that NumPy, Pandas and matplotlib are core parts of Python which are going to help in data analysis. Here are some of the Pandas code snippets which I tested myself in Jupyter notebook. ==>to figure out number of duplicate rows df1['is_duplicated'] = df1.duplicated(['col1', 'Col2'.....'coln']) print(df1['is_duplicated'].sum()) ==> to figure out rows with missing values .. sum(df1.apply(lambda x: sum(x.isnull().values), axis = 1)>0) ==> to figure out unique values for a column        np.unique(df1['column name']) ==>to drop a column df.drop(['col_name', axis=1, inplace=True) ==> to replace a column df_08.rename(columns=lambda x: x.strip().lower().replace(" ", "_"), inplace=True) ==> to rename a column df = df.rename(columns={'old name': 'new name'}) ==> replace spaces with underscore df.rename(columns=lamb