利用python进入数据分析之usagov_bitly_data数据分析
获取数据中的时区进行计数统计
path = 'ch02/usagov_bitly_data2012-03-16-1331923249.txt'
open(path).readline()
'{ "a": "Mozilla\\/5.0 (Windows NT 6.1; WOW64) AppleWebKit\\/535.11 (KHTML, like Gecko) Chrome\\/17.0.963.78 Safari\\/535.11", "c": "US", "nk": 1, "tz": "America\\/New_York", "gr": "MA", "g": "A6qOVH", "h": "wfLQtf", "l": "orofrog", "al": "en-US,en;q=0.8", "hh": "1.usa.gov", "r": "http:\\/\\/www.facebook.com\\/l\\/7AQEFzjSi\\/1.usa.gov\\/wfLQtf", "u": "http:\\/\\/www.ncbi.nlm.nih.gov\\/pubmed\\/22415991", "t": 1331923247, "hc": 1331822918, "cy": "Danvers", "ll": [ 42.576698, -70.954903 ] }\n'
数据转化json格式
import json path = 'ch02/usagov_bitly_data2012-03-16-1331923249.txt' records = [json.loads(line) for line in open(path)] ## 列表推导式
records[0]
{u'a': u'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.11 (KHTML, like Gecko) Chrome/17.0.963.78 Safari/535.11',u'al': u'en-US,en;q=0.8',u'c': u'US',u'cy': u'Danvers',u'g': u'A6qOVH',u'gr': u'MA',u'h': u'wfLQtf',u'hc': 1331822918,u'hh': u'1.usa.gov',u'l': u'orofrog',u'll': [42.576698, -70.954903],u'nk': 1,u'r': u'http://www.facebook.com/l/7AQEFzjSi/1.usa.gov/wfLQtf',u't': 1331923247,u'tz': u'America/New_York',u'u': u'http://www.ncbi.nlm.nih.gov/pubmed/22415991'}
records[0]['tz']
u'America/New_York'
print(records[0]['tz'])
America/New_York
用python对时区进行计数
time_zones = [rec['tz'] for rec in records if 'tz' in rec] # 注意有些记录没有时区
time_zones[:10]
[u'America/New_York',u'America/Denver',u'America/New_York',u'America/Sao_Paulo',u'America/New_York',u'America/New_York',u'Europe/Warsaw',u'',u'',u'']
定义计数函数
def get_counts(sequence):counts = {}for x in sequence:if x in counts:counts[x] += 1else:counts[x] = 1return counts
from collections import defaultdict #用标准库函数实现计数def get_counts2(sequence):counts = defaultdict(int) # values will initialize to 0for x in sequence:counts[x] += 1return counts
counts = get_counts(time_zones)
counts['America/New_York']
1251
len(time_zones)
3440
# 获取前10个字典的值 def top_counts(count_dict, n=10):value_key_pairs = [(count, tz) for tz, count in count_dict.items()]value_key_pairs.sort()return value_key_pairs[-n:]
top_counts(counts)
[(33, u'America/Sao_Paulo'),(35, u'Europe/Madrid'),(36, u'Pacific/Honolulu'),(37, u'Asia/Tokyo'),(74, u'Europe/London'),(191, u'America/Denver'),(382, u'America/Los_Angeles'),(400, u'America/Chicago'),(521, u''),(1251, u'America/New_York')]
# 用标准库实现前N个字典的值 from collections import Counter counts = Counter(time_zones) counts.most_common(10)
[(u'America/New_York', 1251),(u'', 521),(u'America/Chicago', 400),(u'America/Los_Angeles', 382),(u'America/Denver', 191),(u'Europe/London', 74),(u'Asia/Tokyo', 37),(u'Pacific/Honolulu', 36),(u'Europe/Madrid', 35),(u'America/Sao_Paulo', 33)]
用pandas对时区进行计数
%matplotlib inline
from __future__ import division from numpy.random import randn import numpy as np import os import matplotlib.pyplot as plt import pandas as pd plt.rc('figure', figsize=(10, 6)) np.set_printoptions(precision=4)
import json path = 'ch02/usagov_bitly_data2012-03-16-1331923249.txt' lines = open(path).readlines() records = [json.loads(line) for line in lines]
from pandas import DataFrame, Series import pandas as pdframe = DataFrame(records) # DataFrame是pandas中基础的数据结构 frame
_heartbeat_ | a | al | c | cy | g | gr | h | hc | hh | kw | l | ll | nk | r | t | tz | u | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | NaN | Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKi... | en-US,en;q=0.8 | US | Danvers | A6qOVH | MA | wfLQtf | 1.331823e+09 | 1.usa.gov | NaN | orofrog | [42.576698, -70.954903] | 1.0 | http://www.facebook.com/l/7AQEFzjSi/1.usa.gov/... | 1.331923e+09 | America/New_York | http://www.ncbi.nlm.nih.gov/pubmed/22415991 |
1 | NaN | GoogleMaps/RochesterNY | NaN | US | Provo | mwszkS | UT | mwszkS | 1.308262e+09 | j.mp | NaN | bitly | [40.218102, -111.613297] | 0.0 | http://www.AwareMap.com/ | 1.331923e+09 | America/Denver | http://www.monroecounty.gov/etc/911/rss.php |
2 | NaN | Mozilla/4.0 (compatible; MSIE 8.0; Windows NT ... | en-US | US | Washington | xxr3Qb | DC | xxr3Qb | 1.331920e+09 | 1.usa.gov | NaN | bitly | [38.9007, -77.043098] | 1.0 | http://t.co/03elZC4Q | 1.331923e+09 | America/New_York | http://boxer.senate.gov/en/press/releases/0316... |
3 | NaN | Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_8)... | pt-br | BR | Braz | zCaLwp | 27 | zUtuOu | 1.331923e+09 | 1.usa.gov | NaN | alelex88 | [-23.549999, -46.616699] | 0.0 | direct | 1.331923e+09 | America/Sao_Paulo | http://apod.nasa.gov/apod/ap120312.html |
4 | NaN | Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKi... | en-US,en;q=0.8 | US | Shrewsbury | 9b6kNl | MA | 9b6kNl | 1.273672e+09 | bit.ly | NaN | bitly | [42.286499, -71.714699] | 0.0 | http://www.shrewsbury-ma.gov/selco/ | 1.331923e+09 | America/New_York | http://www.shrewsbury-ma.gov/egov/gallery/1341... |
5 | NaN | Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKi... | en-US,en;q=0.8 | US | Shrewsbury | axNK8c | MA | axNK8c | 1.273673e+09 | bit.ly | NaN | bitly | [42.286499, -71.714699] | 0.0 | http://www.shrewsbury-ma.gov/selco/ | 1.331923e+09 | America/New_York | http://www.shrewsbury-ma.gov/egov/gallery/1341... |
6 | NaN | Mozilla/5.0 (Windows NT 5.1) AppleWebKit/535.1... | pl-PL,pl;q=0.8,en-US;q=0.6,en;q=0.4 | PL | Luban | wcndER | 77 | zkpJBR | 1.331923e+09 | 1.usa.gov | NaN | bnjacobs | [51.116699, 15.2833] | 0.0 | http://plus.url.google.com/url?sa=z&n=13319232... | 1.331923e+09 | Europe/Warsaw | http://www.nasa.gov/mission_pages/nustar/main/... |
7 | NaN | Mozilla/5.0 (Windows NT 6.1; rv:2.0.1) Gecko/2... | bg,en-us;q=0.7,en;q=0.3 | None | NaN | wcndER | NaN | zkpJBR | 1.331923e+09 | 1.usa.gov | NaN | bnjacobs | NaN | 0.0 | http://www.facebook.com/ | 1.331923e+09 | http://www.nasa.gov/mission_pages/nustar/main/... | |
8 | NaN | Opera/9.80 (X11; Linux zbov; U; en) Presto/2.1... | en-US, en | None | NaN | wcndER | NaN | zkpJBR | 1.331923e+09 | 1.usa.gov | NaN | bnjacobs | NaN | 0.0 | http://www.facebook.com/l.php?u=http%3A%2F%2F1... | 1.331923e+09 | http://www.nasa.gov/mission_pages/nustar/main/... | |
9 | NaN | Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKi... | pt-BR,pt;q=0.8,en-US;q=0.6,en;q=0.4 | None | NaN | zCaLwp | NaN | zUtuOu | 1.331923e+09 | 1.usa.gov | NaN | alelex88 | NaN | 0.0 | http://t.co/o1Pd0WeV | 1.331923e+09 | http://apod.nasa.gov/apod/ap120312.html | |
10 | NaN | Mozilla/5.0 (Windows NT 6.1; WOW64; rv:10.0.2)... | en-us,en;q=0.5 | US | Seattle | vNJS4H | WA | u0uD9q | 1.319564e+09 | 1.usa.gov | NaN | o_4us71ccioa | [47.5951, -122.332603] | 1.0 | direct | 1.331923e+09 | America/Los_Angeles | https://www.nysdot.gov/rexdesign/design/commun... |
11 | NaN | Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.4... | en-us,en;q=0.5 | US | Washington | wG7OIH | DC | A0nRz4 | 1.331816e+09 | 1.usa.gov | NaN | darrellissa | [38.937599, -77.092796] | 0.0 | http://t.co/ND7SoPyo | 1.331923e+09 | America/New_York | http://oversight.house.gov/wp-content/uploads/... |
12 | NaN | Mozilla/5.0 (Windows NT 6.1; WOW64; rv:10.0.2)... | en-us,en;q=0.5 | US | Alexandria | vNJS4H | VA | u0uD9q | 1.319564e+09 | 1.usa.gov | NaN | o_4us71ccioa | [38.790901, -77.094704] | 1.0 | direct | 1.331923e+09 | America/New_York | https://www.nysdot.gov/rexdesign/design/commun... |
13 | 1.331923e+09 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
14 | NaN | Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US... | en-us,en;q=0.5 | US | Marietta | 2rOUYc | GA | 2rOUYc | 1.255770e+09 | 1.usa.gov | NaN | bitly | [33.953201, -84.5177] | 1.0 | direct | 1.331923e+09 | America/New_York | http://toxtown.nlm.nih.gov/index.php |
15 | NaN | Mozilla/5.0 (Windows NT 6.1) AppleWebKit/535.1... | zh-TW,zh;q=0.8,en-US;q=0.6,en;q=0.4 | HK | Central District | nQvgJp | 00 | rtrrth | 1.317318e+09 | j.mp | NaN | walkeryuen | [22.2833, 114.150002] | 1.0 | http://forum2.hkgolden.com/view.aspx?type=BW&m... | 1.331923e+09 | Asia/Hong_Kong | http://www.ssd.noaa.gov/PS/TROP/TCFP/data/curr... |
16 | NaN | Mozilla/5.0 (Windows NT 6.1) AppleWebKit/535.1... | zh-TW,zh;q=0.8,en-US;q=0.6,en;q=0.4 | HK | Central District | XdUNr | 00 | qWkgbq | 1.317318e+09 | j.mp | NaN | walkeryuen | [22.2833, 114.150002] | 1.0 | http://forum2.hkgolden.com/view.aspx?type=BW&m... | 1.331923e+09 | Asia/Hong_Kong | http://www.usno.navy.mil/NOOC/nmfc-ph/RSS/jtwc... |
17 | NaN | Mozilla/5.0 (Macintosh; Intel Mac OS X 10.5; r... | en-us,en;q=0.5 | US | Buckfield | zH1BFf | ME | x3jOIv | 1.331840e+09 | 1.usa.gov | NaN | andyzieminski | [44.299702, -70.369797] | 0.0 | http://t.co/6Cx4ROLs | 1.331923e+09 | America/New_York | http://www.usda.gov/wps/portal/usda/usdahome?c... |
18 | NaN | GoogleMaps/RochesterNY | NaN | US | Provo | mwszkS | UT | mwszkS | 1.308262e+09 | 1.usa.gov | NaN | bitly | [40.218102, -111.613297] | 0.0 | http://www.AwareMap.com/ | 1.331923e+09 | America/Denver | http://www.monroecounty.gov/etc/911/rss.php |
19 | NaN | Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKi... | it-IT,it;q=0.8,en-US;q=0.6,en;q=0.4 | IT | Venice | wcndER | 20 | zkpJBR | 1.331923e+09 | 1.usa.gov | NaN | bnjacobs | [45.438599, 12.3267] | 0.0 | http://www.facebook.com/ | 1.331923e+09 | Europe/Rome | http://www.nasa.gov/mission_pages/nustar/main/... |
20 | NaN | Mozilla/5.0 (compatible; MSIE 9.0; Windows NT ... | es-ES | ES | Alcal | zQ95Hi | 51 | ytZYWR | 1.331671e+09 | bitly.com | NaN | jplnews | [37.516701, -5.9833] | 0.0 | http://www.facebook.com/ | 1.331923e+09 | Africa/Ceuta | http://voyager.jpl.nasa.gov/imagesvideo/uranus... |
21 | NaN | Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6... | en-us,en;q=0.5 | US | Davidsonville | wcndER | MD | zkpJBR | 1.331923e+09 | 1.usa.gov | NaN | bnjacobs | [38.939201, -76.635002] | 0.0 | http://www.facebook.com/ | 1.331923e+09 | America/New_York | http://www.nasa.gov/mission_pages/nustar/main/... |
22 | NaN | Mozilla/4.0 (compatible; MSIE 8.0; Windows NT ... | en-us | US | Hockessin | y3ZImz | DE | y3ZImz | 1.331064e+09 | 1.usa.gov | NaN | bitly | [39.785, -75.682297] | 0.0 | direct | 1.331923e+09 | America/New_York | http://portal.hud.gov/hudportal/documents/hudd... |
23 | NaN | Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_3)... | en-us | US | Lititz | wWiOiD | PA | wWiOiD | 1.330218e+09 | 1.usa.gov | NaN | bitly | [40.174999, -76.3078] | 0.0 | http://www.facebook.com/l.php?u=http%3A%2F%2F1... | 1.331923e+09 | America/New_York | http://www.tricare.mil/mybenefit/ProfileFilter... |
24 | NaN | Mozilla/5.0 (Windows; U; Windows NT 5.1; es-ES... | es-es,es;q=0.8,en-us;q=0.5,en;q=0.3 | ES | Bilbao | wcndER | 59 | zkpJBR | 1.331923e+09 | 1.usa.gov | NaN | bnjacobs | [43.25, -2.9667] | 0.0 | http://www.facebook.com/ | 1.331923e+09 | Europe/Madrid | http://www.nasa.gov/mission_pages/nustar/main/... |
25 | NaN | Mozilla/5.0 (Windows NT 6.1) AppleWebKit/535.1... | en-GB,en;q=0.8,en-US;q=0.6,en-AU;q=0.4 | MY | Kuala Lumpur | wcndER | 14 | zkpJBR | 1.331923e+09 | 1.usa.gov | NaN | bnjacobs | [3.1667, 101.699997] | 0.0 | http://www.facebook.com/ | 1.331923e+09 | Asia/Kuala_Lumpur | http://www.nasa.gov/mission_pages/nustar/main/... |
26 | NaN | Mozilla/5.0 (Windows NT 6.1) AppleWebKit/535.1... | ro-RO,ro;q=0.8,en-US;q=0.6,en;q=0.4 | CY | Nicosia | wcndER | 04 | zkpJBR | 1.331923e+09 | 1.usa.gov | NaN | bnjacobs | [35.166698, 33.366699] | 0.0 | http://www.facebook.com/?ref=tn_tnmn | 1.331923e+09 | Asia/Nicosia | http://www.nasa.gov/mission_pages/nustar/main/... |
27 | NaN | Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_8)... | en-US,en;q=0.8 | BR | SPaulo | zCaLwp | 27 | zUtuOu | 1.331923e+09 | 1.usa.gov | NaN | alelex88 | [-23.5333, -46.616699] | 0.0 | direct | 1.331923e+09 | America/Sao_Paulo | http://apod.nasa.gov/apod/ap120312.html |
28 | NaN | Mozilla/5.0 (iPad; CPU OS 5_0_1 like Mac OS X)... | en-us | None | NaN | vNJS4H | NaN | u0uD9q | 1.319564e+09 | 1.usa.gov | NaN | o_4us71ccioa | NaN | 0.0 | direct | 1.331923e+09 | https://www.nysdot.gov/rexdesign/design/commun... | |
29 | NaN | Mozilla/5.0 (iPad; U; CPU OS 3_2 like Mac OS X... | en-us | None | NaN | FPX0IM | NaN | FPX0IL | 1.331923e+09 | 1.usa.gov | NaN | twittershare | NaN | 1.0 | http://t.co/5xlp0B34 | 1.331923e+09 | http://www.ed.gov/news/media-advisories/us-dep... | |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
3530 | NaN | Mozilla/5.0 (Windows NT 6.0) AppleWebKit/535.1... | en-US,en;q=0.8 | US | San Francisco | xVZg4P | CA | wqUkTo | 1.331908e+09 | go.nasa.gov | NaN | nasatwitter | [37.7645, -122.429398] | 0.0 | http://www.facebook.com/l.php?u=http%3A%2F%2Fg... | 1.331927e+09 | America/Los_Angeles | http://www.nasa.gov/multimedia/imagegallery/im... |
3531 | NaN | Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6... | en-US | None | NaN | wcndER | NaN | zkpJBR | 1.331923e+09 | 1.usa.gov | NaN | bnjacobs | NaN | 0.0 | direct | 1.331927e+09 | http://www.nasa.gov/mission_pages/nustar/main/... | |
3532 | NaN | Mozilla/5.0 (Windows NT 6.1; WOW64; rv:10.0.2)... | en-us,en;q=0.5 | US | Washington | Au3aUS | DC | A9ct6C | 1.331926e+09 | 1.usa.gov | NaN | ncsha | [38.904202, -77.031998] | 1.0 | http://www.ncsha.org/ | 1.331927e+09 | America/New_York | http://portal.hud.gov/hudportal/HUD?src=/press... |
3533 | NaN | Mozilla/5.0 (iPad; CPU OS 5_1 like Mac OS X) A... | en-us | US | Jacksonville | b2UtUJ | FL | ieCdgH | 1.301393e+09 | go.nasa.gov | NaN | nasatwitter | [30.279301, -81.585098] | 1.0 | direct | 1.331927e+09 | America/New_York | http://apod.nasa.gov/apod/ |
3534 | NaN | Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_8)... | en-us | US | Frisco | vNJS4H | TX | u0uD9q | 1.319564e+09 | 1.usa.gov | NaN | o_4us71ccioa | [33.149899, -96.855499] | 1.0 | direct | 1.331927e+09 | America/Chicago | https://www.nysdot.gov/rexdesign/design/commun... |
3535 | NaN | Mozilla/5.0 (Windows NT 5.1; rv:10.0.2) Gecko/... | en-us | US | Houston | zIgLx8 | TX | yrPaLt | 1.331903e+09 | aash.to | NaN | aashto | [29.775499, -95.415199] | 1.0 | direct | 1.331927e+09 | America/Chicago | http://ntl.bts.gov/lib/44000/44300/44374/FHWA-... |
3536 | NaN | Mozilla/5.0 (BlackBerry; U; BlackBerry 9800; e... | en-US,en;q=0.5 | None | NaN | xIcyim | NaN | yG1TTf | 1.331728e+09 | go.nasa.gov | NaN | nasatwitter | NaN | 0.0 | http://t.co/g1VKE8zS | 1.331927e+09 | http://www.nasa.gov/mission_pages/hurricanes/a... | |
3537 | NaN | Mozilla/5.0 (Windows NT 6.1; WOW64; rv:10.0.2)... | es-es,es;q=0.8,en-us;q=0.5,en;q=0.3 | HN | Tegucigalpa | zCaLwp | 08 | w63FZW | 1.331547e+09 | 1.usa.gov | NaN | bufferapp | [14.1, -87.216698] | 0.0 | http://t.co/A8TJyibE | 1.331927e+09 | America/Tegucigalpa | http://apod.nasa.gov/apod/ap120312.html |
3538 | NaN | Mozilla/5.0 (iPhone; CPU iPhone OS 5_1 like Ma... | en-us | US | Los Angeles | qMac9k | CA | qds1Ge | 1.310474e+09 | 1.usa.gov | NaN | healthypeople | [34.041599, -118.298798] | 0.0 | direct | 1.331927e+09 | America/Los_Angeles | http://healthypeople.gov/2020/connect/webinars... |
3539 | NaN | Mozilla/5.0 (compatible; Fedora Core 3) FC3 KDE | NaN | US | Bellevue | zu2M5o | WA | zDhdro | 1.331586e+09 | bit.ly | NaN | glimtwin | [47.615398, -122.210297] | 0.0 | direct | 1.331927e+09 | America/Los_Angeles | http://www.federalreserve.gov/newsevents/press... |
3540 | NaN | Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKi... | en-US,en;q=0.8 | US | Payson | wcndER | UT | zkpJBR | 1.331923e+09 | 1.usa.gov | NaN | bnjacobs | [40.014198, -111.738899] | 0.0 | http://www.facebook.com/l.php?u=http%3A%2F%2F1... | 1.331927e+09 | America/Denver | http://www.nasa.gov/mission_pages/nustar/main/... |
3541 | NaN | Mozilla/5.0 (X11; U; OpenVMS AlphaServer_ES40;... | NaN | US | Bellevue | zu2M5o | WA | zDhdro | 1.331586e+09 | 1.usa.gov | NaN | glimtwin | [47.615398, -122.210297] | 0.0 | direct | 1.331927e+09 | America/Los_Angeles | http://www.federalreserve.gov/newsevents/press... |
3542 | NaN | Mozilla/5.0 (compatible; MSIE 9.0; Windows NT ... | en-us | US | Pittsburg | y3reI1 | CA | y3reI1 | 1.331926e+09 | 1.usa.gov | NaN | bitly | [38.0051, -121.838699] | 0.0 | http://www.facebook.com/l.php?u=http%3A%2F%2F1... | 1.331927e+09 | America/Los_Angeles | http://www.sba.gov/community/blogs/community-b... |
3543 | 1.331927e+09 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
3544 | NaN | Mozilla/5.0 (Windows NT 6.1; WOW64; rv:5.0.1) ... | en-us,en;q=0.5 | US | Wentzville | vNJS4H | MO | u0uD9q | 1.319564e+09 | 1.usa.gov | NaN | o_4us71ccioa | [38.790001, -90.854897] | 1.0 | direct | 1.331927e+09 | America/Chicago | https://www.nysdot.gov/rexdesign/design/commun... |
3545 | NaN | Mozilla/5.0 (Windows NT 6.1; WOW64; rv:10.0.2)... | en-us,en;q=0.5 | US | Saint Charles | vNJS4H | IL | u0uD9q | 1.319564e+09 | 1.usa.gov | NaN | o_4us71ccioa | [41.9352, -88.290901] | 1.0 | direct | 1.331927e+09 | America/Chicago | https://www.nysdot.gov/rexdesign/design/commun... |
3546 | NaN | Mozilla/5.0 (iPhone; CPU iPhone OS 5_1 like Ma... | en-us | US | Los Angeles | qMac9k | CA | qds1Ge | 1.310474e+09 | 1.usa.gov | NaN | healthypeople | [34.041599, -118.298798] | 1.0 | direct | 1.331927e+09 | America/Los_Angeles | http://healthypeople.gov/2020/connect/webinars... |
3547 | NaN | Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_8)... | en-us | US | Silver Spring | y0jYkg | MD | y0jYkg | 1.331852e+09 | 1.usa.gov | NaN | bitly | [39.052101, -77.014999] | 1.0 | direct | 1.331927e+09 | America/New_York | http://www.epa.gov/otaq/regs/fuels/additive/e1... |
3548 | NaN | Mozilla/5.0 (iPhone; CPU iPhone OS 5_1 like Ma... | en-us | US | Mcgehee | y5rMac | AR | xANY6O | 1.331916e+09 | 1.usa.gov | NaN | twitterfeed | [33.628399, -91.356903] | 1.0 | https://twitter.com/fdarecalls/status/18069759... | 1.331927e+09 | America/Chicago | http://www.fda.gov/Safety/Recalls/ucm296326.htm |
3549 | NaN | Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKi... | sv-SE,sv;q=0.8,en-US;q=0.6,en;q=0.4 | SE | Sollefte | eH8wu | 24 | 7dtjei | 1.260316e+09 | 1.usa.gov | NaN | tweetdeckapi | [63.166698, 17.266701] | 1.0 | direct | 1.331927e+09 | Europe/Stockholm | http://www.nasa.gov/mission_pages/WISE/main/in... |
3550 | NaN | Mozilla/4.0 (compatible; MSIE 8.0; Windows NT ... | en-us | US | Conshohocken | A00b72 | PA | yGSwzn | 1.331918e+09 | 1.usa.gov | NaN | addthis | [40.0798, -75.2855] | 0.0 | http://www.linkedin.com/home?trk=hb_tab_home_top | 1.331927e+09 | America/New_York | http://www.nlm.nih.gov/medlineplus/news/fullst... |
3551 | NaN | Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKi... | en-US,en;q=0.8 | None | NaN | wcndER | NaN | zkpJBR | 1.331923e+09 | 1.usa.gov | NaN | bnjacobs | NaN | 0.0 | http://plus.url.google.com/url?sa=z&n=13319268... | 1.331927e+09 | http://www.nasa.gov/mission_pages/nustar/main/... | |
3552 | NaN | Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US... | NaN | US | Decatur | rqgJuE | AL | xcz8vt | 1.331227e+09 | 1.usa.gov | NaN | bootsnall | [34.572701, -86.940598] | 0.0 | direct | 1.331927e+09 | America/Chicago | http://travel.state.gov/passport/passport_5535... |
3553 | NaN | Mozilla/4.0 (compatible; MSIE 7.0; Windows NT ... | en-us | US | Shrewsbury | 9b6kNl | MA | 9b6kNl | 1.273672e+09 | bit.ly | NaN | bitly | [42.286499, -71.714699] | 0.0 | http://www.shrewsbury-ma.gov/selco/ | 1.331927e+09 | America/New_York | http://www.shrewsbury-ma.gov/egov/gallery/1341... |
3554 | NaN | Mozilla/4.0 (compatible; MSIE 7.0; Windows NT ... | en-us | US | Shrewsbury | axNK8c | MA | axNK8c | 1.273673e+09 | bit.ly | NaN | bitly | [42.286499, -71.714699] | 0.0 | http://www.shrewsbury-ma.gov/selco/ | 1.331927e+09 | America/New_York | http://www.shrewsbury-ma.gov/egov/gallery/1341... |
3555 | NaN | Mozilla/4.0 (compatible; MSIE 9.0; Windows NT ... | en | US | Paramus | e5SvKE | NJ | fqPSr9 | 1.301298e+09 | 1.usa.gov | NaN | tweetdeckapi | [40.9445, -74.07] | 1.0 | direct | 1.331927e+09 | America/New_York | http://www.fda.gov/AdvisoryCommittees/Committe... |
3556 | NaN | Mozilla/5.0 (Windows NT 5.1) AppleWebKit/535.1... | en-US,en;q=0.8 | US | Oklahoma City | jQLtP4 | OK | jQLtP4 | 1.307530e+09 | 1.usa.gov | NaN | bitly | [35.4715, -97.518997] | 0.0 | http://www.facebook.com/l.php?u=http%3A%2F%2F1... | 1.331927e+09 | America/Chicago | http://www.okc.gov/PublicNotificationSystem/Fo... |
3557 | NaN | GoogleMaps/RochesterNY | NaN | US | Provo | mwszkS | UT | mwszkS | 1.308262e+09 | j.mp | NaN | bitly | [40.218102, -111.613297] | 0.0 | http://www.AwareMap.com/ | 1.331927e+09 | America/Denver | http://www.monroecounty.gov/etc/911/rss.php |
3558 | NaN | GoogleProducer | NaN | US | Mountain View | zjtI4X | CA | zjtI4X | 1.327529e+09 | 1.usa.gov | NaN | bitly | [37.419201, -122.057404] | 0.0 | direct | 1.331927e+09 | America/Los_Angeles | http://www.ahrq.gov/qual/qitoolkit/ |
3559 | NaN | Mozilla/4.0 (compatible; MSIE 8.0; Windows NT ... | en-US | US | Mc Lean | qxKrTK | VA | qxKrTK | 1.312898e+09 | 1.usa.gov | NaN | bitly | [38.935799, -77.162102] | 0.0 | http://t.co/OEEEvwjU | 1.331927e+09 | America/New_York | http://herndon-va.gov/Content/public_safety/Pu... |
3560 rows × 18 columns
frame['tz'][:10]
0 America/New_York 1 America/Denver 2 America/New_York 3 America/Sao_Paulo 4 America/New_York 5 America/New_York 6 Europe/Warsaw 7 8 9 Name: tz, dtype: object
tz_counts = frame['tz'].value_counts() tz_counts[:10]
America/New_York 1251521 America/Chicago 400 America/Los_Angeles 382 America/Denver 191 Europe/London 74 Asia/Tokyo 37 Pacific/Honolulu 36 Europe/Madrid 35 America/Sao_Paulo 33 Name: tz, dtype: int64
clean_tz = frame['tz'].fillna('Missing')# 替换缺失值 clean_tz[clean_tz == ''] = 'Unknown'# 替换空字符串 tz_counts = clean_tz.value_counts() tz_counts[:10]
America/New_York 1251 Unknown 521 America/Chicago 400 America/Los_Angeles 382 America/Denver 191 Missing 120 Europe/London 74 Asia/Tokyo 37 Pacific/Honolulu 36 Europe/Madrid 35 Name: tz, dtype: int64
plt.figure(figsize=(10, 4))
<matplotlib.figure.Figure at 0x44bc230>
<matplotlib.figure.Figure at 0x44bc230>
tz_counts[:10].plot(kind='barh', rot=0)
<matplotlib.axes._subplots.AxesSubplot at 0x44bcbd0>
按照WINDOWS和非WINDOWS统计最常出现的时区
frame['a'][1] # 看下浏览器相关信息
u'GoogleMaps/RochesterNY'
frame['a'][50]
u'Mozilla/5.0 (Windows NT 5.1; rv:10.0.2) Gecko/20100101 Firefox/10.0.2'
results = Series([x.split()[0] for x in frame.a.dropna()]) results[:5]
0 Mozilla/5.0 1 GoogleMaps/RochesterNY 2 Mozilla/4.0 3 Mozilla/5.0 4 Mozilla/5.0 dtype: object
results.value_counts()[:8]
Mozilla/5.0 2594 Mozilla/4.0 601 GoogleMaps/RochesterNY 121 Opera/9.80 34 TEST_INTERNET_AGENT 24 GoogleProducer 21 Mozilla/6.0 5 BlackBerry8520/5.0.0.681 4 dtype: int64
cframe = frame[frame.a.notnull()]# 移除空值
operating_system = np.where(cframe['a'].str.contains('Windows'),'Windows', 'Not Windows') #跟进是否包含“Windows”来判断 operating_system[:5]
array(['Windows', 'Not Windows', 'Windows', 'Not Windows', 'Windows'], dtype='|S11')
by_tz_os = cframe.groupby(['tz', operating_system])
agg_counts = by_tz_os.size().unstack().fillna(0) agg_counts[:10]
Not Windows | Windows | |
---|---|---|
tz | ||
245.0 | 276.0 | |
Africa/Cairo | 0.0 | 3.0 |
Africa/Casablanca | 0.0 | 1.0 |
Africa/Ceuta | 0.0 | 2.0 |
Africa/Johannesburg | 0.0 | 1.0 |
Africa/Lusaka | 0.0 | 1.0 |
America/Anchorage | 4.0 | 1.0 |
America/Argentina/Buenos_Aires | 1.0 | 0.0 |
America/Argentina/Cordoba | 0.0 | 1.0 |
America/Argentina/Mendoza | 0.0 | 1.0 |
# Use to sort in ascending order indexer = agg_counts.sum(1).argsort() indexer[:10]
tz24 Africa/Cairo 20 Africa/Casablanca 21 Africa/Ceuta 92 Africa/Johannesburg 87 Africa/Lusaka 53 America/Anchorage 54 America/Argentina/Buenos_Aires 57 America/Argentina/Cordoba 26 America/Argentina/Mendoza 55 dtype: int64
count_subset = agg_counts.take(indexer)[-10:] count_subset
Not Windows | Windows | |
---|---|---|
tz | ||
America/Sao_Paulo | 13.0 | 20.0 |
Europe/Madrid | 16.0 | 19.0 |
Pacific/Honolulu | 0.0 | 36.0 |
Asia/Tokyo | 2.0 | 35.0 |
Europe/London | 43.0 | 31.0 |
America/Denver | 132.0 | 59.0 |
America/Los_Angeles | 130.0 | 252.0 |
America/Chicago | 115.0 | 285.0 |
245.0 | 276.0 | |
America/New_York | 339.0 | 912.0 |
plt.figure()
<matplotlib.figure.Figure at 0x5e69310>
<matplotlib.figure.Figure at 0x5e69310>
count_subset.plot(kind='barh', stacked=True)
<matplotlib.axes._subplots.AxesSubplot at 0x5e46eb0>
plt.figure()
<matplotlib.figure.Figure at 0x5c3a2b0>
<matplotlib.figure.Figure at 0x5c3a2b0>
normed_subset = count_subset.div(count_subset.sum(1), axis=0) normed_subset.plot(kind='barh', stacked=True)
<matplotlib.axes._subplots.AxesSubplot at 0x5b28d50>
利用python进入数据分析之usagov_bitly_data数据分析相关推荐
- 利用Python的全国旅游景点数据分析案例(新手)
目录 概述 数据获取 数据预处理和清洗 景点数据 数据分析 景点数据 酒店数据 机器学习 分析 代码实现 总结 概述 新手刚开始学python,自己写了这个例子熟悉一下pandas库和sklearn. ...
- python读取游戏数据_利用Python对游戏销量进行数据分析
一.提出问题 1. 2005-2017年全球销量的top20的游戏是什么? 2. 2005-2017年各游戏生产商的销量对比,并使用堆叠柱状图进行可视化. 二.理解数据 数据大小:16599条 数据来 ...
- 利用Python进行King County房价数据分析
本次又从kaggle上淘来了 King County 的房价数据,结合近期学习的Python分析工具,对影响房价的可能因素进行分析. 提出问题 随着国家对房产市场的宏观调控越来越严格,此前一路高歌猛进 ...
- 利用python + pyecharts+Pandas对北上广深等城市进行租房数据分析
本次分析的租房数据主要来源于上一篇博客中获取的"房天下"网站租房信息,对该数据分析主要使用了Pandas数据处理库. 利用python pyecharts进行租房情况数据分析 数据 ...
- 利用Python进行数据分析(第2版)
Wes McKinney 是流行的Python开源数据分析库pandas的创始人.他是一名活跃的演讲者,也是Python数据社区和Apache软件基金会的Python/C++开源开发者.目前他在纽约从 ...
- 干货 | 《利用Python进行数据分析》资料开源下载
今天要跟大家分享的是数据分析领域的必备书籍之一的<利用Python进行数据分析>第二版.英文名为 Python for Data Analysis. 本书作者Wes McKinney 资深 ...
- python进行数据分析 kindle_利用Python进行数据分析
[名人推荐] "科学计算和数据分析社区已经等待这本书很多年了:大量具体的实践建议,以及大量综合应用方法.本书在未来几年里肯定会成为Python领域中技术计算的权威指南." --Fe ...
- 利用python进行数据分析 英文-如何学习和评价《利用python进行数据分析》这本书?...
作为用Python做数据分析的必读书籍之一,这本书的开篇有向读者说明,这本书关注的是利用Python操作.处理.清洗和操作数据时的基本要点.目标是提供一份Python编程语言以及Python面向数据的 ...
- 利用python进行数据分析 百度云-利用Python进行数据分析 原书第2版.pdf
作 者 :(美)韦斯·麦金尼(Wes McKinney) 出版发行 : 北京:机械工业出版社 , 2018.07 ISBN号 :978-7-111-60370-2 页 数 : 480 原书定价 : 1 ...
最新文章
- java i数据类型_数据类型 I
- matplotlib 知识点整理:ax与figure
- python依赖平台吗_在Python中创建快速循环最依赖于平台和pythonversion的方法是什么?...
- 良心推荐:高品质音乐播放器Audirvana for Mac
- python中定义一个类、实例化时传入的参数如何传递_用实例分析Python中method的参数传递过程...
- 软件工程自学笔记一(基础篇)
- windowsxp系统桌面卡住了解决
- speedoffice(表格)怎么插入文本框?
- 探究光线追踪技术及UE4的实现
- 计算机导论结业报告大一,河北工业大学计算机导论结业论文
- 网络基础之DNS、网关
- 数图互通高校房产管理——房屋模拟分配建设
- AD19设置元件属性
- 聚类分析 | MATLAB实现k-Means(k均值聚类)分析
- android 地图选房效果,概述-Android 室内地图SDK | 高德地图API
- 查询maven依赖的网址
- _15-骑士精神(IDA*)
- IDEA如何删除项目-小白实操记录
- 《众妙之门——网页排版设计制胜秘诀》——3.6 网页版式高级教程
- SpringBoot整合MongoDB 及 基本使用