探索美国共享单车数据

项目概述
数据集
问题
- 需要回答的问题
- 需要有互动式体验
项目流程
- 导入库及数据集
- 生成初始页面（接受用户输入的城市、月份、日期）
- 根据用户的输入，读取相应的数据
- 展示用户使用单车的时长中，出现频率最多的时长
- 展示用户经过最多的行程（起始站-终点站）
- 展示用户骑行的总时间、平均时间
- 展示使用单车的用户类型、性别、年龄状况
- 设置主函数
如何下载并使用
参考资料

项目概述

在此项目中，将利用 Python 探索与以下三大美国城市的自行车共享系统相关的数据：芝加哥、纽约和华盛顿特区。
将编写代码导入数据，并通过计算描述性统计数据回答有趣的问题。并将写一个脚本，该脚本会接受原始输入并在终端中创建交互式体验，以展现这些统计信息。

数据集

提供了三座城市 2017 年上半年的数据。三个数据文件都包含相同的核心六 (6) 列：

起始时间 Start Time（例如 2017-01-01 00:07:57）
结束时间 End Time（例如 2017-01-01 00:20:53）
骑行时长 Trip Duration（例如 776 秒）
起始车站 Start Station（例如百老汇街和巴里大道）
结束车站 End Station（例如塞奇威克街和北大道）
用户类型 User Type（订阅者 Subscriber/Registered 或客户Customer/Casual）

芝加哥和纽约市文件还包含以下两列（数据格式可以查看下面的图片）：

性别 Gender
出生年份 Birth Year

问题

需要回答的问题

将编写代码并回答以下关于自行车共享数据的问题：

起始时间（Start Time 列）中哪个月份最常见？
起始时间中，一周的哪一天（比如 Monday, Tuesday）最常见？提示：可以使用 datetime.weekday() （点击查看文档）
起始时间中，一天当中哪个小时最常见？
总骑行时长（Trip Duration）是多久，平均骑行时长是多久？
哪个起始车站（Start Station）最热门，哪个结束车站（End Station）最热门？
哪一趟行程最热门（即，哪一个起始站点与结束站点的组合最热门）？
每种用户类型有多少人？
每种性别有多少人？
出生年份最早的是哪一年、最晚的是哪一年，最常见的是哪一年？

需要有互动式体验

最终文件要是一个脚本，它接受原始输入在终端中（如Windows的cmd中国）创建交互式体验，来回答有关数据集的问题。这种体验之所以是交互式的，是因为根据用户输入的内容，下一页面中的数据结果也会随之改变（用input()实现）。

有以下三个问题会对结果产生影响：

你想分析哪个城市的数据？输入：芝加哥，纽约，华盛顿 ( Would you like to see data for Chicago, New York, or Washington?)
你想分析几月的数据？输入：全部，一月，二月…六月 ( Which month? all, january, february, … , june?)
你想分析星期几的数据？输入：全部，星期一，星期二…星期日 (Which day? all, monday, tuesday, … sunday?)

这几个问题的答案将用来确定进行数据分析的城市，同时选择过滤某个月份或星期的数据。在相应的数据集过滤和加载完毕后，用户会看到数据的统计结果，并选择重新开始或退出。输入的信息应当大小写不敏感，比如"Chicago", “CHICAGO”, “chicago”, “chiCago”都是有效输入。你可以使用 lower(), upper(), title() 等字符串方法对输入值进行处理。

项目流程

导入库及数据集

import time
import pandas as pd
import numpy as npCITY_DATA = { 'chicago': 'chicago.csv','new york city': 'new_york_city.csv','washington': 'washington.csv' }

生成初始页面（接受用户输入的城市、月份、日期）

def get_filters():"""Asks user to specify a city, month, and day to analyze.Returns:(str) city - name of the city to analyze(str) month - name of the month to filter by, or "all" to apply no month filter(str) day - name of the day of week to filter by, or "all" to apply no day filter"""print('\nHello! Let\'s explore some US bikeshare data!')# TO DO: get user input for city (chicago, new yor.k city, washington). HINT: Use a while loop to handle invalid inputsdef input_mod(input_print, enterable_list):ret = input(input_print)while ret.lower() not in enterable_list:ret = input(input_print)return ret# TO DO: get user input for city (chicago, new york city, washington). HINT: Use a while loop to handle invalid inputscity = input_mod('\nPlease input the name of city which you want to analyze: Chicago, New york city, Washington or all!\n', list(CITY_DATA.keys()) + ['all'])# TO DO: get user input for month (all, january, february, ... , june)month = input_mod('\nPlease input the month you want to analyze: all, january, february, ... , june!\n', ['january', 'february', 'march', 'april', 'may', 'june', 'all'])# TO DO: get user input for day of week (all, monday, tuesday, ... sunday)day = input_mod('\nPlease input the day-of-week you want to analyze: all, monday, tuesday, ... sunday!\n', ['monday', 'tuesday', 'wednesday', 'thursday', 'friday', 'saturday', 'sunday', 'all'])print('-'*40)return city, month, day

根据用户的输入，读取相应的数据

def load_data(city, month, day):"""Loads data for the specified city and filters by month and day if applicable.Args:(str) city - name of the city to analyze(str) month - name of the month to filter by, or "all" to apply no month filter(str) day - name of the day of week to filter by, or "all" to apply no day filterReturns:df - Pandas DataFrame containing city data filtered by month and day"""#read the csv, create two new columns named 'month' and 'day_of_week' to filter the data needed by usertry:df = pd.read_csv(CITY_DATA[city.lower()])except:#if user input 'all', merge three csv into one to analyzedf_chicago = pd.read_csv(CITY_DATA['chicago'])df_new_york_city = pd.read_csv(CITY_DATA['new york city'])df_washington = pd.read_csv(CITY_DATA['washington'])#把三个数据表汇总到一起df = df_chicago.append([df_new_york_city, df_washington], ignore_index = True, sort = False)#将df表中的时间数据转为pandas中的时间数据类型df['Start Time'] = pd.to_datetime(df['Start Time'])df['month'] = df['Start Time'].dt.monthdf['day_of_week'] = df['Start Time'].dt.weekday_nameif month.lower() != 'all':months = ['january', 'february', 'march', 'april', 'may', 'june']month = months.index(month.lower()) + 1df = df[df['month'] == month]if day.lower() != 'all':df = df[df['day_of_week'] == day.title()]return df

展示用户使用单车的时长中，出现频率最多的时长

def time_stats(df):"""Displays statistics on the most frequent times of travel."""print('\nCalculating The Most Frequent Times of Travel...\n')start_time = time.time()# TO DO: display the most common monthpopular_month = df['month'].mode()[0]print('The most common month is:', popular_month)# TO DO: display the most common day of weekpopular_day = df['day_of_week'].mode()[0]print('The most common day of week is:', popular_day)# TO DO: display the most common start hourdf['start hour'] = df['Start Time'].dt.hourpopular_start_hour = df['start hour'].mode()[0]print('The most common start hour is:', popular_start_hour)print("\nThis took %s seconds." % (time.time() - start_time))print('-'*40)

展示用户经过最多的行程（起始站-终点站）

def station_stats(df):"""Displays statistics on the most popular stations and trip."""print('\nCalculating The Most Popular Stations and Trip...\n')start_time = time.time()# TO DO: display most commonly used start stationpopular_start_station = df['Start Station'].mode()[0]print('The most common start station is:', popular_start_station)# TO DO: display most commonly used end stationpopular_end_station = df['End Station'].mode()[0]print('The most common end station is:', popular_end_station)# TO DO: display most frequent combination of start station and end station triptop = df.groupby(['Start Station', 'End Station']).size().idxmax()print("The most frequent combination of start station and end station trip is \'{}\' to \'{}\'".format(top[0], top[1]))print("\nThis took %s seconds." % (time.time() - start_time))print('-'*40)

展示用户骑行的总时间、平均时间

def trip_duration_stats(df):"""Displays statistics on the total and average trip duration."""print('\nCalculating Trip Duration...\n')start_time = time.time()# TO DO: display total travel timetotal_time = df['Trip Duration'].sum()print('The total travel time is:', total_time, 'minutes.')# TO DO: display mean travel timemean_time = df['Trip Duration'].mean()print('The mean travel time is:', mean_time, 'minutes.')print("\nThis took %s seconds." % (time.time() - start_time))print('-'*40)

展示使用单车的用户类型、性别、年龄状况

def user_stats(df):"""Displays statistics on bikeshare users."""print('\nCalculating User Stats...\n')start_time = time.time()# TO DO: Display counts of user typesuser_types = df['User Type'].value_counts()print('The counts of user types is:', '\n', user_types)# TO DO: Display counts of genderdf = df.dropna()try:gender = df['Gender'].value_counts()print('\nThe counts of gender is:', '\n', gender)except:print('\nSorry,there\'s no such data to analyze.')# TO DO: Display earliest, most recent, and most common year of birthtry:earliest_birth = df['Birth Year'].min()most_recent_birth = df['Birth Year'].max()most_common_year = df['Birth Year'].mode()[0]print('\nThe earlierst year of birth is:', earliest_birth)print('The most recent year of birth is:', most_recent_birth)print('The most common year of birth is:', most_common_year)except:print('\nSorry,there\'s no data of \'Birth Year\' to analyze.')print("\nThis took %s seconds." % (time.time() - start_time))print('-'*40)

设置主函数

def main():while True:city, month, day = get_filters()df = load_data(city, month, day)time_stats(df)station_stats(df)trip_duration_stats(df)user_stats(df)restart = input('\nWould you like to restart? Enter yes or no.\n')if restart.lower() != 'yes':break

if __name__ == "__main__":main()

如何下载并使用

代码下载链接：我的百度网盘
提取码：ur0i

所需环境：Python3

终端运行：Windows用户打开cmd，进入到存储文件，使用ipython bikeshare.py进入，根据文字提示操作即可。

参考资料

参考文档1：https://blog.csdn.net/milton2017/article/details/54406482/
参考文档2：https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.append.html

Udacity数据分析（入门）-探索美国共享单车数据相关推荐

机器学习入门04——共享单车数据预测实验
共享单车骑行数据预测任务说明 1. 任务描述请在Capital Bikeshare (美国Washington, D.C.的一个共享单车公司)提供的自行车数据上进行回归分析.根据每天的天气信息,预 ...
数据架构师学习+爬下6万共享单车数据并进行分析
1.从小白到大数据架构师的学习历程 https://mp.weixin.qq.com/s/pw1EqLXTzrc86x9odTXKkQ 大数据处理技术怎么学习呢?首先我们要学习Python语言和Lin ...
DCIC共享单车数据可视化教程！
今天选取的地图是前几天利用数字中国创新大赛提供的共享单车数据做的一个可视化效果. 很多人询问制作方法,今天给大家介绍下. 自古有云,巧妇难为无米之炊,要做这种数据可视化,数据是关键.数据去哪里找呢?可 ...
Python共享单车数据的OD识别与社区发现（TransBigData+igraph）
这个案例的Jupyter notebook: 点击这里. 对于共享单车的出行,每一次出行都可以被看作是一个从起点行动到终点的出行过程.当我们把起点和终点视为节点,把它们之间的出行视为边时,就可以构建一 ...
kaggle练习-共享单车数据
中国小黄车的惨败,激起了我对共享单车的兴趣.国外的这一行业要早于中国,这个数据是来自kaggle的比赛项目,由美国一家共享单车公司提供.(ps:这个项目当做练习已经做了好久了,今天才整理出来,感觉自己 ...
Python数据分析入门（一）——初探数据可视化
前言静下心算算,当程序员已经有好几年了,不过自大学时代开始,学习对我来说就是个被动接受的过程,学校的课程.当时热门的移动端开发.数据库的学习.web学习.PHP后端学习--需要做什么我便去学什么,到 ...
我是怎样爬下6万共享单车数据并进行分析的（附代码）
共享经济的浪潮席卷着各行各业,而出行行业是这股大潮中的主要分支.如今,在城市中随处可见共享单车的身影,给人们的生活出行带来了便利.相信大家总会遇到这样的窘境,在APP中能看到很多单车,但走到那里的时候 ...
共享单车数据集_共享单车数据可视化报告
1.1 项目说明自行车共享系统是一种租赁自行车的方法,注册会员.租车.还车都将通过城市中的站点网络自动完成,通过这个系统人们可以根据需要从一个地方租赁一辆自行车然后骑到自己的目的地归还.为了更好地服 ...
Python数据分析实战,，美国总统大选数据可视化分析[基于pandas]
目录前言一.任务详情二.数据集来源三.实现过程四.运行代码前言在学习Python数据分析的过程中,是离不开实战的. 今天跟大家带来数据分析可视化经典项目,美国总统大选数据可视化分析,希望 ...
共享单车数据爬取_走出低谷的共享单车，别再被数据造假的青桔给毁了！
10月26日,有媒体报道称,在刚过去的10月23日,青桔单车全天总订单量达到2300万,超过美团和哈啰当天订单总量. 恰恰是在同日,又有网上流传的聊天截图显示,某城市管理群中青桔运维自爆主管让其刷单的 ...

Udacity数据分析（入门）-探索美国共享单车数据