STAT 7008 - Assignment Question 1 (hashtag analysis)

代做STAT 7008作业、代写java/c++程序作业、代做hashtag analysis作业、代做CS/python课程设计作业
STAT 7008 - Assignment 2
Due Date by 31 Oct 2018
Question 1 (hashtag analysis)
1. Tweets1.json corresponds to tweets received before a Presidential
debate in 2016 and all other data files correspond to tweets received
immediately after the same Presidential debate. Please download the
files from the following link:
https://transfernow.net/912g42y1vs78
Please write codes to read the data files tweets1.json to tweets5.json
and combine tweets2.json to tweets5.json to a single file, named
tweets2.json. Determine the number of tweets in tweets1.json and
tweets2.json.
2. In order to clean the tweets in each file with a focus on extracting
hashtags, we observe that 'retweeted_status' is another tweet within
a tweet. We select tweets using the following criteria:
- Non-empty number of hashtags either in 'entities' or in its
'retweeted_status'.
- There is a timestamp.
- There is a legitimate location.
- Extract hashtags that were written in English or convert hashtags
that were written partially in English (Ignore non-english
characters).
Write a function to return a dictionary of acceptable tweets, locations
and hashtags for both tweets1.json and the tweets2.json respectively.
3. Write a function to extract the top n tweeted hashtags of a given
hashtag list. Use the function to find the top n tweeted hashtags of the
tweets1.json and the tweets2.json respectively.
4. Write a function to return a data frame which contains the top n
tweeted hashtags of a given hashtag list. The columns in the returned
data frame are 'hashtag' and 'freq'.
5. Use the function to produce a horizontal bar chart of the top n
tweeted hashtags of the tweets1.json and tweets2.json respectively.
6. Find the max time and min time of the tweets1.json and the
tweets2.json respectively.
7. For each interval defined by (min time, max time), divide it into 10
equally spaced periods respectively.
8. For a given collection of tweets, write a function to return a data frame
with two columns, hashtags and their time of creation. Use the
function to produce data frames for the tweets1.json and the
tweets2.json. Use pandas.cut or else, create a third column 'level' in
each data frame which cuts the time of creation by the corresponding
interval obtained in part 7 respectively.
9. Use pandas.pivot or else, create a numpy array or a pandas data frame
whose rows are time period defined in part 7 and whose columns are
hashtags. The entry for the ith time period and jth hashtag is the
number of occurrence of the jth hashtag in ith time period. Fill the
entry without data by zero. Do this for tweets1.json and the
tweets2.json respectively.
10. Following part 9, what is the number of occurrence of hashtag 'trump'
in the sixth period in the tweets1.json? What is the number of
occurrence of hashtag 'trump' in the eighth period in the tweets2.json?
11. Using the tables obtained in part 9, we can also find the total number
of occurrences for each hashtag. Rank these hashtags in decreasing
order and obtain a time plot for the top 20 hashtags in a single graph.
Rescale the size of the graph so that it is not too small nor too large.
Do this for both tweets1.json and the tweets2.json respectively.
12. The zip_codes_states.csv contains city, state, county, latitude and
longitude of US. Read the file.
13. Select tweets in tweets1.json and the tweets2.json with locations only
in the zip_codes_states.csv. Remove also the location 'london'.
14. Find the top 20 tweeted locations in both tweets1.json and the
tweets2.json respectively.
15. Since there are multiple (lon, lat) pairs for each location, write a
function to return the average lon and the average lat of a given
location. Use the function to generate the average lon and the average
lat for every locations in tweets1.json and the tweets2.json.
16. Combine tweets1.json and tweets2.json. Then, create data frames
which contain locations, counts, longitude and latitude in tweets1.json
and the tweets2.json.
17. Using the sharpfile of US states st99_d00 and the help of the website
https://stackoverflow.com/questions/39742305/how-to-usebasemap-python-to-plot-us-with-50-states,
produce the following
graphs.
18. (Optional)
Using polygon patches and the help of the website
https://stackoverflow.com/questions/39742305/how-to-usebasemap-python-to-plot-us-with-50-states,
produce the following
graph.
Question 3 (extract hurricane paths)
The website http://weather.unisys.com provides hurricane paths data from
1850. We work to extract hurricane paths for a given year.
1. Since the link contains the hurricane information varies with years and
the information is contained in multiple pages, we need to know the
starting page and the total number of pages for a given year. What is
the appropriate starting page for year = '2017'?
2. In order to solve the second question, we try inputting a large number
as the number of pages for a given year. Use an appropriate number,
write a function to extract all links each of which holds information on
the hurricanes in '2017'.
3. Some of the collected links provide summary of hurricanes which do
not lead to correct tables. Remove those links.
4. For each valid hurricane link, it contains four set of information:
- Date
- Hurricane classification
- Hurricane name
- A table of hurricane positions over dates
Since the entire information is contained in a text file provided in the
corresponding webpage defined by the link, write a function to
download the text file and read (without saving it to a local directory)
the text file (at this moment, you don’t need to convert the data to
other format).
5. With the downloaded contents, write a function to convert the
contents to a list of dictionaries. Each dictionary in the list contains the
following keys: Date, Category of the hurricane, Name of the hurricane
and a table of information for the hurricane path. Convert the Date in
each dictionary to datetime objects. Since the recorded times for the
hurricane paths used the Z-time, convert it to datetime object with the
help of http://www.theweatherprediction.com/basic/ztime/.
6. We find some missing data in the Wind column of some tables. Since
the classification of a hurricane at a given moment can be found in the
Status column of the same table and the classification also relates to
the wind speed at that moment, use the classification to impute the
missing wind data. You may want to read the following website
https://en.wikipedia.org/wiki/Tropical_cyclone_scales.
7. Plot the hurricane paths of year '2017'size by the wind speed and color
by the classification status.
If you can produce your graph in a creative way, bonus marks will be
given.
8. (Optional)
Convert the above functions as function of year so that when we
change year, you will be able to generate plot of the hurricane paths
in that year easily.

http://www.6daixie.com/contents/9/2065.html

因为专业，所以值得信赖。如有需要，请加QQ：99515681 或邮箱：99515681@qq.com

微信：codinghelp

转载于:https://www.cnblogs.com/newpython3/p/9885248.html

STAT 7008 - Assignment Question 1 (hashtag analysis)相关推荐

Oversea company interview question.
2019独角兽企业重金招聘Python工程师标准>>> network, web, database, Linux system Interview Questions Given ...
视频问答与推理(Video Question Answering and Reasoning)——论文调研
文章目录 0. 前言 1. ACM MM 2. CVPR 3. ICCV 4. AAAI 更新时间--2019.12 首稿 0. 前言学习 VQA 的第一步--前期论文调研. 调研近几年在各大会议上 ...
斯坦福大学计算机类课程视频
斯坦福大学计算机类课程都是以CS开头编号,可以在网址https://exploredegrees.stanford.edu/coursedescriptions/cs/查询,在网上可以登录查看课程的课 ...
攻克论文写作系列之2：阅读文献的三个进阶和文献笔记法
来源|刀熊说说文|刀熊一.阅读文献的三个进阶 Literature 是一篇学术文章的重要核心内容之一, literature也是一个researcher 每天与之打交道的老朋友. 如果你在美国读了 ...
记录重要的NLP学习资源链接
整理一些NLP学习资源(不止NLP,本人主要关注NLP),如果有更好的,欢迎分享_ NLP 中文自然语言处理相关资料 https://github.com/crownpku/Awesome-Chine ...
Coursera | Applied Data Science with Python 专项课程 | Applied Machine Learning in Python
本文为学习笔记,记录了由University of Michigan推出的Coursera专项课程--Applied Data Science with Python中Course Three: Ap ...
ssm在线考试系统设计与实现（论文+程序设计+数据库文件）下载
摘要 II Abstract III 第一章绪论 1 1.1 研究背景及意义 1 1.2 国内外研究现状 2 1.3 研究内容 3 1.4 本文结构安排 3 第二章相关技术介绍 4 2.1 开 ...
[VQA文献阅读] FloodNet: A High Resolution Aerial Imagery Dataset for Post Flood Scene Understanding
背景文章题目:<FloodNet: A High Resolution Aerial Imagery Dataset for Post Flood Scene Understanding> ...
mybatis plus page eq 多个条件
mybatis plus page eq 多个条件, com.baomidou.mybatisplus.extension.plugins.pagination.Page 好使, import com ...

STAT 7008 - Assignment Question 1 (hashtag analysis)

STAT 7008 - Assignment Question 1 (hashtag analysis)相关推荐

最新文章

热门文章