题意:

处理房价数据,搭建一个模型来预测房价

解析:

1)使用计算机对数据处理,需要找出数据中可能存在的不合理项或错误项并论证,解释变量之间的关系。

2)回答以下问题:如果要建立房价的回归模型,是否应该包含截距项;多变量是否是数据集的潜在问题;如果只能用三个变量,用哪三个能最好的预测房价;建立以这三个变量构成的回归模型。

3)将得出的模型进行校正;说明选择使用EDA的意义,展示成果;比较新旧模型,解释为什么使用校正系数比较模型;说明新的模型为什么合理。

4)说明如何利用科学数据处理来建模和评估,选择一个过程模型进行回答;如果有另一家公司考虑在某地投资,说明得出的模型能不能选择另一块地方。

涉及知识点:

数据分析,EDA,回归模型

更多可加微信讨论

微信号:tiamo-0620

pdf

2019S2 BUSS6002 Assignment 1
Due Date: Friday 27 Sep 2019
Value: 15% of the total mark
Instructions

  1. Required Submission Items:
  2. ONE written report (PDF format). submitted via Canvas.
    • Assignments > Report Submission (Assignment 1)
  3. ONE Jupyter Notebook .ipynb submitted via Canvas.
    • Assignments > Upload Your Code File (Assignment 1)
  4. The assignment is due at 12:00pm (noon) on Friday, 27 Sep 2019. The late
    penalty for the assignment is 5% of the assigned mark per day, starting after
    12:00pm on the due date. The closing date Friday, 4 Oct 2019, 12:00pm
    (noon) is the last date on which an assessment will be accepted for marking.
  5. As per anonymous marking policy, please include your Student ID only in the
    report and do NOT include your name. The name of the report and code file
    must follow: SID_BUSS6002_Assignment1. Failing to name your submitted
    files correctly would incur a penalty.
  6. Your answers should be provided as a final report giving full explanation and
    interpretation of any results you obtain. Output without explanation will receive
    zero marks. You are required to also submit code that can reproduce your
    reported results, as reproducibility is a key component to data science. Not
    submitting your code will lead to a loss of 50% of the assignment mark.
  7. Be warned that plagiarism between individuals is always obvious to the
    markers of the assignment and can be easily detected by Turnitin.
  8. Presentation of the assignment is part of the assignment. There will be 10
    marks for the presentation of your report and code submission.
  9. The report should be NOT more than 10 pages including text, figures, tables,
    small sections of inserted code etc. Think about the best and most structured
    way to present your work, summarise the procedures implemented, support
    your results/findings and prove the originality of your work. You will provide
    your code as a separate submission to the report; however, you may insert
    small sections of your code into the report when necessary.
  10. Your code submission has no length limit, however marks are assigned for
    code presentation, so make your code as concise as possible and add
    comments when necessary to explain your logic and the purpose of each
    code segment. Make sure to remove any unnecessary code and ensure that
    your code can be run without error.
  11. Numbers with decimals should be reported to the third-decimal point.
    Project Description and Dataset
    Suppose you are working as a Data Scientist for a real estate investment firm. The
    firm is assessing locations for investing in housing redevelopment in the United
    States. For this purpose, the firm has identified several potential locations in Seattle
    to purchase existing houses, which would be demolished to make space for the
    redevelopment.
    In order to estimate the costs involved the firm needs to know the current market
    value of the houses that it needs to purchase. You are working on a project that aims
    to build a model to estimate the house prices.
    Seattle’s Department of Assessments has been collecting data since 2014 on house
    sale prices and the characteristics of each house that was sold. You have been
    given access to a copy of original database “house.db”, which is an SQLite file, as
    well as a data dictionary file “house_dict.txt”. You can download the dataset and
    detailed dataset description from the BUSS6002 Canvas site.
    Hint: To list all tables in the database you can use the following query
    SELECT name FROM sqlite_master WHERE type=‘table’ ORDER BY name;
    Task 1
    To start your analysis, you wish to perform a thorough EDA to help you better
    understand the given datasets. The results you obtain in this task will be used to
    inform your modelling choice.
    Requirements:
    a. Check and deal with any missing data (if any) in the given dataset.
    b. Look for and remove any potential outliers (if any) that would possibly affect
    your modelling. Justify your answer.
    c. Visualise the relationships between explanatory variables and the target
    variable through appropriate plotting. Report your analysis and findings.
    Task 2
    Suppose now you want to build a prototype model to predict house sale prices, which
    will be demonstrated to a wider team. Therefore, it needs to be easily understood by
    non-experts, meaning that you can only use a few variables in your model as a starting
    point.
    In order to make informed decisions on your modelling choices, you need to answer
    the following questions:
    a. Suppose you would like to build a linear regression model to predict house sale
    prices, do you wish to include an intercept term in your model? Carefully explain
    your answer.
    b. Do you think multicollinearity could be a potential problem on the given dataset?
    Use your understanding of variables to justify your answer and verify your
    hypothesis using appropriate numeric measures. Explain your decisions to
    proceed based on your findings.
    c. If you wish to use only three variables to predict house sale prices, which three
    variables would you choose? Carefully justify your choice and explain your
    selection criterion.
    d. Build a linear regression model using the three variables you have chosen (Use
    original, i.e. not engineered, variables for this task). Report and interpret your
    regression results.
    e. Perform residual diagnostics to measure the goodness of fit. Report your
    findings.
    Task 3
    The model you have built so far provides an approximate estimate of house prices.
    However, to accurately estimate the costs of the redevelopment plan you must be
    able to estimate house prices as accurately as possible.
    Your goal is now to improve your model as much as you can through feature
    engineering and feature selection. You may consider all variables and apply
    appropriate transformation to the variables as necessary.
    Requirements:
    a. Your model should have a minimum adjusted R-Squared of 75%. If your
    modelling cannot achieve an adjusted R-Squared of 75%, report the best
    model you can obtain.
    b. Justify your choice of feature engineering strategies using EDA and present
    your results.
    c. Compare your new model with the model you have built in Task 2 with respect
    to Adjusted R-Squared. Explain why you should use Adjusted R-Squared here
    to compare the two models.
    d. Provide residual analysis to justify why your new model is more reasonable.
    Task 4
    Suppose you have finished your analysis, now you need to report to your manager
    and reflect on what you have experimented with in your project:
    a. Provide a reflection of how you have utilised the data science process model
    to arrive at modeling and model evaluation based on how you answered the
    previous three questions. Choose only one process model (CRISP-DM or
    Snail Shell) to answer this question. Explain how each part of the questions
    aligns with the different phases of the process model.
    b. The firm is also considering redevelopment projects in other locations.
    Comment on whether the model you have built can or cannot be applied in
    other locations. Justify your answer.
    Marking Outline
    |20 marks| |
    |30 marks–|--|
    | 30 marks |
    |10 marks| |
    |10 marks | |

悉尼大学BUSS6002Assignment1课业解析相关推荐

  1. 新南威尔士大学COMP1531Iteration1课业解析

    新南威尔士大学COMP1531Iteration1课业解析 题意: 通过测试.开发和维护python后端服务器写几个开发文档 解析: 1.在协议接口中为所有的功能创建测试 2.写一个pdf,记录你当前 ...

  2. 墨尔本大学COMP10001课业解析

    墨尔本大学COMP10001课业解析 题意: 编程实现电子投票自动计数功能,对不同的投票方案有良好的支持性 解析: 背景: 大会选举,每位选民只能支持自己最喜欢的候选人,一人一票,获得最多选票的候选人 ...

  3. 悉尼大学陶大程:遗传对抗生成网络有效解决GAN两大痛点

    来源:新智元 本文共7372字,建议阅读10分钟. 本文为你整理了9月20日的AI WORLD 2018 世界人工智能峰会上陶大程教授的演讲内容. [ 导读 ]悉尼大学教授.澳大利亚科学院院士.优必选 ...

  4. 悉尼大学计算机研究生学制,悉尼大学研究生学制

    澳大利亚悉尼大学具有丰富的研究生专业课程,学制安排一般在1-2年时间. 悉尼大学硕士申请要求 要求非211大学申请者,暂不需清华认证 (毕业证.学位证.成绩单) 入学要求: 工程类专业(Enginee ...

  5. 视觉+Transformer最新论文出炉,华为联合北大、悉尼大学发表

    作者 | CV君 来源 | 我爱计算机视觉 Transformer 技术最开始起源于自然语言处理领域,但今年5月份Facebook 的一篇文章将其应用于计算机视觉中的目标检测(DETR算法,目前已有7 ...

  6. 澳大利亚悉尼大学徐畅教授招收深度学习方向全奖博士生

    来源:AI求职 悉尼大学 悉尼大学(The University of Sydney),坐落于澳大利亚新南威尔士州首府悉尼,是研究型大学.悉尼大学注重理论与实践相结合,教育.法学.医学.会计与金融 . ...

  7. 悉尼大学计算机工程专业世界排名,2019QS澳洲计算机专业排名,7所大学进入世界百强!...

    原标题:2019QS澳洲计算机专业排名,7所大学进入世界百强! 说起计算机专业,很多学生会联想到好就业薪水丰厚,不仅是国内,在全球来看,计算机专业人才都非常受欢迎,所以这几年出国留学就读计算机专业学生 ...

  8. 博后招募 | 澳大利亚悉尼大学徐畅老师招收深度学习方向博士后/全奖博士

    合适的工作难找?最新的招聘信息也不知道? AI 求职为大家精选人工智能领域最新鲜的招聘信息,助你先人一步投递,快人一步入职! 悉尼大学 悉尼大学(The University of Sydney),坐 ...

  9. python字符串去头尾_悉尼大学某蒟蒻的Python学习笔记

    About me 本蒟蒻是悉尼大学计算机科学大一的学生,这篇博客记录了学习INFO1110这门课的一些心得,希望能对大家有帮助. To start with 因为计算机只能识别机器语言,所以我们需要编 ...

  10. 华为联合北大、悉尼大学对 Visual Transformer 的最新综述

    Transformer 技术最开始起源于自然语言处理领域,但今年5月份Facebook 的一篇文章将其应用于计算机视觉中的目标检测(DETR算法,目前已有78次引用)使其大放异彩,并迅速得到CV研究社 ...

最新文章

  1. 为什么说特斯拉在自动驾驶上比Waymo更占优势
  2. 解决苹果APP审核需要的IPv6地址的问题
  3. JS 中 this 的指向
  4. java 10 发布_Java 10 发布之后,大多数受访者仍在使用 Java 8(82%)
  5. boost::multiprecision模块将 std::numeric_limits 用作 multiprecision.qbk 上的多精度文档片段的示例
  6. 集群监控系统的设计方案
  7. BoltDB 源码分析
  8. Android的Recovery中font_10x10.h字库文件制作
  9. My task - how is inline creation implemented
  10. Win10安装 WSL Ubuntu Linux系统,非双系统,完美兼容超详细版本
  11. VC皮肤库SkinCrafter v3 4 0 0使用
  12. 15.立体几何——介绍,为什么多个视图,深度和形状线索 测验,人类如何在3D中看到东西_1
  13. acm的ubuntu (ubuntu16.04 安装指南,chrome安装,vim配置,git设置和github,装QQ)
  14. mysql 开发进阶篇系列 7 锁问题(innodb锁争用情况及锁模式)
  15. swot分析法案例_型男收割机之SWOT分析法——大龄剩女脱单攻略
  16. 手把手教你搭建免费云平台——新浪云
  17. Go语言圣经 - 第3章 基础数据类型
  18. Python安装distribute包
  19. 支付宝证书模式支付接口
  20. 4.四大类(DDL、DML、DQL、DCL)

热门文章

  1. CodeBlocks下载与安装
  2. 联想服务器thinkserver TS550 Raid5制作及winserver2012R2 安装过来
  3. wamp 增加php 7.2,笔记 : WampServe加装PHP版本(7.2.3)为例
  4. 关于ucgui3.98(显示部分)移植
  5. Android APP测试流程
  6. plsql如何连接oracle11g_plsql连接oracle教程
  7. 游戏开发如此简单?我直接创建了一个游戏场景【python 游戏实战 02】
  8. REST及RESTful原则
  9. 抓考研英语单词主要矛盾的经验分享,考研英语真题词频统计
  10. 历届诺贝尔文学奖获得者名录