Case Study. Technical and Commercial understating. Internal use only.

  • You’re a consultant for a Tech start-up with 40 staff that has created a phone app called “XYZ” with millions of users.
  • The app lets you save your photos on XYZ’s servers so that they can share photos from their phones with one another.
  • Currently, the company makes money by including adverts on its app.
  • The CTO suggests that XYZ’z adverts can be better targeted by analyzing its users’ photos.
  • For example, a user who shares photos of a baby might see adverts for family-friendly holidays.

QUESTIONS.

~ 400 words. #1

What legal and ethical issues are there as part of the decision of whether to go ahead?

Guidelines:

  • Refer to other companies’ commercial use of user data to explain what has been deemed acceptable in society’s norms.
    • Consumers should understand how their data is used even though many potential uses of consumer data which is beneficial to firms, are not visible to consumers. In this case, it is the customized marketing recommendation. They should be asked whether they are content to receive marketing materials [1].
    • Currently, customers have a relatively higher understanding of how the data should be protected. In following the implementation of the GDPR in Europe, the UK data protection act in the UK, and the Personal Information Protection Act in China.
  • Explain the positive and negative consequences as a good CEO and citizen would, given that it would have real effects on the business, its users, and also society.
    • In my understanding, the customer may get customized recommendations in getting a higher likelihood of getting the right product for themselves. Provided we have informed the customers about how the data is proceeded in the system and do not have their data used before notification, this kind of customized advertisement recommendation is beneficial for the customers in reducing the associated decision process. If we model the society as the distributed system in a network, the associated edge between nodes is becoming shorter since the recommendation is more customized, which is ideal for the society’s economy.
  • Regulations to follow
    • UK data protection act 2018
    • General Data Protection Regulation
  • What do other companies do?
    • Ask users whether they want the customized recommendations or not at the pop-up windows
  • Recommendations
    • Ask users whether they want the customized recommendations or not at the pop-up windows
  • In a more depth analysis, the analysis is as follows, which is from the legal and ethical aspects.
    • In the legal aspect, there are currently laws and regulations related to the personal data protection, such as GDPR, UK data protection act, and the personal informatio protection act in China. The associated data protection laws in these regulations are that the companies or institutes which applies these data in commercial usage, should show the associated mechanism to the users in fully satisfying the legistmation purposes.
    • In the ethical aspect, it may differ from culture to culture. In my opinion ethcial is a side aspect for helping the issues, which laws can not help. That is to say, provided the companies can make benefits not only to the customers but also to the companies, and formally inform them about how their data is processed and it is not possible to generate any risks in leaking your personal information. Then it should be accepted by the ethical guidances.

~ 1000 words. #2

How could a data analysis team tackle the problem of automatically predicting the following from a user’s photos:

a. The user’s demographics.

b. The user’s interests. Include in your answer the features that might be extracted from users’ photos and a description of the types of machine learning algorithms that might be used to make predictions. E.g., NLP? Etc?

Demographics

  • Firstly, age, gender, occupation, cultural background, and family status are the five output layers for the demographics. The associated subcategories are simplified for advertisement recommendation purposes.

  • age [2]

      • Child (0-12 years)
      • Adolescence (13-18 years)
      • Adult (19-59 years)
    • Senior Adult (60 years and above)

  • gender

      • male
      • female
    • Other

  • occupation

      • blue-collar
      • white-collar
    • golden collar

  • cultural background

      • eastern
    • western

  • family status

      • single
      • married
    • Others

  • Secondly, let’s look at inputs, which are photos and descriptions as follows

  • photos

    • Could be applied to the input of the Yolo v 5 network the image classification purposes
  • description

    • Generally, the is a limited number of people in the current world to input descriptions for images (since phones are not an ideal device for text input), therefore, we assume the description of the photo is information automatically captured by the photo, such as GPS location, time. In this case, the photo location can be applied for the cultural background for the customized recommendation.
  • The network training mechanism

  • Training

    • Go to kaggle to find the database for user data, which consists of the photo and description respective
  • If kaggle has the well-labeled database

    • then apply them to the Yolo v 5 network
  • else

      • go to taobao.com to have manual data labeling work done for roughly 0.2 yuan per picture for the internal database within your system
    • Go to the NGX station on Nvidia or the GPU cluster provided by the university on the cloud for training for roughly one night in getting the data done

  • Testing

    • Give the new data to see how the prediction works
  • Validation

    • Apply a cross-validation approach to see how the model works

Interests

  • Firstly, classify the user interests based on the MBTI model [3]

    • Extraverts (E)
    • Introverts (I)
    • Sensors (S)
    • Intuitors (N)
    • Thinkers (T)
    • Feelers (F)
    • Judgers (J)
    • Perceivers §
  • Then follows the same procedure on demographics prediction using the photo mainly and supplied by descriptions. In this case, we likely need to do modeling by ourselves, since there is limited data available on Kaggle based on this model classification.

Guidelines:

  • Discuss what additional data is required to run this analysis.

    • Since we are an app providing photo storage service, I think we can also get the user App operation data to have a deeper understanding of the user’s profile.
  • Both identify and describe intuitively the features that can be extracted from users’ photos.

    • We do not want to extract any features of the photos, Yolo v5 applied the deep-learning procedures to get them done for us.
  • Both identify and describe intuitively the machine learning algorithms that can be used to make predictions.

    • Basically, the machine learning prediction problem is based on the regression, if the regression value is within a specific range, it can be applied in one category.
    • For the traditional machine learning procedure, the input is some features, the outputs are some features, we apply the BP neural network to get the relationship between the input features and output features.
    • For the BP neural network, it applied the backpropagation algorithm to train the weights for each neural for fitting this purpose.
  • Differentiate features that might be effective in predicting demographics from those effective in predicting interests and explain why a certain feature might be more useful for each specific task.

    • In my opinion, the location is good for predicting the cultural background.
    • For others, let’s simply apply deep learning to make life easier.

What is more, in terms of the associated natural language processing mechanism, the associated analysis is as follow.

Assuming there are descriptions about the photos. I really do not know how the description of the data could be useful in evaluating the customer profile from the human’s perception. But, it is not a problem in the PyTorch. Simply put them in the deep learning framework using the labelled data, they do the feature engineering to you, then you get the associated correspondence afterwards.

~ 600 words. #3

The new advert targeting algorithm is tested on a pilot group of 10,000 users compared to a matched control group of 10,000 users.

Two weeks later, the CTO reports that the pilot group has a higher click-through rate such that a t-test between the two groups returns p=.032, a statistically significant difference. What questions might you ask the CTO to confirm whether to roll out the new algorithm? Justify why you would ask each question.

  • What is your F-test score result?

    • It is an alternative to T-test
  • What is your Pearson’s chi-squared test?

    • It is an alternative to T-test
  • What is your experiment conduction mechanism? Did you set the associated comparison group? Is it a continual experiment or a comparison-based study?

    • The experiment conduction mechanism affects on how the test results can be evaluated?
  • The suggested approach is simplified to get the data at the control group and experiment group, and control other variables unchanged, to have an experiment conduct with shows the difference of the data between the control group and the experimental group.

Guidelines:

  • Both identify and justify each question that you might ask the CTO.
  • Justifications will mention the limitations of the experiment.
  • Suggest 1 quick follow-up experiment or additional analysis to confirm the findings.

Things I want to encourage for your improvements

  • I simplify want to see how did you conduct your experinemnt, and we can based on how did you conduct your experiment to give further detailed questions.

General comment.

  • Refer to references and sources to support your answers (only when needed).
  • Focus on the relative importance of factors that have been identified, demonstrating a thorough understanding of the various technologies required to run big data analyses in the real world.

Reference

[1] https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/435817/The_commercial_use_of_consumer_data.pdf

[2] http://ieeexplore.ieee.org/document/6416855/

[3] https://thepeakperformancecenter.com/educational-learning/learning/preferences/myers-briggs-type-indicator/

Case Study. Technical and Commercial understating. Internal use only.相关推荐

  1. Overview of ISA and TMG Networking and ISA Networking Case Study (Part 2)

                老方说:此篇文章摘自ISASERVER.ORG网站,出自Thomas Shinder达人之手.严重建议ISA爱好者看看. Published: Dec 16, 2008 Upd ...

  2. Deep Learning-Based Video Coding: A Review and A Case Study

    郑重声明:原文参见标题,如有侵权,请联系作者,将会撤销发布! 1.Abstract: 本文主要介绍的是2015年以来关于深度图像/视频编码的代表性工作,主要可以分为两类:深度编码方案以及基于传统编码方 ...

  3. Data Visualization – Banking Case Study Example (Part 1-6)

    python信用评分卡(附代码,博主录制) https://study.163.com/course/introduction.htm?courseId=1005214003&utm_camp ...

  4. Case study:在数据库网页中设计数据排序工具

    一.目的 该笔记的目的是引导读者在已搭建的数据库网页的基础上,利用JS设计数据排序工具.其效果如图1所示."Order by"下拉列表框由一系列字段组成,如"Locati ...

  5. Case Study: 利用PHP获取关系型数据库中多张数据表的数据

    一.目标 该笔记的目的是引导读者借助WampServer平台和MySQL数据库,利用HTML/CSS/JS/PHP设计一个多数据表关联的网页.在上一个案例(Case Study: 利用JS实现数据库网 ...

  6. Case Study: 利用JS实现数据库网页的数据分页、数据选择、数据详细信息查看功能

    一.目标 该笔记的目的是引导读者借助WampServer平台和MySQL数据库,利用HTML/CSS/JS/PHP设计一个能够进行实现数据分页显示.数据选择.数据详细信息查看功能的数据库网页.该数据库 ...

  7. Case Study: 利用JS设计高级检索功能通过PHP获取MySQL数据

    一.目标 该笔记的目的是引导读者借助WampServer平台和MySQL数据库,利用HTML/CSS/JS/PHP设计一个含有高级检索功能的数据库网页.该功能效果如图1所示.用户在文本框中输入相应内容 ...

  8. Customer Success Case Study Library

    Customer Success Case是一个比较值得借鉴的地方. Citrix社区The Connection,提供了分享企业虚拟计算的成功经验,社区的一些好处也不介绍了,复制过来: Promot ...

  9. FetchAI Case Study

    FetchAI Case Study Background Location Contact Info Web: https://fetch.ai/ Objectives Methodologies ...

最新文章

  1. C# 学习笔记(17)操作SQL Server 上
  2. 怎么看python环境变量配置是否好了验证图片_简述验证Anaconda是否安装成功的两种方式和Anaconda环境变量配置过程...
  3. 查询ecshop网站代码排查方法_提升网站访问速度,提升网站访问速度,提升网站访问速度的个人经验分享...
  4. 致敬百年南开!南开大学作译者30本必读经典著作
  5. python爬虫爬取雪球网_Python爬虫爬取天天基金网基金排行
  6. 智能车学习(八)——菜单的实现
  7. SLAM++:面向对象的同时定位与建图系统(2013-CVPR)
  8. clientHeight、offsetHeight、scrollHeight、scrollTop的区别以及上拉加载的实现
  9. php日期转时间戳,指定日期转换成时间戳
  10. PLC与RobotStudio联合仿真调试——项目一
  11. 复旦大学电子信息专业考研上岸经验分享
  12. ASM故障组offline
  13. CSS实现多余文本隐藏
  14. 学校计算机ip设置路由器,配合路由器设置电脑静态ip方法图文教程
  15. html写钢琴键盘按键错乱,键盘按键错乱怎么修复
  16. vue(h5+app分享微信、朋友圈、保存图片)
  17. JDBC之execute、executeQuery和executeUpdate之间的区别
  18. protocol buffers 序列化数据
  19. python语言的标准库有哪些,python标准库函数有哪些
  20. 喜 欢 和 爱 的 区 别

热门文章

  1. trunc( mysql写法_使用oracle的trunc和dbms
  2. 天翼云从业认证(4.11)天翼云物流行业解决方案
  3. 【江苏】2021年下半年软考报考时间及通知
  4. 笔记-项目范围管理-需求工程-需求分析的三个阶段-需求提出-需求描述-需求评审...
  5. Angualr中通过原生js和ViewChild的方式获取dom
  6. Winfrom中设置ZedGraph显示多个标题(一个标题换行显示)效果
  7. AntV中的饼状图中的花瓣图旁边的文字显示label怎样修改
  8. MyBatisPlus3.x中使用条件构造器查询某一天的记录数时的日期格式化注意
  9. Atom JS 代码智能提示补全
  10. 一次作死尝试:将自己的linux用rm -rf /会怎样?结果哭了。。