Case Study. Technical and Commercial understating. Internal use only.

You’re a consultant for a Tech start-up with 40 staff that has created a phone app called “XYZ” with millions of users.
The app lets you save your photos on XYZ’s servers so that they can share photos from their phones with one another.
Currently, the company makes money by including adverts on its app.
The CTO suggests that XYZ’z adverts can be better targeted by analyzing its users’ photos.
For example, a user who shares photos of a baby might see adverts for family-friendly holidays.

QUESTIONS.

~ 400 words. #1

What legal and ethical issues are there as part of the decision of whether to go ahead?

Guidelines:

Refer to other companies’ commercial use of user data to explain what has been deemed acceptable in society’s norms.
- Consumers should understand how their data is used even though many potential uses of consumer data which is beneficial to firms, are not visible to consumers. In this case, it is the customized marketing recommendation. They should be asked whether they are content to receive marketing materials [1].
- Currently, customers have a relatively higher understanding of how the data should be protected. In following the implementation of the GDPR in Europe, the UK data protection act in the UK, and the Personal Information Protection Act in China.
Explain the positive and negative consequences as a good CEO and citizen would, given that it would have real effects on the business, its users, and also society.
- In my understanding, the customer may get customized recommendations in getting a higher likelihood of getting the right product for themselves. Provided we have informed the customers about how the data is proceeded in the system and do not have their data used before notification, this kind of customized advertisement recommendation is beneficial for the customers in reducing the associated decision process. If we model the society as the distributed system in a network, the associated edge between nodes is becoming shorter since the recommendation is more customized, which is ideal for the society’s economy.
Regulations to follow
- UK data protection act 2018
- General Data Protection Regulation
What do other companies do?
- Ask users whether they want the customized recommendations or not at the pop-up windows
Recommendations
- Ask users whether they want the customized recommendations or not at the pop-up windows
In a more depth analysis, the analysis is as follows, which is from the legal and ethical aspects.
- In the legal aspect, there are currently laws and regulations related to the personal data protection, such as GDPR, UK data protection act, and the personal informatio protection act in China. The associated data protection laws in these regulations are that the companies or institutes which applies these data in commercial usage, should show the associated mechanism to the users in fully satisfying the legistmation purposes.
- In the ethical aspect, it may differ from culture to culture. In my opinion ethcial is a side aspect for helping the issues, which laws can not help. That is to say, provided the companies can make benefits not only to the customers but also to the companies, and formally inform them about how their data is processed and it is not possible to generate any risks in leaking your personal information. Then it should be accepted by the ethical guidances.

~ 1000 words. #2

How could a data analysis team tackle the problem of automatically predicting the following from a user’s photos:

a. The user’s demographics.

b. The user’s interests. Include in your answer the features that might be extracted from users’ photos and a description of the types of machine learning algorithms that might be used to make predictions. E.g., NLP? Etc?

Demographics

Firstly, age, gender, occupation, cultural background, and family status are the five output layers for the demographics. The associated subcategories are simplified for advertisement recommendation purposes.
age [2]
- - Child (0-12 years)
  - Adolescence (13-18 years)
  - Adult (19-59 years)
- Senior Adult (60 years and above)
gender
- - male
  - female
- Other
occupation
- - blue-collar
  - white-collar
- golden collar
cultural background
- - eastern
- western
family status
- - single
  - married
- Others
Secondly, let’s look at inputs, which are photos and descriptions as follows
photos
- Could be applied to the input of the Yolo v 5 network the image classification purposes
description
- Generally, the is a limited number of people in the current world to input descriptions for images (since phones are not an ideal device for text input), therefore, we assume the description of the photo is information automatically captured by the photo, such as GPS location, time. In this case, the photo location can be applied for the cultural background for the customized recommendation.
The network training mechanism
Training
- Go to kaggle to find the database for user data, which consists of the photo and description respective
If kaggle has the well-labeled database
- then apply them to the Yolo v 5 network
else
- - go to taobao.com to have manual data labeling work done for roughly 0.2 yuan per picture for the internal database within your system
- Go to the NGX station on Nvidia or the GPU cluster provided by the university on the cloud for training for roughly one night in getting the data done
Testing
- Give the new data to see how the prediction works
Validation
- Apply a cross-validation approach to see how the model works

Interests

Firstly, classify the user interests based on the MBTI model [3]
- Extraverts (E)
- Introverts (I)
- Sensors (S)
- Intuitors (N)
- Thinkers (T)
- Feelers (F)
- Judgers (J)
- Perceivers §
Then follows the same procedure on demographics prediction using the photo mainly and supplied by descriptions. In this case, we likely need to do modeling by ourselves, since there is limited data available on Kaggle based on this model classification.

Guidelines:

Discuss what additional data is required to run this analysis.
- Since we are an app providing photo storage service, I think we can also get the user App operation data to have a deeper understanding of the user’s profile.
Both identify and describe intuitively the features that can be extracted from users’ photos.
- We do not want to extract any features of the photos, Yolo v5 applied the deep-learning procedures to get them done for us.
Both identify and describe intuitively the machine learning algorithms that can be used to make predictions.
- Basically, the machine learning prediction problem is based on the regression, if the regression value is within a specific range, it can be applied in one category.
- For the traditional machine learning procedure, the input is some features, the outputs are some features, we apply the BP neural network to get the relationship between the input features and output features.
- For the BP neural network, it applied the backpropagation algorithm to train the weights for each neural for fitting this purpose.
Differentiate features that might be effective in predicting demographics from those effective in predicting interests and explain why a certain feature might be more useful for each specific task.
- In my opinion, the location is good for predicting the cultural background.
- For others, let’s simply apply deep learning to make life easier.

What is more, in terms of the associated natural language processing mechanism, the associated analysis is as follow.

Assuming there are descriptions about the photos. I really do not know how the description of the data could be useful in evaluating the customer profile from the human’s perception. But, it is not a problem in the PyTorch. Simply put them in the deep learning framework using the labelled data, they do the feature engineering to you, then you get the associated correspondence afterwards.

~ 600 words. #3

The new advert targeting algorithm is tested on a pilot group of 10,000 users compared to a matched control group of 10,000 users.

Two weeks later, the CTO reports that the pilot group has a higher click-through rate such that a t-test between the two groups returns p=.032, a statistically significant difference. What questions might you ask the CTO to confirm whether to roll out the new algorithm? Justify why you would ask each question.

What is your F-test score result?
- It is an alternative to T-test
What is your Pearson’s chi-squared test?
- It is an alternative to T-test
What is your experiment conduction mechanism? Did you set the associated comparison group? Is it a continual experiment or a comparison-based study?
- The experiment conduction mechanism affects on how the test results can be evaluated?
The suggested approach is simplified to get the data at the control group and experiment group, and control other variables unchanged, to have an experiment conduct with shows the difference of the data between the control group and the experimental group.

Guidelines:

Both identify and justify each question that you might ask the CTO.
Justifications will mention the limitations of the experiment.
Suggest 1 quick follow-up experiment or additional analysis to confirm the findings.

Things I want to encourage for your improvements

I simplify want to see how did you conduct your experinemnt, and we can based on how did you conduct your experiment to give further detailed questions.

General comment.

Refer to references and sources to support your answers (only when needed).
Focus on the relative importance of factors that have been identified, demonstrating a thorough understanding of the various technologies required to run big data analyses in the real world.

Reference

[1] https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/435817/The_commercial_use_of_consumer_data.pdf

[2] http://ieeexplore.ieee.org/document/6416855/

[3] https://thepeakperformancecenter.com/educational-learning/learning/preferences/myers-briggs-type-indicator/