fastai学习：02_production Questionnaire

1.Where do text models currently have a major deficiency?
Deep learning is currently not good at generating correct responses! We don’t currently have a reliable way to, for instance, combine a knowledge base of medical information with a deep learning model for generating medically correct natural language responses.We don’t currently have a reliable way to, for instance, combine a knowledge base of medical information with a deep learning model for generating medically correct natural language responses.
2.What are possible negative societal implications of text generation models?
It is so easy to create content that appears to a layman to be compelling, but actually is entirely incorrect.
Another concern is that context-appropriate, highly compelling responses on social media could be used at massive scale—thousands of times greater than any troll farm previously seen—to spread disinformation, create unrest, and encourage conflict.
3.In situations where a model might make mistakes, and those mistakes could be harmful, what is a good alternative to automating a process?
人工审查
4.What kind of tabular data is deep learning particularly good at?
Natural language (book titles, reviews, etc.) and high-cardinality categorical columns.
5.What’s a key downside of directly using a deep learning model for recommendation systems?
Nearly all machine learning approaches have the downside that they only tell you what products a particular user might like, rather than what recommendations would be helpful for a user. Many kinds of recommendations for products a user might like may not be at all helpful—for instance, if the user is already familiar with the products, or if they are simply different packagings of products they have already purchased (such as a boxed set of novels, when they already have each of the items in that set).
6.What are the steps of the Drivetrain Approach?

7.How do the steps of the Drivetrain Approach map to a recommendation system?
The objective of a recommendation engine is to drive additional sales by surprising and delighting the customer with recommendations of items they would not have purchased without the recommendation. The lever is the ranking of the recommendations. New data must be collected to generate recommendations that will cause new sales. This will require conducting many randomized experiments in order to collect data about a wide range of recommendations for a wide range of customers. This is a step that few organizations take; but without it, you don’t have the information you need to actually optimize recommendations based on your true objective (more sales!).
8.Create an image recognition model using data you curate, and deploy it on the web.
下载了dogvscatkaggle进行尝试
9.What is DataLoaders?
A fastai class that stores multiple DataLoader objects you pass to it, normally a train and a valid, although it’s possible to have as many as you like. The first two are made available as properties.
10.What four things do we need to tell fastai to create DataLoaders?
What kinds of data we are working with
How to get the list of items
How to label these items
How to create the validation set
11.What does the splitter parameter to DataBlock do?
划分数据集，训练集，验证集，valid_pct
12.How do we ensure ait always gives the same validation set?
设置随机种子，保证划分方式相同，seed
13.What letters are often used to signify the independent and dependent variables?
x independent
y dependent
14.What’s the difference between the crop, pad, and squish resize approaches? When might you choose one over the others?
crop is the default Resize() method, and it crops the images to fit a square shape of the size requested, using the full width or height. This can result in losing some important details. For instance, if we were trying to recognize the breed of dog or cat, we may end up cropping out a key part of the body or the face necessary to distinguish between similar breeds.
pad is an alternative Resize() method, which pads the matrix of the image’s pixels with zeros (which shows as black when viewing the images). If we pad the images then we have a whole lot of empty space, which is just wasted computation for our model, and results in a lower effective resolution for the part of the image we actually use.
squish is another alternative Resize() method, which can either squish or stretch the image. This can cause the image to take on an unrealistic shape, leading to a model that learns that things look different to how they actually are, which we would expect to result in lower accuracy.
Which resizing method to use therefore depends on the underlying problem and dataset. For example, if the features in the dataset images take up the whole image and cropping may result in loss of information, squishing or padding may be more useful.
Another better method is RandomResizedCrop, in which we crop on a randomly selected region of the image. So every epoch, the model will see a different part of the image and will learn accordingly.
15.What is data augmentation? Why is it needed?
Data augmentation refers to creating random variations of our input data, such that they appear different, but do not actually change the meaning of the data. Examples of common data augmentation techniques for images are rotation, flipping, perspective warping, brightness changes and contrast changes. For natural photo images such as the ones we are using here, a standard set of augmentations that we have found work pretty well are provided with the aug_transforms function. Because our images are now all the same size, we can apply these augmentations to an entire batch of them using the GPU, which will save a lot of time.
16.What is the difference between item_tfms and batch_tfms?
item_tfms are transformations applied to a single data sample x on the CPU. Resize() is a common transform because the mini-batch of input images to a cnn must have the same dimensions. Assuming the images are RGB with 3 channels, then Resize() as item_tfms will make sure the images have the same width and height.
batch_tfms are applied to batched data samples (aka individual samples that have been collated into a mini-batch) on the GPU. They are faster and more efficient than item_tfms. A good example of these are the ones provided by aug_transforms(). Inside are several batch-level augmentations that help many models.
17.What is a confusion matrix?
展示预测结果的混淆矩阵
18.What does export save?
Export saves both the architecture, as well as the trained parameters of the neural network architecture. It also saves how the DataLoaders are defined.
19.What is it called when we use a model for getting predictions, instead of training?
Inference
20.What are IPython widgets?
IPython widgets are JavaScript and Python combined functionalities that let us build and interact with GUI components directly in a Jupyter notebook.
21.When might you want to use CPU for deployment? When might GPU be better?
GPUs are best for doing identical work in parallel. If you will be analyzing single pieces of data at a time (like a single image or single sentence), then CPUs may be more cost effective instead, especially with more market competition for CPU servers versus GPU servers. GPUs could be used if you collect user responses into a batch at a time, and perform inference on the batch. This may require the user to wait for model predictions. Additionally, there are many other complexities when it comes to GPU inference, like memory management and queuing of the batches.
22.What are the downsides of deploying your app to a server, instead of to a client (or edge) device such as a phone or PC?
Your application will require a network connection, and there will be some latency each time the model is called.
Also, if your application uses sensitive data then your users may be concerned about an approach which sends that data to a remote server, so sometimes privacy considerations will mean that you need to run the model on the edge device.
Managing the complexity and scaling the server can create additional overhead too, whereas if your model runs on the edge devices then each user is bringing their own compute resources, which leads to easier scaling with an increasing number of users.
23.What are three examples of problems that could occur when rolling out a bear warning system in practice?
Working with video data instead of images
Handling nighttime images, which may not appear in this dataset
Dealing with low-resolution camera images
Ensuring results are returned fast enough to be useful in practice
Recognizing bears in positions that are rarely seen in photos that people post online (for example from behind, partially covered by bushes, or when a long way away from the camera)
24.What is “out-of-domain data”?
That is to say, there may be data that our model sees in production which is very different to what it saw during training. There isn’t really a complete technical solution to this problem; instead, we have to be careful about our approach to rolling out the technology.
25.What is “domain shift”?
One very common problem is domain shift, where the type of data that our model sees changes over time. For instance, an insurance company may use a deep learning model as part of its pricing and risk algorithm, but over time the types of customers that the company attracts, and the types of risks they represent, may change so much that the original training data is no longer relevant.
26.What are the three steps in the deployment process?

fastai学习：02_production Questionnaire相关推荐

fastai学习：01_intro Questionnaire
fastAI Questionnaire 感觉还挺多的,怪不得说每一课要额外8小时进行学习. 1.Do you need these for deep learning? Lots of math T ...
fastai学习：05_pet_breeds Questionnaire
1.Why do we first resize to a large size on the CPU, and then to a smaller size on the GPU? 首先,在训练模型 ...
fastai学习——第一个bug
跟着视频学习,在运行第一段测试代码的时候出现问题 from fastai.vision.all import * path = untar_data(URLs.PETS)/'images'def is ...
fastai学习：06_multicat Questionnarie
1.How could multi-label classification improve the usability of the bear classifier? 可以对不存在的熊进行分类 2. ...
fastai学习——第二个问题
第二节课需要使用bing image search api获取bing图片搜索中的熊图片,此时发现获取api需要注册azure,卡在绑定卡上很久,想了想还要去弄一张带visa的卡,还是算了,就用猫狗大 ...
fastai学习笔记——安装
虽然说是推荐linux,windows可能有bug,但是我还是没办法只用linux win10+anaconda python=3.7 安装很简单 conda install -c fastchan ...
使用Fastai开发和部署图像分类器应用
作者|KRRAI77@GMAIL.COM 编译|Flin 来源|analyticsvidhya 介绍 Fastai是一个流行的开源库,用于学习和练习机器学习以及深度学习.杰里米·霍华德(Jeremy ...
免费GPU哪家强？谷歌Kaggle vs. Colab
作者 | Jeff Hale 译者 | Monanfei 责编 | 夕颜出品 | AI科技大本营(id:rgznai100) 谷歌有两个平台提供免费的云端GPU:Colab和Kaggle, 如果你想 ...
为什么安装的是gpu版本训练时还是用的cpu?_免费GPU哪家强？谷歌Kaggle vs. Colab | 硬核评测...
作者 | Jeff Hale译者 | Monanfei责编 | 夕颜出品 | AI科技大本营(id:rgznai100) 谷歌有两个平台提供免费的云端GPU:Colab和Kaggle, 如果你想深入学 ...

fastai学习：02_production Questionnaire

fastai学习：02_production Questionnaire相关推荐

最新文章

热门文章