aws 认证

the highly important and carefully crafted piece, * this will only be useful after completing the entire course on Udemy

非常重要且精心制作的作品,*仅在完成有关Udemy的整个课程后才有用

适用于AWS ML专业的Udemy课程 (Udemy Course for AWS ML Specialty)

备忘单 (Cheat Sheet)

降低SageMaker上自动超参数调整的成本 (Reduce the cost of Automatic Hyperparameter tuning on SageMaker)

  • use log scales on parameter ranges
    在参数范围上使用对数刻度
  • less concurrent while tuning, cause it learns in different runs
    调整时并发较少,导致它在不同的运行中学习
  • have the smallest range of hyperparameters
    具有最小范围的超参数

Recall is an important metric in situations where classifications are highly imbalanced, and the positive case is rare. Accuracy tends to be misleading in these cases.

在分类高度不平衡的情况下, 召回是一项重要的指标,而正面案例很少见。 在这些情况下,准确性往往会产生误导。

  • Ex: Fraud Detection
    例如:欺诈检测

混淆矩阵备忘单— (Cheat Sheet for Confusion Matrix —)

更多的时代和过度拟合? (More epochs and overfitted?)

  • use drop out regularization
    使用辍学正则化
  • early stopping of epochs is good advice
    早点停止是个好建议

SageMaker笔记本实例支持Internet,在VPC中造成潜在的安全漏洞。 (SageMaker notebook instances are Internet-enabled, creating a potential security hole in your VPC.)

  • VPC Interface Endpoint(PrivateLink)
    VPC接口端点(PrivateLink)
  • Modify instance’s security group to allow outbound connections for training and hosting.
    修改实例的安全组,以允许出站连接进行培训和托管。

边缘 (Edge)

  • SageMaker Neo + IoT GreenGrass
    SageMaker Neo +物联网GreenGrass
  • sample edge device — Nvidia Jetson
    样品边缘设备— Nvidia Jetson

设计并推向边缘 (To design and push something to edge)

  • design something to do the job, say TF model
    设计可以胜任的工作,例如TF模型
  • compile it for the edge device using SageMaker Neo, say Nvidia Jetson
    Nvidia Jetson说,使用SageMaker Neo将其编译为边缘设备
  • run it on the edge using IoT GreenGrass
    使用IoT GreenGrass在边缘运行

亚马逊上的NLP —理解 (NLP on Amazon — Comprehend)

  • Another solution would be to use natural language processing through a service such as Amazon Comprehend.
    另一个解决方案是通过诸如Amazon Comprehend之类的服务使用自然语言处理。

您正在SageMaker上训练具有数百万行训练数据的XGBoost模型,并且希望使用Apache Spark大规模预处理此数据。 实现这一目标的最简单架构是什么? (You are training an XGBoost model on SageMaker with millions of rows of training data, and you wish to use Apache Spark to pre-process this data at scale. What is the simplest architecture that achieves this?)

  • The SageMakerEstimator classes allow tight integration between Spark and SageMaker for several models including XGBoost, and offers the simplest solution

    SageMakerEstimator类允许SparkSageMaker在包括XGBoost在内的多种模型之间进行紧密集成,并提供最简单的解决方案

您无法将SageMaker部署到EMR集群 (You can’t deploy SageMaker to an EMR cluster)

XGBoost实际上需要LibSVM或CSV输入 (XGBoost actually requires LibSVM or CSV input)

归纳最佳ML填充选择? (Imputation best ML filling choices?)

  • Categorical — Deep Learning
    分类-深度学习
  • Numerical — kNN
    数值— kNN

ML和流量峰值是否偶尔出现? (if any, ML and spike Of traffic sporadically?)

  • Use Spot Instances — The use of spot instances in response to anticipated surges in usage is the most cost-effective approach for scaling up an EMR cluster.

    使用竞价型实例—使用竞价型实例来响应预期的使用激增是扩展EMR集群的最具成本效益的方法。

像素级分类称为“语义分割” (Pixel level classification is called — Semantic Segmentation)

什么是损失函数? (What is Loss Function?)

  • What is that you don’t want to lose will be your loss function while building your model
    您不想丢失的是构建模型时的损失函数
  • Example: for fraud detection, you don’t want false negatives, so FN / FN + TP is the loss function
    示例:对于欺诈检测,您不需要假阴性,因此FN / FN + TP是损失函数

降低尺寸 (Reduce the Dimensionality)

  • PCA
    PCA
  • K-Means Clustering
    K均值聚类

KNN-受监督; K均值—无监督 (KNN — Supervised; K-Means — Unsupervised)

/opt/ml/code/train.py (/opt/ml/code/train.py)

  • this should have an env. variable SAGEMAKER_PROGRAM with value train.py in the Dockerfile

    这应该有一个环境。 SAGEMAKER_PROGRAM具有值train.py的变量SAGEMAKER_PROGRAM

使用S3前缀按日期组织数据可以使Glue按日期对数据进行分区,从而可以更快地查询日期范围。 (Organizing data by date using S3 prefixes allows Glue to partition the data by date, which leads to faster queries done on date ranges.)

S3生命周期策略可以自动化将旧数据归档到Glacier的过程。 (S3 lifecycle policies can automate the process of archiving old data to Glacier.)

制作自己的Alexa (Make your own Alexa)

  • Transcribe(speech to text) → Lex(chatbot engine that works on intent) → Polly(that reads the given text (text to speech))
    转录(语音到文本)→Lex(可在意图上工作的聊天机器人引擎)→Polly(读取给定的文本(文本到语音))
  • in real implementation we also use — DynamoDB and Lambdas too
    在实际实现中,我们还使用了DynamoDB和Lambdas

在您首先进行培训之前, AWS Rekognition不会知道您的公司徽标,也不会知道对象检测。 (AWS Rekognition won’t know about your company logo, nor will Object Detection until you have trained it first.)

虽然Ground Truth可以选择使用Mechanical Turk的劳动力,但它是专门为此类任务而设计的,可以很快设置 (While Ground Truth can use the Mechanical Turk workforce as an option, it is purpose-built for this sort of task and can be set up very quickly)

分解机器与处理稀疏数据有关,但是它们本身并不执行降维。 (Factorization machines are relevant to handling sparse data, but they don’t perform dimensionality reduction per se.)

  • Factorization Machines → Sparse Data
    分解机→稀疏数据
  • Sparse Data → Factorization Machines
    稀疏数据→因式分解机

PCA是一种强大的降维技术,可以找到最佳尺寸。 (PCA is a powerful dimensionality reduction technique that will find the best dimensions.)

给定多轴混淆矩阵作为具有对角轴的热图 (Given a multi-axis confusion matrix as a heat map with a diagonal axis)

  • The choice with the lightest color along the diagonal axis is the correct one, as it represents the lowest number of correct predictions.
    沿对角线轴颜色最浅的选择是正确的选择,因为它代表正确预测的最少数量。

我们永远不能说图表捕捉的趋势不错,但季节性不好。 (We can never say a graph is capturing trend good but seasonality bad.)

  • either both good or both bad
    要么好要么坏

季节性是指周期性的变化,而趋势是随时间推移的长期变化。 (Seasonality refers to periodic changes, while trends are longer-term changes over time.)

Kinesis Analytics可以使用SQL本机进行最少的转换。 (Kinesis Analytics can do minimum transformation natively using SQL.)

Amazon Forecast-AWS上的RTF服务以进行预测。 (Amazon Forecast — RTF Service on AWS for forecasting.)

您正在使用EMR,请使用S3→始终使用EMRFS (You are on EMR, to use S3 → always EMRFS)

SMOTE-巧妙的过采样技术 (SMOTE — an ingenious oversampling technique)

  • Synthetic Minority Oversampling Technique
    综合少数民族过采样技术

大批量处理→卡在局部最小值中→您将错过真正的最小值 (Large Batch Size → stuck in local minima → you will miss true minima)

L1正则化技术→减少功能(对修复过度拟合非常有用)→如果执行得太过激,也可能会过早拟合。 (L1 Regularization Technique → reduces features (very useful to fix overfitting) → if done too aggressive might also under fit too soon.)

L2正则化技术→权衡每个特征而不是将其全部删除,这可以提高准确性 (L2 Regularization Technique → it weights each feature instead of removing them entirely, which can lead to better accuracy)

解决不合身? (Tackle underfitting?)

  • use L2 instead of L1
    使用L2代替L1
  • or we can also just reduce the L1 regression term (this term means, how intense L1 was applied)
    或者我们也可以只减少L1回归项(此项意味着应用L1的强度)

解决过度拟合? (Tackle Overfitting?)

  • Dropout regularization Technique
    辍学正则化技术
  • early stops of epochs
    时代的早期停止
  • use a few layers may help
    使用几层可能会有所帮助

分位数分档 (Quantile Binning)

  • splits data into a fixed number of buckets, with the same number of observations in each bin.
    将数据分割成固定数量的存储桶,每个仓中的观察值数量相同。

分布不均的数据并保持分布 (unevenly distributed data and preserve the distribution)

  • Quantile binning
    分位数分档

如果使用间隔合并怎么办? (What if used Interval binning?)

  • some intervals could have fewer items and some could have way more → this behavior loses the distribution visibility
    一些间隔可能会减少项目的数量,而某些间隔可能会有更多的方法→此行为会失去分布可见性

SageMaker分布式培训 (SageMaker Distributed Training)

can’t be done out of the box

开箱即用

  • Horovod
    霍罗沃德
  • Parameter Servers
    参数服务器

训练失败了吗? (Did training fail?)

  • Training with unshuffled data may cause training to fail.
    使用未经改组的数据进行训练可能会导致训练失败。

培训数据应始终规范化和改组。 (Training data should be normalized and shuffled, all the time.)

Sage Maker Linear Learner支持分类回归任务。 (Sage Maker Linear Learner supports both classification and regression tasks.)

F1得分→2.PR/(P + R) (F1 Score → 2.P.R/(P + R))

  • P — Precision
    P —精度
  • R — Recall
    R —召回

Glue和Glue ETL可以为非结构化数据赋予结构,并在接收到该数据时对其进行转换。 (Glue and Glue ETL can impart structure to unstructured data, and perform transformations on that data as it is received.)

Athena是一种无服务器解决方案,与Glue配对后可以直接查询S3数据湖 (Athena is a serverless solution that can query S3 data lakes directly when paired with Glue)

S3中的数据,是否需要可视化? (data in S3 and need visualizations?)

  • S3 → GlueCrawlers → Glue Data Catalog → Athena → QuickSight
    S3→粘合履带→粘合数据目录→雅典娜→QuickSight

当您要准备大量数据时→您总是希望并行完成数据,而Apache Spark是唯一擅长的数据。 (when you want to prepare so much data → you always want it to be done in parallel and Apache Spark is the only one good at it.)

S3上有这么多数据并将其用于ML? (so much data on S3 and use it for ML?)

  • approach 1 :
    方法1:
  • - use PySpark + XGBoostSageMakerEstimator to prepare data using Spark
    -使用PySpark + XGBoostSageMakerEstimator使用Spark准备数据
  • - then pass the data to SageMaker
    -然后将数据传递给SageMaker
  • approach 2 : without using XGBoostSageMakerEstimator
    方法2:不使用XGBoostSageMakerEstimator
  • - use Spark on EMR to pre-process the data and store it back in same/another S3
    -在EMR上使用Spark预处理数据并将其存储回相同/另一个S3中
  • - keep S3 bucket accessible to SageMaker to train on
    -让SageMaker可以访问S3存储桶以进行培训

Glue ETL和Kinesis Analytics都不能转换为LibSVM格式 (Neither Glue ETL nor Kinesis Analytics can convert to LibSVM format)

scikit-learn不适用于分布式解决方案。 (scikit-learn is not for a distributed solution.)

LibSVM —支持向量机的库 (LibSVM — A Library for Support Vector Machines)

最好的插补技术是什么? (What is the best imputation technique?)

  • always supervised for → discrete data
    始终受监督→离散数据
  • Deep Learning for → classification data
    深度学习→分类数据
  • mean or median next
    下一个均值或中位数
  • drop off next
    接下来下车

培训涉及多个长期运行的ETL作业,这些作业需要按顺序执行 (training involves multiple long-running ETL jobs which need to execute in order)

  • order → StepFunctions
    订购→StepFunctions

QuickSight的ML Insights功能允许使用QuickSight本身进行预测。 这是一种包含最少数量组件的无服务器解决方案。 (QuickSight’s ML Insights feature allows forecasting using QuickSight itself. This is a serverless solution that contains the least number of components.)

完全没有开销的预测? (Forecasting without overhead at all?)

  • put data in S3
    将数据放入S3
  • use QuickSight’s native ML Insights feature
    使用QuickSight的本机ML Insights功能
  • also use QuickSight dashboard for visualization
    还使用QuickSight仪表板进行可视化

XGBoost超参数 (XGBoost hyperparameters)

  • subsample
    子样本
  • alpha
    α
  • eta
    eta
  • gamma
    伽玛
  • lambda
    拉姆达

当假阴性的成本高于假阳性的成本时,召回(TP /(TP + FN))很重要。 (Recall (TP / (TP+FN)) is important when the cost of a false negative is higher than that of a false positive.)

在装有相机的地方检测到自定义徽标或T恤? (detect a custom logo or t-shirt from a place with cameras?)

  • custom CNN for achieving computer vision or image detection
    定制的CNN以实现计算机视觉或图像检测
  • camera at location
    相机在位置
  • DeepLens
    深镜头
  • DeepLens_kinesis_Video Module
    DeepLens_kinesis_Video模块
  • SageMaker
    贤者

快速在当前分类器旁边建立另一个分类器? (quickly build another classifier beside the current one?)

  • use transfer learning, clone this besides one and start building on top of it

    使用迁移学习,将其克隆并在其上开始构建

转移学习 (transfer learning)

  • can be below or above
    可以低于或高于
  • use transfer learning, clone this besides one and start building on top of it

    使用迁移学习,将其克隆并在其上开始构建

  • Transfer learning generally involves using an existing model or adding additional layers on top of one.
    转移学习通常涉及使用现有模型或在模型之上添加其他层。

分解机→float32 (Factorization Machines → float32)

分解机 (Factorization Machines)

  • handle sparse data
    处理稀疏数据
  • RecordIO/protobuf in float32 format (highly unusual)

    float32格式的RecordIO / protobuf( 非常不寻常 )

对于SageMaker管道模式 (For SageMaker Pipe Mode)

  • RecordIO is efficient
    RecordIO是高效的

SageMaker Notebook(如果使用默认IAM创建) (SageMaker Notebook if created with default IAM)

  • it can access S3 buckets with ‘sagemaker’ in name

    它可以访问名称为“ sagemaker ”的S3存储桶

除非您将具有S3FullAccess权限的策略添加到角色,否则策略将仅限于存储桶名称中带有“ sagemaker”的存储桶。 奇怪但真实。 (Unless you add policy with S3FullAccess permission to the role, it is restricted to buckets with “sagemaker” in the bucket name. Strange but true.)

炽烈的文字格式 (Blazing Text format)

  • Each line of the input file contains a training sentence per line, along with their labels. Labels must be prefixed with the label, and the tokens within the sentence — including punctuation — should be space-separated.

    输入文件的每一行每行包含一个训练语句及其标签。 标签必须与标签为前缀,这句话中的表征-包括标点符号-应该用空格分开。

为什么是管道模式? (why Pipe mode?)

  • if using pipe mode, we don’t copy the data to the training machine
    如果使用管道模式,我们不会将数据复制到训练机上
  • we stream the data
    我们流数据
  • it makes a big diff. for big datasets
    这带来了很大的不同。 适用于大型数据集
  • requirements of pipe mode? → RecordIO Format
    管道模式的要求? →RecordIO格式

SageMaker LDA→仅管道模式→因此RecordIO (SageMaker LDA → only Pipe mode → so RecordIO)

SageMaker LDA→仅在单个实例上进行培训 (SageMaker LDA → training on an only single instance)

SageMaker分解机→RecordIO && float32 (SageMaker Factorization Machines → RecordIO && float32)

AWS批处理 (AWS Batch)

  • plans, schedules, and executes your batch computing workloads across the full range of AWS compute services and features, such as Amazon EC2 and Spot Instances.
    在整个AWS计算服务和功能(例如Amazon EC2和竞价型实例)中计划,计划和执行批处理计算工作负载。

复杂的工作流程? (complex workflow ?)

  • orderly executed → Step Functions
    有序执行→步骤功能
  • just scheduling ability, but no order required → AWS Batch
    仅具有计划功能,但无需订购→AWS Batch

学习率 (Learning Rate)

  • Too Large → overshoots true minima
    太大→超出实际最小值
  • Too Small → Slows down convergence, takes more time
    太小→降低收敛速度,需要更多时间

批量大小 (Batch Size)

  • Too Large → stuck at local minima
    太大→停留在局部最小值
  • Less Size → true minima
    较小的尺寸→真正的最小值

真正的最低要求是什么? (What is this true minima?)

  • when training usually we want it to perform less bad of one quality

    通常,当我们训练时,我们希望它表现出一种劣质的表现

  • that one quality → Loss Function
    那个质量→损失函数
  • that actually less bad → actual minimal bad. → actual minima → true minima
    那实际上更少的坏→实际上最小的坏。 →实际最小值→真实最小值

SageMaker Seq2Seq (SageMaker Seq2Seq)

  • machine translation

    机器翻译

  • we need to provide vocabulary files
    我们需要提供词汇文件
  • tokenize our words into integers

    我们的单词标记为整数

  • RecordIO-protobuf format with integer tokens
    具有整数标记的RecordIO-protobuf格式

您自己的通用语言翻译器? (Your own Universal Language Translator?)

  • You Speak in language 1 → AWS Transcribe → AWS Translate → AWS Polly speaks in language 2
    您以语言1说→AWS Transcribe→AWS Translate→AWS Polly以语言2说

炽热的文字 (BlazingText)

  • this is for sentiment analysis
    这是用于情绪分析
  • because only the sentiment analysis → order of words doesn’t matter
    因为只有情感分析→单词顺序无关紧要
  • Uses Skip-gram and CBOW-Continuous Bag Of Words
    使用Skip-gram和CBOW连续词袋
  • BlazingText doesn’t use LSTM or CNN
    BlazingText不使用LSTM或CNN

在用于神经网络之前,必须将分类特征转换为一元热的二进制表示形式。 (Categorical features need to be converted into one-hot, binary representations prior to use in a neural network.)

RDS,Elasticsearch和EMR都需要配置服务器。 (RDS, Elasticsearch, and EMR all require the provisioning of servers.)

S3,Glue,Athena和Quicksight都是无服务器解决方案。 (S3, Glue, Athena, and Quicksight are all serverless solutions.)

名人检测 (Celebrity Detection)

  • already trained model under the hood of AWS Rekognition

    已在AWS Rekognition的框架下训练过的模型

检测流中的某些异常? (to detect some anomaly on a stream?)

  • Kinesis Data Analytics has ❤️ a native Random Cut Forest algorithm, use that.
    Kinesis Data Analytics具有❤️本机的Random Cut Forest算法,请使用该算法。
  • Random Cut Forest is Amazon’s own algorithm for anomaly detection and is usually the right choice when anomaly detection is asked for on the exam. It is implemented within both Kinesis Data Analytics and SageMaker, but only Kinesis works in the way described.
    Random Cut Forest是Amazon自己的异常检测算法,通常是在考试中要求进行异常检测时的正确选择。 它在Kinesis Data Analytics和SageMaker中均已实现,但只有Kinesis可以按所述方式工作。

LSTM — RNN的特定种类,长期短期记忆 (LSTM — specific kind of RNN, Long Short Term Memory)

RNN (RNN)

  • feeds the same neuron(so named recurrent — reoccurring)
    喂养相同的神经元(所以称为复发性-重复发生)
  • if the depth of persistence of this feed that is fed → LSTM — long or short
    如果所喂入的这种喂食的持续深度→LSTM —长还是短

产生音乐。 ? (Generate Music. ?)

  • it is a time-series problem
    这是一个时序问题
  • use RNN
    使用RNN

Kinesis Firehose能够即时将JSON数据转换为Parquet或ORC格式。 (Kinesis Firehose has the ability to convert JSON data to Parquet or ORC format on the fly.)

当使用Parquet或ORC等列格式时,Athena的执行效率更高,成本更低, (Athena performs much more efficiently and at lower cost when using columnar formats such as Parquet or ORC,)

无服务器分析。 ? (Serverless Analytics. ?)

  • JSON Data input as Kinesis Streams
    JSON数据输入为Kinesis Streams
  • - send to Firehose
    -发送给Firehose
  • Supply to Kinesis Firehose
    供应给Kinesis Firehose
  • - convert to Parquet or ORC and load to S3
    -转换为Parquet或ORC并加载到S3
  • Athena queries from S3 using Glue Crawler and Glue Data Catalog and provides Analytics
    雅典娜使用Glue Crawler和Glue Data Catalog从S3查询并提供分析

AWS Rekognition可以立即识别图像中的常见对象。 (AWS Rekognition can identify common objects in images right out of the box.)

Comprehend可用于为帖子中的文本生成主题。 (Comprehend could be used to produce topics for the text in the posts.)

理解— RTF AWS NLP (Comprehend — RTF AWS NLP)

BlazingText —只是SageMaker上NLP的一种算法 (BlazingText — Just an Algorithm for NLP on SageMaker)

消失的梯度? (Vanishing Gradient?)

  • use ReLU
    使用ReLU

梯度消失的原因? (reasons for vanishing gradient?)

  • from multiplying together many small derivates of the sigmoid activation function in multiple layers
    将多层S型激活函数的许多小导数相乘

SageMaker Object2Vec与SageMaker BlazingText (SageMaker Object2Vec vs. SageMaker BlazingText)

  • both are algorithms
    两者都是算法
  • Object2Vec creates embeddings for arbitrary objects, like Tweets
    Object2Vec为任意对象(如推文)创建嵌入
  • BlazingText can only find relationships between words but not entire tweets
    BlazingText只能找到单词之间的关系,而不能找到整个推文

XGBoost实例类型? (XGBoost instance type?)

  • M4
    M4
  • XGBoost is a CPU-only algorithm
    XGBoost是仅CPU的算法
  • no benefit from GPUs
    无法从GPU中受益
  • GPU Type → P3 or P2
    GPU类型→P3或P2

实例类型-https: //aws.amazon.com/sagemaker/pricing/instance-types/ (Instance Types — https://aws.amazon.com/sagemaker/pricing/instance-types/)

  • GPU — Accelerated Computing
    GPU —加速计算
  • P, G
    P,G
  • CPU — Standard
    CPU —标准
  • M, T
    M,T
  • Memory-Optimized — Current generation
    内存优化-当前一代
  • R
    [R
  • Compute Optimized — Current generation
    优化计算-当前的一代
  • C
    C
  • Inference Accelerator
    推理加速器
  • another level
    另一个层面

非线性聚类解决方案 (Non — linear clustering solutions)

  • kNN
    神经网络
  • SVM + RBF
    支持向量机+ RBF
  • SVM — Simple Vector Machine
    SVM —简单的矢量机
  • RBF — Radial Basis Function
    RBF —径向基函数

离群值会使线性模型倾斜。 (Outliers can skew linear models.)

  • discard them by identifying as being outside some multiple of a standard deviation from the mean
    通过将其标识为与平均值相差某个标准偏差的倍数,将其丢弃

竞价型实例→EMR上的任务节点 (Spot Instances → task nodes on EMR)

重复数据删除? (Deduplication?)

  • Glue ETL — FindMatchesML ❤️ feature
    胶水ETL — FindMatchesML❤️功能

为一堆文本分配主题 (Assign topics for a bunch of texts)

  • LDA — Latent Dirichlet Allocation, Unsupervised Topic Modeling
    LDA —潜在Dirichlet分配,无监督主题建模
  • NTM — Neural Topic Model — SageMaker Algorithm
    NTM —神经主题模型— SageMaker算法

寻找话题 (find topics)

  • SageMaker LDA Algorithm
    SageMaker LDA算法
  • SageMaker NTM Algorithm
    SageMaker NTM算法
  • Amazon Comprehend also (this does sentiment and full)
    亚马逊还理解(这确实感悟和充分)

归咎于? (Imputation?)

  • If no outliers? → Mean
    如果没有异常值? →均值
  • If yes outliers? → Median
    如果是,则有异常值吗? →中位数

SageMaker的新模型可以在不影响客户的情况下进行测试吗? (SageMaker's new model can be tested without impact to customers?)

  • Yes
  • Production Variants — are made for this

    生产变型 —为此而制造

  • purpose like Tesla Shadow Mode
    特斯拉阴影模式

曲线 (Curves)

  • AUC — Area Under Curve
    AUC —曲线下面积
  • ROC — Receiver Operating Characteristic
    ROC —接收器工作特性
  • Good ROC will be curved up toward (0,1)
    好的ROC会向上弯曲(0,1)
  • Perfect AUC is 1.0
    完美的AUC为1.0

建议使用SageMaker Linear Learner改组 (Shuffling is recommended with SageMaker Linear Learner)

如何控制特定IAM组对SageMaker笔记本的访问? (how to control access to SageMaker notebooks to specific IAM Groups?)

  • put tags on SageMaker resources
    将标签放在SageMaker资源上
  • use ResourceTag conditions in IAM Policies to choose these tags of SageMaker instances

    使用IAM策略中的ResourceTag条件选择SageMaker实例的这些标签

由于数据集中的PII数据而在进行训练时进行完全加密? (Full Encryption while training due to PII data in the dataset?)

  • Inter-container encryption is just a checkbox away when creating a training job via the SageMaker console.
    通过SageMaker控制台创建培训作业时,容器间加密只是一个复选框。
  • It can also be specified using the SageMaker API with a little extra work
    也可以使用SageMaker API进行一些额外的工作来指定它

自定义推理容器要求? (Custom Inference Container requirements?)

  • Your inference container responds to port 8080, and

    您的推理容器响应port 8080 ,并且

  • must respond to ping requests in under 2 seconds.

    必须在2 seconds.响应ping请求2 seconds.

  • Model artifacts need to be compressed in tar format, not zip.

    模型工件需要以tar格式而不是zip压缩。

K-Means是不受监督的。 (摘自备忘录-KUM) (K-Means is unsupervised. (from memo — KUM))

  • to optimize?
    优化?
  • WSS is one way, also called an elbow method

    WSS是一种方法,也称为弯头方法

End.

结束。

翻译自: https://medium.com/swlh/cheat-sheet-for-aws-ml-specialty-certification-e8f9c88566ba

aws 认证

http://www.taodudu.cc/news/show-5843337.html

相关文章:

  • WPF制作的小型笔记本-仿有道云笔记
  • Word VBA批量格式转换:docx转pdf、doc、rtf、txt以及反向转换
  • FSNotes for Mac(纯文本笔记本管理器)
  • 纯文本笔记本管理器:FSNotes for Mac
  • 面试考点:session和cookie
  • TP5中Session
  • js操作session
  • session使用实例
  • Python—Session
  • Cookie 和 Session
  • Session详解(重点)
  • Cookie和Session的区别(面试必备)
  • python 使用pyqt5实现了一个汽车配件记录系统
  • 百炼智能店店通(车后版)亮相2021AMR北京国际汽保汽配展 开启门店渠道拓展新时代
  • 谈谈汽配的网络营销
  • 《汽修汽配管理系统——“汽修管理”模块》项目研发阶段性总结
  • 汽配行业数字化管理 一键完成订单流转+库存预警+绩效核算
  • 基于JavaEE的汽车配件管理系统_JSP网站设计_SqlServer数据库设计
  • java毕业设计汽配管理系统mybatis+源码+调试部署+系统+数据库+lw
  • java毕业设计汽配管理系统(附源码、数据库)
  • Java、JSP汽车4S店配件销售系统的设计
  • easyui 文本框 显示提示信息data-options=prompt:'格式:水箱支架-京东汽配店铺-图集(大图/图集6)'...
  • 『杭电1173』采矿
  • 采矿 HDU - 1173
  • hdu-1173采矿
  • 8F - 采矿
  • HDU1173 采矿
  • HDU 1173 采矿
  • 集群的概念
  • 集群无人机仿真及控制系统搭建简介

aws 认证_AWS ML专业认证备忘单相关推荐

  1. aws dynamodb_DynamoDB备忘单–您需要了解的有关2020 AWS认证开发人员助理认证的Amazon Dynamo DB的所有信息

    aws dynamodb The emergence of cloud services has changed the way we build web-applications. This in ...

  2. azure_Azure ML算法备忘单

    azure 云计算 , 机器学习 (Cloud Computing, Machine Learning) A common question often asked in Data Science i ...

  3. 资源 | AI、神经网络、机器学习、深度学习以及大数据学习备忘单

    向AI转型的程序员都关注了这个号☝☝☝ 以下是关于神经网络.机器学习.深度学习以及大数据学习的备忘单,其中部分内容和此前发布的<资源 | 值得收藏的 27 个机器学习的小抄>有所重复,大家 ...

  4. 机器学习性能改善备忘单

    原文地址:Machine Learning Performance Improvement Cheat Sheet  原文翻译与校对:@姜范波 && 寒小阳  时间:2016年12月. ...

  5. hp-ux 单用户 启动_UX备忘单:搜索与浏览

    hp-ux 单用户 启动 重点 (Top highlight) When designing search results and interest sites, you have to keep i ...

  6. Cheat—— 给Linux初学者和管理员一个终极命令行备忘单

    当你不确定你所运行的命令,尤其是那些使用了许多选项的复杂命令时,你会怎么做?在这种情况下,我们使用man pages来获取帮助.还有一些其它的选择可能包括像'help','whereis'和'what ...

  7. 为什么ui框架设计成单线程_评估UI设计的备忘单

    为什么ui框架设计成单线程 Whether you're evaluating your design proposals or giving feedback to a colleague duri ...

  8. eazy ui 复选框单选_UI备忘单:单选按钮,复选框和其他选择器

    eazy ui 复选框单选 重点 (Top highlight) Pick me! Pick me! No, pick me! In today's cheat sheet we will be lo ...

  9. c# ui 滚动 分页_UI备忘单:分页,无限滚动和“加载更多”按钮

    c# ui 滚动 分页 重点 (Top highlight) When you have a lot of content, you have to rely on one of these thre ...

最新文章

  1. CentOS 5.4 rsync+inotify配置触发式(实时)文件远程同步
  2. 超过100G的CVPR 2020 图像匹配挑战赛数据下载!
  3. win10你的组织已关闭自动更新问题怎么解决?
  4. 镜像的查看,获取,推送和构建
  5. data-mask遮罩无法正常显示与编辑的问题
  6. day02.1 爬取豆瓣网电影信息
  7. 8.卷2(进程间通信)---读写锁
  8. Solr 4.10.3 schema.xml 域类型详解
  9. 用vins_mono运行kitti(raw data)数据集并用evo评估。
  10. nohup java_nohup
  11. is与==的恩怨、编码的详解
  12. Laravel 使用 Entrust 实现 RBAC
  13. 致 Embarcadero 客户及经销伙伴信函
  14. 超级跑车法拉利的历史
  15. 如何在photoshop里画虚线
  16. JS 区分+0和-0
  17. 个人中心(修改密码)
  18. 排序:归并排序(C)
  19. Netty框架之TCP粘包/半包解决方案
  20. Mybatis 子查询

热门文章

  1. 施金源:6.19下周黄金价格走势预测?及下周黄金操作建议
  2. 趋势前沿 | 达摩院语音 AI 最新技术大全
  3. 【密码学】Python用零知识证明实现地图三染色问题
  4. spring事务操作Transaction
  5. redux 基础使用-数据持久化 v18
  6. 热血三国1年2500w
  7. 查看与电脑连接的设备有哪些
  8. c语言冒泡排序(c语言冒泡排序法详解)
  9. 群表示论之frobenius互反律和诱导特征标
  10. 【蓝牙】一文入门Bluez的BLE基础开发 - BLE数据收发(Python)