java转售前

TL;DR: Implementing a pre-trained VGG16 model to perform price prediction on popular sneaker models (or any footwear for that matter!), obtaining a final test validation loss of 34k, representing an average prediction error of $184 or ~30% error. GAN generated images at the end.

TL; DR :实施预训练的VGG16模型以对流行的运动鞋模型(或与此有关的任何鞋子!)进行价格预测,获得最终测试验证损失34k,这表示平均预测误差为$ 184或〜30%误差。 GAN最终生成了图像。

Part 0: Introduction

第0部分:简介

The night before the release of any sneaker, resellers and enthusiasts rejoice in online forums or chat groups discussing the potential resell of the next day’s release, from Air Jordans, to Yeezys, to any big name collaborations, you can count on sneaker reselling to play a major part in the current streetwear culture. The problem however, lies in the fact that there isn’t really a reliable method to accurately and quantitatively gauge resell, as most predictions are based on factors like the level of hype or similar past releases; and although the sneaker community has become quite proficient (most of the time) at predicting whether or not a sneaker will have resell, estimations of the actual resell price is still anyone’s guess.

在发售任何运动鞋的前一天,经销商和发烧友在在线论坛或聊天组中欢欣鼓舞,讨论了第二天发布的潜在转售,从Air Jordans到Yeezys,再到任何知名品牌合作公司,您都可以指望转售的运动鞋在当今街头服饰文化中占据重要地位。 然而,问题在于,实际上并没有可靠的方法来准确,定量地评估转售,因为大多数预测都是基于诸如炒作水平或类似过往发行量之类的因素。 尽管运动鞋界(大部分时间)已经非常熟练地预测运动鞋是否会转售,但实际转售价格的估算仍是每个人的猜测。

As somebody who loves working with data as well as being a long time sneaker enthusiast, I decided to use machine learning to take some of the guess work out of sneaker resell predictions.

作为一个喜欢使用数据并热爱运动鞋的人,我决定使用机器学习从运动鞋转售预测中排除一些猜测。

My Approach: Utilizing data from popular fashion and resell platforms (StockX, Farfetch, etc.), I wanted to train a deep learning model to perform price regression on sneaker image data using sale price as the label. The first task of this project was to crawl the web to scraping image and price data of various sneakers (or men’s footwear in general), then to use this data to train a CNN to perform price regression.

我的方法:利用流行的时尚和转售平台(StockX,Farfetch等)中的数据,我想训练一种深度学习模型,以销售价格为标签对运动鞋图像数据进行价格回归。 这个项目的第一个任务是抓取网络以抓取各种运动鞋(或一般来说是男鞋)的图像和价格数据,然后使用此数据训练CNN进行价格回归。

I also have some fun generating unique footwear silhouettes using GANs to figure out what a computer sees as distinguishing features between cheaper (< $500) and more expensive (>$500) shoes.

我还从使用GAN生成独特的鞋类轮廓中获得乐趣,以找出计算机认为便宜(<$ 500)和更昂贵(> $ 500)鞋子之间的区别。

Part 1: The Data

第1部分:数据

Using custom web crawlers to crawl StockX and Farfetch, a total of ~13000 unique (for the most part) sneaker images along with their prices were collected, 863 from StockX comprised mostly of sneakers, and 12,034 from Farfetch, comprised of sneakers as well as general men’s footwear. A random sample of the data can be seen here:

使用自定义的Web搜寻器来搜寻StockX和Farfetch,总共收集了约13000个唯一的(大部分)运动鞋图像及其价格,StockX的863(主要由运动鞋组成)和12,034的Farfetch(由运动鞋以及一般男鞋。 可以在此处看到数据的随机样本:

Sample of the training image data
训练图像数据样本

As you can see, the dataset is comprised of a good mix between sneakers, high-fashion footwear, and everyday footwear. Next, let’s have a look at the price labels associated with our images:

如您所见,数据集由运动鞋,高级时装鞋和日常鞋类之间的良好组合组成。 接下来,让我们看一下与图片相关的价格标签:

Raw data statistics
原始数据统计

Prior to any data cleaning, we can see that the price labels are heavily right skewed with several really high priced sneakers going as high as $38538, and from a quick statistical check, we can calculate that prices over $1367 are considered outliers in my particular dataset. My initial decision was to keep these prices in the data, as they represented actual sneaker resell prices and perhaps my model would learn to predict even the highest of resells. However, from my first initial attempts at training, my model was converging on average errors of $600+, which is even worse than just predicting the mean each time. After some data cleaning and removing the outlier prices:

在进行任何数据清理之前,我们可以看到价格标签严重偏斜,一些真正高价的运动鞋价格高达38538美元,并且通过快速统计检查,我们可以计算出在我的特定数据集中,超过1367美元的价格被认为是离群值。 我最初的决定是将这些价格保留在数据中,因为它们代表了运动鞋的实际转售价格,也许我的模型将学会预测甚至最高的转售价格。 但是,从我最初的训练尝试开始,我的模型就收敛在600美元以上的平均误差上,这甚至比每次预测均值还差。 经过一些数据清理并除去异常价格:

Clean data statistics
清理数据统计

By removing 537 image/price pairs from my dataset, the distribution becomes much more apparent, albeit still right skewed, and overall training performance has also increased significantly (more on this later), therefore I am comfortable removing ~4% of my data for a huge boost in prediction accuracy. Furthermore, I think the argument can be made that most sneakers on average will resell for under $1000 (based on my data), therefore, I believe it’s much more valuable to train a model to strongly predict within this range, than to poorly predict a larger price range.

通过从数据集中删除537个图像/价格对,分布变得更加明显,尽管仍然正确偏斜,并且整体训练效果也有了显着提高(稍后会详细介绍),因此我很乐意删除〜4%的数据大大提高了预测准确性。 此外,我认为可以得出这样的论据,即大多数运动鞋的平均转售价将低于1000美元(根据我的数据),因此,我认为训练模型以在该范围内进行强烈预测比对模型进行较差的预测更为有价值。价格范围更大。

Part 2: The Model

第2部分:模型

After an exhaustive process of hyperparameter tuning and model testing, I found VGG16 to converge on the lowest validation loss out of other comparable models including VGG19, ResNet(10/50/101), DenseNet121 and InceptionV3. I found that the single most important factor which helped the model drastically improve was the quality of my dataset, such that if I didn’t remove the 537 outlier prices, my model converged at a validation loss of 140k (approximately $374 in error):

经过详尽的超参数调整和模型测试过程,我发现VGG16收敛于其他验证模型(包括VGG19,ResNet(10/50/101),DenseNet121和InceptionV3)中最低的验证损失。 我发现,能够最大程度地改善模型的最重要因素是我的数据集的质量,因此,如果我不删除537个离群价格,我的模型将收敛于140k的验证损失(错误约为374美元):

Model learn curve before removing outliers, convergence at ~140k
在去除离群值之前对学习曲线进行建模,收敛约140k

However after removing these labels, I was able to achieve convergence at 35k ($187 error) with a test loss of 34k representing an average prediction error of $184 or ~30% error:

但是,删除这些标签后,我能够在35k(错误187美元)下实现收敛,并且测试损失为34k,这表示平均预测错误为184美元或〜30 %错误

Model learn curve after removing outliers, convergence at ~35k
去除异常值后建模学习曲线,收敛时间约为35k

In terms of other parameters, I employed a learning rate of 0.001 with an LR scheduler to help with convergence and a batch size of 64 (this was as high as I could go without the GPU exploding). My train/validation/test split was 70:15:15, and used MSE Loss as my regression loss function with model convergence at ~60 epochs. I found that both batch normalization and unfreezing the weights to not show any significant improvements to the validation and test errors for my dataset.

在其他参数方面,我使用LR调度程序的学习率为0.001,以帮助实现收敛,并且批处理大小为64(这在不引起GPU爆炸的情况下可以达到的最高)。 我的训练/验证/测试划分是70:15:15,并使用MSE损失作为我的回归损失函数,模型收敛时间约为60个纪元。 我发现批量归一化和解冻权重均未显示出对我的数据集的验证和测试错误的任何重大改进。

Now let’s test the model to perform some price inferences! All of the shoes tested were released after the initial training data were collected so there won’t be any concerns for data leakage.

现在,让我们测试模型以执行一些价格推断! 在收集了最初的训练数据之后,所有测试过的鞋子都被放行,因此不会担心数据泄漏。

Starting with Air Jordan 4 and Union LA collaboration in the “Off Noir” colorway released on Aug.29, 2020, my model predicted the price to be:

从Air Jordan 4和Union LA合作以2020年8月29日发布的“ Off Noir”配色开始,我的模型预测价格为:

Not quite hitting the mark, but definitely within an acceptable range. Now let’s look at a less hyped shoe, the Jordan 7 Retro Greater China released on Sept. 5th 2020:

尚未达到目标,但绝对在可接受的范围内。 现在让我们看一下一款不太流行的鞋子,即2020年9月5日发布的Jordan 7 Retro大中华区:

This test wouldn’t be complete without at least one Yeezy, so let’s look at one of the newer Yeezy models, the Adidas YZY QNTM:

没有至少一个Yeezy,这项测试将无法完成,因此,让我们看一下一种较新的Yeezy型号,即阿迪达斯YZY QNTM:

Conclusion: I implemented a pre-trained VGG16 model to perform price prediction on popular sneaker models (or any footwear for that matter!), obtaining a final test validation loss of 34k, representing an average prediction error of $184 or ~30% error. Yes 30% is higher than what I hoped the error to be, especially for resellers looking to maximize on profit, however I believe this data-based approach is still more reliable and accurate than guessing, and could point people in the right direction in terms of purchasing decisions.

结论:我实施了预训练的VGG16模型,以便对流行的运动鞋模型(或与此有关的任何鞋子!)进行价格预测,最终测试验证损失为34,000k,这表示平均预测误差为$ 184或〜30%误差。 是的,比我希望的错误高出30%,尤其是对于寻求最大利润的转售商,但是我相信这种基于数据的方法比猜测的结果更可靠,更准确,并且可以使人们在正确的方向上购买决策。

Part 3: Fun with GANs!!

第3部分:与GAN一起玩!

Since I collected a bunch of sneaker images, I thought it would be interesting to train a GAN model to generate unique sneaker silhouettes. In particular, I wanted to see what were the distinguishing features between cheaper (<$500) and more expensive (>$500) shoes. Here’s what the images converged to after 220 epochs:

由于我收集了一堆运动鞋图像,因此我认为训练GAN模型以生成独特的运动鞋轮廓会很有趣。 特别是,我想看看便宜的鞋子(<$ 500)和更贵的鞋子(> $ 500)有什么区别。 以下是220个纪元后图像收敛的结果:

Within the cheaper category, we see more silhouettes associates with general athletic shoes, ranging from runners to skate shoes to basketball sneakers. Furthermore, Nike and Adidas dominate a large portion of shoes in this category as the GAN was able to pick on on the Nike swoosh and the Adidas three stripes. Lastly, shoes within this category appear to be more colorful.

在较便宜的类别中,我们看到更多与一般运动鞋相关的轮廓,从跑步者到滑板鞋再到篮球运动鞋。 此外,耐克和阿迪达斯在该类别的鞋子中占主导地位,因为GAN可以选择耐克耐克和阿迪达斯的三个条纹。 最后,该类别的鞋子看起来更鲜艳。

The most obvious feature of the expensive category seems to be the use of leather, as the majority of the shoes seem to be either dress shoes or leather boots. Colors are also more muted and neutral consisting mostly of black, white and beige. It’s interesting to note that the two white sneakers on the 3rd row resemble the popular Gucci sneaker silhouette. I also note that the dataset for this category consisted of 2000 less images less than the cheap sneaker dataset, which may result in less represented images.

昂贵类别中最明显的特征似乎是皮革的使用,因为大多数鞋子似乎是正装鞋或皮靴。 颜色也更加柔和和中性,主要由黑色,白色和米色组成。 有趣的是,第三排的两个白色运动鞋与流行的Gucci运动鞋轮廓相似。 我还注意到,该类别的数据集包含的图像比廉价的运动鞋数据集少2000幅图像,这可能会导致较少的图像代表。

Source code and data available on my Github:

我的Github上可用的源代码和数据:

Webcrawlers | Price Estimator | GAN

网络爬虫| 价格估算器| 甘

翻译自: https://medium.com/swlh/predicting-sneaker-resell-with-deep-learning-d3a78b144099

java转售前


http://www.taodudu.cc/news/show-3473475.html

相关文章:

  • 干货 | 数据库专家C.Mohan——人工智能的前世今生
  • What is a service mesh? And why do I need one?
  • Bugku_crypto部分wp(持续更新中)
  • Bugku CTF 密码学刷题
  • 英语影视台词---一、少年派的奇幻漂流
  • CTF BugKu平台——Crypto篇刷题记录(后续更新)
  • 为什么深度学习与机器学习完全不同?
  • Hyper-V虚拟机启动时报“账户没有足够的权限打开VHD文件”原因及解决方法
  • 业余软件开发_我需要在业余时间编码才能成为一名优秀的开发人员
  • 让你的应用使用周期更长
  • 某公司软件开发工程师孙工,作息规律为上三天班,休息一天,经常不确定休息日 是否周末,为此,请你开发一个程序,当孙工输入年及月,以日历方式显示对应月 份的休息日,用中括号进行标记.同时,统计出本月有几天
  • 最佳作息时间
  • 健康作息计划
  • 2018年作息及读书计划
  • Java作业——找到休息日
  • 任务卡_03-Java核心类库_第2节 常用类库
  • 正确的作息时间表
  • 2022中国作息报告
  • java日期类练习--打印日历+寻找休息日
  • JAVA 技术方向支线任务-找到休息日
  • 作息终于规律了
  • 我为什么越来越喜欢规律作息
  • 人应该遵守的每日作息规律
  • 哈佛学生作息时间
  • 坚持#第148天~请遵守每天作息规律养成良好习惯
  • 规律的作息
  • 社工入门之如何分析一个人的作息规律
  • 是否要规律作息的思考
  • 某公司软件开发工程师孙工,作息规律为上三天班,休息一天,经常不确定休 息日是否周末,为此,请你开发一个程序,当孙工输入年及月,以日历方式显示对 应月份的休息日,用中括号进行标记.同时,统计出本月有几天
  • 个人生活作息规律

java转售前_通过深度学习预测运动鞋转售相关推荐

  1. 深度学习:在图像上找到手势_使用深度学习的人类情绪和手势检测器:第1部分

    深度学习:在图像上找到手势 情感手势检测 (Emotion Gesture Detection) Has anyone ever wondered looking at someone and tri ...

  2. 深度学习将灰度图着色_通过深度学习为视频着色

    深度学习将灰度图着色 零本地设置/ DeOldify / Colab笔记本 (Zero Local Setup / DeOldify / Colab Notebook) "Haal Kais ...

  3. 深度学习模型建立过程_所有深度学习都是统计模型的建立

    深度学习模型建立过程 Deep learning is often used to make predictions for data driven analysis. But what are th ...

  4. 【金融】【pytorch】使用深度学习预测期货收盘价涨跌——LSTM模型构建与训练

    [金融][pytorch]使用深度学习预测期货收盘价涨跌--LSTM模型构建与训练 LSTM 创建模型 模型训练 查看指标 LSTM 创建模型 指标函数参考<如何用keras/tf/pytorc ...

  5. 总结IT售前多年工作经验,给各位面试售前岗位同仁一些建议。

    本人曾在外贸ERP龙头企业.电信行业龙头企业.外包性质企业及安防集成企业均有干过开发/项目经理/售前经理岗位,总结10来年工作经验,给未来求职者一些建议,可供参考: 以表格为初步判断依据,您适合选择什 ...

  6. 运用深度学习预测肺癌

    运用深度学习预测肺癌 原文:Forecasting Lung Cancer Diagnoses with Deep Learning 注:本文为The Data Science Bowl (DSB) ...

  7. 用深度学习预测世界杯胜率,有多大把握?

    四年一届的世界杯今天在战斗民族俄罗斯开幕! 迷足球的小伙伴们一定要开启通宵看球模式了 当然,除了看过程,结果想必也是球迷们关注的焦点 前有章鱼哥神算预测胜负,最近人工智能这么火,是不是也可以预测呢? ...

  8. 【金融】【pytorch】使用深度学习预测期货收盘价涨跌——全连接神经网络模型构建与训练

    [金融][pytorch]使用深度学习预测期货收盘价涨跌--全连接神经网络模型构建与训练 模型构建与训练 模型构建与训练 def get_accuracy(SR,GT,threshold=0.5):S ...

  9. AI技术在气象领域应用方法:GFS数值模式的风速预报订正、台风预报数据智能订正、机器学习预测风电场的风功率、深度学习预测浅水方程模式、LSTM方法预测ENSO、深度学习convLSTM

    查看原文>>>Python人工智能在气象中的应用 Python是功能强大.免费.开源,实现面向对象的编程语言,在数据处理.科学计算.数学建模.数据挖掘和数据可视化方面具备优异的性能, ...

最新文章

  1. 关于NB-IoT,没有比这篇更通俗易懂的啦!
  2. KiCAD初学者指南
  3. java多线程抽奖_java 线程池、多线程并发实战(生产者消费者模型 1 vs 10) 附案例源码...
  4. 第二周 表格、字典、元组、集合 知识点
  5. POI实现Excel导入时提示NoSuchMethodError: org.apache.poi.util.POILogger.log
  6. linux下硬盘的安装及分区fdisk
  7. notepad比对文本_仵航说 notepad++怎么对比文件 仵老大
  8. 如何使用Windows搜索在任何文件中搜索文本
  9. python内置模块有哪些_python中那些小众但有用的内置模块
  10. Python字符串处理小案例
  11. LeetCode(455)——分发饼干(JavaScript)
  12. C语言百叶窗动画效果算法,用vb实现“百叶窗”的图形特效_visualbasic教程
  13. Python 中list中所有值加和_深入认识Python中的itertools模块-Python教程
  14. spring boot学生课程考试系统的设计与实现毕业设计源码171548
  15. 自考 软件工程专业 07169 软件开发工具
  16. ASP.NET身份验证和授权,使用cookie和Claims认证
  17. 空间句法(二)——Axwoman 6.0
  18. (新)Chrome浏览器自定义背景插件
  19. 《电感元器件》的特性分析
  20. storage/emulated/0.到底在哪儿

热门文章

  1. 启用tim无法访问文件夹_如何在三星手机上启用安全文件夹
  2. 基于“FFD形变+梯度下降优化”图像配准的一种加速方法
  3. 图像配准系列之基于FFD形变与粒子群算法的图像配准
  4. C语言---输出九九乘法表
  5. 100 个网络基础知识普及(上)
  6. 游戏设计艺术——透镜看《英雄联盟》 2
  7. Ubuntu MongoDB 安装及简单使用
  8. 南京工业大学乐学python答案_铁乐学python_day01-作业
  9. 读《汽车构造》第四版
  10. word内容和纸张方向一起旋转的方法