引言

Quick Draw Dataset 是一个包含345 个类别的 5000 万幅绘图的集合,由游戏Quick, Draw! 的玩家贡献!. 这些绘图被捕获为带时间戳的矢量,并用元数据标记,包括要求玩家绘制的内容以及玩家所在的国家/地区。您可以在quickdraw.withgoogle.com/data上浏览已识别的图纸。

因为我需要对简笔画之类的数据集做分类,但是我手上的数据集太小,就需要大量的数据集做个大模型,然后在微调

数据集链接:wonderking/QuickDraw | 格物钛,非结构化数据平台Graviti提供基于SAAS模式的一站式数据管理,数据可视化,数据集供应,公开数据集获取,数据标注,数据使用的AI数据平台。https://gas.graviti.cn/dataset/wonderking/QuickDraw

正文

步骤:ndjson转可以先转换成json,然后json在转换成png。

我会先把单个的代码展示出来,最后有完整的代码逻辑。

1、ndjson转json

var fs = require('fs');
var ndjson = require('ndjson'); // npm install ndjsonfunction parseSimplifiedDrawings(fileName, callback) {var drawings = [];var fileStream = fs.createReadStream(fileName)fileStream.pipe(ndjson.parse()).on('data', function (obj) {drawings.push(obj)}).on("error", callback).on("end", function () {callback(null, drawings)});
}function tojson(filename) { //airplane.ndjsonvar list = filename.split(".")parseSimplifiedDrawings("D:\\my_py\\data\\QuickDrawsimplified\\" + filename, function (err, drawings) {if (err) return console.error(err);drawings.forEach(function (d) {// Do something with the drawingconsole.log(d.key_id, d.countrycode);})console.log("# of drawings:", drawings);var filename = "D:\\my_py\\data\\jsons\\"+list[0]+".json";//这里保存fs.writeFileSync(filename, JSON.stringify(drawings));//这里保存})
}

要运行上面的js文件,需要先安装nodejs,可以上网查教程。

需要调用tojson函数,传入ndjison文件的地址,这个函数就会把ndjison转换为json文件。别忘了改保存json的地址。

2、json转png

f = open("D:\\my_py\\data\\jsons\\"+ list[0] + ".json")
setting = json.load(f)for j in range(0, 200):  # 转化保存1000个图for i in range(0, len(setting[j]['drawing'])):x = setting[j]['drawing'][i][0]y = setting[j]['drawing'][i][1]f = interpolate.interp1d(x, y, kind="slinear")  # 线性插值pl.plot(x, y, 'k')ax = pl.gca()  # 一个猫的所有线条画一起ax.xaxis.set_ticks_position('top')  # convert x,没有ax这几句猫就反着了ax.invert_yaxis()pl.axis('off')pl.savefig("D:\\my_py\\data\\images\\"+list[0]+"\\"+list[0]+"%d.png" % j)  # 保存位置pl.close()  # 不关闭的话所有图都画一起了

f为打开的json文件,按照这个流程走,就可以转换成为png文件了。

完整代码

json_to_imgs.py的逻辑是:

通过读List.txt里面的文件名,拼接成ndjson文件的路径,先转换成json,然后转换成img图片。一共有345类,每类1000张图片。这里我只转换了前200张。大概用了一个多小时。

imageTansform.py是将白底黑字的图片转换成黑底白字的图片,看需求而定。

运行前需要先安装一些依赖:

node.js,自己查教程安装
pip install matplotlib
pip install pillow
pip install PyExecJS
pip install scipy
npm install ndjson

json_to_imgs.py

import json
from scipy import interpolate  # pip install scipy
import pylab as pl  # pip install matplotlib
import execjs  # pip install PyExecJS
import osdef js_from_file(file_name):"""读取js文件:return:"""with open(file_name, 'r', encoding='UTF-8') as file:result = file.read()return resultif __name__ == '__main__':with open("List.txt", "r") as f:for line in f.readlines():filename = line.strip('\n')  # 去掉列表中每一个元素的换行符list = filename.split('.')# 编译加载js字符串ndjson_to_json = execjs.compile(js_from_file('ndjson_to_json.js'))try:ndjson_to_json.call("tojson", filename)except:passf = open("D:\\my_py\\data\\jsons\\"+ list[0] + ".json")  # json文件所在绝对路径if os.path.exists("E:\\data\\quickDraw\\images\\" + list[0]) is False:os.mkdir("D:\\my_py\\data\\images\\"+list[0])setting = json.load(f)for j in range(0, 200):  # 转化保存1000个图for i in range(0, len(setting[j]['drawing'])):x = setting[j]['drawing'][i][0]y = setting[j]['drawing'][i][1]f = interpolate.interp1d(x, y, kind="slinear")  # 线性插值pl.plot(x, y, 'k')ax = pl.gca()  # 一个猫的所有线条画一起ax.xaxis.set_ticks_position('top')  # convert x,没有ax这几句猫就反着了ax.invert_yaxis()pl.axis('off')pl.savefig("D:\\my_py\\data\\images\\"+list[0]+"\\"+list[0]+"%d.png" % j)  # 保存位置pl.close()  # 不关闭的话所有图都画一起了

ndjson_to_json.js

var fs = require('fs');
var ndjson = require('ndjson'); // npm install ndjsonfunction parseSimplifiedDrawings(fileName, callback) {var drawings = [];var fileStream = fs.createReadStream(fileName)fileStream.pipe(ndjson.parse()).on('data', function (obj) {drawings.push(obj)}).on("error", callback).on("end", function () {callback(null, drawings)});
}function tojson(filename) { //airplane.ndjsonvar list = filename.split(".")parseSimplifiedDrawings("D:\\my_py\\data\\QuickDrawsimplified\\" + filename, function (err, drawings) {if (err) return console.error(err);drawings.forEach(function (d) {// Do something with the drawingconsole.log(d.key_id, d.countrycode);})console.log("# of drawings:", drawings);var filename = "D:\\my_py\\data\\jsons\\"+list[0]+".json";//这里保存fs.writeFileSync(filename, JSON.stringify(drawings));//这里保存})
}

List.txt

aircraft carrier.ndjson
airplane.ndjson
alarm clock.ndjson
ambulance.ndjson
angel.ndjson
animal migration.ndjson
ant.ndjson
anvil.ndjson
apple.ndjson
arm.ndjson
asparagus.ndjson
axe.ndjson
backpack.ndjson
banana.ndjson
bandage.ndjson
barn.ndjson
baseball bat.ndjson
baseball.ndjson
basket.ndjson
basketball.ndjson
bat.ndjson
bathtub.ndjson
beach.ndjson
bear.ndjson
beard.ndjson
bed.ndjson
bee.ndjson
belt.ndjson
bench.ndjson
bicycle.ndjson
binoculars.ndjson
bird.ndjson
birthday cake.ndjson
blackberry.ndjson
blueberry.ndjson
book.ndjson
boomerang.ndjson
bottlecap.ndjson
bowtie.ndjson
bracelet.ndjson
brain.ndjson
bread.ndjson
bridge.ndjson
broccoli.ndjson
broom.ndjson
bucket.ndjson
bulldozer.ndjson
bus.ndjson
bush.ndjson
butterfly.ndjson
cactus.ndjson
cake.ndjson
calculator.ndjson
calendar.ndjson
camel.ndjson
camera.ndjson
camouflage.ndjson
campfire.ndjson
candle.ndjson
cannon.ndjson
canoe.ndjson
car.ndjson
carrot.ndjson
castle.ndjson
cat.ndjson
ceiling fan.ndjson
cell phone.ndjson
cello.ndjson
chair.ndjson
chandelier.ndjson
church.ndjson
circle.ndjson
clarinet.ndjson
clock.ndjson
cloud.ndjson
coffee cup.ndjson
compass.ndjson
computer.ndjson
cookie.ndjson
cooler.ndjson
couch.ndjson
cow.ndjson
crab.ndjson
crayon.ndjson
crocodile.ndjson
crown.ndjson
cruise ship.ndjson
cup.ndjson
diamond.ndjson
dishwasher.ndjson
diving board.ndjson
dog.ndjson
dolphin.ndjson
donut.ndjson
door.ndjson
dragon.ndjson
dresser.ndjson
drill.ndjson
drums.ndjson
duck.ndjson
dumbbell.ndjson
ear.ndjson
elbow.ndjson
elephant.ndjson
envelope.ndjson
eraser.ndjson
eye.ndjson
eyeglasses.ndjson
face.ndjson
fan.ndjson
feather.ndjson
fence.ndjson
finger.ndjson
fire hydrant.ndjson
fireplace.ndjson
firetruck.ndjson
fish.ndjson
flamingo.ndjson
flashlight.ndjson
flip flops.ndjson
floor lamp.ndjson
flower.ndjson
flying saucer.ndjson
foot.ndjson
fork.ndjson
frog.ndjson
frying pan.ndjson
garden hose.ndjson
garden.ndjson
giraffe.ndjson
goatee.ndjson
golf club.ndjson
grapes.ndjson
grass.ndjson
guitar.ndjson
hamburger.ndjson
hammer.ndjson
hand.ndjson
harp.ndjson
hat.ndjson
headphones.ndjson
hedgehog.ndjson
helicopter.ndjson
helmet.ndjson
hexagon.ndjson
hockey puck.ndjson
hockey stick.ndjson
horse.ndjson
hospital.ndjson
hot air balloon.ndjson
hot dog.ndjson
hot tub.ndjson
hourglass.ndjson
house plant.ndjson
house.ndjson
hurricane.ndjson
ice cream.ndjson
jacket.ndjson
jail.ndjson
kangaroo.ndjson
key.ndjson
keyboard.ndjson
knee.ndjson
knife.ndjson
ladder.ndjson
lantern.ndjson
laptop.ndjson
leaf.ndjson
leg.ndjson
light bulb.ndjson
lighter.ndjson
lighthouse.ndjson
lightning.ndjson
line.ndjson
lion.ndjson
lipstick.ndjson
lobster.ndjson
lollipop.ndjson
mailbox.ndjson
map.ndjson
marker.ndjson
matches.ndjson
megaphone.ndjson
mermaid.ndjson
microphone.ndjson
microwave.ndjson
monkey.ndjson
moon.ndjson
mosquito.ndjson
motorbike.ndjson
mountain.ndjson
mouse.ndjson
moustache.ndjson
mouth.ndjson
mug.ndjson
mushroom.ndjson
nail.ndjson
necklace.ndjson
nose.ndjson
ocean.ndjson
octagon.ndjson
octopus.ndjson
onion.ndjson
oven.ndjson
owl.ndjson
paint can.ndjson
paintbrush.ndjson
palm tree.ndjson
panda.ndjson
pants.ndjson
paper clip.ndjson
parachute.ndjson
parrot.ndjson
passport.ndjson
peanut.ndjson
pear.ndjson
peas.ndjson
pencil.ndjson
penguin.ndjson
piano.ndjson
pickup truck.ndjson
picture frame.ndjson
pig.ndjson
pillow.ndjson
pineapple.ndjson
pizza.ndjson
pliers.ndjson
police car.ndjson
pond.ndjson
pool.ndjson
popsicle.ndjson
postcard.ndjson
potato.ndjson
power outlet.ndjson
purse.ndjson
rabbit.ndjson
raccoon.ndjson
radio.ndjson
rain.ndjson
rainbow.ndjson
rake.ndjson
remote control.ndjson
rhinoceros.ndjson
rifle.ndjson
river.ndjson
roller coaster.ndjson
rollerskates.ndjson
sailboat.ndjson
sandwich.ndjson
saw.ndjson
saxophone.ndjson
school bus.ndjson
scissors.ndjson
scorpion.ndjson
screwdriver.ndjson
sea turtle.ndjson
see saw.ndjson
shark.ndjson
sheep.ndjson
shoe.ndjson
shorts.ndjson
shovel.ndjson
sink.ndjson
skateboard.ndjson
skull.ndjson
skyscraper.ndjson
sleeping bag.ndjson
smiley face.ndjson
snail.ndjson
snake.ndjson
snorkel.ndjson
snowflake.ndjson
snowman.ndjson
soccer ball.ndjson
sock.ndjson
speedboat.ndjson
spider.ndjson
spoon.ndjson
spreadsheet.ndjson
square.ndjson
squiggle.ndjson
squirrel.ndjson
stairs.ndjson
star.ndjson
steak.ndjson
stereo.ndjson
stethoscope.ndjson
stitches.ndjson
stop sign.ndjson
stove.ndjson
strawberry.ndjson
streetlight.ndjson
string bean.ndjson
submarine.ndjson
suitcase.ndjson
sun.ndjson
swan.ndjson
sweater.ndjson
swing set.ndjson
sword.ndjson
syringe.ndjson
t-shirt.ndjson
table.ndjson
teapot.ndjson
teddy-bear.ndjson
telephone.ndjson
television.ndjson
tennis racquet.ndjson
tent.ndjson
The Eiffel Tower.ndjson
The Great Wall of China.ndjson
The Mona Lisa.ndjson
tiger.ndjson
toaster.ndjson
toe.ndjson
toilet.ndjson
tooth.ndjson
toothbrush.ndjson
toothpaste.ndjson
tornado.ndjson
tractor.ndjson
traffic light.ndjson
train.ndjson
tree.ndjson
triangle.ndjson
trombone.ndjson
truck.ndjson
trumpet.ndjson
umbrella.ndjson
underwear.ndjson
van.ndjson
vase.ndjson
violin.ndjson
washing machine.ndjson
watermelon.ndjson
waterslide.ndjson
whale.ndjson
wheel.ndjson
windmill.ndjson
wine bottle.ndjson
wine glass.ndjson
wristwatch.ndjson
yoga.ndjson
zebra.ndjson
zigzag.ndjson

imageTansform.py

import os
from PIL import Image # pip install pillowdef Convert(str):"""将图像中白色像素转变为黑色像素"""root = "D://my_py//data//image20//" + strfiles = [f for f in os.listdir(root)]for filename in files:img = Image.open(root + '/' + filename)img = img.convert("RGBA")pixdata = img.load()for y in range(img.size[1]):for x in range(img.size[0]):pixdata[x, y] = 255 - pixdata[x, y][0], \255 - pixdata[x, y][1], \255 - pixdata[x, y][2]if not os.path.exists("D:\\my_py\\data\\image20_tra\\" + str):os.mkdir("D:\\my_py\\data\\image20_tra\\" + str)img.save("D:\\my_py\\data\\image20_tra\\" + str + "\\" + filename)if __name__ == "__main__":with open("List20.txt", "r") as f:for line in f.readlines():filename = line.strip('\n')  # 去掉列表中每一个元素的换行符line = filename.split('.')Convert(str(line[0]))
pass

图片数据集

345类,每类200张图片

wonderking/QuickDraw | 格物钛,非结构化数据平台Graviti提供基于SAAS模式的一站式数据管理,数据可视化,数据集供应,公开数据集获取,数据标注,数据使用的AI数据平台。https://gas.graviti.cn/dataset/wonderking/QuickDraw

参考文献:

https://zhuanlan.zhihu.com/p/40903937https://zhuanlan.zhihu.com/p/40903937

将QuickDraw数据集ndjson转为png图片相关推荐

  1. MNIST数据集转为.jpg图片格式

    从mnist官网下载下来的mnist手写数据集是二进制文件流格式的,不能直接查看,如果需要查看,需要将二进制文件转化为jpg格式,可以用各种编程语言实现,如MATLAB.Python.C++等,本文是 ...

  2. 将MNIST数据集转换成.jpg图片

    MNIST数据集简介 # MNIST 数据集合共包含70000张手写数字图片 # 其中60000张用作训练集 # 10000张用作预测集 # 数据集包含了0-9共10类手写数字图片,每张 # 图片都做 ...

  3. php mpdf html 转pdf,使用 MPDF 将HTML转为PDF,然后将该PDF转为PNG图片的时候,中文报错... ......

    第一步: 使用 MPDF(版本6.1) 将 HTML 页面转为PDF文件,可以转成功.代码如下: $html = "对盲人初学者来说,它无需任何额外的修改."; // $html ...

  4. php mpdf html 转pdf,使用 MPDF 将HTML转为PDF,然后将该PDF转为PNG图片的时候,中文报错... ...汗血宝马...

    第一步: 使用 MPDF(版本6.1) 将 HTML 页面转为PDF文件,可以转成功.代码如下: $html = "对盲人初学者来说,它无需任何额外的修改."; // $html ...

  5. Python 将矩形图片转为圆形图片

    使用PIL库将矩形图片转为原型图片 from PIL import Image, ImageDraw, ImageFilterdef crop_max_square(pil_img):return c ...

  6. Python把视频转为 gif 图片——视频制作利器:MoviePy

    简 介:MoviePy 是一个用于视频编辑的 Python 模块,可用于基本操作(如剪切.连接.标题插入).视频合成(也称为非线性编辑).视频处理或创建高级效果.它可以读取和写入最常见的视频格式,包括 ...

  7. 微信小程序之将base64图片转为本地图片

    开始 最近项目中遇到个问题,就是生成海报的时候,需要画上小程序的二维码,然后后台返回的二维码图片是base64的格式,真机生成海报后二维码没有显示. 所以就把后台返回的base64格式的二维码图片转为 ...

  8. 剪辑视频的教程视频,分享视频转码转为序列图片

    所谓的视频转换成图片的意思就是把视频中的经典画面以图片的形式进行保存.如果是用普通的截图工具对视频画面进行截取,会相对来说比较费时,小编分享一个方法有需要的朋友接着往下看吧! 第一步,运行软件&quo ...

  9. 用yolo3训练自己的数据集(包含数据搜集,图片标注,图片批量命名以及如何修改代码)——口罩佩戴以及规范佩戴口罩检验

    用yolo3训练自己的数据集--口罩佩戴及规范性佩戴检验 前言 1. 数据集处理 1.1 数据搜集(多途径) 1.2 自己制作数据集 2.图片标注 2.1 图片批量命名 2.2 使用labelimg进 ...

最新文章

  1. 算法对建筑业的影响,不仅仅是画图
  2. 网站推广专员浅析网站推广期间如何降低网站优化短板威胁?
  3. 征战蓝桥 —— 2016年第七届 —— C/C++A组第7题——剪邮票
  4. MFC-使用自定义控件的方法
  5. Linux之文件通信
  6. 大数据学习(07)--MapReduce
  7. 为什么苹果不再需要谷歌地图?
  8. C#LeetCode刷题之#119-杨辉三角 II(Pascal‘s Triangle II)
  9. 苹果id是什么格式的_iTunes Converter mac(音频格式转换工具)
  10. pandas获取符合条件值的索引
  11. 广度优先遍历(Breadth First Search)
  12. 在Eclipse4.2 4.3 中安装最新版插件 WindowsBuilder swt
  13. java英雄联盟战斗力题目,lol:英雄联盟宇宙的顶尖战力,那些强大的飞升者们...
  14. 有各组方差怎么算组间平方和_组内离差平方和,组间离差平方和与总离差平方和各反映了什么?...
  15. Python爬虫之QQ空间登陆获取信息!
  16. Linux移植Windows摄像头驱动,基于3.14内核usb摄像头驱动的移植
  17. Egret做微信好友排行榜
  18. 将切割后的小图片还原为大图片
  19. linux中raid扩容,Linux停软Raid1扩容方案
  20. 取消AsyncTask

热门文章

  1. 华为“达芬奇计划”首次曝光!
  2. 苹果icloud登录_怎么取消iCloud云上贵州运营的扣费
  3. C语言bound函数,C++中lower_bound函数和upper_bound函数
  4. 学计算机的一定要独立显卡嘛,电脑没有独立显卡会怎么样
  5. win11 nvidia驱动无法更新问题解决
  6. 如何备份你的 QQ 空间相册
  7. php 红宝石,红宝石-世界名贵宝石排行榜-天天排行网
  8. S/4 HANA标准表MARC增强字段
  9. Beatbox brilliance—TED Record
  10. Scrapy框架爬虫项目:京东商城笔记本电脑信息爬取