文章目录

  • 一、数据预处理
    • 1.1 自定义dataset
    • 1.2 划分数据集
    • 1.3 数据增强
    • 1.4载入dataloader
  • 二、模型训练和预测
    • 2.1 训练验证函数
    • 2.2 predict函数,设置num_workers很重要
  • 三、试验不同的模型、优化器,开始训练
    • 3.1 resnet18
      • 3.1.1 baseline分数0.676(resnet18+trfs,bs=32)
      • 3.1.2 数据增强使用trfs_sharp,分数0.71
    • 3.2 SwinTransformer
      • 3.2.1 数据增强为trfs,分数0.6813。
      • 3.2.2 数据增强使用trfs_shap,分数0.731
    • 3.3 EfficientNetV2
    • 3.4 convnext
    • 3.5 VIT
  • 四、提分思路
  • 赛事地址

  比赛详情和baseline见:《如何打一个CV比赛V2.0》。本次比赛我是在colab上跑的,用的是datawhale采样数据集。

from google.colab import drive
drive.mount('/content/drive')
import os
os.chdir('/content/drive/MyDrive/CV/华为车道检测')

下载比赛数据集:

!wget https://mirror.coggle.club/digix-2022-cv-sample-0829.zip
# 解压文件,文件夹重命名为dataset
!unzip digix-2022-cv-sample-0829.zip
!mv digix-2022-cv-sample-0829 dataset # 重命名文件夹为dataset
import os
import glob
from PIL import Image
import csv, time
import numpy as np# pytorch相关
import torch
import torchvision
import torch.optim as optim
import torch.nn as nn
import torchvision.transforms as transforms
from torch.utils.data import DataLoader
import torch.utils.data as data
import csvfrom torch.utils.tensorboard import SummaryWriter
import pprint
# 设置随机种子 固定结果
def set_seeds(seed):torch.manual_seed(seed)  # 固定随机种子(CPU)if torch.cuda.is_available():  # 固定随机种子(GPU)torch.cuda.manual_seed(seed)  # 为当前GPU设置torch.cuda.manual_seed_all(seed)  # 为所有GPU设置np.random.seed(seed)  # 保证后续使用random函数时,产生固定的随机数torch.backends.cudnn.deterministic = True  # 固定网络结构

一、数据预处理

1.1 自定义dataset

# 自定义读取数据集
class ImageSet(data.Dataset):def __init__(self,images,labels,transform):self.transform = transformself.images = imagesself.labels = labelsdef __getitem__(self, item):        # 防止文件出错,这里生成一个随机的照片try:image = Image.open(self.images[item]).convert('RGB')except:image = Image.fromarray(np.zeros((448,448), dtype=np.int8))image = image.convert('RGB')image = self.transform(image)return image, torch.tensor(self.labels[item])def __len__(self):return len(self.images)

1.2 划分数据集

import pandas as pd
import codecs# 训练集标注数据
lines = codecs.open('dataset/train_label.csv').readlines()
train_label = pd.DataFrame({'image': ['dataset/train_image/' + x.strip().split(',')[0] for x in lines],'label': [x.strip().split(',')[1:] for x in lines],
})# 将标签进行二值化处理(原数据集有7种标签,6中问题图片和正常图片,本次比赛只区分是否有问题)
train_label['new_label'] = train_label['label'].apply(lambda x: int('0'not in x))
train_label

1.3 数据增强

  • 参考《数据增强 - AutoAugment 系列论文(1)》、数据增强 - Cutout、Random Erasing、Mixup、Cutmix、mixup介绍《AUGMIX》
  • PILImage对象size属性返回的是w, h,而resize的参数顺序是h, w
  • 训练验证测试集的transformer应该一样,不然效果会很差
  • 目前试验了锐化、Mixup、Augmix、AutoAgmentation,以及入网尺寸,发现入网尺寸为transforms.Resize((352,176)),然后 transforms.CenterCrop([320,160])效果比resize(224,224)好。因为这次图片size是(1080,2400)。Mixup、Augmix、AutoAgmentation效果不太好,锐化有提点。
  • 入网尺寸有试过增大一倍,但是efficiennet显存很快就爆了,没有再试。
import cv2, os
def check_image(path):try:if os.path.exists(path):return Trueelse:return Falseexcept:return False
# 筛选路径存在的训练集
train_is_valid = train_label['image'].apply(lambda x: check_image(x) )
train_label = train_label[train_is_valid]
print(len(train_label))# 数据扩增方法,trfs是baseline的数据增强方法。
trfs = transforms.Compose([transforms.Resize((224,224)),transforms.RandomHorizontalFlip(),transforms.RandomAdjustSharpness(sharpness_factor=2,p=0.5),transforms.ToTensor(),transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])])# trfs_shap是目前改进的数据增强方法。
trfs_sharp = transforms.Compose([transforms.Resize((352,176)),transforms.CenterCrop([320,160]),transforms.RandomAdjustSharpness(sharpness_factor=2,p=1),    transforms.ToTensor(),transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])])from sklearn.model_selection import train_test_split
# 将特征划分到 X 中,标签划分到 Y 中
x = train_label['image'].values  # 不加values结果不是列表,划分后也不是列表。还得转换一次列表,麻烦
y = train_label['new_label'].values
x_train, x_valid, y_train, y_valid = train_test_split(x,y,test_size=0.15,random_state=0)#还不是列表
1969

1.4载入dataloader

  • 设置num_workers=4可以大大加快后面预测时的推理速度。
  • 定义get_dataloader函数,参数是bs、数据增强方式transformer等。因为在试验模型时不停地改参数,有时候都不记得用了哪一些了。
# 训练集和验证集dataset
# 训练集和验证集dataset
def get_dataloadr(bs,transforms)train_dataset = ImageSet(x_train,y_train,transform=transforms)train_all_dataset = ImageSet(x,y,transform=transforms)valid_dataset = ImageSet(x_valid,y_valid,transform=transforms)# 测试集datasettest_images = glob.glob('dataset/test_images/*')test_dataset = ImageSet(test_images, [0] * len(test_images),transforms)train_loader = DataLoader(train_dataset,batch_size=bs,shuffle=True,num_workers=4,pin_memory=True)valid_loader = DataLoader(valid_dataset,batch_size=bs,shuffle=False,num_workers=4,pin_memory=True)train_all_loader=DataLoader(train_all_dataset,batch_size=bs,num_workers=4,shuffle=True)test_loader = DataLoader(test_dataset,batch_size=bs,shuffle=False,num_workers=4,pin_memory=True)train_dataset[0][0].shape,len(train_loader),len(train_all_loader),len(test_loader)return train_loader,valid_loader,train_all_loader,test_loader
(torch.Size([3, 320, 160]), 27, 31, 157)
# 查看图片尺寸
from PIL import Imageimage_test=x_train[0]
image = Image.open(image_test)
arry_img=np.asarray(image)
image.size,arry_img.shape,type(image)
((1080, 2400), (2400, 1080, 4), PIL.PngImagePlugin.PngImageFile)

二、模型训练和预测

2.1 训练验证函数

  1. roc_auc_score传入的应该是标签和预测概率,不是预测标签.而且这样会报错Only one class present in y_true. ROC AUC score is not defined in that case.
  2. 如果传入roc_auc_score(pred_all,label_all)会报错continuous format is not supported。第一个参数必须是标签
  3. 可以设置为只保存最佳auc的模型,但是训练时只用采样数据集存在过拟合,最佳valid_auc不一定是最优模型,所以还是设置了每个epoch都保存模型。
from torch._C import NoneType
#编写训练和验证循环
import time
import numpy as np
from sklearn.metrics import f1_score,precision_score,recall_score,accuracy_score,roc_auc_score
#加载进度条
from tqdm.auto import tqdmdef train_and_eval(train_loader,valid_loader=None,epoch=None,scheduler=None,save_name=None):best_auc=0.0 # 后面设置了每个epoch都保存数据,所以best_auc其实没用到num_training_steps=len(train_loader)*epochprogress_bar=tqdm(range(num_training_steps))writer = SummaryWriter(log_dir='runs/'+save_name)for i in range(epoch):"""训练模型"""start=time.time()model.train()print("***** Running training epoch {} *****".format(i+1))train_loss_sum,total_train_acc=0.0,0pred_all,label_all=[],[]for idx,(X,labels) in enumerate(train_loader):if isinstance(X, list):# Required for BERT fine-tuning (to be covered later)X = [x.to(device) for x in X]else:X = X.to(device)labels = labels.to(device)#计算输出和losspred=model(X)loss=criterion(pred,labels)loss.backward()optimizer.step()if scheduler is not None:scheduler=schedulerscheduler.step()optimizer.zero_grad()  progress_bar.update(1)train_loss_sum+=loss.item()pred=pred.clone().detach().cpu().numpy() # detach表示复制且不可求导,原tensor不变,仍可求导# 计算acc需要argmax算标签,计算auc需要概率值,而不是预测标签predictions=np.argmax(pred,axis=-1) # 预测标签,用来计算acclabels=labels.to('cpu').numpy()label_all.extend(labels)pred_all.extend(pred[:,1]) total_train_acc+=accuracy_score(predictions,labels)avg_train_acc=total_train_acc/len(train_loader)train_auc=roc_auc_score(label_all,pred_all)# 将需要显示的数据加入tensorboardwriter.add_scalar(tag="loss/train", scalar_value=train_loss_sum,global_step=i*len(train_loader)+idx)writer.add_scalar(tag="acc/train", scalar_value=avg_train_acc,global_step=i*len(train_loader)+idx)writer.add_scalar(tag="auc/train", scalar_value=train_auc,global_step=i*len(train_loader)+idx)if i%1==0: # 每个epoch打印一次结果print("Epoch {:03d} | Step {:03d}/{:03d} | Loss {:.4f} | train_acc {:.4f} | train_auc {:.4f} | \Time {:.4f} | lr = {} \n".format(i+1,idx+1,len(train_loader),train_loss_sum/(idx+1),avg_train_acc,train_auc,time.time()-start,optimizer.state_dict()['param_groups'][0]['lr']))if valid_loader is not None:  #有传入验证集就验证模型      model.eval()pred_all,label_all=[],[]total_eval_loss,total_eval_accuracy=0,0for (X,labels) in valid_loader:with torch.no_grad():#只有这一块是不需要求导的    if isinstance(X, list):# Required for BERT fine-tuning (to be covered later)X = [x.to(device) for x in X]else:X = X.to(device)labels = labels.to(device)pred=model(X)loss=criterion(pred,labels)#计算loss和准确率total_eval_loss+=loss.item()pred=pred.clone().detach().cpu().numpy()#detach表示复制且不可求导,原tensor不变,仍可求导predictions=np.argmax(pred,axis=-1)labels=labels.to('cpu').numpy()label_all.extend(labels)pred_all.extend(pred[:,1])total_eval_accuracy+=accuracy_score(predictions,labels)avg_val_acc=total_eval_accuracy/len(valid_loader)val_auc=roc_auc_score(label_all,pred_all) writer.add_scalar(tag="loss/valid", scalar_value=total_eval_loss,global_step=i*len(valid_loader)+idx)writer.add_scalar(tag="acc/valid", scalar_value=avg_val_acc,global_step=i*len(valid_loader)+idx)writer.add_scalar(tag="auc/valid", scalar_value=val_auc,global_step=i*len(valid_loader)+idx)torch.save(model.state_dict(),'model/'+'%s'%save_name+'_'+'%d'%i)print("val_accuracy:%.4f" % (avg_val_acc),'\t',"val_auc:%.4f" % (val_auc))print("val_loss: %.4f"%(total_eval_loss/len(valid_loader)),' \t',"time costed={}s \n".format(round(time.time()-start,5)))print("-------------------------------")else:# 没有验证集也要保存模型best_auc=train_auctorch.save(model.state_dict(),'model/'+'%s'%save_name+'_'+'%d'%i)

2.2 predict函数,设置num_workers很重要

选手提交csv文件,编码采用无BOM 的UTF-8。格式如下:

  • imagename, defect_prob
  • imagename对应测试图片的图片名,defect_prob表示测试图片存在问题的概率。imagename, defect_prob间采用英文逗号分隔。
  • 之前没有设置dataloader的num_workers,结果swin_s预测测试集结果50min没跑完气死。下载模型到kaggle预测,结果kaggle是torch 1.12和television1.10,没有swin模型,怎么安装升级版本都不行。上传到kaggle作为dataset到colab不挂载device跑结果死活下不了
  • 最后想着跟num_workers有关,设置num_workers=4,好的时候swin_s只用6分半预测完。(colab分配的GPU貌似是会不一样的)
def predict(model,model_path=None):if model_path is not None:model.load_state_dict(torch.load(model_path))device = 'cuda' if torch.cuda.is_available() else 'cpu'model = model.to(device)model.eval()to_prob = nn.Softmax(dim=1)with torch.no_grad():imagenames, probs = list(), list()for batch_idx, batch in enumerate(test_loader):image, _ = batchimage = image.to('cuda')pred = model(image)prob = to_prob(pred)prob = list(prob.data.cpu().numpy())probs += probprint(probs[0],len(probs))with open('dataset/submission.csv', 'w',newline = '', encoding='utf8') as fp:writer = csv.writer(fp)writer.writerow(['imagename', 'defect_prob'])for imagename, prob in zip(test_images, probs):imagename = os.path.basename(imagename)writer.writerow([imagename, str(prob[1])])

三、试验不同的模型、优化器,开始训练

3.1 resnet18

3.1.1 baseline分数0.676(resnet18+trfs,bs=32)

# 加载resnet18预训练模型
train_loader,valid_loader,train_all_loader,test_loader=get_dataloader(bs=32,transforms=trfs)model = torchvision.models.resnet18(pretrained=True)
model.fc = torch.nn.Linear(512, 2)
device = 'cuda' if torch.cuda.is_available() else 'cpu'
model = model.to(device) #使用GPU# 模型优化器
optimizer = optim.SGD(model.parameters(), lr=0.001)
# 模型损失函数
criterion= nn.CrossEntropyLoss()
train_and_eval(train_loader,valid_loader,epoch=3,save_path='resnet')
  0%|          | 0/159 [00:00<?, ?it/s]***** Running training epoch 1 *****
Epoch 001 | Step 053/053 | Loss 0.3689 | train_acc 0.8782 | train_auc 0.5587 |     Time 106.9236 | lr = 0.001 val_accuracy:0.8625   val_auc:0.7425
val_loss: 0.3398     time costed=124.82111s -------------------------------
***** Running training epoch 2 *****
Epoch 002 | Step 053/053 | Loss 0.3087 | train_acc 0.8850 | train_auc 0.7146 |     Time 105.2105 | lr = 0.001 val_accuracy:0.8656   val_auc:0.8386
val_loss: 0.2950     time costed=123.19321s -------------------------------
***** Running training epoch 3 *****
Epoch 003 | Step 053/053 | Loss 0.2766 | train_acc 0.9027 | train_auc 0.7803 |     Time 104.9202 | lr = 0.001 val_accuracy:0.9031   val_auc:0.8385
val_loss: 0.2590     time costed=123.11665s -------------------------------
# 加载上次跑的模型
model = torchvision.models.resnet18(pretrained=True)
model.fc = torch.nn.Linear(512, 2)
model.load_state_dict(torch.load('resnet_01'))
model = model.to(device)
# 全量数据再训练一次
train_all_loader = DataLoader(train_dataset,batch_size=32,shuffle=True)
optimizer = optim.SGD(model.parameters(), lr=0.0002)
train_and_eval(train_all_loader,epoch=1,save_path='resnet_all')
  0%|          | 0/53 [00:00<?, ?it/s]***** Running training epoch 1 *****
Epoch 001 | Step 053/053 | Loss 0.2586 | train_acc 0.9136 | train_auc 0.8179 |     Time 109.0911 | lr = 0.0002
predict(model,model_path=None)

3.1.2 数据增强使用trfs_sharp,分数0.71

# 加载resnet18预训练模型
set_seeds(2022)
train_loader,valid_loader,train_all_loader,test_loader=get_dataloader(bs=32,transforms=trfs_sharp)model = torchvision.models.resnet18(pretrained=True)
# weights='DEFAULT'
model.fc = torch.nn.Linear(512,2)
device = 'cuda' if torch.cuda.is_available() else 'cpu'
model = model.to(device) #使用GPU# 模型优化器
optimizer = optim.SGD(model.parameters(), lr=0.001)
scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer,T_max=3)
# 模型损失函数
criterion= nn.CrossEntropyLoss()
train_and_eval(train_loader,valid_loader,epoch=6,save_name='resnet18')
***** Running training epoch 1 *****
Epoch 001 | Step 053/053 | Loss 0.3774 | train_acc 0.8617 | train_auc 0.4651 |             Time 144.9953 | lr = 0.001 val_accuracy:0.8625   val_auc:0.6024
val_loss: 0.3489     time costed=169.10428s -------------------------------
***** Running training epoch 2 *****
Epoch 002 | Step 053/053 | Loss 0.3024 | train_acc 0.8844 | train_auc 0.6599 |             Time 124.3967 | lr = 0.001 val_accuracy:0.8688   val_auc:0.7516
val_loss: 0.3118     time costed=145.55964s -------------------------------
***** Running training epoch 3 *****
Epoch 003 | Step 053/053 | Loss 0.2708 | train_acc 0.8956 | train_auc 0.7791 |             Time 124.2991 | lr = 0.001 val_accuracy:0.8812   val_auc:0.8182
val_loss: 0.2830     time costed=146.05037s -------------------------------
***** Running training epoch 4 *****
Epoch 004 | Step 053/053 | Loss 0.2555 | train_acc 0.9026 | train_auc 0.8296 |             Time 123.9550 | lr = 0.001 val_accuracy:0.8969   val_auc:0.8488
val_loss: 0.2573     time costed=145.67704s -------------------------------
***** Running training epoch 5 *****
Epoch 005 | Step 053/053 | Loss 0.2417 | train_acc 0.9171 | train_auc 0.8541 |             Time 122.8209 | lr = 0.001 val_accuracy:0.9250   val_auc:0.8596
val_loss: 0.2418     time costed=144.79443s -------------------------------
***** Running training epoch 6 *****
Epoch 006 | Step 053/053 | Loss 0.2208 | train_acc 0.9325 | train_auc 0.8818 |             Time 123.2759 | lr = 0.001 val_accuracy:0.9219   val_auc:0.8661
val_loss: 0.2326     time costed=144.71817s -------------------------------
# 加载上次跑的模型
model = torchvision.models.resnet18(pretrained=True)
model.fc = torch.nn.Linear(512, 2)
model.load_state_dict(torch.load('resnet18_5'))
model = model.to(device)
# 全量数据再训练一次
optimizer = optim.SGD(model.parameters(), lr=0.0001)
scheduler = torch.optim.lr_scheduler.LinearLR(optimizer,start_factor=0.001)
train_and_eval(train_all_loader,epoch=3,save_name='resnet18_all')
***** Running training epoch 1 *****
Epoch 001 | Step 062/062 | Loss 0.2153 | train_acc 0.9311 | train_auc 0.8909 |             Time 148.2866 | lr = 0.0001 ***** Running training epoch 2 *****
Epoch 002 | Step 062/062 | Loss 0.2130 | train_acc 0.9325 | train_auc 0.8965 |             Time 147.7210 | lr = 0.0001 ***** Running training epoch 3 *****
Epoch 003 | Step 062/062 | Loss 0.2126 | train_acc 0.9311 | train_auc 0.8991 |             Time 146.8041 | lr = 0.0001
predict(model,model_path=None)

3.2 SwinTransformer

3.2.1 数据增强为trfs,分数0.6813。

pytorch上SwinTransformer有三个T(Tiny),S(Small),B(Base)。其预训练权重在《Table of all available classification weights》。

  • 简单说权重使用weights=‘DEFAULT’ or weights=‘IMAGENET1K_V1’.
  • 最后分类头是head层,也是torch.nn.Linear,s和t版输入是768维,输出1000维。
  • 优化器用sgd,bs=64,lr=0.001减半,优化器用sgd,训练10epoch用了21min
  • train_acc 0.8824 ,train_auc 0.7777 |val_accuracy:0.8669 val_auc:0.8997
  • 最后用全部训练集训练2个epoch,分数是0.6813
train_loader,valid_loader,train_all_loader,test_loader=get_dataloader(bs=64,transforms=trfs)model = torchvision.models.swin_transformer.swin_s(weights='DEFAULT')
model.head=torch.nn.Linear(768,2)
device = 'cuda' if torch.cuda.is_available() else 'cpu'
model = model.to(device) #使用GPU
lr,weight_decay=0.001,0.0003
# 模型优化器
optimizer = optim.SGD(model.parameters(), lr=lr)
scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer,T_max=5)
# 模型损失函数
criterion= nn.CrossEntropyLoss()
train_and_eval(train_loader,valid_loader,epoch=10,scheduler=scheduler,save_name='swin_s')
  0%|          | 0/270 [00:00<?, ?it/s]***** Running training epoch 1 *****
Epoch 001 | Step 027/027 | Loss 0.5020 | train_acc 0.8003 | train_auc 0.5437 |     Time 756.2816 | lr = 0.00034549150281252655 val_accuracy:0.8588      val_auc:0.6714
val_loss: 0.3851     time costed=888.98904s -------------------------------
***** Running training epoch 2 *****
Epoch 002 | Step 027/027 | Loss 0.3553 | train_acc 0.8825 | train_auc 0.6178 |     Time 120.7405 | lr = 9.549150281252699e-05 val_accuracy:0.8588   val_auc:0.7586
val_loss: 0.3732     time costed=140.11592s -------------------------------
***** Running training epoch 3 *****
Epoch 003 | Step 027/027 | Loss 0.3430 | train_acc 0.8825 | train_auc 0.6562 |     Time 120.7618 | lr = 0.0009045084971874806 val_accuracy:0.8588   val_auc:0.8103
val_loss: 0.3599     time costed=140.08969s -------------------------------
***** Running training epoch 4 *****
Epoch 004 | Step 027/027 | Loss 0.3325 | train_acc 0.8825 | train_auc 0.6738 |     Time 120.9380 | lr = 0.0006545084971874633 val_accuracy:0.8588   val_auc:0.7986
val_loss: 0.3569     time costed=140.3202s -------------------------------
***** Running training epoch 5 *****
Epoch 005 | Step 027/027 | Loss 0.3288 | train_acc 0.8830 | train_auc 0.6897 |     Time 120.4576 | lr = 0.0 val_accuracy:0.8588     val_auc:0.8407
val_loss: 0.3447     time costed=139.84589s -------------------------------
***** Running training epoch 6 *****
Epoch 006 | Step 027/027 | Loss 0.3240 | train_acc 0.8789 | train_auc 0.7275 |     Time 120.7112 | lr = 0.0006545084971874866 val_accuracy:0.8588   val_auc:0.8521
val_loss: 0.3297     time costed=140.0113s -------------------------------
***** Running training epoch 7 *****
Epoch 007 | Step 027/027 | Loss 0.3157 | train_acc 0.8836 | train_auc 0.7355 |     Time 120.6098 | lr = 0.0009045084971875055 val_accuracy:0.8588   val_auc:0.8899
val_loss: 0.3175     time costed=139.8843s -------------------------------
***** Running training epoch 8 *****
Epoch 008 | Step 027/027 | Loss 0.3179 | train_acc 0.8765 | train_auc 0.7576 |     Time 120.4713 | lr = 9.549150281252627e-05 val_accuracy:0.8588   val_auc:0.8959
val_loss: 0.3099     time costed=139.96295s -------------------------------
***** Running training epoch 9 *****
Epoch 009 | Step 027/027 | Loss 0.3025 | train_acc 0.8848 | train_auc 0.7701 |     Time 120.4666 | lr = 0.00034549150281254536 val_accuracy:0.8637      val_auc:0.8945
val_loss: 0.2947     time costed=139.81461s -------------------------------
***** Running training epoch 10 *****
Epoch 010 | Step 027/027 | Loss 0.3001 | train_acc 0.8824 | train_auc 0.7777 |     Time 120.6441 | lr = 0.000999999999999998 val_accuracy:0.8669    val_auc:0.8997
val_loss: 0.2868     time costed=139.93732s -------------------------------

启动tensorboard查看训练效果

%load_ext tensorboard
%tensorboard --logdir runs/swin_s
Output hidden; open in https://colab.research.google.com to view.
# 最后用全量数据跑2个epoch
model = torchvision.models.swin_transformer.swin_s(weights='DEFAULT')
model.head=torch.nn.Linear(768,2)
model.load_state_dict(torch.load('model/swin_s_9'))device = 'cuda' if torch.cuda.is_available() else 'cpu'
model = model.to(device)
scheduler = torch.optim.lr_scheduler.LinearLR(optimizer,start_factor=0.0002)
# 模型损失函数
criterion= nn.CrossEntropyLoss()
train_and_eval(train_all_loader,scheduler=scheduler,epoch=2,save_name='swin_s_all')
  0%|          | 0/62 [00:00<?, ?it/s]***** Running training epoch 1 *****
Epoch 001 | Step 031/031 | Loss 0.2886 | train_acc 0.8860 | train_auc 0.8052 |     Time 142.2524 | lr = 0.0009999999999999974 ***** Running training epoch 2 *****
Epoch 002 | Step 031/031 | Loss 0.2897 | train_acc 0.8865 | train_auc 0.7930 |     Time 141.9387 | lr = 0.0009999999999999974
 predict(model,model_path=None)

3.2.2 数据增强使用trfs_shap,分数0.731

train_loader,valid_loader,train_all_loader,test_loader=get_dataloader(bs=64,transforms=trfs_sharp)
model = torchvision.models.swin_transformer.swin_s(weights='DEFAULT')
model.head=torch.nn.Linear(768,2)device = 'cuda' if torch.cuda.is_available() else 'cpu'
model = model.to(device) #使用GPU
lr,weight_decay=0.001,0.0003 # 学习率为0.001时验证集acc不变
# 模型优化器
optimizer = optim.SGD(model.parameters(), lr=lr)
scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer,T_max=5)
# 模型损失函数
criterion= nn.CrossEntropyLoss()
train_and_eval(train_loader,valid_loader,epoch=10,scheduler=scheduler,save_name='swins_sharp')
Downloading: "https://download.pytorch.org/models/swin_s-5e29d889.pth" to /root/.cache/torch/hub/checkpoints/swin_s-5e29d889.pth0%|          | 0.00/190M [00:00<?, ?B/s]0%|          | 0/270 [00:00<?, ?it/s]***** Running training epoch 1 *****
Epoch 001 | Step 027/027 | Loss 0.5272 | train_acc 0.7702 | train_auc 0.4723 |     Time 149.8585 | lr = 0.00034549150281252655 val_accuracy:0.8588      val_auc:0.5290
val_loss: 0.4006     time costed=187.8727s -------------------------------
***** Running training epoch 2 *****
Epoch 002 | Step 027/027 | Loss 0.3652 | train_acc 0.8825 | train_auc 0.5693 |     Time 56.3001 | lr = 9.549150281252699e-05 val_accuracy:0.8588    val_auc:0.5995
val_loss: 0.3884     time costed=68.73339s -------------------------------
***** Running training epoch 3 *****
Epoch 003 | Step 027/027 | Loss 0.3348 | train_acc 0.8860 | train_auc 0.6288 |     Time 56.2766 | lr = 0.0009045084971874806 val_accuracy:0.8588    val_auc:0.6558
val_loss: 0.3918     time costed=68.7855s -------------------------------
***** Running training epoch 4 *****
Epoch 004 | Step 027/027 | Loss 0.3279 | train_acc 0.8860 | train_auc 0.6313 |     Time 56.7157 | lr = 0.0006545084971874633 val_accuracy:0.8588    val_auc:0.6815
val_loss: 0.3914     time costed=69.05862s -------------------------------
***** Running training epoch 5 *****
Epoch 005 | Step 027/027 | Loss 0.3363 | train_acc 0.8825 | train_auc 0.6533 |     Time 56.2564 | lr = 0.0 val_accuracy:0.8588      val_auc:0.7486
val_loss: 0.3646     time costed=68.72446s -------------------------------
***** Running training epoch 6 *****
Epoch 006 | Step 027/027 | Loss 0.3415 | train_acc 0.8754 | train_auc 0.7024 |     Time 56.3478 | lr = 0.0006545084971874866 val_accuracy:0.8588    val_auc:0.7658
val_loss: 0.3583     time costed=69.10426s -------------------------------
***** Running training epoch 7 *****
Epoch 007 | Step 027/027 | Loss 0.3127 | train_acc 0.8860 | train_auc 0.7012 |     Time 56.2839 | lr = 0.0009045084971875055 val_accuracy:0.8588    val_auc:0.7843
val_loss: 0.3557     time costed=68.70668s -------------------------------
***** Running training epoch 8 *****
Epoch 008 | Step 027/027 | Loss 0.3155 | train_acc 0.8836 | train_auc 0.7328 |     Time 56.3986 | lr = 9.549150281252627e-05 val_accuracy:0.8588    val_auc:0.8015
val_loss: 0.3494     time costed=68.8231s -------------------------------
***** Running training epoch 9 *****
Epoch 009 | Step 027/027 | Loss 0.3045 | train_acc 0.8866 | train_auc 0.7426 |     Time 55.9236 | lr = 0.00034549150281254536 val_accuracy:0.8588   val_auc:0.8112
val_loss: 0.3492     time costed=68.71339s -------------------------------
***** Running training epoch 10 *****
Epoch 010 | Step 027/027 | Loss 0.3055 | train_acc 0.8830 | train_auc 0.7635 |     Time 55.7483 | lr = 0.000999999999999998 val_accuracy:0.8588     val_auc:0.8031
val_loss: 0.3388     time costed=68.4414s -------------------------------
%load_ext tensorboard
%tensorboard --logdir runs/swin_s_shap

最后用全量数据跑3个epoch


model = torchvision.models.swin_transformer.swin_s(weights='DEFAULT')
model.head=torch.nn.Linear(768,2)
model.load_state_dict(torch.load('model/swins_sharp_9'))device = 'cuda' if torch.cuda.is_available() else 'cpu'
model = model.to(device)optimizer = optim.SGD(model.parameters(),lr=0.0003)
scheduler = torch.optim.lr_scheduler.LinearLR(optimizer,start_factor=0.0003)
# 模型损失函数
criterion= nn.CrossEntropyLoss()
train_and_eval(train_all_loader,scheduler=scheduler,epoch=3,save_name='swins_sharp_all')
  0%|          | 0/93 [00:00<?, ?it/s]***** Running training epoch 1 *****
Epoch 001 | Step 031/031 | Loss 0.3156 | train_acc 0.8787 | train_auc 0.7499 |     Time 120.0241 | lr = 0.00030000000000000003 ***** Running training epoch 2 *****
Epoch 002 | Step 031/031 | Loss 0.3086 | train_acc 0.8803 | train_auc 0.7477 |     Time 63.1228 | lr = 0.00030000000000000003 ***** Running training epoch 3 *****
Epoch 003 | Step 031/031 | Loss 0.2997 | train_acc 0.8816 | train_auc 0.7771 |     Time 62.7066 | lr = 0.00030000000000000003

预测测试集生成结果

model = torchvision.models.swin_transformer.swin_s(weights='DEFAULT')
model.head=torch.nn.Linear(768,2)predict(model,model_path='model/swins_sharp_all_2') # 14min
[0.90404904 0.09595092] 10000

3.3 EfficientNetV2

效果并不好,容易爆显存,可能是超参数没调好。

train_loader,valid_loader,train_all_loader,test_loader=get_dataloader(bs=64,transforms=trfs_sharp)
model = torchvision.models.efficientnet_v2_s(weights='DEFAULT')
model.classifier[1]=torch.nn.Linear(1280,2)device = 'cuda' if torch.cuda.is_available() else 'cpu'
model = model.to(device)lr,weight_decay=0.001,0.0003
# 模型优化器
optimizer = optim.SGD(model.parameters(), lr=lr)
scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer,T_max=5)
# 模型损失函数
criterion= nn.CrossEntropyLoss()
train_and_eval(train_loader,valid_loader,epoch=10,scheduler=scheduler,save_name='effnet2')
Downloading: "https://download.pytorch.org/models/efficientnet_v2_s-dd5fe13b.pth" to /root/.cache/torch/hub/checkpoints/efficientnet_v2_s-dd5fe13b.pth0%|          | 0.00/82.7M [00:00<?, ?B/s]0%|          | 0/270 [00:00<?, ?it/s]***** Running training epoch 1 *****
Epoch 001 | Step 027/027 | Loss 0.5934 | train_acc 0.7841 | train_auc 0.5029 |     Time 152.3739 | lr = 0.00034549150281252655 val_accuracy:0.8019      val_auc:0.5505
val_loss: 0.5903     time costed=185.25558s -------------------------------
***** Running training epoch 2 *****
Epoch 002 | Step 027/027 | Loss 0.5329 | train_acc 0.8437 | train_auc 0.5208 |     Time 54.4941 | lr = 9.549150281252699e-05 val_accuracy:0.8300    val_auc:0.5115
val_loss: 0.5477     time costed=64.94097s -------------------------------
***** Running training epoch 3 *****
Epoch 003 | Step 027/027 | Loss 0.5050 | train_acc 0.8656 | train_auc 0.4964 |     Time 49.6710 | lr = 0.0009045084971874806 val_accuracy:0.8400    val_auc:0.6024
val_loss: 0.5248     time costed=60.15345s -------------------------------
***** Running training epoch 4 *****
Epoch 004 | Step 027/027 | Loss 0.4718 | train_acc 0.8773 | train_auc 0.5031 |     Time 49.8339 | lr = 0.0006545084971874633 val_accuracy:0.8588    val_auc:0.5236
val_loss: 0.5017     time costed=60.28123s -------------------------------
***** Running training epoch 5 *****
Epoch 005 | Step 027/027 | Loss 0.4481 | train_acc 0.8790 | train_auc 0.5428 |     Time 49.7957 | lr = 0.0 val_accuracy:0.8588      val_auc:0.4952
val_loss: 0.4818     time costed=60.31998s -------------------------------
***** Running training epoch 6 *****
Epoch 006 | Step 027/027 | Loss 0.4234 | train_acc 0.8848 | train_auc 0.5676 |     Time 50.4792 | lr = 0.0006545084971874866 val_accuracy:0.8619    val_auc:0.5764
val_loss: 0.4550     time costed=61.06234s -------------------------------
***** Running training epoch 7 *****
Epoch 007 | Step 027/027 | Loss 0.4178 | train_acc 0.8796 | train_auc 0.5479 |     Time 49.9712 | lr = 0.0009045084971875055 val_accuracy:0.8588    val_auc:0.5897
val_loss: 0.4512     time costed=60.4179s -------------------------------
***** Running training epoch 8 *****
Epoch 008 | Step 027/027 | Loss 0.4050 | train_acc 0.8789 | train_auc 0.5802 |     Time 49.6602 | lr = 9.549150281252627e-05 val_accuracy:0.8538    val_auc:0.5747
val_loss: 0.4505     time costed=60.17702s -------------------------------
***** Running training epoch 9 *****
Epoch 009 | Step 027/027 | Loss 0.4062 | train_acc 0.8783 | train_auc 0.5692 |     Time 49.7288 | lr = 0.00034549150281254536 val_accuracy:0.8588   val_auc:0.6225
val_loss: 0.4247     time costed=60.10822s -------------------------------
***** Running training epoch 10 *****
Epoch 010 | Step 027/027 | Loss 0.3873 | train_acc 0.8807 | train_auc 0.5912 |     Time 49.3028 | lr = 0.000999999999999998 val_accuracy:0.8588     val_auc:0.5934
val_loss: 0.4345     time costed=59.68535s -------------------------------
#%load_ext tensorboard
%tensorboard --logdir runs/effnet2

3.4 convnext

效果没调好,就不写训练结果了

train_loader,valid_loader,train_all_loader,test_loader=get_dataloader(bs=64,transforms=trfs_sharp)
# 加载convnext预训练模型,
model = torchvision.models.convnext.convnext_small(pretrained=True)
model.classifier[2]=torch.nn.Linear(768,2)device = 'cuda' if torch.cuda.is_available() else 'cpu'
model = model.to(device) #使用GPU# 模型优化器
optimizer = optim.SGD(model.parameters(), lr=0.0005)
scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer,T_max=5)
# 模型损失函数
criterion= nn.CrossEntropyLoss()
train_and_eval(train_loader,valid_loader,epoch=10,scheduler=scheduler,save_name='convnext') # 21min

3.5 VIT

VIT使用了绝对位置编码,所以迁移学习的时候:

  • 图片入网尺寸需要和预训练模型的入网尺寸一致(比如同样使用ImageNet的入网尺寸224)
  • 或者是从头开始训练,不使用预训练权重。

一开始没有注意到入网尺寸的问题,训练老是报错AssertionError: Wrong image height!

train_loader,valid_loader,train_all_loader,test_loader=get_dataloader(bs=64,transforms=trfs_sharp)model=torchvision.models.vit_b_16() # 不使用预训练模型,直接从头开始训练
model.heads.head=torch.nn.Linear(768,2)
device = 'cuda' if torch.cuda.is_available() else 'cpu'
model = model.to(device) #使用GPUlr,weight_decay=0.001,0.0003 # 学习率为0.001时验证集acc不变
# 模型优化器
optimizer = optim.SGD(model.parameters(), lr=lr)
scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer,T_max=5)
# 模型损失函数
criterion= nn.CrossEntropyLoss()
train_and_eval(train_loader,valid_loader,epoch=10,scheduler=scheduler,save_name='VITB')

四、提分思路

  • 数据扩增:randcrop、randnoise、flip

  • 更加强大的模型:输入的图片的尺寸、模型的最终的精度

  • 利用无标签的数据集:伪标签的进行打标

    • 筛选出置信度比较高的样本 3000
    • 3000 + 训练集 再次训练
  • timm和torchvison 权重是不同,使用的超参数需要调整。

  • 如果使用全量数据集进行训练,更加强化模型-》更长的训练时间,AUC 0.8 +

  • 如果使用预训练模型 vs 从头训练,精度还是有差异,前者更好

华为2022校园赛——车道渲染相关推荐

  1. 2022年华为ICT实践赛网络赛道题库全(1107道题目)

    2022年华为ICT实践赛网络赛道题库 今年的省赛初赛17号比,好多省份因为疫情原因线上,真的羡慕各位学弟学妹,想当初自己和队友在实验室里通宵刷题和看教学视频(不过最后结果是好的自己拿了省一国二),线 ...

  2. “互联网+”大赛之智慧校园 赛题攻略:你的智慧校园,WeLink帮你来建

    摘要:本赛题的核心就是借助华为云WeLink的中台服务能力/开发工具等,结合学校的具体的高价值场景,开发出WeLink小程序,方便师生的学习与生活. 本文分享自华为云社区<"互联网+& ...

  3. 2022国赛高教杯数学建模A题B题(预测)

    2022国赛高教杯A题:   如何正确看待外企商品   随着经济和科技的发展,科技与生活融为一体,越来越多的智能科技化商品涌现在国内市场,有些商品例如手机已经成为当代人出门必用的付款.出行证明的媒介, ...

  4. 华为2014校园招聘的机试题目

    华为2014校园招聘的机试题目和2013年的完全一样. 一.题目描述(60分): 通过键盘输入一串小写字母(a~z)组成的字符串.请编写一个字符串过滤程序,若字符串中出现多个相同的字符,将非首次出现的 ...

  5. 【蓝桥杯】 三行代码解决 “全排列的价值”(2022省赛pythonA组)

    三行代码解决 "全排列的价值"(2022省赛pythonA组) 置顶代码: from math import factorial n = int(input()) print(fa ...

  6. 2012年华为杯校园编程大赛决赛 类别:软件C/C++语言

    2012年华为杯校园编程大赛决赛 类别:软件C/C++语言 编程题(共1题,100分.请上机编写程序,按题目要求提交文件.测试用例不对考生公开,凡不满足提交要求导致不能运行或用例不通过,不予评分.) ...

  7. 【数学建模】2022亚太赛A题 结晶器熔炼结晶过程序列图像特征提取与建模分析

    2022亚太赛A题 1 前言 2 问题重述 3 问题一 3.1 数据处理 3.1.1 图像裁剪 3.1.2 提取红色部分 3.2 汉字提取 3.2.1 失败的例子 3.2.2 正确的例子 4 问题二 ...

  8. 【2022国赛官方评审要点发布】2022高教社杯全国大学生数学建模竞赛官方评阅要点

    [2022国赛官方评审要点发布]2022高教社杯全国大学生数学建模竞赛官方评阅要点 文章目录 2022年A题评阅要点 问题1 问题2 问题3 问题4 2022年B题评阅要点 问题1 问题2 2022年 ...

  9. 2022国赛论文及可运行代码

    2022数学建模国赛各题分析 A题分析 A题大方向:微分方程.机理分析.动力系统.参数拟合.数据矫正.非线性优化.误差分析.精度分析 国赛前主要看这几个方面的论文 热问题(热传递,热扩散,热对流) 航 ...

  10. 2022电赛C题:小车跟踪(方案1+核心代码)

    目录 前言 一.题目 二.方案1 1.材料清单 2.说明 三.核心代码 四.工程获取 前言 针对2022年电赛C题小车跟踪,本团队一共是做了两种方案:       第一种主要以摄像头(openmv)为 ...

最新文章

  1. UWP: ListView 中与滚动有关的两个需求的实现
  2. pandas打乱行次序
  3. CodeForces - 1300D Aerodynamic(几何+思维)
  4. BZOJ.1190.[HNOI2007]梦幻岛宝珠(分层背包DP)
  5. SAP产品和3D渲染技术的结合-使用JavaScript的开源3D渲染库实现
  6. js 根据固定位置获取经纬度--腾讯地图
  7. Ubuntu 安装开源微信(源码安装+release快速安装)
  8. “远程桌面己停止工作”的解决方法
  9. PHP 5.6 中 Automatically populating $HTTP_RAW_POST_DATA is deprecated and will be removed in a future
  10. 动态加载网上或者本地场景的后续
  11. python 语言基本知识2:数据结构
  12. Pyramidal RoR for Image Classification
  13. mysql 1033_mysql报错1033 Incorrect information in file: ‘xxx.frm’问题的解决方法
  14. 添加网络计算机后打印乱码,Windows7系统打印机无法打印出现乱码的解决方法
  15. 我最喜欢的音乐系列之李连杰电影插曲
  16. 电脑dns服务器未响应啥意思,电脑诊断出DNS服务器未响应是什么意思
  17. 【python】用tkinter做一个最近很火的强制表白神器
  18. 14.3 inline、const、mutable、this与static
  19. Django期末考试复习
  20. ORACLE中分钟用mm和mi 标示区别

热门文章

  1. python 中文编码乱码问题原理分析及解决思路
  2. win10跳过计算机密码,win10开机密码忘了怎么办
  3. java 文件流下载pdf
  4. matlab 上三角矩阵变为对称矩阵,已知上/下三角矩阵如何快速将对称阵补全
  5. 计算机与材料物理,南京邮电大学材料物理专业
  6. 基于MATLAB的任意多边形最小外接圆计算
  7. 02.CCNA 200-301 题库_51-100
  8. 数分下(第2讲):二阶线性微分方程的解法
  9. 【Hardware】【天线基础知识】
  10. oracle10非正常删除卸载干净,Oracle 10g卸载干净