AnnaAraslanova/FBNet 程序分析
AnnaAraslanova/FBNet 是 FBNet 相对来说比较好的一个第三方实现。延迟测量采用 x86 处理器的结果近似。需要注意的是:
- PyTorch GPU 并行对输入数据有要求;
- 随机超网络直接使用 BN 层似乎不妥。
supernet_main_file.py
sample_architecture_from_the_supernet 从中选出最优结构。
if __name__ == "__main__":assert args.train_or_sample in ['train', 'sample']if args.train_or_sample == 'train':train_supernet()elif args.train_or_sample == 'sample':assert args.architecture_name != '' and args.architecture_name not in MODEL_ARCHhardsampling = False if args.hardsampling_bool_value in ['False', '0'] else Truesample_architecture_from_the_supernet(unique_name_of_arch=args.architecture_name, hardsampling=hardsampling)
train_supernet
manual_seed = 1np.random.seed(manual_seed)torch.manual_seed(manual_seed)torch.cuda.manual_seed_all(manual_seed)torch.backends.cudnn.benchmark = True
1.7版本之后参数变成了
logdir
。
create_directories_from_list([CONFIG_SUPERNET['logging']['path_to_tensorboard_logs']])logger = get_logger(CONFIG_SUPERNET['logging']['path_to_log_file'])writer = SummaryWriter(log_dir=CONFIG_SUPERNET['logging']['path_to_tensorboard_logs'])
LookUpTable 会将结果写入文件。
#### LookUp table consists all information about layerslookup_table = LookUpTable(calulate_latency=CONFIG_SUPERNET['lookup_table']['create_from_scratch'])
get_loaders 划分训练和验证数据集。
#### DataLoadingtrain_w_loader, train_thetas_loader = get_loaders(CONFIG_SUPERNET['dataloading']['w_share_in_train'],CONFIG_SUPERNET['dataloading']['batch_size'],CONFIG_SUPERNET['dataloading']['path_to_save_data'],logger)test_loader = get_test_loader(CONFIG_SUPERNET['dataloading']['batch_size'],CONFIG_SUPERNET['dataloading']['path_to_save_data'])
实例化 FBNet_Stochastic_SuperNet 。
nn.Module.apply 将fn
递归地应用于每个子模块(由.children()
返回)以及self
。典型用途包括初始化模型的参数(另请参见 torch.nn.init)。
这里为什么调用 weights_init 而不是在内部初始化?
没有加载快照继续训练的功能。
torch.nn.DataParallel 在模块级实现数据并行性。此容器通过在批处理维度中进行分块,将输入拆分到指定设备上,从而使给定module
的应用程序并行化(其他对象将在每个设备上复制一次)。在前向过程中,模块在每个设备上复制,每个副本处理输入的一部分。在向后传递期间,汇总每个副本的梯度到原始模块中。批量大小应大于使用的 GPU 数量。
另请参阅:Use nn.DataParallel instead of multiprocessing
允许将任意位置和关键字输入传递到 DataParallel,但某些类型是特殊处理的。在指定的dim
上(默认为0)分散张量。浅复制元组、列表和字典类型。其他类型将在不同的线程之间共享,如果在模型的正向传递中写入,则可能会损坏。
在运行此 DataParallel 模块之前,并行化module
必须在device_ids[0]
上具有其参数和缓冲区。
每次前向时,模块都会复制到每个设备上,因此
forward
运行模块的任何更新都将丢失。例如,如果module
具有在每个forward
中递增的计数器属性,则它将始终保持在初始值,因为更新是对forward
之后销毁的副本进行的。但是,DataParallel 保证device[0]
上副本的参数和缓冲区与基本并行化module
共享存储。 因此将记录device[0]
上的参数和缓冲区的原地更新。例如,BBatchNorm2d 和 spectral_norm() 依赖于此行为来更新缓冲区。
将调用module
及其子模块上定义的前向和后向钩子len(device_ids)
次,每个钩子的输入都位于特定的设备上。特别地,仅保证钩子在相应设备上的操作顺序正确。例如,不能保证在所有len(device_ids)
个 forward() 调用之前执行通过 register_forward_pre_hook() 设置的钩子,但是每个钩子都会在该设备的相应 forward() 调用之前执行。
#### Modelmodel = FBNet_Stochastic_SuperNet(lookup_table, cnt_classes=10).cuda()model = model.apply(weights_init)model = nn.DataParallel(model, device_ids=[0])
网络权重和结构参数关联到不同的优化器。
SupernetLoss 计算带有延迟的损失。
torch.optim.lr_scheduler.CosineAnnealingLR 使用余弦退火计划设置每个参数组的学习率,其中 ηmax\eta_{max}ηmax 设置为初始 lr,TcurT_{cur}Tcur 是自 SGDR 上次重启以来的纪元数:
ηt+1=ηmin+(ηt−ηmin)1+cos(Tcur+1Tmaxπ)1+cos(TcurTmaxπ),Tcur≠(2k+1)Tmax;ηt+1=ηt+(ηmax−ηmin)1−cos(1Tmaxπ)2,Tcur=(2k+1)Tmax.\begin{aligned} \eta_{t+1} = \eta_{min} + (\eta_t - \eta_{min})\frac{1 + \cos(\frac{T_{cur+1}}{T_{max}}\pi)}{1 + \cos(\frac{T_{cur}}{T_{max}}\pi)}, T_{cur} \neq (2k+1)T_{max};\\ \eta_{t+1} = \eta_{t} + (\eta_{max} - \eta_{min})\frac{1 - \cos(\frac{1}{T_{max}}\pi)}{2}, T_{cur} = (2k+1)T_{max}.\\ \end{aligned} ηt+1=ηmin+(ηt−ηmin)1+cos(TmaxTcurπ)1+cos(TmaxTcur+1π),Tcur̸=(2k+1)Tmax;ηt+1=ηt+(ηmax−ηmin)21−cos(Tmax1π),Tcur=(2k+1)Tmax.
#### Loss, Optimizer and Schedulercriterion = SupernetLoss().cuda()thetas_params = [param for name, param in model.named_parameters() if 'thetas' in name]params_except_thetas = [param for param in model.parameters() if not check_tensor_in_list(param, thetas_params)]w_optimizer = torch.optim.SGD(params=params_except_thetas,lr=CONFIG_SUPERNET['optimizer']['w_lr'], momentum=CONFIG_SUPERNET['optimizer']['w_momentum'],weight_decay=CONFIG_SUPERNET['optimizer']['w_weight_decay'])theta_optimizer = torch.optim.Adam(params=thetas_params,lr=CONFIG_SUPERNET['optimizer']['thetas_lr'],weight_decay=CONFIG_SUPERNET['optimizer']['thetas_weight_decay'])last_epoch = -1w_scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(w_optimizer,T_max=CONFIG_SUPERNET['train_settings']['cnt_epochs'],last_epoch=last_epoch)
TrainerSupernet 封装了训练过程。
#### Training Looptrainer = TrainerSupernet(criterion, w_optimizer, theta_optimizer, w_scheduler, logger, writer)trainer.train_loop(train_w_loader, train_thetas_loader, test_loader, model)
get_logger
""" Make python logger """# [!] Since tensorboardX use default logger (e.g. logging.info()), we should use custom loggerlogger = logging.getLogger('fbnet')log_format = '%(asctime)s | %(message)s'formatter = logging.Formatter(log_format, datefmt='%m/%d %I:%M:%S %p')file_handler = logging.FileHandler(file_path)file_handler.setFormatter(formatter)stream_handler = logging.StreamHandler()stream_handler.setFormatter(formatter)logger.addHandler(file_handler)logger.addHandler(stream_handler)logger.setLevel(logging.INFO)return logger
LookUpTable
CANDIDATE_BLOCKS 列举了论文表2中的9种结构,详细参数在 PRIMITIVES 中。
Block type | expansion | Kernel | Group |
---|---|---|---|
k3_e1 | 1 | 3 | 1 |
k3_e1_g2 | 1 | 3 | 2 |
k3_e3 | 3 | 3 | 1 |
k3_e6 | 6 | 3 | 1 |
k5_e1 | 1 | 5 | 1 |
k5_e1_g2 | 1 | 5 | 2 |
k5_e3 | 3 | 5 | 1 |
k5_e6 | 6 | 5 | 1 |
skip | - | - | - |
SEARCH_SPACE 对应论文表1网络结构(仅 TBS)。
Input shape | Block | f | n | s |
---|---|---|---|---|
2242×3224^2 \times 32242×3 | 3x3 conv | 16 | 1 | 2 |
1122×16112^2 \times 161122×16 | TBS | 16 | 1 | 1 |
1122×16112^2 \times 161122×16 | TBS | 24 | 4 | 2 |
562×2456^2 \times 24562×24 | TBS | 32 | 4 | 2 |
282×3228^2 \times 32282×32 | TBS | 64 | 4 | 2 |
142×6414^2 \times 64142×64 | TBS | 112 | 4 | 1 |
142×11214^2 \times 112142×112 | TBS | 184 | 4 | 2 |
72×1847^2 \times 18472×184 | TBS | 352 | 1 | 1 |
72×3527^2 \times 35272×352 | 1x1 conv | 1984 | 1 | 1 |
72×1504(1984)7^2 \times 1504~(1984)72×1504(1984) | x7 avgpool | - | 1 | 1 |
150415041504 | fc | 1000 | 1 | - |
由search_space
的输入形状数量推断层数。
创建操作符字典self.lookup_table_operations
。
_generate_layers_parameters 从 SEARCH_SPACE 中解析出层参数和输入参数。
def __init__(self, candidate_blocks=CANDIDATE_BLOCKS, search_space=SEARCH_SPACE,calulate_latency=False):self.cnt_layers = len(search_space["input_shape"])# constructors for each operationself.lookup_table_operations = {op_name : PRIMITIVES[op_name] for op_name in candidate_blocks}# arguments for the ops constructors. one set of arguments for all 9 constructors at each layer# input_shapes just for convinienceself.layers_parameters, self.layers_input_shapes = self._generate_layers_parameters(search_space)
_create_from_operations 计算操作符的耗时并写入文件。
_read_lookup_table_from_file 从文件读取结果。
# lookup_tableself.lookup_table_latency = Noneif calulate_latency:self._create_from_operations(cnt_of_runs=CONFIG_SUPERNET['lookup_table']['number_of_runs'],write_to_file=CONFIG_SUPERNET['lookup_table']['path_to_lookup_table'])else:self._create_from_file(path_to_file=CONFIG_SUPERNET['lookup_table']['path_to_lookup_table'])
_generate_layers_parameters
# layers_parameters are : C_in, C_out, expansion, stridelayers_parameters = [(search_space["input_shape"][layer_id][0],search_space["channel_size"][layer_id],# expansion (set to -999) embedded into operation and will not be considered# (look fbnet_building_blocks/fbnet_builder.py - this is facebookresearch code# and I don't want to modify it)-999,search_space["strides"][layer_id]) for layer_id in range(self.cnt_layers)]# layers_input_shapes are (C_in, input_w, input_h)layers_input_shapes = search_space["input_shape"]return layers_parameters, layers_input_shapes
_create_from_operations
self.lookup_table_latency = self._calculate_latency(self.lookup_table_operations,self.layers_parameters,self.layers_input_shapes,cnt_of_runs)if write_to_file is not None:self._write_lookup_table_to_file(write_to_file)
_calculate_latency
LATENCY_BATCH_SIZE = 1latency_table_layer_by_ops = [{} for i in range(self.cnt_layers)]for layer_id in range(self.cnt_layers):for op_name in operations:op = operations[op_name](*layers_parameters[layer_id])input_sample = torch.randn((LATENCY_BATCH_SIZE, *layers_input_shapes[layer_id]))globals()['op'], globals()['input_sample'] = op, input_sampletotal_time = timeit.timeit('output = op(input_sample)', setup="gc.enable()", \globals=globals(), number=cnt_of_runs)# measured in micro-secondlatency_table_layer_by_ops[layer_id][op_name] = total_time / cnt_of_runs / LATENCY_BATCH_SIZE * 1e6return latency_table_layer_by_ops
_write_lookup_table_to_file
clear_files_in_the_list 清空已有文件。
ops
为操作符名称列表。第1行打印名称。
clear_files_in_the_list([path_to_file])ops = [op_name for op_name in self.lookup_table_operations]text = [op_name + " " for op_name in ops[:-1]]text.append(ops[-1] + "\n")
打印操作符的耗时,每行为一个 TBS。
add_text_to_file 以文件形式保存结果。
for layer_id in range(self.cnt_layers):for op_name in ops:text.append(str(self.lookup_table_latency[layer_id][op_name]))text.append(" ")text[-1] = "\n"text = text[:-1]text = ''.join(text)add_text_to_file(text, path_to_file)
_create_from_file
self.lookup_table_latency = self._read_lookup_table_from_file(path_to_file)
_read_lookup_table_from_file
latences = [line.strip('\n') for line in open(path_to_file)]ops_names = latences[0].split(" ")latences = [list(map(float, layer.split(" "))) for layer in latences[1:]]lookup_table_latency = [{op_name : latences[i][op_id] for op_id, op_name in enumerate(ops_names)} for i in range(self.cnt_layers)]return lookup_table_latency
get_loaders
train_transform = transforms.Compose([transforms.RandomCrop(32, padding=4),transforms.RandomHorizontalFlip(),transforms.ToTensor(),transforms.Normalize(CIFAR_MEAN, CIFAR_STD),])train_data = datasets.CIFAR10(root=path_to_save_data, train=True, download=True, transform=train_transform)
创建索引,划分数据集。
torch.utils.data.SubsetRandomSampler 从给定的索引列表中随机抽样元素,无需替换。
num_train = len(train_data) # 50kindices = list(range(num_train)) #split = int(np.floor(train_portion * num_train)) # 40ktrain_idx, valid_idx = indices[:split], indices[split:]train_sampler = SubsetRandomSampler(train_idx)train_loader = torch.utils.data.DataLoader(train_data, batch_size=batch_size, sampler=train_sampler,pin_memory=True, num_workers=32)if train_portion == 1:return train_loadervalid_sampler = SubsetRandomSampler(valid_idx)val_loader = torch.utils.data.DataLoader(train_data, batch_size=batch_size, sampler=train_sampler,pin_memory=True, num_workers=16)return train_loader, val_loader
get_test_loader
test_transform = transforms.Compose([transforms.ToTensor(),transforms.Normalize(CIFAR_MEAN, CIFAR_STD),])test_data = datasets.CIFAR10(root=path_to_save_data, train=False,download=True, transform=test_transform)test_loader = torch.utils.data.DataLoader(test_data, batch_size=batch_size,shuffle=False, num_workers=16)return test_loader
FBNet_Stochastic_SuperNet
def __init__(self, lookup_table, cnt_classes=1000):super(FBNet_Stochastic_SuperNet, self).__init__()# self.first identical to 'add_first' in the fbnet_building_blocks/fbnet_builder.pyself.first = ConvBNRelu(input_depth=3, output_depth=16, kernel=3, stride=2,pad=3 // 2, no_bias=1, use_relu="relu", bn_type="bn")self.stages_to_search = nn.ModuleList([MixedOperation(lookup_table.layers_parameters[layer_id],lookup_table.lookup_table_operations,lookup_table.lookup_table_latency[layer_id])for layer_id in range(lookup_table.cnt_layers)])self.last_stages = nn.Sequential(OrderedDict([("conv_k1", nn.Conv2d(lookup_table.layers_parameters[-1][1], 1504, kernel_size = 1)),("avg_pool_k7", nn.AvgPool2d(kernel_size=7)),("flatten", Flatten()),("fc", nn.Linear(in_features=1504, out_features=cnt_classes)),]))
forward
y = self.first(x)for mixed_op in self.stages_to_search:y, latency_to_accumulate = mixed_op(y, temperature, latency_to_accumulate)y = self.last_stages(y)return y, latency_to_accumulate
MixedOperation
# Arguments:# proposed_operations is a dictionary {operation_name : op_constructor}# latency is a dictionary {operation_name : latency}def __init__(self, layer_parameters, proposed_operations, latency):super(MixedOperation, self).__init__()ops_names = [op_name for op_name in proposed_operations]self.ops = nn.ModuleList([proposed_operations[op_name](*layer_parameters)for op_name in ops_names])self.latency = [latency[op_name] for op_name in ops_names]self.thetas = nn.Parameter(torch.Tensor([1.0 / len(ops_names) for i in range(len(ops_names))]))
forward
参数:
logits
:[…, num_features]
未标准化的概率对数tau
:非负标量温度hard
:如果为True
,则返回的样本将被离散化为 one-hot 矢量,但可微,就好像它是 autograd 中的软样本一样dim
(int):计算 softmax 的维数。默认值:-1。
返回:
采样与logits
形状相同的张量,服从 Gumbel-Softmax 分布。如果hard=True
,则返回的样本将是独热的,否则它们将是各dim
概率和为1的概率分布。
此函数出于遗留原因,可能会在将来从 nn.Functional 中删除。
hard
的主要技巧是做 y_hard - y_soft.detach() + y_soft
它实现了两件事:
- 使输出值完全独热(因为我们加然后减去 y_soft 值)
- 使梯度等于 y_soft 梯度(因为我们剥离所有其他梯度)
这里self.thetas
需要加 torch.Tensor.unsqueeze 操作变成2维。
soft_mask_variables = nn.functional.gumbel_softmax(self.thetas, temperature)output = sum(m * op(x) for m, op in zip(soft_mask_variables, self.ops))latency = sum(m * lat for m, lat in zip(soft_mask_variables, self.latency))latency_to_accumulate = latency_to_accumulate + latencyreturn output, latency_to_accumulate
weights_init
if deepth > max_depth:returnif isinstance(m, torch.nn.Conv2d):torch.nn.init.kaiming_uniform_(m.weight.data)if m.bias is not None:torch.nn.init.constant_(m.bias.data, 0)elif isinstance(m, torch.nn.Linear):m.weight.data.normal_(0, 0.01)if m.bias is not None:m.bias.data.zero_()elif isinstance(m, torch.nn.BatchNorm2d):returnelif isinstance(m, torch.nn.ReLU):returnelif isinstance(m, torch.nn.Module):deepth += 1for m_ in m.modules():weights_init(m_, deepth)else:raise ValueError("%s is unk" % m.__class__.__name__)
SupernetLoss
def __init__(self):super(SupernetLoss, self).__init__()self.alpha = CONFIG_SUPERNET['loss']['alpha']self.beta = CONFIG_SUPERNET['loss']['beta']self.weight_criterion = nn.CrossEntropyLoss()
forward
需要对torch.log(latency ** self.beta)
求均值。
self.beta
应放在外面,否则会失去作用。
ce = self.weight_criterion(outs, targets)lat = torch.log(latency ** self.beta)losses_ce.update(ce.item(), N)losses_lat.update(lat.item(), N)loss = self.alpha * ce * latreturn loss #.unsqueeze(0)
TrainerSupernet
def __init__(self, criterion, w_optimizer, theta_optimizer, w_scheduler, logger, writer):self.top1 = AverageMeter()self.top3 = AverageMeter()self.losses = AverageMeter()self.losses_lat = AverageMeter()self.losses_ce = AverageMeter()self.logger = loggerself.writer = writerself.criterion = criterionself.w_optimizer = w_optimizerself.theta_optimizer = theta_optimizerself.w_scheduler = w_schedulerself.temperature = CONFIG_SUPERNET['train_settings']['init_temperature']self.exp_anneal_rate = CONFIG_SUPERNET['train_settings']['exp_anneal_rate'] # apply it every epochself.cnt_epochs = CONFIG_SUPERNET['train_settings']['cnt_epochs']self.train_thetas_from_the_epoch = CONFIG_SUPERNET['train_settings']['train_thetas_from_the_epoch']self.print_freq = CONFIG_SUPERNET['train_settings']['print_freq']self.path_to_save_model = CONFIG_SUPERNET['train_settings']['path_to_save_model']
train_loop
首先训练网络权重self.train_thetas_from_the_epoch
个 epoch。
调用 _training_step 一次训练一个 epoch,名字不具有表现力。
best_top1 = 0.0# firstly, train weights onlyfor epoch in range(self.train_thetas_from_the_epoch):self.writer.add_scalar('learning_rate/weights', self.w_optimizer.param_groups[0]['lr'], epoch)self.logger.info("Firstly, start to train weights for epoch %d" % (epoch))self._training_step(model, train_w_loader, self.w_optimizer, epoch, info_for_logger="_w_step_")self.w_scheduler.step()
for epoch in range(self.train_thetas_from_the_epoch, self.cnt_epochs):self.writer.add_scalar('learning_rate/weights', self.w_optimizer.param_groups[0]['lr'], epoch)self.writer.add_scalar('learning_rate/theta', self.theta_optimizer.param_groups[0]['lr'], epoch)self.logger.info("Start to train weights for epoch %d" % (epoch))self._training_step(model, train_w_loader, self.w_optimizer, epoch, info_for_logger="_w_step_")self.w_scheduler.step()self.logger.info("Start to train theta for epoch %d" % (epoch))self._training_step(model, train_thetas_loader, self.theta_optimizer, epoch, info_for_logger="_theta_step_")top1_avg = self._validate(model, test_loader, epoch)if best_top1 < top1_avg:best_top1 = top1_avgself.logger.info("Best top1 acc by now. Save model")save(model, self.path_to_save_model)self.temperature = self.temperature * self.exp_anneal_rate
_training_step
需要显式构造
latency_to_accumulate
变量,且元素与设备数量相同。
_intermediate_stats_logging 记录损失、top1、top3、交叉熵以及延迟。
_epoch_stats_logging 记录 epoch 状态信息到 tensorboard。
model = model.train()start_time = time.time()for step, (X, y) in enumerate(loader):X, y = X.cuda(non_blocking=True), y.cuda(non_blocking=True)# X.to(device, non_blocking=True), y.to(device, non_blocking=True)N = X.shape[0]optimizer.zero_grad()latency_to_accumulate = Variable(torch.Tensor([[0.0]]), requires_grad=True).cuda()outs, latency_to_accumulate = model(X, self.temperature, latency_to_accumulate)loss = self.criterion(outs, y, latency_to_accumulate, self.losses_ce, self.losses_lat, N)loss.backward()optimizer.step()self._intermediate_stats_logging(outs, y, loss, step, epoch, N, len_loader=len(loader), val_or_train="Train")self._epoch_stats_logging(start_time=start_time, epoch=epoch, info_for_logger=info_for_logger, val_or_train='train')for avg in [self.top1, self.top3, self.losses]:avg.reset()
_validate
model.eval()start_time = time.time()with torch.no_grad():for step, (X, y) in enumerate(loader):X, y = X.cuda(), y.cuda()N = X.shape[0]latency_to_accumulate = torch.Tensor([[0.0]]).cuda()outs, latency_to_accumulate = model(X, self.temperature, latency_to_accumulate)loss = self.criterion(outs, y, latency_to_accumulate, self.losses_ce, self.losses_lat, N)self._intermediate_stats_logging(outs, y, loss, step, epoch, N, len_loader=len(loader), val_or_train="Valid")top1_avg = self.top1.get_avg()self._epoch_stats_logging(start_time=start_time, epoch=epoch, val_or_train='val')for avg in [self.top1, self.top3, self.losses]:avg.reset()return top1_avg
_intermediate_stats_logging
prec1, prec3 = accuracy(outs, y, topk=(1, 5))self.losses.update(loss.item(), N)self.top1.update(prec1.item(), N)self.top3.update(prec3.item(), N)
if (step > 1 and step % self.print_freq == 0) or step == len_loader - 1:self.logger.info(val_or_train+": [{:3d}/{}] Step {:03d}/{:03d} Loss {:.3f} ""Prec@(1,3) ({:.1%}, {:.1%}), ce_loss {:.3f}, lat_loss {:.3f}".format(epoch + 1, self.cnt_epochs, step, len_loader - 1, self.losses.get_avg(),self.top1.get_avg(), self.top3.get_avg(), self.losses_ce.get_avg(), self.losses_lat.get_avg()))
_epoch_stats_logging
self.writer.add_scalar('train_vs_val/'+val_or_train+'_loss'+info_for_logger, self.losses.get_avg(), epoch)self.writer.add_scalar('train_vs_val/'+val_or_train+'_top1'+info_for_logger, self.top1.get_avg(), epoch)self.writer.add_scalar('train_vs_val/'+val_or_train+'_top3'+info_for_logger, self.top3.get_avg(), epoch)self.writer.add_scalar('train_vs_val/'+val_or_train+'_losses_lat'+info_for_logger, self.losses_lat.get_avg(), epoch)self.writer.add_scalar('train_vs_val/'+val_or_train+'_losses_ce'+info_for_logger, self.losses_ce.get_avg(), epoch)top1_avg = self.top1.get_avg()self.logger.info(info_for_logger+val_or_train + ": [{:3d}/{}] Final Prec@1 {:.4%} Time {:.2f}".format(epoch+1, self.cnt_epochs, top1_avg, time.time() - start_time))
accuracy
""" Computes the precision@k for the specified values of k """maxk = max(topk)batch_size = target.size(0)_, pred = output.topk(maxk, 1, True, True)pred = pred.t()# one-hot caseif target.ndimension() > 1:target = target.max(1)[1]correct = pred.eq(target.view(1, -1).expand_as(pred))res = []for k in topk:correct_k = correct[:k].view(-1).float().sum(0)res.append(correct_k.mul_(1.0 / batch_size))return res
PRIMITIVES
"skip": lambda C_in, C_out, expansion, stride, **kwargs: Identity(C_in, C_out, stride),"ir_k3": lambda C_in, C_out, expansion, stride, **kwargs: IRFBlock(C_in, C_out, expansion, stride, **kwargs),"ir_k5": lambda C_in, C_out, expansion, stride, **kwargs: IRFBlock(C_in, C_out, expansion, stride, kernel=5, **kwargs),"ir_k7": lambda C_in, C_out, expansion, stride, **kwargs: IRFBlock(C_in, C_out, expansion, stride, kernel=7, **kwargs),"ir_k1": lambda C_in, C_out, expansion, stride, **kwargs: IRFBlock(C_in, C_out, expansion, stride, kernel=1, **kwargs),"shuffle": lambda C_in, C_out, expansion, stride, **kwargs: IRFBlock(C_in, C_out, expansion, stride, shuffle_type="mid", pw_group=4, **kwargs),"basic_block": lambda C_in, C_out, expansion, stride, **kwargs: CascadeConv3x3(C_in, C_out, stride),"shift_5x5": lambda C_in, C_out, expansion, stride, **kwargs: ShiftBlock5x5(C_in, C_out, expansion, stride),# layer search 2"ir_k3_e1": lambda C_in, C_out, expansion, stride, **kwargs: IRFBlock(C_in, C_out, 1, stride, kernel=3, **kwargs),"ir_k3_e3": lambda C_in, C_out, expansion, stride, **kwargs: IRFBlock(C_in, C_out, 3, stride, kernel=3, **kwargs),"ir_k3_e6": lambda C_in, C_out, expansion, stride, **kwargs: IRFBlock(C_in, C_out, 6, stride, kernel=3, **kwargs),"ir_k3_s4": lambda C_in, C_out, expansion, stride, **kwargs: IRFBlock(C_in, C_out, 4, stride, kernel=3, shuffle_type="mid", pw_group=4, **kwargs),"ir_k5_e1": lambda C_in, C_out, expansion, stride, **kwargs: IRFBlock(C_in, C_out, 1, stride, kernel=5, **kwargs),"ir_k5_e3": lambda C_in, C_out, expansion, stride, **kwargs: IRFBlock(C_in, C_out, 3, stride, kernel=5, **kwargs),"ir_k5_e6": lambda C_in, C_out, expansion, stride, **kwargs: IRFBlock(C_in, C_out, 6, stride, kernel=5, **kwargs),"ir_k5_s4": lambda C_in, C_out, expansion, stride, **kwargs: IRFBlock(C_in, C_out, 4, stride, kernel=5, shuffle_type="mid", pw_group=4, **kwargs),# layer search se"ir_k3_e1_se": lambda C_in, C_out, expansion, stride, **kwargs: IRFBlock(C_in, C_out, 1, stride, kernel=3, se=True, **kwargs),"ir_k3_e3_se": lambda C_in, C_out, expansion, stride, **kwargs: IRFBlock(C_in, C_out, 3, stride, kernel=3, se=True, **kwargs),"ir_k3_e6_se": lambda C_in, C_out, expansion, stride, **kwargs: IRFBlock(C_in, C_out, 6, stride, kernel=3, se=True, **kwargs),"ir_k3_s4_se": lambda C_in, C_out, expansion, stride, **kwargs: IRFBlock(C_in,C_out,4,stride,kernel=3,shuffle_type="mid",pw_group=4,se=True,**kwargs),"ir_k5_e1_se": lambda C_in, C_out, expansion, stride, **kwargs: IRFBlock(C_in, C_out, 1, stride, kernel=5, se=True, **kwargs),"ir_k5_e3_se": lambda C_in, C_out, expansion, stride, **kwargs: IRFBlock(C_in, C_out, 3, stride, kernel=5, se=True, **kwargs),"ir_k5_e6_se": lambda C_in, C_out, expansion, stride, **kwargs: IRFBlock(C_in, C_out, 6, stride, kernel=5, se=True, **kwargs),"ir_k5_s4_se": lambda C_in, C_out, expansion, stride, **kwargs: IRFBlock(C_in,C_out,4,stride,kernel=5,shuffle_type="mid",pw_group=4,se=True,**kwargs),# layer search 3 (in addition to layer search 2)"ir_k3_s2": lambda C_in, C_out, expansion, stride, **kwargs: IRFBlock(C_in, C_out, 1, stride, kernel=3, shuffle_type="mid", pw_group=2, **kwargs),"ir_k5_s2": lambda C_in, C_out, expansion, stride, **kwargs: IRFBlock(C_in, C_out, 1, stride, kernel=5, shuffle_type="mid", pw_group=2, **kwargs),"ir_k3_s2_se": lambda C_in, C_out, expansion, stride, **kwargs: IRFBlock(C_in,C_out,1,stride,kernel=3,shuffle_type="mid",pw_group=2,se=True,**kwargs),"ir_k5_s2_se": lambda C_in, C_out, expansion, stride, **kwargs: IRFBlock(C_in,C_out,1,stride,kernel=5,shuffle_type="mid",pw_group=2,se=True,**kwargs),# layer search 4 (in addition to layer search 3)"ir_k3_sep": lambda C_in, C_out, expansion, stride, **kwargs: IRFBlock(C_in, C_out, expansion, stride, kernel=3, cdw=True, **kwargs),"ir_k33_e1": lambda C_in, C_out, expansion, stride, **kwargs: IRFBlock(C_in, C_out, 1, stride, kernel=3, cdw=True, **kwargs),"ir_k33_e3": lambda C_in, C_out, expansion, stride, **kwargs: IRFBlock(C_in, C_out, 3, stride, kernel=3, cdw=True, **kwargs),"ir_k33_e6": lambda C_in, C_out, expansion, stride, **kwargs: IRFBlock(C_in, C_out, 6, stride, kernel=3, cdw=True, **kwargs),# layer search 5 (in addition to layer search 4)"ir_k7_e1": lambda C_in, C_out, expansion, stride, **kwargs: IRFBlock(C_in, C_out, 1, stride, kernel=7, **kwargs),"ir_k7_e3": lambda C_in, C_out, expansion, stride, **kwargs: IRFBlock(C_in, C_out, 3, stride, kernel=7, **kwargs),"ir_k7_e6": lambda C_in, C_out, expansion, stride, **kwargs: IRFBlock(C_in, C_out, 6, stride, kernel=7, **kwargs),"ir_k7_sep": lambda C_in, C_out, expansion, stride, **kwargs: IRFBlock(C_in, C_out, expansion, stride, kernel=7, cdw=True, **kwargs),"ir_k7_sep_e1": lambda C_in, C_out, expansion, stride, **kwargs: IRFBlock(C_in, C_out, 1, stride, kernel=7, cdw=True, **kwargs),"ir_k7_sep_e3": lambda C_in, C_out, expansion, stride, **kwargs: IRFBlock(C_in, C_out, 3, stride, kernel=7, cdw=True, **kwargs),"ir_k7_sep_e6": lambda C_in, C_out, expansion, stride, **kwargs: IRFBlock(C_in, C_out, 6, stride, kernel=7, cdw=True, **kwargs),
}
ConvBNRelu
ConvBNRelu 模块选项很多,可设置不使用 BN,或者使用 FrozenBatchNorm2d。相对来说,nn.BatchNorm2d(C_out, affine=affine)
更常见一些。
def __init__(self,input_depth,output_depth,kernel,stride,pad,no_bias,use_relu,bn_type,group=1,*args,**kwargs):super(ConvBNRelu, self).__init__()assert use_relu in ["relu", None]if isinstance(bn_type, (list, tuple)):assert len(bn_type) == 2assert bn_type[0] == "gn"gn_group = bn_type[1]bn_type = bn_type[0]assert bn_type in ["bn", "af", "gn", None]assert stride in [1, 2, 4]op = Conv2d(input_depth,output_depth,kernel_size=kernel,stride=stride,padding=pad,bias=not no_bias,groups=group,*args,**kwargs)nn.init.kaiming_normal_(op.weight, mode="fan_out", nonlinearity="relu")if op.bias is not None:nn.init.constant_(op.bias, 0.0)self.add_module("conv", op)if bn_type == "bn":bn_op = BatchNorm2d(output_depth)elif bn_type == "gn":bn_op = nn.GroupNorm(num_groups=gn_group, num_channels=output_depth)elif bn_type == "af":bn_op = FrozenBatchNorm2d(output_depth)if bn_type is not None:self.add_module("bn", bn_op)if use_relu == "relu":self.add_module("relu", nn.ReLU(inplace=True))
sample_architecture_from_the_supernet
加载模型。由于 save 保存的是 torch.nn.DataParallel 类型的模型,所以
load 的输入也需保持一致。其属性module
为原模型。
logger = get_logger(CONFIG_SUPERNET['logging']['path_to_log_file'])lookup_table = LookUpTable()model = FBNet_Stochastic_SuperNet(lookup_table, cnt_classes=10).cuda()model = nn.DataParallel(model)load(model, CONFIG_SUPERNET['train_settings']['path_to_save_model'])ops_names = [op_name for op_name in lookup_table.lookup_table_operations]cnt_ops = len(ops_names)
arch_operations=[]if hardsampling:for layer in model.module.stages_to_search:arch_operations.append(ops_names[np.argmax(layer.thetas.detach().cpu().numpy())])else:rng = np.linspace(0, cnt_ops - 1, cnt_ops, dtype=int)for layer in model.module.stages_to_search:distribution = softmax(layer.thetas.detach().cpu().numpy())arch_operations.append(ops_names[np.random.choice(rng, p=distribution)])logger.info("Sampled Architecture: " + " - ".join(arch_operations))writh_new_ARCH_to_fbnet_modeldef(arch_operations, my_unique_name_for_ARCH=unique_name_of_arch)logger.info("CONGRATULATIONS! New architecture " + unique_name_of_arch \+ " was written into fbnet_building_blocks/fbnet_modeldef.py")
load
model.load_state_dict(torch.load(model_path))
writh_new_ARCH_to_fbnet_modeldef
MODEL_ARCH 用于保存模型结构。
检查名字是否已存在。
assert len(ops_names) == 22if my_unique_name_for_ARCH in MODEL_ARCH:print("The specification with the name", my_unique_name_for_ARCH, "already written \to the fbnet_building_blocks.fbnet_modeldef. Please, create a new name \or delete the specification from fbnet_building_blocks.fbnet_modeldef (by hand)")assert my_unique_name_for_ARCH not in MODEL_ARCH
将ops_names
转为字符串列表ops
,进一步按 stage 分组拼接为ops_lines
### create text to inserttext_to_write = " \"" + my_unique_name_for_ARCH + "\": {\n\\"block_op_type\": [\n"ops = ["[\"" + str(op) + "\"], " for op in ops_names]ops_lines = [ops[0], ops[1:5], ops[5:9], ops[9:13], ops[13:17], ops[17:21], ops[21]]ops_lines = [''.join(line) for line in ops_lines]text_to_write += ' ' + '\n '.join(ops_lines)
e = [(op_name[-1] if op_name[-2] == 'e' else '1') for op_name in ops_names]text_to_write += "\n\],\n\\"block_cfg\": {\n\\"first\": [16, 2],\n\\"stages\": [\n\[["+e[0]+", 16, 1, 1]], # stage 1\n\[["+e[1]+", 24, 1, 2]], [["+e[2]+", 24, 1, 1]], \[["+e[3]+", 24, 1, 1]], [["+e[4]+", 24, 1, 1]], # stage 2\n\[["+e[5]+", 32, 1, 2]], [["+e[6]+", 32, 1, 1]], \[["+e[7]+", 32, 1, 1]], [["+e[8]+", 32, 1, 1]], # stage 3\n\[["+e[9]+", 64, 1, 2]], [["+e[10]+", 64, 1, 1]], \[["+e[11]+", 64, 1, 1]], [["+e[12]+", 64, 1, 1]], # stage 4\n\[["+e[13]+", 112, 1, 1]], [["+e[14]+", 112, 1, 1]], \[["+e[15]+", 112, 1, 1]], [["+e[16]+", 112, 1, 1]], # stage 5\n\[["+e[17]+", 184, 1, 2]], [["+e[18]+", 184, 1, 1]], \[["+e[19]+", 184, 1, 1]], [["+e[20]+", 184, 1, 1]], # stage 6\n\[["+e[21]+", 352, 1, 1]], # stage 7\n\],\n\\"backbone\": [num for num in range(23)],\n\},\n\},\n\
}\
"
### open file and find place to insertwith open('./fbnet_building_blocks/fbnet_modeldef.py') as f1:lines = f1.readlines()end_of_MODEL_ARCH_id = next(i for i in reversed(range(len(lines))) if lines[i].strip() == '}')text_to_write = lines[:end_of_MODEL_ARCH_id] + [text_to_write]with open('./fbnet_building_blocks/fbnet_modeldef.py', 'w') as f2:f2.writelines(text_to_write)
参考资料:
- Lambda Lambda Lambda
- Print lists in Python (4 Different Ways)
- Optional: Data Parallelism
AnnaAraslanova/FBNet 程序分析相关推荐
- 程序分析工具gprof介绍
程序分析是以某种语言书写的程序为对象,对其内部的运作流程进行分析.程序分析的目的主要有三点:一是通过程序内部各个模块之间的调用关系,整体上把握程序的运行流程,从而更好地理解程序,从中汲取有价值的内容. ...
- 字节跳动pest分析_字节跳动小程序分析:前景及优势都是什么?
近几年小程序渐渐成为微信.百度.支付宝等巨头的标配,各大互联网巨头纷纷加码小程序,字节跳动自然也不甘落后.字节跳动小程序前景如何呢?我们来做一个详细的字节跳动小程序分析: 1.平台条件 小程序要想做起 ...
- 第四周项目四-程序分析(4)
/**Copyright(c)2016,烟台大学计算机与控制工程学院*All rights reserved*文件名称:123.cpp*作 者:王蕊*完成日期:2016年3月23日*版 本 号:v1. ...
- 静态程序分析chapter3 - 数据流分析详述(Reaching Definitions、Live Variables、Available Expressions Analysis)
文章目录 二. 数据流分析 introduction1 introduction2 输入和输出状态 转换函数 数据流分析应用 1,Reaching Definitions Analysis 概述 用途 ...
- 静态程序分析chapter1 - 概述和两个重要步骤
文章目录 前言 Static Analysis Rice's Theorem Sound & Complete Sound 示例 小结 抽象和过近似(Abstraction + Over-ap ...
- 基于时间片轮转程序分析进程调度
张雨梅 原创作品转载请注明出处 <Linux内核分析>MOOC课程http://mooc.study.163.com/course/USTC-10000 背景知识 一般程序运行过程中都 ...
- c语言报告程序分析报告,2012C语言程序分析报告.doc
2012C语言程序分析报告 C语言程序设计专周 专 周 报 告 班级:10611 学号:20 姓名: 设计时间:2011-5-30至2011-6-3 一.设计题目:职工工资管理小软件 二.实习目的 1 ...
- 微型计算机原理sar,微机原理的题一.程序分析 1.MOV AX,80F0H MOV CL,10H SAR AX,CL ADD AX,80H...
共回答了15个问题采纳率:86.7% 一.程序分析 1.MOV AX,80F0H;AX=1000 0000 1111 0000B MOV CL,10H;CL=16 SAR AX,CL;对AX算术右移1 ...
- java 程序分析题_java程序入门50题分析:002
[程序2]题目:判断101-200之间有多少个素数,并输出所有素数. 程序分析2:神马是素数,坑爹么,我都不知道素数,吃素我是知道了.那就百度下吧!!质数又称素数.指在一个大于1的自然数中,除了1和此 ...
最新文章
- 高校促进“智慧城市”信息化建设策略研究
- 深夜,你的手机为谁开?
- Visual C# 2010从入门到精通
- 深入理解Golang 编程思维和工程实战
- 每年通过率仅1%的“天才考试”,中国到底应不应该学?
- (23)FPGA面试技能提升篇(SSC接口、V35接口)
- 动态规划-矩阵连乘问题
- linux关闭邮件提示错误,LINUX命令关闭 You have mail in /var/spool/mail/root邮件提醒功能...
- 记一次python分布式web开发(利用docker)
- 如何修改远程服务器登录密码
- HTML5 video 视频标签使用介绍
- iNFTnews|Opensea上爆火的Art Gobblers为何引起巨大争议?
- JAVA第二次作业《胖瘦程度计算》
- strncmp函数用法是什么
- nw.js 软件推荐:AxeSlide斧子演示:PPT的另一种可能(转)
- 桂林电子科技大学计算机学院钟艳如,桂林电子科技大学考研研究生导师简介-钟艳如...
- Java 定时器 Timer 原理解析
- 『Java安全』XStream 1.4-1.4.61.4.10反序列化漏洞CVE-2013-7285复现与浅析
- 基于MariaDB4j实现持久层单元测试
- Python实现Fleiss Kappa一致性分析,并计算Z值和p值等相关统计量