BERT缺点：全局注意力机制，计算量大
引入新的基于跨度的动态卷积
通过迭代方式堆叠混合注意力机制和分组前馈模块，构建ConBert模型

# -*- coding: utf-8 -*-
"""“A quick example of ConvBERT”的副本Automatically generated by Colaboratory.Original file is located athttps://colab.research.google.com/drive/1wwyVpznJ-f7W2tMtv1cq8J2P4Jw6kvrf# A quick example of Finetuning ConvBERT### Download code, pre-trained model and data
"""!git clone https://github.com/yitu-opensource/ConvBert.git!gdown https://drive.google.com/uc?id=1qso7-JsjXmtUIMlJjli5jUVV_JJ35-Zi
!unzip convbert_models.zipcd ConvBert/!mkdir -p data/modelsmv ../convbert_models/* data/models/mv vocab.txt data/!python3 download_glue_data.py!cd glue_data && mv CoLA cola && mv MNLI mnli && mv MRPC mrpc && mv QNLI qnli && mv QQP qqp && mv RTE rte && mv SST-2 sst && mv STS-B sts && mv diagnostic/diagnostic.tsv mnli && mkdir -p /content/ConvBert/data/finetuning_data && mv * /content/ConvBert/data/finetuning_data!pip install tensorflow-gpu==1.15cd /content/ConvBert"""### Finetune pre-trained model on CoLA"""!python3 run_finetuning.py --data-dir data \
--model-name convbert_medium-small --hparams '{"model_size": "medium-small", "task_names": ["cola"]}'from google.colab import drive
drive.mount('/content/drive')

结果

Config: model=convbert_medium-small, trial 1/1

answerable_classifier True
answerable_uses_start_logits True
answerable_weight 0.5
beam_size 20
conv_kernel_size 9
conv_type sdconv
data_dir data
debug False
do_eval True
do_lower_case True
do_train True
doc_stride 128
double_unordered True
embedding_size 128
eval_batch_size 32
gcp_project None
head_ratio 2
init_checkpoint None
iterations_per_loop 1000
joint_prediction True
keep_all_models False
layerwise_lr_decay 0.8
learning_rate 0.0003
linear_groups 2
log_examples False
max_answer_length 30
max_query_length 64
max_seq_length 128
model_dir data/models/convbert_medium-small/finetuning_models/cola_model
model_hparam_overrides {}
model_name convbert_medium-small
model_size medium-small
n_best_size 20
n_writes_test 5
num_tpu_cores 1
num_train_epochs 3.0
num_trials 1
predict_batch_size 32
preprocessed_data_dir data/finetuning_tfrecords/cola_tfrecords
qa_eval_file <built-in method format of str object at 0x7f750ab5bab0>
qa_na_file <built-in method format of str object at 0x7f750be1f990>
qa_na_threshold -2.75
qa_preds_file <built-in method format of str object at 0x7f750be1f828>
raw_data_dir <built-in method format of str object at 0x7f750ab80620>
results_pkl data/models/convbert_medium-small/results/cola_results.pkl
results_txt data/models/convbert_medium-small/results/cola_results.txt
save_checkpoints_steps 1000000
task_names [‘cola’]
test_predictions <built-in method format of str object at 0x7f750ab4abb0>
tpu_job_name None
tpu_name None
tpu_zone None
train_batch_size 32
use_tfrecords_if_existing True
use_tpu False
vocab_file data/vocab.txt
vocab_size 30522
warmup_proportion 0.1
weight_decay_rate 0.01
write_test_outputs True

Loading dataset cola_train
Existing tfrecords not found so creating
Writing example 0 of 8551
Writing example 2000 of 8551
Writing example 4000 of 8551
Writing example 6000 of 8551
Writing example 8000 of 8551

Start training: model=convbert_medium-small, trial 1/1

Training for 804 steps
Building model…
Building complete
Model size: 17476K
2020-11-22 06:49:37.660643: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2020-11-22 06:49:37.665460: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2200000000 Hz
2020-11-22 06:49:37.665678: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x2bb5d40 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-11-22 06:49:37.665719: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
2020-11-22 06:49:37.669099: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2020-11-22 06:49:37.850764: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-11-22 06:49:37.851533: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x2bb52c0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-11-22 06:49:37.851564: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Tesla T4, Compute Capability 7.5
2020-11-22 06:49:37.852504: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-11-22 06:49:37.853099: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties:
name: Tesla T4 major: 7 minor: 5 memoryClockRate(GHz): 1.59
pciBusID: 0000:00:04.0
2020-11-22 06:49:37.863347: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2020-11-22 06:49:38.072328: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2020-11-22 06:49:38.164226: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10.0
2020-11-22 06:49:38.178642: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10.0
2020-11-22 06:49:38.407491: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0
2020-11-22 06:49:38.564427: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10.0
2020-11-22 06:49:39.064241: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-11-22 06:49:39.064452: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-11-22 06:49:39.065201: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-11-22 06:49:39.065745: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
2020-11-22 06:49:39.065877: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2020-11-22 06:49:39.067334: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-11-22 06:49:39.067366: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165] 0
2020-11-22 06:49:39.067378: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0: N
2020-11-22 06:49:39.067509: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-11-22 06:49:39.068136: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-11-22 06:49:39.068696: W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:39] Overriding allow_growth setting because the TF_FORCE_GPU_ALLOW_GROWTH environment variable is set. Original config value was 0.
2020-11-22 06:49:39.068754: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14221 MB memory) -> physical GPU (device: 0, name: Tesla T4, pci bus id: 0000:00:04.0, compute capability: 7.5)
2020-11-22 06:50:00.245076: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2020-11-22 06:50:01.647667: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
10/804 = 1.2%, SPS: 0.4, ELAP: 24, ETA: 32:01 - loss: 34.5120
20/804 = 2.5%, SPS: 0.7, ELAP: 28, ETA: 18:08 - loss: 25.7527
30/804 = 3.7%, SPS: 1.0, ELAP: 31, ETA: 13:29 - loss: 30.5804
40/804 = 5.0%, SPS: 1.1, ELAP: 35, ETA: 11:07 - loss: 20.3636
50/804 = 6.2%, SPS: 1.3, ELAP: 39, ETA: 9:41 - loss: 23.4167
60/804 = 7.5%, SPS: 1.4, ELAP: 42, ETA: 8:43 - loss: 17.2042
70/804 = 8.7%, SPS: 1.5, ELAP: 46, ETA: 8:00 - loss: 30.5870
80/804 = 10.0%, SPS: 1.6, ELAP: 49, ETA: 7:27 - loss: 40.2094
90/804 = 11.2%, SPS: 1.7, ELAP: 53, ETA: 7:01 - loss: 26.7452
100/804 = 12.4%, SPS: 1.8, ELAP: 57, ETA: 6:39 - loss: 16.7169
110/804 = 13.7%, SPS: 1.8, ELAP: 1:00, ETA: 6:21 - loss: 27.0444
120/804 = 14.9%, SPS: 1.9, ELAP: 1:04, ETA: 6:05 - loss: 19.5855
130/804 = 16.2%, SPS: 1.9, ELAP: 1:08, ETA: 5:51 - loss: 16.7763
140/804 = 17.4%, SPS: 2.0, ELAP: 1:11, ETA: 5:38 - loss: 17.4109
150/804 = 18.7%, SPS: 2.0, ELAP: 1:15, ETA: 5:27 - loss: 18.4531
160/804 = 19.9%, SPS: 2.0, ELAP: 1:19, ETA: 5:17 - loss: 18.6332
170/804 = 21.1%, SPS: 2.1, ELAP: 1:22, ETA: 5:07 - loss: 19.4752
180/804 = 22.4%, SPS: 2.1, ELAP: 1:26, ETA: 4:59 - loss: 17.0534
190/804 = 23.6%, SPS: 2.1, ELAP: 1:30, ETA: 4:50 - loss: 19.5509
200/804 = 24.9%, SPS: 2.1, ELAP: 1:34, ETA: 4:43 - loss: 20.9011
210/804 = 26.1%, SPS: 2.2, ELAP: 1:37, ETA: 4:36 - loss: 22.9459
220/804 = 27.4%, SPS: 2.2, ELAP: 1:41, ETA: 4:29 - loss: 25.4820
230/804 = 28.6%, SPS: 2.2, ELAP: 1:45, ETA: 4:22 - loss: 17.2022
240/804 = 29.9%, SPS: 2.2, ELAP: 1:49, ETA: 4:16 - loss: 20.0078
250/804 = 31.1%, SPS: 2.2, ELAP: 1:53, ETA: 4:09 - loss: 22.4817
260/804 = 32.3%, SPS: 2.2, ELAP: 1:56, ETA: 4:03 - loss: 20.0815
270/804 = 33.6%, SPS: 2.2, ELAP: 2:00, ETA: 3:57 - loss: 14.9433
280/804 = 34.8%, SPS: 2.3, ELAP: 2:04, ETA: 3:52 - loss: 16.6641
290/804 = 36.1%, SPS: 2.3, ELAP: 2:08, ETA: 3:46 - loss: 23.8966
300/804 = 37.3%, SPS: 2.3, ELAP: 2:11, ETA: 3:41 - loss: 22.4076
310/804 = 38.6%, SPS: 2.3, ELAP: 2:15, ETA: 3:35 - loss: 21.9326
320/804 = 39.8%, SPS: 2.3, ELAP: 2:19, ETA: 3:30 - loss: 18.5909
330/804 = 41.0%, SPS: 2.3, ELAP: 2:22, ETA: 3:25 - loss: 18.6677
340/804 = 42.3%, SPS: 2.3, ELAP: 2:26, ETA: 3:20 - loss: 17.6923
350/804 = 43.5%, SPS: 2.3, ELAP: 2:30, ETA: 3:14 - loss: 19.8282
360/804 = 44.8%, SPS: 2.3, ELAP: 2:34, ETA: 3:09 - loss: 17.6203
370/804 = 46.0%, SPS: 2.4, ELAP: 2:37, ETA: 3:05 - loss: 14.6527
380/804 = 47.3%, SPS: 2.4, ELAP: 2:41, ETA: 3:00 - loss: 20.6327
390/804 = 48.5%, SPS: 2.4, ELAP: 2:45, ETA: 2:55 - loss: 24.8328
400/804 = 49.8%, SPS: 2.4, ELAP: 2:49, ETA: 2:50 - loss: 19.6345
410/804 = 51.0%, SPS: 2.4, ELAP: 2:52, ETA: 2:46 - loss: 21.9456
420/804 = 52.2%, SPS: 2.4, ELAP: 2:56, ETA: 2:41 - loss: 19.7812
430/804 = 53.5%, SPS: 2.4, ELAP: 3:00, ETA: 2:36 - loss: 15.1743
440/804 = 54.7%, SPS: 2.4, ELAP: 3:03, ETA: 2:32 - loss: 20.7318
450/804 = 56.0%, SPS: 2.4, ELAP: 3:07, ETA: 2:27 - loss: 15.0698
460/804 = 57.2%, SPS: 2.4, ELAP: 3:11, ETA: 2:23 - loss: 20.7934
470/804 = 58.5%, SPS: 2.4, ELAP: 3:15, ETA: 2:18 - loss: 18.8080
480/804 = 59.7%, SPS: 2.4, ELAP: 3:18, ETA: 2:14 - loss: 19.6939
490/804 = 60.9%, SPS: 2.4, ELAP: 3:22, ETA: 2:10 - loss: 21.4402
500/804 = 62.2%, SPS: 2.4, ELAP: 3:26, ETA: 2:05 - loss: 20.7852
510/804 = 63.4%, SPS: 2.4, ELAP: 3:30, ETA: 2:01 - loss: 18.5615
520/804 = 64.7%, SPS: 2.4, ELAP: 3:33, ETA: 1:57 - loss: 21.1930
530/804 = 65.9%, SPS: 2.4, ELAP: 3:37, ETA: 1:52 - loss: 19.7613
540/804 = 67.2%, SPS: 2.4, ELAP: 3:41, ETA: 1:48 - loss: 19.3402
550/804 = 68.4%, SPS: 2.4, ELAP: 3:45, ETA: 1:44 - loss: 23.7254
560/804 = 69.7%, SPS: 2.5, ELAP: 3:48, ETA: 1:39 - loss: 16.6931
570/804 = 70.9%, SPS: 2.5, ELAP: 3:52, ETA: 1:35 - loss: 17.7235
580/804 = 72.1%, SPS: 2.5, ELAP: 3:56, ETA: 1:31 - loss: 18.4697
590/804 = 73.4%, SPS: 2.5, ELAP: 4:00, ETA: 1:27 - loss: 21.6436
600/804 = 74.6%, SPS: 2.5, ELAP: 4:03, ETA: 1:23 - loss: 20.9059
610/804 = 75.9%, SPS: 2.5, ELAP: 4:07, ETA: 1:19 - loss: 16.8878
620/804 = 77.1%, SPS: 2.5, ELAP: 4:11, ETA: 1:14 - loss: 16.3921
630/804 = 78.4%, SPS: 2.5, ELAP: 4:14, ETA: 1:10 - loss: 17.7633
640/804 = 79.6%, SPS: 2.5, ELAP: 4:18, ETA: 1:06 - loss: 17.7392
650/804 = 80.8%, SPS: 2.5, ELAP: 4:22, ETA: 1:02 - loss: 21.0208
660/804 = 82.1%, SPS: 2.5, ELAP: 4:26, ETA: 58 - loss: 21.4841
670/804 = 83.3%, SPS: 2.5, ELAP: 4:29, ETA: 54 - loss: 19.7891
680/804 = 84.6%, SPS: 2.5, ELAP: 4:33, ETA: 50 - loss: 19.5995
690/804 = 85.8%, SPS: 2.5, ELAP: 4:37, ETA: 46 - loss: 18.4344
700/804 = 87.1%, SPS: 2.5, ELAP: 4:41, ETA: 42 - loss: 16.1493
710/804 = 88.3%, SPS: 2.5, ELAP: 4:44, ETA: 38 - loss: 21.7625
720/804 = 89.6%, SPS: 2.5, ELAP: 4:48, ETA: 34 - loss: 19.7536
730/804 = 90.8%, SPS: 2.5, ELAP: 4:52, ETA: 30 - loss: 18.3410
740/804 = 92.0%, SPS: 2.5, ELAP: 4:56, ETA: 26 - loss: 19.0112
750/804 = 93.3%, SPS: 2.5, ELAP: 4:59, ETA: 22 - loss: 22.4744
760/804 = 94.5%, SPS: 2.5, ELAP: 5:03, ETA: 18 - loss: 19.3824
770/804 = 95.8%, SPS: 2.5, ELAP: 5:07, ETA: 14 - loss: 21.4590
780/804 = 97.0%, SPS: 2.5, ELAP: 5:10, ETA: 10 - loss: 18.8766
790/804 = 98.3%, SPS: 2.5, ELAP: 5:14, ETA: 6 - loss: 17.2203
800/804 = 99.5%, SPS: 2.5, ELAP: 5:18, ETA: 2 - loss: 24.9802

================================================================================
Run dev set evaluation: model=convbert_medium-small, trial 1/1

Evaluating cola
Loading dataset cola_dev
Existing tfrecords not found so creating
Writing example 0 of 1043
Building model…
Building complete
Model size: 17476K
2020-11-22 06:55:18.078277: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-11-22 06:55:18.078809: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties:
name: Tesla T4 major: 7 minor: 5 memoryClockRate(GHz): 1.59
pciBusID: 0000:00:04.0
2020-11-22 06:55:18.078919: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2020-11-22 06:55:18.078945: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2020-11-22 06:55:18.078967: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10.0
2020-11-22 06:55:18.078988: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10.0
2020-11-22 06:55:18.079026: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0
2020-11-22 06:55:18.079051: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10.0
2020-11-22 06:55:18.079071: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-11-22 06:55:18.079150: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-11-22 06:55:18.079662: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-11-22 06:55:18.080150: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
2020-11-22 06:55:18.080201: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-11-22 06:55:18.080216: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165] 0
2020-11-22 06:55:18.080227: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0: N
2020-11-22 06:55:18.080355: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-11-22 06:55:18.080964: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-11-22 06:55:18.081449: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14221 MB memory) -> physical GPU (device: 0, name: Tesla T4, pci bus id: 0000:00:04.0, compute capability: 7.5)
/usr/local/lib/python3.6/dist-packages/sklearn/metrics/_classification.py:900: RuntimeWarning: invalid value encountered in double_scalars
mcc = cov_ytyp / np.sqrt(cov_ytyt * cov_ypyp)
cola: mcc: 0.00 - loss: 0.62

Writing results to data/models/convbert_medium-small/results/cola_results.txt

Running on the test set and writing the predictions: model=convbert_medium-small, trial 1/1

Writing out predictions for [Task(cola)] test
Loading dataset cola_test
Existing tfrecords not found so creating
Writing example 0 of 1063
Building model…
Building complete
Model size: 17476K
2020-11-22 06:55:28.103373: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-11-22 06:55:28.103945: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties:
name: Tesla T4 major: 7 minor: 5 memoryClockRate(GHz): 1.59
pciBusID: 0000:00:04.0
2020-11-22 06:55:28.104047: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2020-11-22 06:55:28.104081: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2020-11-22 06:55:28.104103: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10.0
2020-11-22 06:55:28.104127: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10.0
2020-11-22 06:55:28.104150: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0
2020-11-22 06:55:28.104169: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10.0
2020-11-22 06:55:28.104189: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-11-22 06:55:28.104268: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-11-22 06:55:28.104762: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-11-22 06:55:28.105196: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
2020-11-22 06:55:28.105238: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-11-22 06:55:28.105253: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165] 0
2020-11-22 06:55:28.105263: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0: N
2020-11-22 06:55:28.105365: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-11-22 06:55:28.105869: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-11-22 06:55:28.106338: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14221 MB memory) -> physical GPU (device: 0, name: Tesla T4, pci bus id: 0000:00:04.0, compute capability: 7.5)
Pickling predictions for 1063 cola examples (test)

ConBert:Improving BERT with Span-based Dynamic Convolution相关推荐

输出分组_通过分组卷积的思想，巧妙的代码实现动态卷积(Dynamic Convolution)
论文的题目为<Dynamic Convolution: Attention over Convolution Kernels> paper的地址https://arxiv.org/pdf/ ...
【论文笔记】Dynamic Convolution: Attention over Convolution Kernels
Dynamic Convolution: Attention over Convolution Kernels,CVPR2020 论文地址:https://openaccess.thecvf.com/ ...
Dynamic Convolution: Attention over Convolution Kernels
摘要文章提出的动态卷积能够根据输入,动态地集成多个并行的卷积核为一个动态核,可以提升模型表达能力而无需提升网络深度与宽度.通过简单替换成动态卷积,MobileNetV3-small取得了2.3%的性 ...
【论文阅读】Dynamic Convolution: Attention over Convolution Kernels（CVPR2020）
论文题目:Dynamic Convolution: Attention over Convolution Kernels(CVPR2020) 论文地址:https://arxiv.org/abs/19 ...
（二）动态卷积之Dynamic Convolution
代码地址:code 论文题目:Dynamic Convolution: Attention over Convolution Kernels 论文地址:paper 目录前言 Dynamic Conv ...
论文解读：PromptBERT: Improving BERT Sentence Embeddings with Prompts
论文解读:PromptBERT: Improving BERT Sentence Embeddings with Prompts 一.动机虽然BERT等语言模型有很大的成果,但在对句子表征方面(se ...
动态卷积 Dynamic convolution
每周汇报,实属不易.近期学习了关于动态卷积的相关内容,写成一个小节,帮助理解什么为动态卷积.内容较为宽泛,若想学习细节知识,可以参考论文.和知乎链接:https://zhuanlan.zhihu.co ...
PromptBERT: Improving BERT Sentence Embeddings with Prompts
这篇文章用Prompt减少偏差token偏差,传统的BERT输出的向量,在句子语义相似度方面的表现是不好的.作者发现原因主要由两点组成:static token embedding biases和in ...
PromptBERT: Improving BERT Sentence Embeddings with Prompts （通篇翻译）
PromptBERT:使用提示改进BERT句子嵌入 Ting Jiang 1 ∗ , Shaohan Huang 3 , Zihan Zhang 4 , Deqing Wang 1 † , Fuzhe ...

ConBert:Improving BERT with Span-based Dynamic Convolution

结果

Config: model=convbert_medium-small, trial 1/1

Loading dataset cola_train
Existing tfrecords not found so creating
Writing example 0 of 8551
Writing example 2000 of 8551
Writing example 4000 of 8551
Writing example 6000 of 8551
Writing example 8000 of 8551

Start training: model=convbert_medium-small, trial 1/1

================================================================================
Run dev set evaluation: model=convbert_medium-small, trial 1/1

Writing results to data/models/convbert_medium-small/results/cola_results.txt

Running on the test set and writing the predictions: model=convbert_medium-small, trial 1/1

ConBert:Improving BERT with Span-based Dynamic Convolution相关推荐

最新文章

热门文章

ConBert:Improving BERT with Span-based Dynamic Convolution

结果

Config: model=convbert_medium-small, trial 1/1

Loading dataset cola_train Existing tfrecords not found so creating Writing example 0 of 8551 Writing example 2000 of 8551 Writing example 4000 of 8551 Writing example 6000 of 8551 Writing example 8000 of 8551

Start training: model=convbert_medium-small, trial 1/1

================================================================================ Run dev set evaluation: model=convbert_medium-small, trial 1/1

Writing results to data/models/convbert_medium-small/results/cola_results.txt

Running on the test set and writing the predictions: model=convbert_medium-small, trial 1/1

ConBert:Improving BERT with Span-based Dynamic Convolution相关推荐

最新文章

热门文章

Loading dataset cola_train
Existing tfrecords not found so creating
Writing example 0 of 8551
Writing example 2000 of 8551
Writing example 4000 of 8551
Writing example 6000 of 8551
Writing example 8000 of 8551

================================================================================
Run dev set evaluation: model=convbert_medium-small, trial 1/1