作者 |  Aymeric Damien编辑 | 奇予纪出品 | 磐创AI团队

Word2Vec (Word Embedding)

使用TensorFlow 2.0实现Word2Vec算法计算单词的向量表示,这个例子是使用一小部分维基百科文章来训练的。

更多信息请查看论文: Mikolov, Tomas et al. "Efficient Estimation of Word Representations in Vector Space.", 2013[1]

from __future__ import division, print_function, absolute_import

import collectionsimport osimport randomimport urllibimport zipfile

import numpy as npimport tensorflow as tf
learning_rate = 0.1batch_size = 128num_steps = 3000000display_step = 10000eval_step = 200000

# 训练参数learning_rate = 0.1batch_size = 128num_steps = 3000000display_step = 10000eval_step = 200000

# 评估参数eval_words = ['five', 'of', 'going', 'hardware', 'american', 'britain']

# Word2Vec 参数embedding_size = 200 # 嵌入向量的维度 vector.max_vocabulary_size = 50000 # 词汇表中不同单词的总数words in the vocabulary.min_occurrence = 10  # 删除出现小于n次的所有单词skip_window = 3 # 左右各要考虑多少个单词num_skips = 2 # 重复使用输入生成标签的次数num_sampled = 64 # 负采样数量
# 下载一小部分维基百科文章集url = 'http://mattmahoney.net/dc/text8.zip'data_path = 'text8.zip'if not os.path.exists(data_path):    print("Downloading the dataset... (It may take some time)")    filename, _ = urllib.urlretrieve(url, data_path)    print("Done!")

# 解压数据集文件,文本已处理完毕with zipfile.ZipFile(data_path) as f:    text_words = f.read(f.namelist()[0]).lower().split()
# 构建词典并用 UNK 标记替换频数较低的词count = [('UNK', -1)]# 检索最常见的单词count.extend(collections.Counter(text_words).most_common(max_vocabulary_size - 1))# 删除少于'min_occurrence'次数的样本for i in range(len(count) - 1, -1, -1):    if count[i][1]         count.pop(i)    else:        #该集合是有序的,因此在当出现小于'min_occurrence'时停止        break# 计算单词表单词个数vocabulary_size = len(count)# 为每一个词分配idword2id = dict()for i, (word, _)in enumerate(count):    word2id[word] = i

data = list()unk_count = 0for word in text_words:     # 检索单词id,或者如果不在字典中则为其指定索引0('UNK')    index = word2id.get(word, 0)    if index == 0:        unk_count += 1    data.append(index)count[0] = ('UNK', unk_count)id2word = dict(zip(word2id.values(), word2id.keys()))

print("Words count:", len(text_words))print("Unique words:", len(set(text_words)))print("Vocabulary size:", vocabulary_size)print("Most common words:", count[:10])

output:

Words count: 17005207Unique words: 253854Vocabulary size: 47135Most common words: [('UNK', 444176), ('the', 1061396), ('of', 593677), ('and', 416629), ('one', 411764), ('in', 372201), ('a', 325873), ('to', 316376), ('zero', 264975), ('nine', 250430)]
data_index = 0# 为skip-gram模型生成训练批次def next_batch(batch_size, num_skips, skip_window):    global data_index    assert batch_size % num_skips == 0    assert num_skips <= 2 * skip_window    batch = np.ndarray(shape=(batch_size), dtype=np.int32)    labels = np.ndarray(shape=(batch_size, 1), dtype=np.int32)    # 得到窗口长度( 当前单词左边和右边 + 当前单词)    span = 2 * skip_window + 1    buffer = collections.deque(maxlen=span)    if data_index + span > len(data):        data_index = 0    buffer.extend(data[data_index:data_index + span])    data_index += span    for i in range(batch_size // num_skips):        context_words = [w for w in range(span) if w != skip_window]        words_to_use = random.sample(context_words, num_skips)        for j, context_word in enumerate(words_to_use):            batch[i * num_skips + j] = buffer[skip_window]            labels[i * num_skips + j, 0] = buffer[context_word]        if data_index == len(data):            buffer.extend(data[0:span])            data_index = span        else:            buffer.append(data[data_index])            data_index += 1    #回溯一点,以避免在批处理结束时跳过单词    data_index = (data_index + len(data) - span) % len(data)    return batch, labels
# 确保在CPU上分配以下操作和变量# (某些操作在GPU上不兼容)with tf.device('/cpu:0'):    # 创建嵌入变量(每一行代表一个词嵌入向量) embedding vector).    embedding = tf.Variable(tf.random.normal([vocabulary_size, embedding_size]))    # 构造NCE损失的变量    nce_weights = tf.Variable(tf.random.normal([vocabulary_size, embedding_size]))    nce_biases = tf.Variable(tf.zeros([vocabulary_size]))

def get_embedding(x):    with tf.device('/cpu:0'):       # 对于X中的每一个样本查找对应的嵌入向量        x_embed = tf.nn.embedding_lookup(embedding, x)        return x_embed

def nce_loss(x_embed, y):    with tf.device('/cpu:0'):        # 计算批处理的平均NCE损失        y = tf.cast(y, tf.int64)        loss = tf.reduce_mean(            tf.nn.nce_loss(weights=nce_weights,                           biases=nce_biases,                           labels=y,                           inputs=x_embed,                           num_sampled=num_sampled,                           num_classes=vocabulary_size))        return loss

# 评估def evaluate(x_embed):    with tf.device('/cpu:0'):         # 计算输入数据嵌入与每个嵌入向量之间的余弦相似度        x_embed = tf.cast(x_embed, tf.float32)        x_embed_norm = x_embed / tf.sqrt(tf.reduce_sum(tf.square(x_embed)))        embedding_norm = embedding / tf.sqrt(tf.reduce_sum(tf.square(embedding), 1, keepdims=True), tf.float32)        cosine_sim_op = tf.matmul(x_embed_norm, embedding_norm, transpose_b=True)        return cosine_sim_op

# 定义优化器optimizer = tf.optimizers.SGD(learning_rate)
# 优化过程def run_optimization(x, y):    with tf.device('/cpu:0'):       # 将计算封装在GradientTape中以实现自动微分        with tf.GradientTape() as g:            emb = get_embedding(x)            loss = nce_loss(emb, y)

        # 计算梯度        gradients = g.gradient(loss, [embedding, nce_weights, nce_biases])

         # 按gradients更新 W 和 b        optimizer.apply_gradients(zip(gradients, [embedding, nce_weights, nce_biases]))
# 用于测试的单词x_test = np.array([word2id[w] for w in eval_words])

# 针对给定步骤数进行训练for step in xrange(1, num_steps + 1):    batch_x, batch_y = next_batch(batch_size, num_skips, skip_window)    run_optimization(batch_x, batch_y)

    if step % display_step == 0 or step == 1:        loss = nce_loss(get_embedding(batch_x), batch_y)        print("step: %i, loss: %f" % (step, loss))

    # 评估    if step % eval_step == 0 or step == 1:        print("Evaluation...")        sim = evaluate(get_embedding(x_test)).numpy()        for i in xrange(len(eval_words)):            top_k = 8  # 最相似的单词数量            nearest = (-sim[i, :]).argsort()[1:top_k + 1]            log_str = '"%s" nearest neighbors:' % eval_words[i]            for k in xrange(top_k):                log_str = '%s %s,' % (log_str, id2word[nearest[k]])            print(log_str)
step: 1, loss: 504.444214Evaluation..."five" nearest neighbors: censure, stricken, anglicanism, stick, streetcars, shrines, horrified, sparkle,"of" nearest neighbors: jolly, weary, clinicians, kerouac, economist, owls, safe, playoff,"going" nearest neighbors: filament, platforms, moderately, micheal, despotic, krag, disclosed, your,"hardware" nearest neighbors: occupants, paraffin, vera, reorganized, rename, declares, prima, condoned,"american" nearest neighbors: portfolio, rhein, aalto, angle, lifeson, tucker, sexton, dench,"britain" nearest neighbors: indivisible, disbelief, scripture, pepsi, scriptores, sighting, napalm, strike,step: 10000, loss: 117.166962step: 20000, loss: 65.478333step: 30000, loss: 46.580460step: 40000, loss: 25.563128step: 50000, loss: 50.924446step: 60000, loss: 51.696526step: 70000, loss: 17.272142step: 80000, loss: 32.579414step: 90000, loss: 68.372032step: 100000, loss: 36.026573step: 110000, loss: 22.502020step: 120000, loss: 15.788742step: 130000, loss: 31.832420step: 140000, loss: 25.096617step: 150000, loss: 12.013027step: 160000, loss: 20.574780step: 170000, loss: 12.201975step: 180000, loss: 20.983793step: 190000, loss: 11.366720step: 200000, loss: 19.431549Evaluation..."five" nearest neighbors: three, four, eight, six, two, seven, nine, zero,"of" nearest neighbors: the, a, and, first, with, on, but, from,"going" nearest neighbors: have, more, used, out, be, with, on, however,"hardware" nearest neighbors: be, known, system, apollo, and, a, such, used,"american" nearest neighbors: UNK, and, from, s, at, in, after, about,"britain" nearest neighbors: of, and, many, the, as, used, but, such,step: 210000, loss: 16.361233step: 220000, loss: 17.529526step: 230000, loss: 16.805817step: 240000, loss: 6.365625step: 250000, loss: 8.083097step: 260000, loss: 11.262514step: 270000, loss: 9.842708step: 280000, loss: 6.363440step: 290000, loss: 8.732617step: 300000, loss: 10.484728step: 310000, loss: 12.099487step: 320000, loss: 11.496288step: 330000, loss: 9.283813step: 340000, loss: 10.777218step: 350000, loss: 16.310440step: 360000, loss: 7.495782step: 370000, loss: 9.287696step: 380000, loss: 6.982735step: 390000, loss: 8.549622step: 400000, loss: 8.388112Evaluation..."five" nearest neighbors: four, three, six, two, seven, eight, one, zero,"of" nearest neighbors: the, a, with, also, for, and, which, by,"going" nearest neighbors: have, are, both, called, being, a, of, had,"hardware" nearest neighbors: may, de, some, have, so, which, other, also,"american" nearest neighbors: s, british, UNK, from, in, including, first, see,"britain" nearest neighbors: against, include, including, both, british, other, an, most,step: 410000, loss: 8.757725step: 420000, loss: 12.303110step: 430000, loss: 12.325478step: 440000, loss: 7.659882step: 450000, loss: 6.028089step: 460000, loss: 12.700299step: 470000, loss: 7.063077step: 480000, loss: 18.004183step: 490000, loss: 7.510474step: 500000, loss: 10.089376step: 510000, loss: 11.404436step: 520000, loss: 9.494527step: 530000, loss: 7.797963step: 540000, loss: 7.390718step: 550000, loss: 13.911215step: 560000, loss: 6.975731step: 570000, loss: 6.179163step: 580000, loss: 7.066525step: 590000, loss: 6.487288step: 600000, loss: 5.361528Evaluation..."five" nearest neighbors: four, six, three, seven, two, one, eight, zero,"of" nearest neighbors: the, and, from, with, a, including, in, include,"going" nearest neighbors: have, even, they, term, who, many, which, were,"hardware" nearest neighbors: include, computer, an, which, other, each, than, may,"american" nearest neighbors: english, french, s, german, from, in, film, see,"britain" nearest neighbors: several, first, modern, part, government, german, was, were,step: 610000, loss: 4.144980step: 620000, loss: 5.865635step: 630000, loss: 6.826498step: 640000, loss: 8.376097step: 650000, loss: 7.117930step: 660000, loss: 7.639544step: 670000, loss: 5.973255step: 680000, loss: 4.908459step: 690000, loss: 6.164993step: 700000, loss: 7.360281step: 710000, loss: 12.693079step: 720000, loss: 6.410182step: 730000, loss: 7.499201step: 740000, loss: 6.509094step: 750000, loss: 10.625893step: 760000, loss: 7.177696step: 770000, loss: 12.639092step: 780000, loss: 8.441635step: 790000, loss: 7.529139step: 800000, loss: 6.579177Evaluation..."five" nearest neighbors: four, three, six, seven, eight, two, one, zero,"of" nearest neighbors: and, with, in, the, its, from, by, including,"going" nearest neighbors: have, they, how, include, people, however, also, their,"hardware" nearest neighbors: computer, large, include, may, or, which, other, there,"american" nearest neighbors: born, french, british, english, german, b, john, d,"britain" nearest neighbors: country, including, include, general, part, various, several, by,step: 810000, loss: 6.934138step: 820000, loss: 5.686094step: 830000, loss: 7.310243step: 840000, loss: 5.028157step: 850000, loss: 7.079705step: 860000, loss: 6.768996step: 870000, loss: 5.604030step: 880000, loss: 8.208309step: 890000, loss: 6.301597step: 900000, loss: 5.733234step: 910000, loss: 6.577081step: 920000, loss: 6.774826step: 930000, loss: 7.068932step: 940000, loss: 6.694956step: 950000, loss: 7.944673step: 960000, loss: 5.988618step: 970000, loss: 6.651366step: 980000, loss: 4.595577step: 990000, loss: 6.564834step: 1000000, loss: 4.327858Evaluation..."five" nearest neighbors: four, three, seven, six, eight, two, nine, zero,"of" nearest neighbors: the, first, and, became, from, under, at, with,"going" nearest neighbors: others, has, then, have, how, become, had, also,"hardware" nearest neighbors: computer, large, systems, these, different, either, include, using,"american" nearest neighbors: b, born, d, UNK, nine, english, german, french,"britain" nearest neighbors: government, island, local, country, by, including, control, within,step: 1010000, loss: 5.841236step: 1020000, loss: 5.805200step: 1030000, loss: 9.962063step: 1040000, loss: 6.281199step: 1050000, loss: 7.147995step: 1060000, loss: 5.721184step: 1070000, loss: 7.080662step: 1080000, loss: 6.638658step: 1090000, loss: 5.814178step: 1100000, loss: 5.195928step: 1110000, loss: 6.724787step: 1120000, loss: 6.503905step: 1130000, loss: 5.762966step: 1140000, loss: 5.790243step: 1150000, loss: 5.958191step: 1160000, loss: 5.997983step: 1170000, loss: 7.065348step: 1180000, loss: 6.073387step: 1190000, loss: 6.644097step: 1200000, loss: 5.934450Evaluation..."five" nearest neighbors: three, four, six, eight, seven, two, nine, zero,"of" nearest neighbors: the, and, including, in, its, with, from, on,"going" nearest neighbors: others, then, through, has, had, another, people, when,"hardware" nearest neighbors: computer, control, systems, either, these, large, small, other,"american" nearest neighbors: born, german, john, d, british, b, UNK, french,"britain" nearest neighbors: local, against, british, island, country, general, including, within,step: 1210000, loss: 5.832344step: 1220000, loss: 6.453851step: 1230000, loss: 6.583966step: 1240000, loss: 5.571673step: 1250000, loss: 5.720917step: 1260000, loss: 7.663424step: 1270000, loss: 6.583741step: 1280000, loss: 8.503859step: 1290000, loss: 5.540640step: 1300000, loss: 6.703249step: 1310000, loss: 5.274101step: 1320000, loss: 5.846446step: 1330000, loss: 5.438172step: 1340000, loss: 6.367691step: 1350000, loss: 6.558622step: 1360000, loss: 9.822924step: 1370000, loss: 4.982378step: 1380000, loss: 6.159739step: 1390000, loss: 5.819083step: 1400000, loss: 7.775135Evaluation..."five" nearest neighbors: four, three, six, seven, two, eight, one, zero,"of" nearest neighbors: and, the, in, with, its, within, for, including,"going" nearest neighbors: others, through, while, has, to, how, particularly, their,"hardware" nearest neighbors: computer, systems, large, control, research, using, information, either,"american" nearest neighbors: english, french, german, born, film, british, s, former,"britain" nearest neighbors: british, country, europe, local, military, island, against, western,step: 1410000, loss: 8.214248step: 1420000, loss: 4.696859step: 1430000, loss: 5.873761step: 1440000, loss: 5.971557step: 1450000, loss: 4.992722step: 1460000, loss: 5.197714step: 1470000, loss: 6.916918step: 1480000, loss: 6.441984step: 1490000, loss: 5.443647step: 1500000, loss: 5.178482step: 1510000, loss: 6.060414step: 1520000, loss: 6.373306step: 1530000, loss: 5.098322step: 1540000, loss: 6.674916step: 1550000, loss: 6.712685step: 1560000, loss: 5.280202step: 1570000, loss: 6.454964step: 1580000, loss: 4.896697step: 1590000, loss: 6.239226step: 1600000, loss: 5.709726Evaluation..."five" nearest neighbors: three, four, two, six, seven, eight, one, zero,"of" nearest neighbors: the, and, including, in, with, within, its, following,"going" nearest neighbors: others, people, who, they, that, far, were, have,"hardware" nearest neighbors: computer, systems, include, high, research, some, information, large,"american" nearest neighbors: born, english, french, british, german, d, john, b,"britain" nearest neighbors: country, military, china, europe, against, local, central, british,step: 1610000, loss: 6.334940step: 1620000, loss: 5.093616step: 1630000, loss: 6.119366step: 1640000, loss: 4.975187step: 1650000, loss: 6.490408step: 1660000, loss: 7.464082step: 1670000, loss: 4.977184step: 1680000, loss: 5.658133step: 1690000, loss: 5.352454step: 1700000, loss: 6.810776step: 1710000, loss: 5.687447step: 1720000, loss: 5.992206step: 1730000, loss: 5.513011step: 1740000, loss: 5.548522step: 1750000, loss: 6.200248step: 1760000, loss: 13.070073step: 1770000, loss: 4.621058step: 1780000, loss: 5.301342step: 1790000, loss: 4.777030step: 1800000, loss: 6.912136Evaluation..."five" nearest neighbors: three, four, six, seven, eight, two, nine, zero,"of" nearest neighbors: the, in, first, from, became, and, following, under,"going" nearest neighbors: others, their, through, which, therefore, open, how, that,"hardware" nearest neighbors: computer, systems, include, research, standard, different, system, small,"american" nearest neighbors: b, d, born, actor, UNK, english, nine, german,"britain" nearest neighbors: china, country, europe, against, canada, military, island, including,step: 1810000, loss: 5.584600step: 1820000, loss: 5.619820step: 1830000, loss: 6.078709step: 1840000, loss: 5.052518step: 1850000, loss: 5.430106step: 1860000, loss: 7.396770step: 1870000, loss: 5.344787step: 1880000, loss: 5.937998step: 1890000, loss: 5.706491step: 1900000, loss: 5.140662step: 1910000, loss: 5.607048step: 1920000, loss: 5.407231step: 1930000, loss: 6.238531step: 1940000, loss: 5.567973step: 1950000, loss: 4.894245step: 1960000, loss: 6.104193step: 1970000, loss: 5.282631step: 1980000, loss: 6.189069step: 1990000, loss: 6.169409step: 2000000, loss: 6.470152Evaluation..."five" nearest neighbors: four, three, six, seven, eight, two, nine, zero,"of" nearest neighbors: the, its, in, with, and, including, within, against,"going" nearest neighbors: others, only, therefore, will, how, a, far, though,"hardware" nearest neighbors: computer, systems, for, network, software, program, research, system,"american" nearest neighbors: born, actor, d, italian, german, john, robert, b,"britain" nearest neighbors: china, country, europe, canada, british, former, island, france,step: 2010000, loss: 5.298714step: 2020000, loss: 5.494207step: 2030000, loss: 5.410875step: 2040000, loss: 6.228232step: 2050000, loss: 5.044596step: 2060000, loss: 4.624638step: 2070000, loss: 4.919327step: 2080000, loss: 4.639625step: 2090000, loss: 4.865627step: 2100000, loss: 4.951073step: 2110000, loss: 5.973768step: 2120000, loss: 7.366824step: 2130000, loss: 5.149571step: 2140000, loss: 7.846234step: 2150000, loss: 5.449315step: 2160000, loss: 5.359211step: 2170000, loss: 5.171029step: 2180000, loss: 6.106437step: 2190000, loss: 6.043995step: 2200000, loss: 5.642351Evaluation..."five" nearest neighbors: four, three, six, two, eight, seven, zero, one,"of" nearest neighbors: the, and, its, see, for, in, with, including,"going" nearest neighbors: others, therefore, how, even, them, your, have, although,"hardware" nearest neighbors: computer, systems, system, network, program, research, software, include,"american" nearest neighbors: english, french, german, canadian, british, film, author, italian,"britain" nearest neighbors: europe, china, country, germany, british, england, france, throughout,step: 2210000, loss: 4.427110step: 2220000, loss: 6.240989step: 2230000, loss: 5.184978step: 2240000, loss: 8.035570step: 2250000, loss: 5.793781step: 2260000, loss: 4.908427step: 2270000, loss: 8.807668step: 2280000, loss: 6.083229step: 2290000, loss: 5.773360step: 2300000, loss: 5.613671step: 2310000, loss: 6.080076step: 2320000, loss: 5.288568step: 2330000, loss: 5.949232step: 2340000, loss: 5.479994step: 2350000, loss: 7.717686step: 2360000, loss: 5.163609step: 2370000, loss: 5.989407step: 2380000, loss: 5.785729step: 2390000, loss: 5.345478step: 2400000, loss: 6.627133Evaluation..."five" nearest neighbors: three, four, six, two, seven, eight, zero, nine,"of" nearest neighbors: the, in, and, including, from, within, its, with,"going" nearest neighbors: therefore, people, they, out, only, according, your, now,"hardware" nearest neighbors: computer, systems, network, program, system, software, run, design,"american" nearest neighbors: author, born, actor, english, canadian, british, italian, d,"britain" nearest neighbors: china, europe, country, throughout, france, canada, england, western,step: 2410000, loss: 5.666146step: 2420000, loss: 5.316198step: 2430000, loss: 5.129625step: 2440000, loss: 5.247949step: 2450000, loss: 5.741394step: 2460000, loss: 5.833083step: 2470000, loss: 7.704844step: 2480000, loss: 5.398345step: 2490000, loss: 5.089633step: 2500000, loss: 5.620508step: 2510000, loss: 4.976034step: 2520000, loss: 5.884676step: 2530000, loss: 6.649922step: 2540000, loss: 5.002588step: 2550000, loss: 5.072144step: 2560000, loss: 5.165375step: 2570000, loss: 5.310089step: 2580000, loss: 5.481957step: 2590000, loss: 6.104440step: 2600000, loss: 5.339644Evaluation..."five" nearest neighbors: three, four, six, seven, eight, nine, two, zero,"of" nearest neighbors: the, first, from, with, became, in, following, and,"going" nearest neighbors: how, therefore, back, will, through, always, your, make,"hardware" nearest neighbors: computer, systems, system, network, program, technology, design, software,"american" nearest neighbors: actor, singer, born, b, author, d, english, writer,"britain" nearest neighbors: europe, china, throughout, great, england, france, country, india,step: 2610000, loss: 7.754117step: 2620000, loss: 5.979313step: 2630000, loss: 5.394362step: 2640000, loss: 4.866740step: 2650000, loss: 5.219806step: 2660000, loss: 6.074809step: 2670000, loss: 6.216953step: 2680000, loss: 5.944881step: 2690000, loss: 5.863350step: 2700000, loss: 6.128705step: 2710000, loss: 5.502523step: 2720000, loss: 5.300839step: 2730000, loss: 6.358493step: 2740000, loss: 6.058306step: 2750000, loss: 4.689510step: 2760000, loss: 6.032880step: 2770000, loss: 5.844904step: 2780000, loss: 5.385874step: 2790000, loss: 5.370956step: 2800000, loss: 4.912577Evaluation..."five" nearest neighbors: four, six, three, eight, seven, two, nine, one,"of" nearest neighbors: in, the, and, from, including, following, with, under,"going" nearest neighbors: your, then, through, will, how, so, back, even,"hardware" nearest neighbors: computer, systems, program, network, design, standard, physical, software,"american" nearest neighbors: actor, singer, born, author, writer, canadian, italian, d,"britain" nearest neighbors: europe, china, england, throughout, france, india, great, germany,step: 2810000, loss: 5.897756step: 2820000, loss: 7.194932step: 2830000, loss: 7.430175step: 2840000, loss: 7.258231step: 2850000, loss: 5.837617step: 2860000, loss: 5.496673step: 2870000, loss: 6.173716step: 2880000, loss: 6.095749step: 2890000, loss: 6.064944step: 2900000, loss: 5.560488step: 2910000, loss: 4.966107step: 2920000, loss: 5.789579step: 2930000, loss: 4.525987step: 2940000, loss: 6.704808step: 2950000, loss: 4.506433step: 2960000, loss: 6.251270step: 2970000, loss: 5.588204step: 2980000, loss: 5.423235step: 2990000, loss: 5.613834step: 3000000, loss: 5.137326Evaluation..."five" nearest neighbors: four, three, six, seven, eight, two, zero, one,"of" nearest neighbors: the, including, and, with, in, its, includes, within,"going" nearest neighbors: how, they, when, them, make, always, your, though,"hardware" nearest neighbors: computer, systems, network, program, physical, design, technology, software,"american" nearest neighbors: canadian, english, australian, british, german, film, italian, author,"britain" nearest neighbors: europe, england, china, throughout, india, france, great, british,

[1]: https://arxiv.org/pdf/1301.3781.pdf

还想看更多TensorFlow专栏文章?可在公众号底部菜单栏子菜单“独家原创”中找到TensorFlow系列文章,同步更新中,关注公众号了解更多吧~

或点击下方“阅读原文”,进入TensorFlow专栏,即可查看往期文章。

嗨,你还在看吗?

word2vec代码_TensorFlow2.0 代码实战专栏(四):Word2Vec (Word Embedding)相关推荐

  1. 一文看懂:零代码、0代码、无代码平台是什么?怎么选?

    注:零代码.0代码.无代码平台,指的基本是同一个意思,即不用代码开发.就能实现应用搭建的平台 以前想要P一张好看的图片或者做一个很燃的视频,只能交给设计师&摄影师用专业的软件,但现在有了美图秀 ...

  2. HighNewTech:低代码(0代码/无代码,无需代码)开发的简介以及如何选择最合适的低代码开发工具

    High&NewTech:低代码(0代码/无代码,无需代码)开发的简介以及如何选择最合适的低代码开发工具 导读:在互联网时代,博主经常反思一个问题,如何跟进这个快速发展的时代才能不会被淘汰?博 ...

  3. TF1.0与TF2.0的区别?,怎样将TF1.0代码转为TF2.0代码?

    1.TF1.0与TF2.0的区别 2.API变动 3.如何升级 示例: TF1.0代码: TF2.0代码:

  4. 2.文本预处理(分词,命名实体识别和词性标注,one-hot,word2vec,word embedding,文本数据分析,文本特征处理,文本数据增强)

    文章目录 1.1 认识文本预处理 文本预处理及其作用 文本预处理中包含的主要环节 文本处理的基本方法 文本张量表示方法 文本语料的数据分析 文本特征处理 数据增强方法 重要说明 1.2 文本处理的基本 ...

  5. NLP-词向量(Word Embedding)-2013:Word2vec模型(CBOW、Skip-Gram)【对NNLM的简化】【层次Softmax、负采样、重采样】【静态表示;无法解决一词多义】

    一.文本的表示方法 (Representation) 文本是一种非结构化的数据信息,是不可以直接被计算的.因为文本不能够直接被模型计算,所以需要将其转化为向量. 文本表示的作用就是将这些非结构化的信息 ...

  6. Android项目实战(四):ViewPager切换动画(3.0版本以上有效果)

    原文:Android项目实战(四):ViewPager切换动画(3.0版本以上有效果) 学习内容来自"慕课网" 一般APP进去之后都会有几张图片来导航,这里就学习怎么在这张图片切换 ...

  7. Gerrit代码Review高阶实战

    Gerrit代码Review高阶实战 Gerrit代码Review高阶实战 Gerrit 是一个免费.开放源代码的代码审查软件,使用网页界面.利用网页浏览器,同一团队的程序员,可以相互审阅彼此修改后的 ...

  8. c语言人脸口罩检测,使用ModelArts 0代码实现人脸口罩检测

    一.灵感来源 前两天闲逛华为Modelarts AI市场的时候偶然间发现huqi大佬上传的一个有关口罩检测的数据集,突然就想着看见好多大佬写过口罩检测的案例,要不我也玩一把试试.虽然没有大佬们高大上都 ...

  9. 算法代码[置顶] 机器学习实战之KNN算法详解

    改章节笔者在深圳喝咖啡的时候突然想到的...之前就有想写几篇关于算法代码的文章,所以回家到以后就奋笔疾书的写出来发表了     前一段时间介绍了Kmeans聚类,而KNN这个算法刚好是聚类以后经常使用 ...

最新文章

  1. 使用require.js和backbone实现简单单页应用实践
  2. ZOJ-2366 Weird Dissimilarity 动态规划+贪心
  3. PowerMockito使用详解
  4. php pdo操作mysql_PHP操作数据库详细(PDO)
  5. winserver 服务开机启动
  6. cnn 预测过程代码_代码实践 | CNN卷积神经网络之文本分类
  7. Educational Codeforces Round 58
  8. SIP(Session Initiation Protocol,会话初始协议)
  9. L3-013 非常弹的球 (30 分)
  10. 基于Modbus/TCP的西门子1200PLC和STM32通信
  11. java web play_玩转 Java Web 应用开发:Play 框架
  12. linux运行minecraft,如何在DeepinLinux下运行Minecraft光影整合包进行游戏
  13. 4.人工智能时代下的大数据
  14. 人工智能聊天机器人(有详细安装教程)/ 自动学习型
  15. gitee团队协作使用
  16. 数字IC设计随笔之二(VCS、DVE|Verdi单步调试)
  17. 自己组装nas服务器万兆,阿文菌爱捡垃圾 篇二十一:组装一台小巧的8盘位万兆NAS,参考翼王使用永擎C236主板,E3是否还值得下手?...
  18. unity黑白滤镜_unity3D 把图片变黑白的Shader
  19. 如何注册我的世界服务器账号密码,我的世界电脑服务器怎么注册登录密码
  20. neo4j报错1:因CREATE和MERGE报错

热门文章

  1. 九度oj 题目1354:和为S的连续正数序列
  2. 结对编程——四则运算过程
  3. 纯CSS3实现宽屏二级下拉菜单
  4. 【原创】用J-LINK烧写ARM开发板的Nor Flash
  5. linux内核网络协议栈--linux网络设备理解(十三)
  6. php html中的判断,php怎么判断字符串中是否包含html标签?
  7. 计算机组成与结 读写数据实验,计算机组成与结构实验报告现实版.doc
  8. HTML5表单元素禁用,禁用HTML5表单元素的验证
  9. 『前端干货篇』:你不知道的Stylus
  10. 吴恩达《机器学习》课程总结(15)异常检测