NMT with Attention

article/2025/8/26 19:30:46

1.数据准备

termcolor.colered 对输出进行染色,凸显。colored(f"tokenize('hello'): ", 'green')

from termcolor import colored
import random
import numpy as npimport trax
from trax import layers as tl
from trax.fastmath import numpy as fastnp
from trax.supervised import training!pip list | grep trax#generator for train and eval
train_stream_fn = trax.data.TFDS('opus/medical',data_dir='./data/',keys=('en', 'de'),eval_holdout_size=0.01, # 1% for evaltrain=True)# Get generator function for the eval set
eval_stream_fn = trax.data.TFDS('opus/medical',data_dir='./data/',keys=('en', 'de'),eval_holdout_size=0.01, # 1% for evaltrain=False)#tokenize
# global variables that state the filename and directory of the vocabulary file
VOCAB_FILE = 'ende_32k.subword'
VOCAB_DIR = 'data/'# Tokenize the dataset.
tokenized_train_stream = trax.data.Tokenize(vocab_file=VOCAB_FILE, vocab_dir=VOCAB_DIR)(train_stream)
tokenized_eval_stream = trax.data.Tokenize(vocab_file=VOCAB_FILE, vocab_dir=VOCAB_DIR)(eval_stream)#append EOS for each sentence
EOS = 1# generator helper function to append EOS to each sentence
def append_eos(stream):for (inputs, targets) in stream:inputs_with_eos = list(inputs) + [EOS]targets_with_eos = list(targets) + [EOS]yield np.array(inputs_with_eos), np.array(targets_with_eos)# append EOS to the train data
tokenized_train_stream = append_eos(tokenized_train_stream)# append EOS to the eval data
tokenized_eval_stream = append_eos(tokenized_eval_stream)# Filter too long sentences to not run out of memory.
# length_keys=[0, 1] means we filter both English and German sentences, so
# both much be not longer that 256 tokens for training / 512 for eval.
filtered_train_stream = trax.data.FilterByLength(max_length=256, length_keys=[0, 1])(tokenized_train_stream)
filtered_eval_stream = trax.data.FilterByLength(max_length=512, length_keys=[0, 1])(tokenized_eval_stream)# print a sample input-target pair of tokenized sentences
train_input, train_target = next(filtered_train_stream)#build helper func(tokenize & detokenize)
def tokenize(input_str, vocab_file=None, vocab_dir=None):"""Encodes a string to an array of integersArgs:input_str (str): human-readable string to encodevocab_file (str): filename of the vocabulary text filevocab_dir (str): path to the vocabulary fileReturns:numpy.ndarray: tokenized version of the input string"""# Set the encoding of the "end of sentence" as 1EOS = 1# Use the trax.data.tokenize method. It takes streams and returns streams,# we get around it by making a 1-element stream with `iter`.inputs =  next(trax.data.tokenize(iter([input_str]),vocab_file=vocab_file, vocab_dir=vocab_dir))# Mark the end of the sentence with EOSinputs = list(inputs) + [EOS]# Adding the batch dimension to the front of the shapebatch_inputs = np.reshape(np.array(inputs), [1, -1])return batch_inputsdef detokenize(integers, vocab_file=None, vocab_dir=None):"""Decodes an array of integers to a human readable stringArgs:integers (numpy.ndarray): array of integers to decodevocab_file (str): filename of the vocabulary text filevocab_dir (str): path to the vocabulary fileReturns:str: the decoded sentence."""# Remove the dimensions of size 1integers = list(np.squeeze(integers))# Set the encoding of the "end of sentence" as 1EOS = 1# Remove the EOS to decode only the original tokensif EOS in integers:integers = integers[:integers.index(EOS)] return trax.data.detokenize(integers, vocab_file=vocab_file, vocab_dir=vocab_dir)#build  different length buckets for save the memory 
# Buckets are defined in terms of boundaries and batch sizes.
# Batch_sizes[i] determines the batch size for items with length < boundaries[i]
# So below, we'll take a batch of 256 sentences of length < 8, 128 if length is
# between 8 and 16, and so on -- and only 2 if length is over 512.
boundaries =  [8,   16,  32, 64, 128, 256, 512]
batch_sizes = [256, 128, 64, 32, 16,    8,   4,  2]# Create the generators.
train_batch_stream = trax.data.BucketByLength(boundaries, batch_sizes,length_keys=[0, 1]  # As before: count inputs and targets to length.
)(filtered_train_stream)eval_batch_stream = trax.data.BucketByLength(boundaries, batch_sizes,length_keys=[0, 1]  # As before: count inputs and targets to length.
)(filtered_eval_stream)# Add masking for the padding (0s).
train_batch_stream = trax.data.AddLossWeights(id_to_mask=0)(train_batch_stream)
eval_batch_stream = trax.data.AddLossWeights(id_to_mask=0)(eval_batch_stream)

2.模型

def input_encoder_fn(input_vocab_size, d_model, n_encoder_layers):""" Input encoder runs on the input sentence and createsactivations that will be the keys and values for attention.Args:input_vocab_size: int: vocab size of the inputd_model: int:  depth of embedding (n_units in the LSTM cell)n_encoder_layers: int: number of LSTM layers in the encoderReturns:tl.Serial: The input encoder"""# create a serial networkinput_encoder = tl.Serial( ### START CODE HERE (REPLACE INSTANCES OF `None` WITH YOUR CODE) #### create an embedding layer to convert tokens to vectorstl.Embedding(vocab_size=input_vocab_size,d_feature=d_model),# feed the embeddings to the LSTM layers. It is a stack of n_encoder_layers LSTM layers[tl.LSTM(n_units=d_model) for _ in range(n_encoder_layers)]### END CODE HERE ###)return input_encoder

 

def pre_attention_decoder_fn(mode, target_vocab_size, d_model):""" Pre-attention decoder runs on the targets and createsactivations that are used as queries in attention.Args:mode: str: 'train' or 'eval'target_vocab_size: int: vocab size of the targetd_model: int:  depth of embedding (n_units in the LSTM cell)Returns:tl.Serial: The pre-attention decoder"""# create a serial networkpre_attention_decoder = tl.Serial(### START CODE HERE (REPLACE INSTANCES OF `None` WITH YOUR CODE) #### shift right to insert start-of-sentence token and implement# teacher forcing during trainingtl.ShiftRight(),# run an embedding layer to convert tokens to vectorstl.Embedding(vocab_size=target_vocab_size,d_feature=d_model),# feed to an LSTM layertl.LSTM(n_units=d_model)### END CODE HERE ###)return pre_attention_decoder
def prepare_attention_input(encoder_activations, decoder_activations, inputs):"""Prepare queries, keys, values and mask for attention.Args:encoder_activations fastnp.array(batch_size, padded_input_length, d_model): output from the input encoderdecoder_activations fastnp.array(batch_size, padded_input_length, d_model): output from the pre-attention decoderinputs fastnp.array(batch_size, padded_input_length): padded input tokensReturns:queries, keys, values and mask for attention."""### START CODE HERE (REPLACE INSTANCES OF `None` WITH YOUR CODE) #### set the keys and values to the encoder activationskeys = encoder_activationsvalues = encoder_activations# set the queries to the decoder activationsqueries = decoder_activations# generate the mask to distinguish real tokens from padding# hint: inputs is 1 for real tokens and 0 where they are paddingmask = 1 - fastnp.equal(inputs,0)### END CODE HERE #### add axes to the mask for attention heads and decoder length.mask = fastnp.reshape(mask, (mask.shape[0], 1, 1, mask.shape[1]))# broadcast so mask shape is [batch size, attention heads, decoder-len, encoder-len].# note: for this assignment, attention heads is set to 1.mask = mask + fastnp.zeros((1, 1, decoder_activations.shape[1], 1))return queries, keys, values, mask

 


def NMTAttn(input_vocab_size=33300,target_vocab_size=33300,d_model=1024,n_encoder_layers=2,n_decoder_layers=2,n_attention_heads=4,attention_dropout=0.0,mode='train'):"""Returns an LSTM sequence-to-sequence model with attention.The input to the model is a pair (input tokens, target tokens), e.g.,an English sentence (tokenized) and its translation into German (tokenized).Args:input_vocab_size: int: vocab size of the inputtarget_vocab_size: int: vocab size of the targetd_model: int:  depth of embedding (n_units in the LSTM cell)n_encoder_layers: int: number of LSTM layers in the encodern_decoder_layers: int: number of LSTM layers in the decoder after attentionn_attention_heads: int: number of attention headsattention_dropout: float, dropout for the attention layermode: str: 'train', 'eval' or 'predict', predict mode is for fast inferenceReturns:A LSTM sequence-to-sequence model with attention."""### START CODE HERE (REPLACE INSTANCES OF `None` WITH YOUR CODE) #### Step 0: call the helper function to create layers for the input encoderinput_encoder = input_encoder_fn(input_vocab_size, d_model, n_encoder_layers)# Step 0: call the helper function to create layers for the pre-attention decoderpre_attention_decoder = pre_attention_decoder_fn(mode, target_vocab_size, d_model)# Step 1: create a serial networkmodel = tl.Serial( # Step 2: copy input tokens and target tokens as they will be needed later.tl.Select([0,1,0,1]),# Step 3: run input encoder on the input and pre-attention decoder the target.tl.Parallel(input_encoder, pre_attention_decoder),# Step 4: prepare queries, keys, values and mask for attention.tl.Fn('PrepareAttentionInput', f=prepare_attention_input, n_out=4),# Step 5: run the AttentionQKV layer# nest it inside a Residual layer to add to the pre-attention decoder activations(i.e. queries)tl.Residual(tl.AttentionQKV(d_model, n_heads=n_attention_heads, dropout=attention_dropout, mode=mode)),# Step 6: drop attention mask (i.e. index = Nonetl.Select([0,2]),# Step 7: run the rest of the RNN decoder[tl.LSTM(n_units=d_model) for _ in range(n_decoder_layers)],# Step 8: prepare output by making it the right sizetl.Dense(n_units=target_vocab_size),# Step 9: Log-softmax for outputtl.LogSoftmax())### END CODE HEREreturn model

3.训练

train_task = training.TrainTask(### START CODE HERE (REPLACE INSTANCES OF `None` WITH YOUR CODE) #### use the train batch stream as labeled datalabeled_data= train_batch_stream ,# use the cross entropy lossloss_layer= tl.CrossEntropyLoss(),# use the Adam optimizer with learning rate of 0.01optimizer= trax.optimizers.Adam(0.01),# use the `trax.lr.warmup_and_rsqrt_decay` as the learning rate schedule# have 1000 warmup steps with a max value of 0.01lr_schedule= trax.lr.warmup_and_rsqrt_decay(n_warmup_steps=1000,max_value=0.01),# have a checkpoint every 10 stepsn_steps_per_checkpoint= 10,### END CODE HERE ###
)eval_task = training.EvalTask(## use the eval batch stream as labeled datalabeled_data=eval_batch_stream,## use the cross entropy loss and accuracy as metricsmetrics=[tl.CrossEntropyLoss(), tl.Accuracy()],
)training_loop = training.Loop(NMTAttn(mode='train'),train_task,eval_tasks=[eval_task],output_dir=output_dir)training_loop.run(10)

4.测试

def logsoftmax_sample(log_probs, temperature=1.0):  # pylint: disable=invalid-name"""Returns a sample from a log-softmax output, with temperature.Args:log_probs: Logarithms of probabilities (often coming from LogSofmax)temperature: For scaling before sampling (1.0 = default, 0.0 = pick argmax)"""# This is equivalent to sampling from a softmax with temperature.u = np.random.uniform(low=1e-6, high=1.0 - 1e-6, size=log_probs.shape)g = -np.log(-np.log(u))return np.argmax(log_probs + g * temperature, axis=-1)def next_symbol(NMTAttn, input_tokens, cur_output_tokens, temperature):"""Returns the index of the next token.Args:NMTAttn (tl.Serial): An LSTM sequence-to-sequence model with attention.input_tokens (np.ndarray 1 x n_tokens): tokenized representation of the input sentencecur_output_tokens (list): tokenized representation of previously translated wordstemperature (float): parameter for sampling ranging from 0.0 to 1.0.0.0: same as argmax, always pick the most probable token1.0: sampling from the distribution (can sometimes say random things)Returns:int: index of the next token in the translated sentencefloat: log probability of the next symbol"""### START CODE HERE (REPLACE INSTANCES OF `None` WITH YOUR CODE) #### set the length of the current output tokenstoken_length = len(cur_output_tokens)# calculate next power of 2 for padding length padded_length =  2**int(np.ceil(np.log2(token_length + 1)))# pad cur_output_tokens up to the padded_lengthpadded = cur_output_tokens + [0]*(padded_length-token_length)# model expects the output to have an axis for the batch size in front so# convert `padded` list to a numpy array with shape (x, <padded_length>) where the# x position is the batch axis. (hint: you can use np.expand_dims() with axis=0 to insert a new axis)padded_with_batch = np.expand_dims(np.array(padded),axis=0)# get the model prediction. remember to use the `NMAttn` argument defined above.# hint: the model accepts a tuple as input (e.g. `my_model((input1, input2))`)output, _ = NMTAttn((input_tokens,padded_with_batch))# get log probabilities from the last token outputlog_probs = output[0,token_length,:]# get the next symbol by getting a logsoftmax sample (*hint: cast to an int)symbol = int(tl.logsoftmax_sample(log_probs, temperature))### END CODE HERE ###return symbol, float(log_probs[symbol])def sampling_decode(input_sentence, NMTAttn = None, temperature=0.0, vocab_file=None, vocab_dir=None):"""Returns the translated sentence.Args:input_sentence (str): sentence to translate.NMTAttn (tl.Serial): An LSTM sequence-to-sequence model with attention.temperature (float): parameter for sampling ranging from 0.0 to 1.0.0.0: same as argmax, always pick the most probable token1.0: sampling from the distribution (can sometimes say random things)vocab_file (str): filename of the vocabularyvocab_dir (str): path to the vocabulary fileReturns:tuple: (list, str, float)list of int: tokenized version of the translated sentencefloat: log probability of the translated sentencestr: the translated sentence"""### START CODE HERE (REPLACE INSTANCES OF `None` WITH YOUR CODE) #### encode the input sentenceinput_tokens = tokenize(input_sentence, vocab_file=vocab_file, vocab_dir=vocab_dir)# initialize the list of output tokenscur_output_tokens = []# initialize an integer that represents the current output indexcur_output = 0# Set the encoding of the "end of sentence" as 1EOS = 1# check that the current output is not the end of sentence tokenwhile cur_output != EOS:# update the current output token by getting the index of the next word (hint: use next_symbol)cur_output, log_prob = next_symbol(NMTAttn, input_tokens, cur_output_tokens, temperature)# append the current output token to the list of output tokenscur_output_tokens.append(cur_output)# detokenize the output tokenssentence = detokenize(cur_output_tokens,vocab_file=vocab_file, vocab_dir=vocab_dir)### END CODE HERE ###return cur_output_tokens, log_prob, sentencedef greedy_decode_test(sentence, NMTAttn=None, vocab_file=None, vocab_dir=None):"""Prints the input and output of our NMTAttn model using greedy decodeArgs:sentence (str): a custom string.NMTAttn (tl.Serial): An LSTM sequence-to-sequence model with attention.vocab_file (str): filename of the vocabularyvocab_dir (str): path to the vocabulary fileReturns:str: the translated sentence"""_,_, translated_sentence = sampling_decode(sentence, NMTAttn, vocab_file=vocab_file, vocab_dir=vocab_dir)print("English: ", sentence)print("German: ", translated_sentence)return translated_sentence

生成多个样本进行比较

def generate_samples(sentence, n_samples, NMTAttn=None, temperature=0.6, vocab_file=None, vocab_dir=None):"""Generates samples using sampling_decode()Args:sentence (str): sentence to translate.n_samples (int): number of samples to generateNMTAttn (tl.Serial): An LSTM sequence-to-sequence model with attention.temperature (float): parameter for sampling ranging from 0.0 to 1.0.0.0: same as argmax, always pick the most probable token1.0: sampling from the distribution (can sometimes say random things)vocab_file (str): filename of the vocabularyvocab_dir (str): path to the vocabulary fileReturns:tuple: (list, list)list of lists: token list per samplelist of floats: log probability per sample"""# define lists to contain samples and probabilitiessamples, log_probs = [], []# run a for loop to generate n samplesfor _ in range(n_samples):# get a sample using the sampling_decode() functionsample, logp, _ = sampling_decode(sentence, NMTAttn, temperature, vocab_file=vocab_file, vocab_dir=vocab_dir)# append the token list to the samples listsamples.append(sample)# append the log probability to the log_probs listlog_probs.append(logp)return samples, log_probs#相似度比较--jaccard
def jaccard_similarity(candidate, reference):"""Returns the Jaccard similarity between two token listsArgs:candidate (list of int): tokenized version of the candidate translationreference (list of int): tokenized version of the reference translationReturns:float: overlap between the two token lists"""# convert the lists to a set to get the unique tokenscan_unigram_set, ref_unigram_set = set(candidate), set(reference)  # get the set of tokens common to both candidate and referencejoint_elems = can_unigram_set.intersection(ref_unigram_set)# get the set of all tokens found in either candidate or referenceall_elems = can_unigram_set.union(ref_unigram_set)# divide the number of joint elements by the number of all elementsoverlap = len(joint_elems) / len(all_elems)return overlap#相似度比较--rougel-1
from collections import Counterdef rouge1_similarity(system, reference):"""Returns the ROUGE-1 score between two token listsArgs:system (list of int): tokenized version of the system translationreference (list of int): tokenized version of the reference translationReturns:float: overlap between the two token lists"""    ### START CODE HERE (REPLACE INSTANCES OF `None` WITH YOUR CODE) #### make a frequency table of the system tokens (hint: use the Counter class)sys_counter = Counter(system)# make a frequency table of the reference tokens (hint: use the Counter class)ref_counter = Counter(reference)# initialize overlap to 0overlap = 0# run a for loop over the sys_counter object (can be treated as a dictionary)for token in sys_counter:# lookup the value of the token in the sys_counter dictionary (hint: use the get() method)token_count_sys = sys_counter.get(token,0)# lookup the value of the token in the ref_counter dictionary (hint: use the get() method)token_count_ref = ref_counter.get(token,0)# update the overlap by getting the smaller number between the two token counts aboveoverlap += min(token_count_sys,token_count_ref)# get the precision (i.e. number of overlapping tokens / number of system tokens)precision = overlap/len(system)# get the recall (i.e. number of overlapping tokens / number of reference tokens)recall = overlap/len(reference)if precision + recall != 0:# compute the f1-scorerouge1_score = 2*(precision*recall)/(precision+recall)else:rouge1_score = 0 ### END CODE HERE ###return rouge1_scoredef average_overlap(similarity_fn, samples, *ignore_params):"""Returns the arithmetic mean of each candidate sentence in the samplesArgs:similarity_fn (function): similarity function used to compute the overlapsamples (list of lists): tokenized version of the translated sentences*ignore_params: additional parameters will be ignoredReturns:dict: scores of each samplekey: index of the samplevalue: score of the sample"""  # initialize dictionaryscores = {}# run a for loop for each samplefor index_candidate, candidate in enumerate(samples):    ### START CODE HERE (REPLACE INSTANCES OF `None` WITH YOUR CODE) #### initialize overlap to 0.0overlap = 0.0# run a for loop for each samplefor index_sample, sample in enumerate(samples): # skip if the candidate index is the same as the sample indexif index_candidate == index_sample:continue# get the overlap between candidate and sample using the similarity functionsample_overlap = similarity_fn(candidate, sample)# add the sample overlap to the total overlapoverlap += sample_overlap# get the score for the candidate by computing the averagescore = overlap/(len(samples)-1)# save the score in the dictionary. use index as the key.scores[index_candidate] = score### END CODE HERE ###return scoresdef weighted_avg_overlap(similarity_fn, samples, log_probs):"""Returns the weighted mean of each candidate sentence in the samplesArgs:samples (list of lists): tokenized version of the translated sentenceslog_probs (list of float): log probability of the translated sentencesReturns:dict: scores of each samplekey: index of the samplevalue: score of the sample"""# initialize dictionaryscores = {}# run a for loop for each samplefor index_candidate, candidate in enumerate(samples):    # initialize overlap and weighted sumoverlap, weight_sum = 0.0, 0.0# run a for loop for each samplefor index_sample, (sample, logp) in enumerate(zip(samples, log_probs)):# skip if the candidate index is the same as the sample index            if index_candidate == index_sample:continue# convert log probability to linear scalesample_p = float(np.exp(logp))# update the weighted sumweight_sum += sample_p# get the unigram overlap between candidate and samplesample_overlap = similarity_fn(candidate, sample)# update the overlapoverlap += sample_p * sample_overlap# get the score for the candidatescore = overlap / weight_sum# save the score in the dictionary. use index as the key.scores[index_candidate] = scorereturn scores##将所有的函数汇总
def mbr_decode(sentence, n_samples, score_fn, similarity_fn, NMTAttn=None, temperature=0.6, vocab_file=None, vocab_dir=None):"""Returns the translated sentence using Minimum Bayes Risk decodingArgs:sentence (str): sentence to translate.n_samples (int): number of samples to generatescore_fn (function): function that generates the score for each samplesimilarity_fn (function): function used to compute the overlap between a pair of samplesNMTAttn (tl.Serial): An LSTM sequence-to-sequence model with attention.temperature (float): parameter for sampling ranging from 0.0 to 1.0.0.0: same as argmax, always pick the most probable token1.0: sampling from the distribution (can sometimes say random things)vocab_file (str): filename of the vocabularyvocab_dir (str): path to the vocabulary fileReturns:str: the translated sentence"""### START CODE HERE (REPLACE INSTANCES OF `None` WITH YOUR CODE) #### generate samplessamples, log_probs = generate_samples(sentence, n_samples, NMTAttn=NMTAttn, temperature=temperature, vocab_file=vocab_file, vocab_dir=vocab_dir)# use the scoring function to get a dictionary of scores# pass in the relevant parameters as shown in the function definition of # the mean methods you developed earlierscores = score_fn(similarity_fn, samples,log_probs)# find the key with the highest scoremax_index = max(scores,key=lambda x :scores[x])# detokenize the token list associated with the max_indextranslated_sentence = detokenize(samples[max_index],vocab_file,vocab_dir)### END CODE HERE ###return (translated_sentence, max_index, scores)

 


http://chatgpt.dhexx.cn/article/2OWgVuLR.shtml

相关文章

CANopen通信之NMT通信

在介绍NMT通信机制之前&#xff0c;先介绍一下NMT这个通信对象的定义。在以下部分中COB-ID使用的是 CANopen 预定义连接集中已定义的缺省标志符。 1) NMT 模块控制&#xff08; NMT Module Control&#xff09; 只有 NMT-Master 节点能够传送 NMT Module Control 报文。 所有从…

NMT:神经网络机器翻译

前言 SMT是在神经网络之前最主流的翻译模式&#xff0c;统计机器翻译&#xff1b;NMT则是基于神经网络的翻译模式&#xff0c;也是当前效果最好的翻译模式。现在基于几篇paper来梳理下神经网络下的翻译模型。 NMT based RNN 1) First End-to-End RNN Trial   2014年&…

神经机器翻译(Neural machine translation, NMT)学习笔记

神经机器翻译&#xff08;Neural machine translation, NMT&#xff09;是最近提出的机器翻译方法。与传统的统计机器翻译不同&#xff0c;NMT的目标是建立一个单一的神经网络&#xff0c;可以共同调整以最大化翻译性能。最近提出的用于神经机器翻译的模型经常属于编码器-解码器…

Java本机内存跟踪NMT实战详解

JVM通常会额外分配内存。这些额外的分配&#xff0c;会导致java程序占用的内存&#xff0c;超出-Xmx的限制。让我们使用NMT查看内存的使用情况 NMT 是什么 NMT 是一种 Java Hotspot VM 功能&#xff0c;用于跟踪 HotSpot VM 的内部内存使用情况。您可以使用该jcmd实用程序访问…

fastText模型

提示&#xff1a;文章写完后&#xff0c;目录可以自动生成&#xff0c;如何生成可参考右边的帮助文档 文章目录 目标一、fastText的模型架构1. N-gram的理解1.1 bag of word 2. fastTex模型中层次化的softmax2.1 哈夫曼树和哈夫曼编码2.1.1 哈夫曼树的定义2.1.2 哈夫曼树的相关…

fastText Python 教程

诸神缄默不语-个人CSDN博文目录 fastText Python官方GitHub文件夹网址&#xff1a;fastText/python at main facebookresearch/fastText 本文介绍fastText Python包的基本教程&#xff0c;包括安装方式和简单的使用方式。 我看gensim也有对fasttext算法的支持&#xff08;ht…

玩转Fasttext

Fasttext是Facebook AI Research最近推出的文本分类和词训练工具&#xff0c;其源码已经托管在Github上。Fasttext最大的特点是模型简单&#xff0c;只有一层的隐层以及输出层&#xff0c;因此训练速度非常快&#xff0c;在普通的CPU上可以实现分钟级别的训练&#xff0c;比深度…

FastText详解

词向量对比word2vec https://zhuanlan.zhihu.com/p/75391062 0、实战经验 word2vec vs fastText区别&#xff1f; 思想&#xff1a;将整篇文档的词及n-gram向量叠加平均得到文档向量&#xff0c;然后使用文档向量做softmax多分类。 word2vec vs fastText区别&#xff1f; …

FastText原理

一、简介 fasttext是facebook开源的一个词向量与文本分类工具&#xff0c;在2016年开源&#xff0c;典型应用场景是“带监督的文本分类问题”。提供简单而高效的文本分类和表征学习的方法&#xff0c;性能比肩深度学习而且速度更快。 fastText结合了自然语言处理和机器学习中最…

Fasttext

Fasttext Paper Fasttext特点 模型简单&#xff0c;只有一层的隐层以及输出层&#xff0c;因此训练速度非常快不需要训练词向量&#xff0c;Fasttext自己会训练两个优化&#xff1a;Hierarchical Softmax、N-gram Fasttext模型架构 fastText模型架构和word2vec中的CBOW很相…

[NLP]文本分类之fastText详解

Word2vec, Fasttext, Glove, Elmo, Bert, Flair pre-train Word Embedding 一、fastText简介 fastText是一个快速文本分类算法&#xff0c;与基于神经网络的分类算法相比有两大优点&#xff1a; 1、fastText在保持高精度的情况下加快了训练速度和测试速度 2、fastText不需要预…

FastText:高效的文本分类工具

❤️觉得内容不错的话&#xff0c;欢迎点赞收藏加关注&#x1f60a;&#x1f60a;&#x1f60a;&#xff0c;后续会继续输入更多优质内容❤️ &#x1f449;有问题欢迎大家加关注私戳或者评论&#xff08;包括但不限于NLP算法相关&#xff0c;linux学习相关&#xff0c;读研读博…

FastText的简单介绍

0、引言 FastText是facebook开源的一款集word2vec、文本分类等一体的机器学习训练工具。在之前的论文中&#xff0c;作者用FastText和char-CNN、deepCNN等主流的深度学习框架&#xff0c;在同样的公开数据集上进行对比测试&#xff0c;在保证准确率稳定的情况下&#xff0c;Fa…

快速文本分类(FastText)

&#x1f50e;大家好&#xff0c;我是Sonhhxg_柒&#xff0c;希望你看完之后&#xff0c;能对你有所帮助&#xff0c;不足请指正&#xff01;共同学习交流&#x1f50e; &#x1f4dd;个人主页&#xff0d;Sonhhxg_柒的博客_CSDN博客 &#x1f4c3; &#x1f381;欢迎各位→点赞…

FastText:快速的文本分类器

转载请注明作者和出处&#xff1a;http://blog.csdn.net/john_bh/ 一、简介二、FastText原理 2.1 模型架构2.2 层次SoftMax2.3 N-gram特征 三、 基于fastText实现文本分类 3.1 fastText有监督学习分类3.2 fastText有监督学习分类 三、总结 3.1 fastText和word2vec的区别3.2 小…

DCGAN的PyTorch实现

DCGAN 1.什么是GAN GAN是一个框架&#xff0c;让深度模型可以学习到数据的分布&#xff0c;从而通过数据的分布生成新的数据(服从同一分布)。 其由一个判别器和一个生成器构成&#xff0c;生成器负责生成“仿造数据”&#xff0c;判别器负责判断“仿造数据”的质量。两者一起…

GAN论文阅读——DCGAN

论文标题&#xff1a;Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks 论文链接&#xff1a;https://arxiv.org/abs/1511.06434 参考资料&#xff1a;http://blog.csdn.net/liuxiao214/article/details/73500737      …

DCGAN整理总结

DCGAN整理总结 GAN什么是GAN&#xff1f;GAN重要参数及损失函数 DCGAN什么是DCGAN&#xff1f;DCGAN结构TensorFlow版本MINIST手写体生成模型Pytorch版本人脸生成模型 GAN 什么是GAN&#xff1f; GAN是一个教深度学习模型捕捉训练数据的布局来从该布局中生成新数据的框架。最早…

DCGAN论文翻译

UNSUPERVISED REPRESENTATION LEARNING WITH DEEP CONVOLUTIONAL GENERATIVE ADVERSARIAL NETWORKS &#xff08;使用DCGAN的无监督表示学习&#xff09; ABSTRACT&#xff08;摘要&#xff09; In recent years, supervised learning with convolutional networks (CNNs) h…

机器学习中的DCGAN-Tensorflow:用于更稳定的训练

https://www.toutiao.com/a6666031263536644621/ 自从Ian Goodfellow的论文以来&#xff0c;GAN已应用于许多领域&#xff0c;但其不稳定性一直存在问题。GAN必须解决极小极大&#xff08;鞍点&#xff09;问题&#xff0c;因此这个问题是固有的。 马鞍点的滑稽表示 许多研究人…