In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-11 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)05/31 Report--
This article mainly explains "how to achieve Transformer with Pytorch". Interested friends may wish to have a look. The method introduced in this paper is simple, fast and practical. Let's let the editor take you to learn "how to achieve Transformer with Pytorch".
First, construct data 1.1sentence length # about word embedding, take sequence modeling as an example # there are two input sentences, the first is 2, and the second is 4src_len = torch.tensor ([2,4]). To (torch.int32) # there are two target sentences. The first length is 4, and the second length is 3tgt_len = torch.tensor ([4,3]) .to (torch.int32) print (src_len) print (tgt_len)
There are two input sentences (src_len), the first is 2 in length and the second is 4 in length
There are two target sentences (tgt_len). The first length is 4 and the second length is 3
1.2 generate sentences
Generate sentences with random numbers, fill the blank space with 0, and keep all sentences of the same length.
Src_seq = torch.cat ([torch.unsqueeze (F.pad (torch.randint (1, max_num_src_words, (L,)), (0, max (src_len)-L)), 0) for L in src_len]) tgt_seq = torch.cat ([torch.unsqueeze (torch.randint (1, max_num_tgt_words, (L,)), (0, max (tgt_len)-L) 0) for L in tgt_len]) print (src_seq) print (tgt_seq)
Src_seq is the input of two sentences, tgt_seq is the output of two sentences.
Why are sentences numbers? When doing Chinese-English translation, each Chinese or English also corresponds to a number, which is the only way to facilitate processing.
1.3 generate a dictionary
In this dictionary, there are a total of 8 words (lines), and each word corresponds to an 8-dimensional vector (simplified). Note that in practical application, there should be hundreds of thousands of words, and each word may have 512 dimensions.
# construct word embeddingsrc_embedding_table = nn.Embedding (9, model_dim) tgt_embedding_table = nn.Embedding (9, model_dim) # input word dictionary print (src_embedding_table) # target word dictionary print (tgt_embedding_table)
In the dictionary, you need to leave a dimension for class token, so it is nine lines.
1.4 get vectorized sentences
Take out the sentences obtained in 1.2 through the dictionary
# get the vectorized sentence src_embedding = src_embedding_table (src_seq) tgt_embedding = tgt_embedding_table (tgt_seq) print (src_embedding) print (tgt_embedding)
The general procedure at this stage
Import torch# sentence length src_len = torch.tensor ([2,4]) .to (torch.int32) tgt_len = torch.tensor ([4,3]) .to (torch.int32) # construct a sentence Fill the blanks with 0 src_seq = torch.cat ([torch.unsqueeze (F.pad (torch.randint (1,8, (L,)), (0, max (src_len)-L)), 0) for L in src_len]) tgt_seq = torch.cat ([torch.unsqueeze (torch.randint (1,8, (L,)), (0, max (tgt_len)-L) 0) for L in tgt_len]) # construct the dictionary src_embedding_table = nn.Embedding (9,8) tgt_embedding_table = nn.Embedding (9,8) # to get the vectorized sentence src_embedding = src_embedding_table (src_seq) tgt_embedding = tgt_embedding_table (tgt_seq) print (src_embedding) print (tgt_embedding) II.
Location coding is one of the key points of transformer. By adding transformer location coding, it replaces the time series information of traditional RNN and enhances the concurrency of the model. The formula for position coding is as follows: (where pos represents rows and I represents columns)
2.1 calculate the value in parentheses # to get the numerator pos pos_mat = torch.arange (4). Reshape ((- 1,1)) # get the denominator i_mat = torch.pow (10000, torch.arange (0,8,2). Reshape ((1,-1)) / 8) print (pos_mat) print (i_mat)
2.2 get the position code # initialize the position coding matrix pe_embedding_table = torch.zeros (4,8) # get the even line position code pe_embedding_table [:, 0pe_embedding_table 2] = torch.sin (pos_mat / i_mat) # get the odd line position code pe_embedding_table [:, 1pe_embedding_table / i_mat 2] = torch.cos (pos_mat / i_mat) pe_embedding = nn.Embedding (4 8) # set position coding non-updatable parameter pe_embedding.weight = nn.Parameter (pe_embedding_table, requires_grad=False) print (pe_embedding.weight)
Third, bullish attention 3.1 self mask
Some positions are filled with 0 blanks, and you don't want to be affected by these positions during training, so you need to use self mask. The principle of self mask is to make the values of these positions infinitesimal, after softmax, these values will become 0, will no longer affect the result.
3.1.1 get the effective position matrix
# get the effective position matrix vaild_encoder_pos = torch.unsqueeze (torch.cat ([torch.unsqueeze (F.pad (torch.ones (L), (0, max (src_len)-L)), 0) for L in src_len]), 2) valid_encoder_pos_matrix = torch.bmm (vaild_encoder_pos, vaild_encoder_pos.transpose (1,2)) print (valid_encoder_pos_matrix)
3.1.2 get invalid position matrix
Invalid_encoder_pos_matrix = 1-valid_encoder_pos_matrixmask_encoder_self_attention = invalid_encoder_pos_matrix.to (torch.bool) print (mask_encoder_self_attention)
The True representative needs to mask the location
3.1.3 get mask matrix
Fill in the position that needs to be mask with the minimum number
# initialize mask matrix score = torch.randn (2, max (src_len), max (src_len)) # fill mask_score = score.masked_fill (mask_encoder_self_attention,-1e9) print (mask_score) with decimals
Count it as softmat
Mask_score_softmax = F.softmax (mask_score) print (mask_score_softmax)
As you can see, the expected results have been achieved.
At this point, I believe you have a deeper understanding of "Pytorch how to achieve Transformer", might as well come to the actual operation of it! Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.