In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/01 Report--
This article mainly introduces "how to configure tensorflow". In daily operation, I believe many people have doubts about how to configure tensorflow. The editor consulted all kinds of materials and sorted out simple and easy-to-use methods of operation. I hope it will be helpful for you to answer the doubts about "how to configure tensorflow". Next, please follow the editor to study!
1. Convert training sample files to collections
[] the whole file
[] each paragraph
[] participle, tag
two。 Convert IOB mode tags to IOBES/B beginning, I middle, E ending, S alone, O other
B-= > S-
I-= > E-
3. Create a character and ID mapping relationship
When the pkl file does not exist
1) create a character and ID mapping dictionary
Extract chars (all phrases) to generate dico (phrase plus word frequency) and then reverse order to generate char_to_id,id_to_char
2) similarly, create a mapping dictionary for tags and id
Extract tags (all tag) to generate dico (tag plus frequency) in reverse order, then generate tag_to_id,id_to_tag, and write it to tag_to_id.txt and id_to_tag.txt
3) store the mapping relationship in the pkl file
(save mapping relationship object obj serialization to file)
Char_to_id,id_to_char,tag_to_id,id_to_tag
When the pkl file exists
Deserialize mapping relational objects to parse the data in the file into char_to_id,id_to_char,tag_to_id,id_to_tag
4. Convert the sample data to a single character, the corresponding id, the corresponding participle to (123), and the corresponding label id list train_data
Data = [] data.append ([string, chars, segs, tags])
String: the list of all the phrases in a paragraph
Chars: the list of id for each word in string in the char_to_id dictionary
Segs:
Tmp = [2] * len (str (word))
Tmp [0] = 1
Tmp [- 1] = 3
Seg_feature.extend (tmp)
String got seg_feature after traversing word through word_list after segment participle (it's a list: [0pje 0rect 0je 1je 2je 3je 1g 2je 2je 2je 3m 0p0]).
Tags: the list of id in the tag_to_id dictionary for each tag in string
5. Batch management to split a batch of data by batch size
Train_manager: batch_data
Len_data
Batch_data: list () batch_data.append ([strings, chars, segs, targets])
6. Generate related paths, generate configuration files
Result_path,ckpt_path,config_file
7.tensorflow configuration
Set step size
Load cach
Create an empty model:
Find the model file name through the checkpoint file
Tf.global_variables_initializer () adds a node to initialize all global variables (GraphKeys.VARIABLES).
Returns an operation (op) that initializes all global variables, and runs this node after building the entire model and loading the model in the session.
Set up to use CPU/GPU
Set the number of training
Traversing the batch after out-of-order train_manager for model training
Step, batch_loss = model.run_step (sess, True, batch)
Loss.append (batch_loss)
Every 20 batches (step size steps_check) calculates the time required for the remaining batches and prints the log (MSE represents the difference between the predicted value and the target value
And then find the average np.mean (loss))
Optimize the model and save it through dev files
Evaluate (sess, model, "dev", dev_manager, id_to_tag)
Def evaluate (sess, model, name, data, id_to_tag): # logger = None
Logger.info ("evaluate: {}" .format (name))
Ner_results = model.evaluate (sess, data, id_to_tag)
Eval_lines = test_ner (ner_results, FLAGS.result_path)
Message = 'eval_lines value: {}' .format (eval_lines)
# print (message)
Logger.info (message)
F1 = float (eval_lines [1] .strip () .split () [- 1])
If name = = "dev":
Best_test_f1 = model.best_dev_f1.eval ()
If F1 > best_test_f1:
Tf.assign (model.best_dev_f1, F1). Eval ()
# print ("new best dev F1 score: {: > .3f}" .format (F1))
Logger.info ("new best dev F1 score: {: > .3f}" .format (F1))
Return F1 > best_test_f1
Elif name = = "test":
Best_test_f1 = model.best_test_f1.eval ()
If F1 > best_test_f1:
Tf.assign (model.best_test_f1, F1). Eval ()
# print ("new best test F1 score: {: > .3f}" .format (F1))
Logger.info ("new best test F1 score: {: > .3f}" .format (F1))
Return F1 > best_test_f1
Def evaluate (self, sess, data_manager, id_to_tag):
Results = []
Trans = self.trans.eval ()
For batch in data_manager.iter_batch ():
Strings = batch [0]
Tags = batch [- 1]
Lengths, scores = self.run_step (sess, False, batch)
Batch_paths = self.decode (scores, lengths, trans)
For i in range (len (strings)):
Result = []
String = strings [I] [: progresses [I]]
Gold = iobes_iob ([id_to_ tagint (x)] for x in tags [I] [: originths[ I]])
Pred = iobes_iob ([id_to_ tagint (x)] for x in batch_ paths [I] [: encrypts [I]])
# gold = iob_iobes ([id_to_ tag [int (x)] for x in tags [I] [: encrypts [I]])
# pred = iob_iobes ([id_to_ tags [int (x)] for x in batch_ paths [I] [: encrypts [I]])
For char, gold, pred in zip (string, gold, pred):
Result.append ("" .join ([char, gold, pred]))
Results.append (result)
Return results
Def test_ner (results, path):
Output_file = os.path.join (path, "ner_predict.utf8")
With open (output_file, "w", encoding='utf8') as f:
To_write = []
For block in results:
For line in block:
To_write.append (line + "\ n")
To_write.append ("\ n")
F.writelines (to_write)
Eval_lines = return_report (output_file)
Return eval_lines
-
Label entity recognition
1. Load configuration parameters of config_file, tensorflow parameters, whether to use gpu or not
two。 Create a new result_file under the specified path, open the map_file file to read char_to_id, id_to_char, tag_to_id, id_to_tag
3. Enter tf_config into the participation load cache
Load the trained model by reading the ckpt file under ckpt_path
4. Test_list = readText2Json ("/ data0/nlp/test/ner/test.txt") to convert the contents of the test file under the corresponding path into dict
Strs = line.replace. Split ('\ njin1').
Res = {}
String = strs [0]
Res ['string'] = string
Value_list = []
Wl = []
Tag_num = 0
For str in strs [1:]:
Value = {}
Pos = find_last (str,':') / / University of Oxford, UK: ORG
Word = str [: pos]
Type = str [pos+1:]
If not type = = 'Olympus:
# print (type)
Tag_num + = 1 / / count excluding type O
Value ['word'] = word
Value ['type'] = type
Value_list.append (value) / / [{'type':'ORG','word':' University of Oxford'}]
Wl.append (word)
If len (value_list) = = 0:
Continue
Res ['value'] = value_list
Res ['wl'] = wl
Res ['tag_num'] = tag_num
Res_list.append (res)
Return res_list
5. Traversing res_list
For tests in test_list:
Line = tests ['string']. Replace (','). Replace ('\ tbrush,'). Replace ('\ sgiving,'')
Res_list = tests ['value']
{'tag_num': 1
Wl': ['Oxford University', 'Publishing House', 'publish', 'multiple', 'English Dictionary', 'collectively']
Value': [{'type':' ORG', 'word':' Oxford University'}, {'type':' Olympiad, 'word':' Press'}, {'type':' Olympiad, 'word':' Publishing'}, {'type':' Olympiad, 'word':''}, {'type':' Olympiad, 'word':' multiple'} {'type':' Olympiad, 'word':' English Dictionary'}, {'type':' Olympiad, 'word':''}, {'type':' Oleg, 'word':' collectively referred to as'}]
'string':' any of various English dictionaries published by Oxford University Press in the UK.
Inputs = input_from_line (string, char_to_id, word_list=wl)
Convert full-width symbols to half-width in string and transcode in html
Inputs = list ()
Inputs.append ([line])
Inputs.append ([[char_to_ id] if char in char_to_id else char_to_id ["] / / convert line to id according to the dictionary table
For char in line]])
Inputs.append ([get_seg_features (line, word_list=word_list)]) / / line is processed by get_seg_features matrix
Inputs.append ([[]])
6. Conduct model training to get the training result result
Result = model.evaluate_line (sess, inputs, id_to_tag)
Def evaluate_line (self, sess, inputs, id_to_tag):
Trans = self.trans.eval (session=sess)
Lengths, scores = self.run_step (sess, False, inputs)
Batch_paths = self.decode (scores, lengths, trans)
Tags = [id_to_ tagidx] for idx in batch_paths [0]] / / the resulting id is converted to tag
Return result_to_json (inputs [0] [0], tags) / / the result of word segmentation is converted to json
Entities = result ['entities']
[{'start': 0,' end': 1, 'type':' Olympiad, 'word':' English'}, {'start': 1,' end': 6, 'type':' ORGANIZATION', 'word':'}, {'start': 6,' end': 9, 'type':' Oval, 'word':' Press, {'start': 9,' end': 11 'type':' Olympiad, 'word':' publication'}, {'start': 11,' end': 12, 'type':' oasis, 'word':''}, {'start': 12,' end': 14, 'type':' Olympiad, 'word':' multiple'}, {'start': 14,' end': 16, 'type':' Olympiad, 'word':' English'} {'start': 16,' end': 18, 'type':' Oval, 'word':' Dictionary}, {'start': 18,' end': 19, 'type':' Olympiad, 'word':'}, {'start': 19,' end': 21, 'type':' Olympiad, 'word':' collectively]
7. Traversing the entities, excluding the data whose type is O, and writing the word and type of the subscript res in the standard answer res_list and the type and word in the identification result entities into the result_file as one line of lin.
And count the number of correct results.
-
For tests in test_list:
Test_acc + = tag_num
-
For index, i in enumerate (entities):
Word = I ['word']
Type = I ['type']
If index > len (res_list)-1:
Break
# if type = ='O' and res_ list [index] ['type'] = =' Olympus:
# continue
If type = = 'Olympus:
Continue
Test_rec + = 1
Lin = res_ list [index] ['word'] +'\ tresume _ list [index] ['type'] +'\ tresume typelist'\ tresume wordlist'\ n'
Start_in = 0
End_in =-1
If index > 5:
Start_in = index-5
If len (res_list)-index > 5:
End_in = index + 5
Result_file.write (lin)
For j in res_ list[start _ in:end_in]:
If word = = j ['word'] and type = = j [' type']:
Test_right = test_right + 1
Break
8. Print accuracy, recall and other related information
Zq = round (test_right / test_rec, 4) * 100
Zh = round (test_right / test_acc, 4) * 100
F1 = round (2 * zq * zh / (zq + zh))
Print ('total number of tags: {}, number of identification tags: {}, number of identification pairs: {}, accuracy: {}, recall: {}, F1: {}' .format (test_acc, test_rec, test_right, zq, zh, F1))
Total number of tags: 8773, number of identification tags: 12260, number of identification pairs: 3542, accuracy: 28.89,recall: 40.37, F1: 34
Test_acc: the number of tags that exclude all other types in the standard answer
Test_rec: the total number of non-other type tags actually identified in the results identified by the model entity
-
Classification model training:
1. Prepare post.xls and other.xls files
Train_datas = []
Data = pd.read_excel (constant.clfytrain_path +'/ {} '.format (file_name), header=None)
Read the contents of the file into train_datas
Fuse the data in two excel files according to the 0 axis (column)
Pn = pd.concat (train_datas, ignore_index=True)
At this point, the study on "how to configure tensorflow" is over. I hope to be able to solve your doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.