Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to configure tensorflow

2025-01-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/01 Report--

This article mainly introduces "how to configure tensorflow". In daily operation, I believe many people have doubts about how to configure tensorflow. The editor consulted all kinds of materials and sorted out simple and easy-to-use methods of operation. I hope it will be helpful for you to answer the doubts about "how to configure tensorflow". Next, please follow the editor to study!

1. Convert training sample files to collections

[] the whole file

[] each paragraph

[] participle, tag

two。 Convert IOB mode tags to IOBES/B beginning, I middle, E ending, S alone, O other

B-= > S-

I-= > E-

3. Create a character and ID mapping relationship

When the pkl file does not exist

1) create a character and ID mapping dictionary

Extract chars (all phrases) to generate dico (phrase plus word frequency) and then reverse order to generate char_to_id,id_to_char

2) similarly, create a mapping dictionary for tags and id

Extract tags (all tag) to generate dico (tag plus frequency) in reverse order, then generate tag_to_id,id_to_tag, and write it to tag_to_id.txt and id_to_tag.txt

3) store the mapping relationship in the pkl file

(save mapping relationship object obj serialization to file)

Char_to_id,id_to_char,tag_to_id,id_to_tag

When the pkl file exists

Deserialize mapping relational objects to parse the data in the file into char_to_id,id_to_char,tag_to_id,id_to_tag

4. Convert the sample data to a single character, the corresponding id, the corresponding participle to (123), and the corresponding label id list train_data

Data = [] data.append ([string, chars, segs, tags])

String: the list of all the phrases in a paragraph

Chars: the list of id for each word in string in the char_to_id dictionary

Segs:

Tmp = [2] * len (str (word))

Tmp [0] = 1

Tmp [- 1] = 3

Seg_feature.extend (tmp)

String got seg_feature after traversing word through word_list after segment participle (it's a list: [0pje 0rect 0je 1je 2je 3je 1g 2je 2je 2je 3m 0p0]).

Tags: the list of id in the tag_to_id dictionary for each tag in string

5. Batch management to split a batch of data by batch size

Train_manager: batch_data

Len_data

Batch_data: list () batch_data.append ([strings, chars, segs, targets])

6. Generate related paths, generate configuration files

Result_path,ckpt_path,config_file

7.tensorflow configuration

Set step size

Load cach

Create an empty model:

Find the model file name through the checkpoint file

Tf.global_variables_initializer () adds a node to initialize all global variables (GraphKeys.VARIABLES).

Returns an operation (op) that initializes all global variables, and runs this node after building the entire model and loading the model in the session.

Set up to use CPU/GPU

Set the number of training

Traversing the batch after out-of-order train_manager for model training

Step, batch_loss = model.run_step (sess, True, batch)

Loss.append (batch_loss)

Every 20 batches (step size steps_check) calculates the time required for the remaining batches and prints the log (MSE represents the difference between the predicted value and the target value

And then find the average np.mean (loss))

Optimize the model and save it through dev files

Evaluate (sess, model, "dev", dev_manager, id_to_tag)

Def evaluate (sess, model, name, data, id_to_tag): # logger = None

Logger.info ("evaluate: {}" .format (name))

Ner_results = model.evaluate (sess, data, id_to_tag)

Eval_lines = test_ner (ner_results, FLAGS.result_path)

Message = 'eval_lines value: {}' .format (eval_lines)

# print (message)

Logger.info (message)

F1 = float (eval_lines [1] .strip () .split () [- 1])

If name = = "dev":

Best_test_f1 = model.best_dev_f1.eval ()

If F1 > best_test_f1:

Tf.assign (model.best_dev_f1, F1). Eval ()

# print ("new best dev F1 score: {: > .3f}" .format (F1))

Logger.info ("new best dev F1 score: {: > .3f}" .format (F1))

Return F1 > best_test_f1

Elif name = = "test":

Best_test_f1 = model.best_test_f1.eval ()

If F1 > best_test_f1:

Tf.assign (model.best_test_f1, F1). Eval ()

# print ("new best test F1 score: {: > .3f}" .format (F1))

Logger.info ("new best test F1 score: {: > .3f}" .format (F1))

Return F1 > best_test_f1

Def evaluate (self, sess, data_manager, id_to_tag):

Results = []

Trans = self.trans.eval ()

For batch in data_manager.iter_batch ():

Strings = batch [0]

Tags = batch [- 1]

Lengths, scores = self.run_step (sess, False, batch)

Batch_paths = self.decode (scores, lengths, trans)

For i in range (len (strings)):

Result = []

String = strings [I] [: progresses [I]]

Gold = iobes_iob ([id_to_ tagint (x)] for x in tags [I] [: originths[ I]])

Pred = iobes_iob ([id_to_ tagint (x)] for x in batch_ paths [I] [: encrypts [I]])

# gold = iob_iobes ([id_to_ tag [int (x)] for x in tags [I] [: encrypts [I]])

# pred = iob_iobes ([id_to_ tags [int (x)] for x in batch_ paths [I] [: encrypts [I]])

For char, gold, pred in zip (string, gold, pred):

Result.append ("" .join ([char, gold, pred]))

Results.append (result)

Return results

Def test_ner (results, path):

Output_file = os.path.join (path, "ner_predict.utf8")

With open (output_file, "w", encoding='utf8') as f:

To_write = []

For block in results:

For line in block:

To_write.append (line + "\ n")

To_write.append ("\ n")

F.writelines (to_write)

Eval_lines = return_report (output_file)

Return eval_lines

-

Label entity recognition

1. Load configuration parameters of config_file, tensorflow parameters, whether to use gpu or not

two。 Create a new result_file under the specified path, open the map_file file to read char_to_id, id_to_char, tag_to_id, id_to_tag

3. Enter tf_config into the participation load cache

Load the trained model by reading the ckpt file under ckpt_path

4. Test_list = readText2Json ("/ data0/nlp/test/ner/test.txt") to convert the contents of the test file under the corresponding path into dict

Strs = line.replace. Split ('\ njin1').

Res = {}

String = strs [0]

Res ['string'] = string

Value_list = []

Wl = []

Tag_num = 0

For str in strs [1:]:

Value = {}

Pos = find_last (str,':') / / University of Oxford, UK: ORG

Word = str [: pos]

Type = str [pos+1:]

If not type = = 'Olympus:

# print (type)

Tag_num + = 1 / / count excluding type O

Value ['word'] = word

Value ['type'] = type

Value_list.append (value) / / [{'type':'ORG','word':' University of Oxford'}]

Wl.append (word)

If len (value_list) = = 0:

Continue

Res ['value'] = value_list

Res ['wl'] = wl

Res ['tag_num'] = tag_num

Res_list.append (res)

Return res_list

5. Traversing res_list

For tests in test_list:

Line = tests ['string']. Replace (','). Replace ('\ tbrush,'). Replace ('\ sgiving,'')

Res_list = tests ['value']

{'tag_num': 1

Wl': ['Oxford University', 'Publishing House', 'publish', 'multiple', 'English Dictionary', 'collectively']

Value': [{'type':' ORG', 'word':' Oxford University'}, {'type':' Olympiad, 'word':' Press'}, {'type':' Olympiad, 'word':' Publishing'}, {'type':' Olympiad, 'word':''}, {'type':' Olympiad, 'word':' multiple'} {'type':' Olympiad, 'word':' English Dictionary'}, {'type':' Olympiad, 'word':''}, {'type':' Oleg, 'word':' collectively referred to as'}]

'string':' any of various English dictionaries published by Oxford University Press in the UK.

Inputs = input_from_line (string, char_to_id, word_list=wl)

Convert full-width symbols to half-width in string and transcode in html

Inputs = list ()

Inputs.append ([line])

Inputs.append ([[char_to_ id] if char in char_to_id else char_to_id ["] / / convert line to id according to the dictionary table

For char in line]])

Inputs.append ([get_seg_features (line, word_list=word_list)]) / / line is processed by get_seg_features matrix

Inputs.append ([[]])

6. Conduct model training to get the training result result

Result = model.evaluate_line (sess, inputs, id_to_tag)

Def evaluate_line (self, sess, inputs, id_to_tag):

Trans = self.trans.eval (session=sess)

Lengths, scores = self.run_step (sess, False, inputs)

Batch_paths = self.decode (scores, lengths, trans)

Tags = [id_to_ tagidx] for idx in batch_paths [0]] / / the resulting id is converted to tag

Return result_to_json (inputs [0] [0], tags) / / the result of word segmentation is converted to json

Entities = result ['entities']

[{'start': 0,' end': 1, 'type':' Olympiad, 'word':' English'}, {'start': 1,' end': 6, 'type':' ORGANIZATION', 'word':'}, {'start': 6,' end': 9, 'type':' Oval, 'word':' Press, {'start': 9,' end': 11 'type':' Olympiad, 'word':' publication'}, {'start': 11,' end': 12, 'type':' oasis, 'word':''}, {'start': 12,' end': 14, 'type':' Olympiad, 'word':' multiple'}, {'start': 14,' end': 16, 'type':' Olympiad, 'word':' English'} {'start': 16,' end': 18, 'type':' Oval, 'word':' Dictionary}, {'start': 18,' end': 19, 'type':' Olympiad, 'word':'}, {'start': 19,' end': 21, 'type':' Olympiad, 'word':' collectively]

7. Traversing the entities, excluding the data whose type is O, and writing the word and type of the subscript res in the standard answer res_list and the type and word in the identification result entities into the result_file as one line of lin.

And count the number of correct results.

-

For tests in test_list:

Test_acc + = tag_num

-

For index, i in enumerate (entities):

Word = I ['word']

Type = I ['type']

If index > len (res_list)-1:

Break

# if type = ='O' and res_ list [index] ['type'] = =' Olympus:

# continue

If type = = 'Olympus:

Continue

Test_rec + = 1

Lin = res_ list [index] ['word'] +'\ tresume _ list [index] ['type'] +'\ tresume typelist'\ tresume wordlist'\ n'

Start_in = 0

End_in =-1

If index > 5:

Start_in = index-5

If len (res_list)-index > 5:

End_in = index + 5

Result_file.write (lin)

For j in res_ list[start _ in:end_in]:

If word = = j ['word'] and type = = j [' type']:

Test_right = test_right + 1

Break

8. Print accuracy, recall and other related information

Zq = round (test_right / test_rec, 4) * 100

Zh = round (test_right / test_acc, 4) * 100

F1 = round (2 * zq * zh / (zq + zh))

Print ('total number of tags: {}, number of identification tags: {}, number of identification pairs: {}, accuracy: {}, recall: {}, F1: {}' .format (test_acc, test_rec, test_right, zq, zh, F1))

Total number of tags: 8773, number of identification tags: 12260, number of identification pairs: 3542, accuracy: 28.89,recall: 40.37, F1: 34

Test_acc: the number of tags that exclude all other types in the standard answer

Test_rec: the total number of non-other type tags actually identified in the results identified by the model entity

-

Classification model training:

1. Prepare post.xls and other.xls files

Train_datas = []

Data = pd.read_excel (constant.clfytrain_path +'/ {} '.format (file_name), header=None)

Read the contents of the file into train_datas

Fuse the data in two excel files according to the 0 axis (column)

Pn = pd.concat (train_datas, ignore_index=True)

At this point, the study on "how to configure tensorflow" is over. I hope to be able to solve your doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report