In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-22 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)05/31 Report--
This article mainly explains "how to use Python to pick up the dragon in poetry". The content in the article is simple and clear, and it is easy to learn and understand. Please follow the editor's train of thought to study and learn "how to pick up the dragon in poetry with Python".
Poetry corpus first of all, we use Python crawlers to crawl poems and make corpus. The crawled pages are as follows:
Crawling poetry
As this article mainly tries to show the ideas of the project, therefore, only crawled the page of three hundred Tang poems, three hundred ancient poems, three hundred Song ci, selected Song ci, a total of about 1100 poems. In order to speed up the crawler, the concurrent crawler is implemented and saved to the poem.txt file. The complete Python program is as follows:
Import reimport requestsfrom bs4 import BeautifulSoupfrom concurrent.futures import ThreadPoolExecutor, wait, ALL_COMPLETED# crawled poetry website urls = ['https://so.gushiwen.org/gushi/tangshi.aspx',' https://so.gushiwen.org/gushi/sanbai.aspx', 'https://so.gushiwen.org/gushi/songsan.aspx', 'https://so.gushiwen.org/gushi/songci.aspx'] poem_links = [] # the URL of the poem for url in urls: # request header headers = {:' Mozilla/5.0 (Windows NT 10.0 WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.87 Safari/537.36'} req = requests.get (url, headers=headers) soup = BeautifulSoup (req.text, "lxml") content = soup.find_all ('div' Class_= "sons") [0] links = content.find_all ('a') for link in links: poem_links.append ('https://so.gushiwen.org'+link['href'])poem_list = [] # crawl poetry page def get_poem (url): # url =' https://so.gushiwen.org/shiwenv_45c396367f59.aspx' # request header headers = {: 'Mozilla/5.0 (Windows NT 10.0) WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.87 Safari/537.36'} req = requests.get (url, headers=headers) soup = BeautifulSoup (req.text, "lxml") poem = soup.find ('div', class_='contson'). Text.strip () poem = poem.replace (',') poem = re.sub (re.compile (r "([sS] *?)"),'' Poem) poem = re.sub (re.compile (r "([sS] *?)"),'', poem) poem = re.sub (re.compile (r "). ([sS] *?) "),', poem) poem = poem.replace ('!','!') Replace ('?','?') Poem_list.append (poem) # using concurrent crawling executor = ThreadPoolExecutor (max_workers=10) # can adjust the parameters of max_workers, that is, the number of threads # submit (): the first is the function, followed by the input parameters of the function, allowing multiple future_tasks = [executor.submit (get_poem, url) for url in poem_links] # to wait for all threads to complete. Before entering the subsequent execution of wait (future_tasks, return_when=ALL_COMPLETED) # write the crawled verse into the txt file poems = list (set (poem_list)) poems = sorted (poems, key=lambda x:len (x)) for poem in poems: poem = poem.replace ('",'). Replace ('','). Replace (':','). Replace ('" '') print (poem) with open ('Flav Ganxtache,' a') as f: f.write (poem) f.write ('') this program has crawled more than 1100 poems And save the poems to the poem.txt file to form our poetry corpus. Of course, these poems can not be used directly, and we need to clean up the data, for example, some poems have irregular punctuation, some are not poems, but just the preface of poems, and so on. This process needs to be operated manually, although it is a little troublesome, but it is also worth it for the effect of the following poetry clauses.
The poetry clause has the poetry corpus, we need to carry on the clause to the poem, the standard of the clause is: according to the ending.?! Make a clause, which can be implemented with a regular expression. After that, the poem with a good clause is written into a dictionary: the key (key) is the pinyin of the first word of the sentence, and the value (value) is the corresponding poem, and the dictionary is saved as a pickle file. The complete Python code is as follows:
As f: poems = f.readlines () sents = [] for poem in poems: parts = re.findall (r'[sS] *? [.?]' , poem.strip () for part in parts: if len (part) > = 5: sents.append (part) poem_dict = defaultdict (list) for sent in sents: print (part) head = Pinyin (). Get_pinyin (sent, tone_marks='marks', splitter=''). Split () [0] poem_ [head] .append (sent) with open ('. / poemDict.pk' 'wb') as f: pickle.dump (poem_dict, f) main () We can look at the contents of the pickle file (poemDict.pk):
Contents of the pickle file (part)
Of course, one pinyin can correspond to multiple poems.
reads the pickle file, writes the program, and runs the program in the form of exe file. In order for to compile into an exe file without errors, we need to rewrite the init.py file of the xpinyin module, copy all the code of the file to mypinyin.py, and put the following code in the code
Data_path = os.path.join (os.path.dirname (os.path.abspath (_ _ file__)), 'Mandarin.dat') is rewritten to
Data_path = os.path.join (os.getcwd (), 'Mandarin.dat') so we have finished the mypinyin.py file. next, we need to write the code (Poem_Jielong.py) for poetry connection. The complete code is as follows:
Import picklefrom mypinyin import Pinyinimport randomimport ctypesSTD_INPUT_HANDLE =-10STD_OUTPUT_HANDLE =-11STD_ERROR_HANDLE =-12FOREGROUND_DARKWHITE = 0x07 # Dark White FOREGROUND_BLUE = 0x09 # Blue FOREGROUND_GREEN = 0x0a # Green FOREGROUND_SKYBLUE = 0x0b # Sky Blue FOREGROUND_RED = 0x0c # Red FOREGROUND_PINK = 0x0d # Pink FOREGROUND_YELLOW = 0x0e # Yellow FOREGROUND_WHITE = 0x0f # White std_out_handle = ctypes.windll Kernel32.GetStdHandle (STD_OUTPUT_HANDLE) # sets the CMD text color def set_cmd_text_color (color Handle=std_out_handle): Bool = ctypes.windll.kernel32.SetConsoleTextAttribute (handle, color) return Bool# reset text to dark white def resetColor (): set_cmd_text_color (FOREGROUND_DARKWHITE) # output text in CMD in the specified color def cprint (mess, color): color_dict = {: FOREGROUND_BLUE,: FOREGROUND_GREEN,: FOREGROUND_SKYBLUE : FOREGROUND_RED,: FOREGROUND_PINK,: FOREGROUND_YELLOW,: FOREGROUND_WHITE} set_cmd_text_color (color_ colors [color]) print (mess) resetColor () color_list = ['blue', 'green', 'sky blue', 'red', 'pink', 'yellow' 'White'] # get the dictionary with open ('. / poemDict.pk', 'rb') as f: poem_dict = pickle.load (f) # for key, value in poem_dict.items (): # print (key, value) MODE = str (input (' Choose MODE (1 for)) 2 for machine connector):') while True: try: if MODE = = '1seam: enter = str (input (' please enter a poem or a word to begin:')) while enter! = 'exit': test = Pinyin (). Get_pinyin (enter, tone_marks='marks' Splitter='') tail = test.split () [- 1] if tail not in poem_dict.keys (): cprint ('unable to pick up the poem. , 'red') MODE = 0 break else: cprint ('Machine reply:% s'%random.sample (poem_dict [tail], 1) [0], random.sample (color_list 1) [0]) enter = str (input ('your reply:')) [:-1] MODE = 0 if MODE = = '2lines: enter = input (' Please enter a poem or a word to begin:') for i in range (10): test = Pinyin (). Get_pinyin (enter, tone_marks='marks') Splitter='') tail = test.split () [- 1] if tail not in poem_dict.keys (): cprint ('- > cannot go on.', 'red') MODE = 0 break else: answer = random.sample (poem_ cargo [tail] 1) [0] cprint ('(% d)->% s'% (iTun1, answer), random.sample (color_list, 1) [0]) enter = answer [:-1] print ('(* show the first 10 return dragons at most). (*)') MODE = 0 except Exception as err: print (err) finally: if MODE not in ['1MZ 2']: MODE = str (input ('Choose MODE (1 for manual connection, 2 for machine connection):') now the structure of the whole project is as follows (Mandarin.dat file is copied from the folder corresponding to the xpinyin module):
Project file
Switch to this folder and enter the following command to generate the exe file:
The exe file generated by pyinstaller-F Poem_jielong.py is Poem_jielong.exe and is located in the dist folder of this folder. In order for exe to run successfully, you need to copy the poemDict.pk and Mandarin.dat files to the dist folder.
Test run run the Poem_jielong.exe file, the page is as follows:
Exe file start page
There are two modes for the poetry dragon connection of this project, one is manual connection, that is, you enter a poem or a word first, and then the computer replies a sentence, and you reply a sentence, and you are responsible for the rules of poetry connection; the other mode is that you enter a poem or a word first, and the machine will automatically output the following poems (up to 10). first tests the manual connection mode:
Manual connection of dragon
retests the machine connection mode:
Thank you for your reading, the above is the content of "how to pick up the dragon in poetry with Python". After the study of this article, I believe you have a deeper understanding of the problem of how to pick up the dragon in poetry with Python, and the specific use needs to be verified in practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.