How to write PDF Converter with Python 04/25 Update SLTechnology News&Howtos

How to write PDF Converter with Python

2025-04-25 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/03 Report--

This article introduces the knowledge of "how to use Python to write PDF converter". In the operation of practical cases, many people will encounter such a dilemma, so let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!

Preface

My friends must have experienced that when you want to turn PDF into WORD, your typing is right in front of you:

If you don't charge, you want to go whoring for nothing? Nice try.

However, bloggers will not back down. After all, it is a traditional virtue to face difficulties. So today's topic comes out: using python to write a PDF to WORD gadget (based on a website interface).

First, train of thought analysis

After searching the Internet, you can find a lot of PDF conversion tools, including online conversion sites, such as this:

Then, through the test interface provided by the website, we can achieve the conversion through crawler simulation.

There is no mistake ~ the train of thought is so simple and clear, today's protagonist is:

Https://app.xunjiepdf.com

Through the packet analysis, we know that this is a POST request, and then we can simulate it with the requests library.

It should be noted that this interface is only used for testing, so the pages available for conversion are limited. For more complete functions, please support the original version.

Second, my code

The so-called 10, 000 coders, there are 10, 000 codes, the following is my code, just for reference.

Import related libraries:

Import time import requests

Define the PDF2Word class:

# 2020 the latest python learning resources sharing: 1156465813 class PDF2Word (): def _ _ init__ (self): self.machineid = 'ccc052ee5200088b92342303c4ea9399' self.token =' self.guid =''self.keytag =' 'def produceToken (self): url =' https://app.xunjiepdf.com/api/producetoken' headers = {' User-Agent': 'Mozilla/5.0 (Windows NT 6.3 Win64; x64; rv:76.0) Gecko/20100101 Firefox/76.0', 'Accept':' application/json, text/javascript, * / *; qroom0.01, 'Accept-Language':' zh-CN,zh;q=0.8,zh-TW;q=0.7,zh-HK;q=0.5,en-US;q=0.3,en QQ 0.2, 'Content-Type':' application/x-www-form-urlencoded Charset=UTF-8', 'Xmuri RequestedMurray Withmasters:' XMLHttpRequest', 'Origin':' https://app.xunjiepdf.com', 'Connection':' keep-alive', 'Referer':' https://app.xunjiepdf.com/pdf2word/',} data = {'machineid':self.machineid} res = requests.post (url,headers=headers Data=data) res_json = res.json () if res_json ['code'] = = 10000: self.token = res_json [' token'] self.guid = res_json ['guid'] print (' successfully obtained token') return True else: return False def uploadPDF (self Filepath): filename = filepath.split ('/') [- 1] files = {'file': open (filepath,'rb')} url =' https://app.xunjiepdf.com/api/Upload' headers = {'User-Agent':' Mozilla/5.0 (Windows NT 6.3) Win64; x64; rv:76.0) Gecko/20100101 Firefox/76.0', 'Accept':' * / *', 'Accept-Language':' zh-CN,zh;q=0.8,zh-TW;q=0.7,zh-HK;q=0.5,en-US;q=0.3,en Qroom0.2, 'Content-Type':' application/pdf', 'Origin':' https://app.xunjiepdf.com', 'Connection':' keep-alive', 'Referer':' https://app.xunjiepdf.com/pdf2word/',} params = (('tasktype',' pdf2word')) ('phonenumber','), ('loginkey','), ('machineid', self.machineid), (' token', self.token), ('limitsize',' 2048'), ('pdfname', filename), (' queuekey', self.guid) ('uploadtime','), ('filecount',' 1'), ('fileindex',' 1'), ('pagerange',' all'), ('picturequality','), ('outputfileextension',' docx'), ('picturerotate'') ('filesequence',' 0 undefined), ('filepwd','), ('iconsize','), ('picturetoonepdf','), ('isshare',' 0'), ('softname',' pdfonlineconverter') ('softversion',' V5.0'), ('validpagescount',' 20'), ('limituse',' 1'), ('filespwdlist','), ('fileCountwater',' 1'), ('languagefrom','') ('languageto','), ('cadverchose','), ('pictureforecolor','), ('picturebackcolor','), ('id',' WU_FILE_1'), ('name', filename), (' type'') 'application/pdf'), (' lastModifiedDate','), ('size','),) res= requests.post (url,headers=headers,params=params Files=files) res_json = res.json () if res_json ['message'] =' upload successfully': self.keytag = res_json ['keytag'] print (' upload PDF' successfully) return True else: return False def progress (self): url = 'https://app.xunjiepdf.com / api/Progress' headers = {'User-Agent':' Mozilla/5.0 (Windows NT 6.3) Win64; x64; rv:76.0) Gecko/20100101 Firefox/76.0', 'Accept':' text/plain, * / *; qroom0.01, 'Accept-Language':' zh-CN,zh;q=0.8,zh-TW;q=0.7,zh-HK;q=0.5,en-US;q=0.3,en QQ 0.2, 'Content-Type':' application/x-www-form-urlencoded Charset=UTF-8', 'Xmuri Origin':' https://app.xunjiepdf.com', 'Connection':' keep-alive', 'Referer':' https://app.xunjiepdf.com/pdf2word/',} data = {'tasktag': self.keytag 'phonenumber':', 'loginkey':', 'limituse':' 1'} res= requests.post (url,headers=headers Data=data) res_json = res.json () if res_json ['message'] =' processed successfully': print ('PDF processing completed') return True else: print ('PDF processing') return False def downloadWord (self Output): url = 'https://app.xunjiepdf.com/download/fileid/%s'%self.keytag res = requests.get (url) with open (output,'wb') as f: f.write (res.content) print (' PDF downloaded successfully ('% s')'% output) def convertPDF (self,filepath Outpath): filename = filepath.split ('/') [- 1] filename = filename.split ('.') [0] + .docx 'self.produceToken () self.uploadPDF (filepath) while True: res = self.progress () if res = = True: break time.sleep (1) self.downloadWord (outpath+filename)

Execute the main function:

If _ _ name__=='__main__': pdf2word = PDF2Word () pdf2word.convertPDF ('001.pdf')

Note: the convertPDF function takes two arguments, the first is the PDF to be converted, and the second is the converted directory. Run, a button into the soul, ".docx" file has been lying in my directory, comfortable ~

This is the end of the content of "how to write PDF Converter with Python". Thank you for reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.