Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How does python de-watermark pictures and PDF

2025-01-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/02 Report--

Editor to share with you how python for pictures and PDF to watermark, I believe that most people do not understand, so share this article for your reference, I hope you will learn a lot after reading this article, let's go to know it!

Some of the pdf learning materials downloaded from the Internet will carry watermarks, which will greatly affect reading. For example, the following image is intercepted from the pdf file.

Installation module

PIL:Python Imaging Library is a very powerful standard library for image processing on python, but it only supports python 2.7.Therefore, some volunteers have created pillow that supports python 3 on the basis of PIL and added some new features.

Pip install pillow

Pymupdf can use python to access files with the extension * .pdf, .xps, .oxps, .epub, .cbz, or * .fb2. It also supports many popular image formats, including multi-page TIFF images.

Pip install PyMuPDF

Import the modules you need

From PIL import Imagefrom itertools import productimport fitzimport os gets the RGB of the picture

The principle of pdf de-watermarking is similar to that of images. The editor starts by removing the watermark from the image above.

Friends who have studied the computer all know that RGB represents red, green and blue in the computer, (255,0,0) represents red, (0,255,0) represents green, (0,0,255) represents blue, (0,255,255) represents white, (0,0,0) represents black, and the principle of de-watermarking is to turn the color of the watermark into white (255,255,255).

First, get the width and height of the picture, and use the itertools module to obtain the Cartesian product of width and height as pixels. The color of each pixel consists of the first three RGB and the fourth Alpha channel. The Alpha channel is not needed, just RGB data.

Def remove_img (): image_file = input ("Please enter picture address:") img = Image.open (image_file) width, height = img.size for pos in product (range (width), range (height)): rgb = img.getpixel (pos) [: 3] print (rgb) Image de-watermark

Use Wechat screenshots to view the RGB of watermark pixels.

You can see that the RGB of the watermark is (210210210). Here, the sum of RGB is used to determine that the watermark point is the watermark point, and the pixel color is replaced with white. Finally, save the picture.

Rgb = img.getpixel (pos) [: 3] if (sum (rgb) > = 620): img.putpixel (pos, (255,255,255)) img.save

Example result:

PDF de-watermarking

The principle of PDF de-watermarking and picture de-watermarking is roughly the same, after opening the pdf file with PyMuPDF, each page of pdf is converted into a picture pixmap,pixmap has its own RGB, just change the RGB in the pdf watermark to (255,255,255) and finally save it as a picture.

Def remove_pdf (): page_num = 0 pdf_file = input ("Please enter pdf address:") pdf = fitz.open (pdf_file) For page in pdf: pixmap = page.get_pixmap () for pos in product (range (pixmap.width), range (pixmap.height)): rgb = pixmap.pixel (pos [0], pos [1]) if (sum (rgb) > = 620): pixmap.set_pixel (pos [0], pos [1], (255,255) Pixmap.pil_save (f "d:/pdf_images/ {page_num} .png") print (f "{page_num} watermark removal completed") page_num = page_num + 1

Example result:

The picture is converted to pdf

Pictures to pdf need to pay attention to the sorting of pictures, digital file names must first be converted to the int type and then sort. Open the picture with the PyMuPDF module and convert the picture into a single-page pdf with the convertToPDF () function. Insert into a new pdf file.

Def pic2pdf (): pic_dir = input ("Please enter the picture folder path:") pdf = fitz.open () img_files = sorted (os.listdir (pic_dir)) Key=lambda x:int (str (x) .split ('.') [0]) for img in img_files: print (img) imgdoc = fitz.open (pic_dir +'/'+ img) pdfbytes = imgdoc.convertToPDF () imgpdf = fitz.open ("pdf" Pdfbytes) pdf.insertPDF (imgpdf) pdf.save ("d:/demo.pdf") pdf.close () above are all the contents of the article "how python removes watermarks for pictures and PDF" Thank you for reading! I believe we all have a certain understanding, hope to share the content to help you, if you want to learn more knowledge, welcome to follow the industry information channel!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report