How to use Python to make the picture characters move 07/06 Update SLTechnology News&Howtos

How to use Python to make the picture characters move

2025-07-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/03 Report--

This article mainly explains "how to use Python to make picture characters move". Friends who are interested might as well take a look. The method introduced in this paper is simple, fast and practical. Let's let the editor take you to learn "how to use Python to make picture characters move".

There are numerous areas of interest in which video is generated by animated objects in still images, including filmmaking, photography and e-commerce. More precisely, image animation refers to the task of combining the extracted video appearance to automatically synthesize video-a video derived from the source image and motion mode.

In recent years, as an effective image animation technology, depth generation model has appeared video redirection. In particular, generatable confrontation networks (GANS) and variational automatic encoders (VAES) have been used to convert facial expressions or motion patterns between human subjects in video.

According to the paper FirstOrder Motion Model for Image Animation, in the large task of attitude migration, Monkey-Net first tries to use the self-supervised paradigm to predict the key points to represent the attitude information, and estimates the attitude key points of the driving video to complete the migration in the test phase. On this basis, FOMM uses the local affine transformation of adjacent key points to simulate the motion of the object, and takes into account the occlusion part, which can be generated by image inpainting.

Today, with the help of the source code shared in the paper, we will build a model to create the character movement we need. The specific process is as follows.

Preparation before the experiment

First of all, the python version we use is 3.6.5. The modules used are as follows:

The imageio module is used to control the input and output of the image.

The Matplotlib module is used for drawing.

The numpy module is used to handle matrix operations.

The Pillow library is used to load data processing.

Pytorch module is used to create models, model training and so on.

See the requirements.txt file for the complete module requirements.

Loading and invoking of the model

Through the definition of command line parameters to achieve the purpose of loading models, pictures, etc.

(1) the first is the reading of the training model, including the loading mode of the model:

Def load_checkpoints (config_path, checkpoint_path, cpu=False): with open (config_path) as f: config = yaml.load (f) generator = OcclusionAwareGenerator (* * config [model_params] [generator_params] * * config [model_params] [common_params]) if not cpu: generator.cuda () kp_detector = KPDetector (* * config [model_params] [kp_detector_params]) * * config [model_params] [common_params]) if not cpu: kp_detector.cuda () if cpu: checkpoint = torch.load (checkpoint_path) Map_location=torch.device (cpu)) else: checkpoint = torch.load (checkpoint_path) generator.load_state_dict (checkpoint [generator]) kp_detector.load_state_dict (checkpoint [kp_detector]) if not cpu: generator = DataParallelWithCallback (generator) kp_detector = DataParallelWithCallback (kp_detector) generator.eval () kp_detector.eval () return generator, kp_detector

(2) then use the virtual image created by the model to find the best facial features:

Def make_animation (source_image, driving_video, generator, kp_detector, relative=True, adapt_movement_scale=True, cpu=False): with torch.no_grad (): predictions = [] source = torch.tensor (source_ image [np.newaxis] .astype (np.float32)) .permute (0,3,1) 2) if not cpu: sourcesource = source.cuda () driving = torch.tensor (np.array (driving_video) [np.newaxis] .astype (np.float32)) .permute (0,4,1,2,3) kp_source = kp_detector (source) kp_driving_initial = kp_detector (driving [:,: 0]) for frame_idx in tqdm (range (driving.shape [2])): drivingdriving_frame = driving [:,:, frame_idx] if not cpu: driving_framedriving_frame = driving_frame.cuda () kp_driving=kp_ detector (driving_frame) kp_norm = normalize_kp (kp_sourcekp_source=kp_source, kp_drivingkp_driving=kp_driving Kp_driving_initialkp_driving_initial=kp_driving_initial, use_relative_movement=relative, use_relative_jacobian=relative, adapt_movement_scaleadapt_movement_scale=adapt_movement_scale) out = generator (source, kp_sourcekp_source=kp_source, kp_driving=kp_norm) predictions.append (np.transpose (out [prediction] .data.cpu (). Numpy () [0,2,3,1]) return predictions def find_best_frame (source, driving, cpu=False): import face_alignment def normalize_kp (kp): kpkp = kp-kp.mean (axis=0, keepdims=True) area = ConvexHull (kp [:,: 2]). Volume area = np.sqrt (area) kp [:,: 2] = kp [: : 2] / area return kp fa = face_alignment.FaceAlignment (face_alignment.LandmarksType._2D, flip_input=True, device= cpu if cpu else cuda) kp_source = fa.get_landmarks (255th * source) [0] kp_source = normalize_kp (kp_source) norm = float (inf) frame_num = 0 for I Image in tqdm (enumerate (driving)): kp_driving = fa.get_landmarks (255th * image) [0] kp_driving = normalize_kp (kp_driving) new_norm = (np.abs (kp_source-kp_driving) * * 2). Sum () if new_norm < norm: norm = new_norm frame_num = I return frame_num

(3) then define how the command line calls parameters to load pictures, videos, and so on:

Parser = ArgumentParser () parser.add_argument ("- config", required=True, help= "path to config") parser.add_argument ("--checkpoint", default= vox-cpk.pth.tar, help= "path to checkpoint to restore") parser.add_argument ("--source_image", default= sup-mat/source.png, help= "path to source image") parser.add_argument ("--driving_video", default= sup-mat/source.png Help= "path to driving video") parser.add_argument ("--result_video", default= result.mp4, help= "path to output") parser.add_argument ("--relative", dest= "relative", action= "store_true", help= "use relative or absolute keypoint coordinates") parser.add_argument ("- adapt_scale", dest= "adapt_scale", action= "store_true" Help= "adapt movement scale based on convex hull of keypoints") parser.add_argument ("--find_best_frame", dest= "find_best_frame", action= "store_true", help= "Generate from the frame that is the most alligned with source. (Only for faces, requires face_aligment lib) ") parser.add_argument ("-best_frame ", dest=" best_frame ", type=int, default=None, help=" Set frame to start from. ") Parser.add_argument ("--cpu", dest= "cpu", action= "store_true", help= "cpu mode.") Parser.set_defaults (relative=False) parser.set_defaults (adapt_scale=False) opt = parser.parse_args () source_image = imageio.imread (opt.source_image) reader = imageio.get_reader (opt.driving_video) fps = reader.get_meta_data () [fps] driving_video = [] try: for im in reader: driving_video.append (im ) except RuntimeError: pass reader.close () source_image = resize (source_image (256,256)) [...,: 3] driving_video = [resize (frame, (256,256)) [...,: 3] for frame in driving_video] generator, kp_detector = load_checkpoints (config_path=opt.config, checkpoint_path=opt.checkpoint) Cpu=opt.cpu) if opt.find_best_frame or opt.best_frame is not None: I = opt.best_frame if opt.best_frame is not None else find_best_frame (source_image, driving_video) Cpu=opt.cpu) print ("Best frame:" + str (I)) driving_forward = driving_ video [I:] driving_backward = driving_video [: (iTun1)] [::-1] predictions_forward = make_animation (source_image, driving_forward, generator, kp_detector, relative=opt.relative, adapt_movement_scale=opt.adapt_scale, cpu=opt.cpu) predictions_backward = make_animation (source_image, driving_backward, generator Kp_detector, relative=opt.relative, adapt_movement_scale=opt.adapt_scale, cpu=opt.cpu) predictions = predictions_backward [::-1] + predictions_ scheduled [1:] else: predictions = make_animation (source_image, driving_video, generator, kp_detector, relative=opt.relative, adapt_movement_scale=opt.adapt_scale, cpu=opt.cpu) imageio.mimsave (opt.result_video, [img_as_ubyte (frame) for frame in predictions], fpsfps=fps)

Construction of the model

The whole process of model training is a process of image reconstruction, the input is the source image and the drive image, and the output is a new image with the attitude of the drive image, in which the two input images come from the same video, that is, the same object information, so the whole training process is the reconstruction process of the drive image. Generally speaking, it is divided into two modules, one is motion estimation module and the other is imagegeneration module.

The main contents are as follows: (1) the network layer is established as perceptual loss by defining VGG19 model.

You need to set more GUI buttons to manually enter data for forecasting, where the code is as follows:

Class Vgg19 (torch.nn.Module): "Vgg19 network for perceptual loss. See Sec 3.3."" Def _ _ init__ (self, requires_grad=False): super (Vgg19 Self). _ init__ () vgg_pretrained_features = models.vgg19 (pretrained=True). Features self.slice1 = torch.nn.Sequential () self.slice2 = torch.nn.Sequential () self.slice3 = torch.nn.Sequential () self.slice4 = torch.nn.Sequential () self.slice5 = torch.nn.Sequential () for x in range (2) Self.slice1.add_module (str (x) Vgg_pretrained_ employees [x]) for x in range (2,7): self.slice2.add_module (str (x), vgg_pretrained_ configurations [x]) for x in range (7,12): self.slice3.add_module (str (x), vgg_pretrained_ configurations [x]) for x in range (12,21): self.slice4.add_module (str (x) Vgg_pretrained_ employees [x]) for x in range (21, 30): self.slice5.add_module (str (x), vgg_pretrained_ configurations [x]) self.mean = torch.nn.Parameter (data=torch.Tensor (np.array ([0.485, 0.456, 0.406]). Reshape ((1, 3, 1, 1) Requires_grad=False) self.std = torch.nn.Parameter (data=torch.Tensor (np.array ([0.229, 0.224, 0.225]). Reshape ((1,3,1,1) Requires_grad=False) if not requires_grad: for param in self.parameters (): param.requires_grad = False def forward (self X): X = (X-self.mean) / self.std h_relu1 = self.slice1 (X) h_relu2 = self.slice2 (h_relu1) h_relu3 = self.slice3 (h_relu2) h_relu4 = self.slice4 (h_relu3) h_relu5 = self.slice5 (h_relu4) out = [h_relu1, h_relu2, h_relu3, h_relu4 H_relu5] return out

(2) create an image pyramid to calculate the pyramid perceived loss:

Class ImagePyramide (torch.nn.Module): "" Create image pyramide for computing pyramide perceptual loss. See Sec 3.3 "def _ init__ (self, scales, num_channels): super (ImagePyramide, self). _ _ init__ () downs = {} for scale in scales: downs [str (scale). Replace (. , -)] = AntiAliasInterpolation2d (num_channels, scale) self.downs = nn.ModuleDict (downs) def forward (self, x): out_dict = {} for scale, down_module in self.downs.items (): out_dict [prediction_ + str (scale). Replace (-,. )] = down_module (x) return out_dict

(3) Stochastic tps transform with equal variance constraint

Class Transform: "Random tps transformation for equivariance constraints. See Sec 3. 3 "" def _ init__ (self, bs, * * kwargs): noise = torch.normal (mean=0, std=kwargs [sigma_affine] * torch.ones ([bs, 2,3])) self.theta = noise + torch.eye (2,3) .view (1,2) 3) self.bs = bs if (sigma_tps in kwargs) and (points_tps in kwargs): self.tps = True self.control_points = make_coordinate_grid ((kwargs [points_tps], kwargs [points_tps]), type=noise.type () selfself.control_points = self.control_points.unsqueeze (0) self.control_params = torch.normal (mean=0) Std=kwargs [sigma_tps] * torch.ones ([bs, 1, kwargs [points_tps] * * 2]) else: self.tps = False def transform_frame (self, frame): grid = make_coordinate_grid (frame.shape [2:], type=frame.type ()) .unsqueeze (0) gridgrid = grid.view (1) Frame.shape [2] * frame.shape [3], 2) grid = self.warp_coordinates (grid) .view (self.bs, frame.shape [2], frame.shape [3], 2) return F.grid_sample (frame, grid, padding_mode= "reflection") def warp_coordinates (self Coordinates): theta = self.theta.type (coordinates.type ()) thetatheta = theta.unsqueeze (1) transformed = torch.matmul (theta [:,: 2], coordinates.unsqueeze (- 1)) + theta [:,: Transformedtransformed = transformed.squeeze (- 1) if self.tps: control_points = self.control_points.type (coordinates.type ()) control_params = self.control_params.type (coordinates.type ()) distances = coordinates.view (coordinates.shape [0],-1,1,2)-control_points.view (1,1,1) 2) distances = torch.abs (distances). Sum (- 1) result = distances * * 2 resultresult = result * torch.log (distances + 1e-6) resultresult = result * control_params resultresult = result.sum (dim=2) .view (self.bs, coordinates.shape [1], 1) transformedtransformed = transformed + result return transformed def jacobian (self Coordinates): new_coordinates = self.warp_coordinates (coordinates) gradgrad_x = grad (new_coordinates [..., 0] .sum (), coordinates, create_graph=True) gradgrad_y = grad (new_coordinates [..., 1] .sum (), coordinates, create_graph=True) jacobian = torch.cat ([grad_x [0] .unsqueeze (- 2), grad_y [0] .unsqueeze (- 2)] Dim=-2) return jacobian

(4) definition of generator: generator, the given source image and key points attempt to convert the image to cause the main points according to the motion trajectory. Some of the codes are as follows:

Class OcclusionAwareGenerator (nn.Module): def _ init__ (self, num_channels, num_kp, block_expansion, max_features, num_down_blocks, num_bottleneck_blocks, estimate_occlusion_map=False, dense_motion_params=None, estimate_jacobian=False): super (OcclusionAwareGenerator Self). _ init__ () if dense_motion_params is not None: self.dense_motion_network = DenseMotionNetwork (num_kpnum_kp=num_kp, num_channelsnum_channels=num_channels, estimate_occlusion_mapestimate_occlusion_map=estimate_occlusion_map) * * dense_motion_params) else: self.dense_motion_network = None self.first = SameBlock2d (num_channels, block_expansion, kernel_size= (7,7), padding= (3) ) down_blocks = [] for i in range (num_down_blocks): in_features = min (max_features, block_expansion * (2 * * I)) out_features = min (max_features, block_expansion * (2 * * (I + 1)) down_blocks.append (DownBlock2d (in_features, out_features, kernel_size= (3,3), padding= (1) )) self.down_blocks = nn.ModuleList (down_blocks) up_blocks = [] for i in range (num_down_blocks): in_features = min (max_features, block_expansion * (2 * * (num_down_blocks-I) out_features = min (max_features) Block_expansion * (2 * * (num_down_blocks-I-1)) up_blocks.append (UpBlock2d (in_features, out_features, kernel_size= (3,3), padding= (1,1)) self.up_blocks = nn.ModuleList (up_blocks) self.bottleneck = torch.nn.Sequential () in_features = min (max_features) Block_expansion * (2 * * num_down_blocks) for i in range (num_bottleneck_blocks): self.bottleneck.add_module (r + str (I), ResBlock2d (in_features, kernel_size= (3,3), padding= (1,1)) self.final = nn.Conv2d (block_expansion, num_channels, kernel_size= (7,7), padding= (3) 3) self.estimate_occlusion_map = estimate_occlusion_map self.num_channels = num_channels

(5) the discriminator is similar to Pix2PixGenerator.

Def _ _ init__ (self, num_channels=3, block_expansion=64, num_blocks=4, max_features=512, sn=False, use_kp=False, num_kp=10, kp_variance=0.01, * * kwargs): super (Discriminator Self). _ init__ () down_blocks = [] for i in range (num_blocks): down_blocks.append (DownBlock2d (num_channels + num_kp * use_kp if I = = 0 else min (max_features, block_expansion * (2 * * I)), min (max_features) Block_expansion * (2 * * (I + 1)), norm= (I! = 0), kernel_size=4, pool= (I! = num_blocks-1), snsn=sn) self.down_blocks = nn.ModuleList (down_blocks) self.conv = nn.Conv2d (self.down_blocks [- 1] .conv.out _ channels, out_channels=1 Kernel_size=1) if sn: self.conv = nn.utils.spectral_norm (self.conv) self.use_kp = use_kp self.kp_variance = kp_variance def forward (self, x, kp=None): feature_maps = [] out = x if self.use_kp: heatmap = kp2gaussian (kp, x.shape [2:] Self.kp_variance) out = torch.cat ([out, heatmap], dim=1) for down_block in self.down_blocks: feature_maps.append (down_block (out)) out = feature_maps [- 1] prediction_map = self.conv (out) return feature_maps, prediction_map

The effect is as follows:

At this point, I believe you have a deeper understanding of "how to use Python to make picture characters move". You might as well do it in practice. Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.