How to realize voiceprint recognition kaldi callhome diarization 02/11 Update SLTechnology News&Howtos

How to realize voiceprint recognition kaldi callhome diarization

2026-02-11 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

This article mainly introduces "how to achieve voiceprint recognition kaldi callhome diarization". In daily operation, I believe many people have doubts about how to achieve voiceprint recognition kaldi callhome diarization. The editor consulted all kinds of materials and sorted out simple and easy-to-use operation methods. I hope it will be helpful to answer the doubts of "how to achieve voiceprint recognition kaldi callhome diarization". Next, please follow the editor to study!

Callhome diarization kaldi is specially used to cluster mixed recording files.

Learn to read the instruction demo in kaldi for yourself.

Individual actions are as follows:

Teps/segmentation/detect_speech_activity.sh-- cmd 'run.pl'-- nj 1-- mfcc-config. / conf/mfcc_hires.conf-- extra-left-context 79-- extra-right-context 21-- extra-left-context-initial 0-- extra-right-context-final 0-- frames-per-chunk 150 data/ljj exp/segmentation_1a/tdnn_stats_asr_sad_1a exp/mfcc_hires exp/segmentation_sad_snr/nnet_tdnn_j_ljj Data/ljj steps/make_mfcc.sh-mfcc-config conf/mfcc.conf-nj 1-cmd "run.pl"-write-utt2num-frames true data/ljj_seg exp/make_mfcc mfcc utils/fix_data_dir.sh data/ljj_seg # Cepstral mean variance normalization (CMVN) local/nnet3/xvector/prepare_feats.sh-nj 1-cmd "run.pl" data/ljj_seg data/ljj_seg_cmn exp/ljj _ seg_cmn cp data/ljj_seg/segments data/ljj_seg_cmn/ utils/fix_data_dir.sh data/ljj_seg_cmn diarization/nnet3/xvector/extract_xvectors.sh-- cmd "run.pl"-- nj 1-- window 1.5-- period 0.75-- apply-cmn false-- min-segment 0.5 exp/xvector_nnet_1a data/ljj_seg_cmn exp/xvectors_ljj_seg diarization/nnet3/xvector/score_plda.sh- -cmd "run.pl-- mem 4G"-- nj 1-- target-energy 0.9 exp/xvector_nnet_1a/xvectors_callhome1 exp/xvectors_ljj_seg exp/xvectors_ljj_seg/plda_scores diarization/cluster.sh-- cmd "run.pl-- mem 4G"-- nj 1-- reco2num-spk data/ljj_seg/reco2num_spk exp/xvectors_ljj_seg/plda_scores exp/xvectors_ljj_seg/plda_scores_num_speakers # How many people speak need to generate-- reco2num-spk data/ljj_seg/reco2num_spk diarization/cluster.sh-- cmd "run.pl-- mem 4G"-- nj 1-- threshold 0 exp/xvectors_ljj_seg/plda_scores exp/xvectors_ljj_seg/plda_scores_threshold_0 the second column is the file name The third column is the start time The fourth column is the moving time, the fifth column is how much time from the moving time, the eighth column is the label of the file as follows: it is known that there are several people speaking in the file, SPEAKER 18642259056-liujinjie.wav 0 0.000 4.510 1 SPEAKER 18642259056-liujinjie.wav 0 4.530 1.660 2 SPEAKER 18642259056-liujinjie.wav 0 6.210 4.880 2 SPEAKER 18642259056-liujinjie.wav 0 11.090 1.660 1 SPEAKER 18642259056-liujinjie.wav 0 12.800 2.130 1 SPEAKER 18642259056-liujinjie.wav 0 14.950 4.400 2 SPEAKER 18642259056-liujinjie.wav 0 19.390 1.810 2 SPEAKER 18642259056-liujinjie.wav 0 21.220 5.220 2 SPEAKER 18642259056-liujinjie.wav 0 26.440 4.410 1 SPEAKER 18642259056-liujinjie.wav 0 30.850 2.480 2 SPEAKER 18642259056-liujinjie.wav 0 33.340 5.120 2 SPEAKER 18642259056-liujinjie.wav 0 38.460 5.990 1 SPEAKER 18642259056-liujinjie.wav 0 44.480 3.910 1 SPEAKER 18642259056-liujinjie.wav 0 48.460 1 SPEAKER 18642259056-liujinjie.wav 0 52.060 5.420 1 SPEAKER 18642259056-liujinjie.wav 0 57.530 5.030 1 I do not know how many people in the file are talking when SPEAKER 18642259056-liujinjie.wav 0 0.000 4.510 1 SPEAKER 18642259056-liujinjie.wav 04.530 1.660 3 SPEAKER 18642259056-liujinjie.wav 0 6.210 4.880 2 SPEAKER 18642259056-liujinjie.wav 0 11.090 1.660 1 SPEAKER 18642259056-liujinjie.wav 0 12.800 2.130 1 SPEAKER 18642259056-liujinjie.wav 0 14.950 4.400 2 SPEAKER 18642259056-liujinjie.wav 0 19.390 1.810 2 SPEAKER 18642259056-liujinjie.wav 0 21.220 5.220 2 SPEAKER 18642259056-liujinjie.wav 0 26.440 4.410 1 SPEAKER 18642259056-liujinjie.wav 0 30.850 2.480 2 SPEAKER 18642259056-liujinjie.wav 0 33.340 5.120 2 SPEAKER 18642259056-liujinjie.wav 0 38.460 1 SPEAKER 18642259056-liujinjie.wav 0 44.480 3.910 1 SPEAKER 18642259056-liujinjie.wav 0 48.460 3.460 1 SPEAKER 18642259056-liujinjie.wav 052.060 5.420 1 SPEAKER 18642259056-liujinjie.wav 057.530 5.030 1 then use pydub The splicing of voice clips has come to this. The study on "how to achieve voiceprint recognition kaldi callhome diarization" is over. I hope to be able to solve your doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.