In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-03-26 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/01 Report--
In this issue, Xiaobian will bring you about the principle of macrogenome binning. The article is rich in content and analyzed and described from a professional perspective. After reading this article, I hope you can gain something.
Metagenomic binning is a process of separating sequences belonging to different genomes according to genome characteristics and assembly information. Binning bins (more precisely, strain-level clusters or strain-level taxonomic units) are likely to be genomic sequences of unknown microorganisms that cannot be cultured in the laboratory, and it is important to perform genomic analysis on them [1 ].
In order to isolate a single genome from a metagenome, sequence features or sequence assembly information can be used. The commonly available information mainly includes the following:
a. Genomic characteristics such as frequency of nucleic acid use (usually tetranucleotide frequency), GC content, and required single copy genes;
b. coverage information according to contig sequence;
c. kmer abundance information based on sequencing data;
d. Co-abundance patterns across multiple samples;
e. annotation information obtained by mapping sequences to reference sequences in the database, i.e. species binning.
Depending on the sequence data used, binning strategies can be divided into three categories: clean reads before assembly, contigs after assembly, and annotated genes.
Based on reads binning
The expected depth of genome kmer varies with the abundance of microorganisms in environmental samples. According to the abundance of kmer, reads belonging to different genomes can be clustered directly and separated. The advantage is that it can cluster species with very low abundance in metagenomes and isolate species with close phylogenetic relationship. Considering that the utilization rate of reads in macrogenome assembly is very low, under the condition of single sample 5Gb sequencing quantity, the utilization rate of reads in environmental sample assembly is generally only about 10%, and the utilization rate of reads in intestinal sample or extreme environment sample assembly can generally reach 30%, so that many species, especially the reads of low-abundance species, are not assembled and are not reflected in contig and wasted. Therefore, it is possible to obtain sequencing data of genome of low-abundance species based on reads binning. In practical studies, the LSA (Latent Strain Analysis) method based on reads binning can cluster species with abundance as low as 0.00001%, and is highly sensitive to different strains in the same species [2].
2) Based on genes binning
After the sequence assembly and gene prediction of the metagenome are completed, the predicted genes in all samples are mixed together to remove redundancy to obtain a unique gene set, and the correlation between genes is calculated according to the abundance change mode of genes in each sample, and clustering is carried out by using the correlation. Binning using this strategy can be called CAG (co-abundance gene groups), CAG containing more than 700 genes is called MGS (metagenomic species), CAG can be used for association analysis, MGS can be used for subsequent assembly of single bacteria [3]. Of course, according to the specific clustering algorithm and correlation coefficient, the names of the bins obtained by genes binning are also different. In addition to the above, there are MLG (metagenomic linkage groups), MGC (metagenomic clusters) and MetaOTUs (metagenomic operational taxonomicunits). At the same time, the standards for species annotation of MLG, MGC, MGS and MetaOTUs are also different.
In the published articles on metagenomic association analysis (MWAS) and multi-genome joint analysis, genes binning is used in many metagenomic binning methods, especially in disease MWAS studies [4]. The advantage of this method is that binning based on genes abundance variation pattern is more operable, the process is relatively simple, replicable and the consumption of computer resources is relatively low.
Based on contigs binning
After the macrogenome is assembled, all reads are mapped onto contigs to obtain contig coverage, and then the contig is clustered by integrating GC content, accounting composition and other information to separate contig sequences belonging to different genomes. Contig binning is widely used at present. The most commonly used one is to assemble single species genome. At present, there are many kinds of software based on contig binning [1]. Contigs binning has good effect on species with high abundance, but there are still some defects or there is still a lot of room for improvement. For example, the utilization of nucleic acid composition information is not fully developed. The frequency of four-base use is widely used and accepted because of its simplicity. However, studies have shown that k-mer abundance information is also a good germline feature, and longer k-mers contain more information, and homology between genes and reference genomes is also a valuable germline signal, but these have not been integrated by automated binning software.
Binning results are sensitive to parameter settings, but many binning software programs have limited adjustable parameters, which makes it often necessary to manually adjust them to obtain high-quality bins. The above is what the principle of macrogenome binning shared by Xiaobian is. If there is a similar doubt, please refer to the above analysis for understanding. If you want to know more about it, please pay attention to the industry information channel.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
Author: Tracy Wechat: Tracy19890201
© 2024 shulou.com SLNews company. All rights reserved.