Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to read Ka and Ks

2025-04-08 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/01 Report--

How to read Ka and Ks, I believe that many inexperienced people do not know what to do. Therefore, this paper summarizes the causes and solutions of the problem. Through this article, I hope you can solve this problem.

What is Ka/Ks?

The ratio of the number of nonsynonymous substitutions per nonsynonymous site (Ka) to the number of synonymous substitutions per synonymous site (Ks).

Ka/Ks represents the ratio of the number of non-synonymous substitution sites (Ka) to the number of synonymous substitution sites (Ks).

I don't understand. Say it again?

First, suppose you are comparing a pair of homologous gene sequences between two species. The forces of evolution usually make these two DNA sequences different. We all know that codons are degenerate, so some differences will lead to the translation of different amino acids (non-synonymous mutation, nonsynonymous changes), and some produce the same amino acid (synonymous mutation, synonymous changes) because of the existence of synonymous codons. By counting all the times of non-synonymous and synonymous substitution of the two sequences directly, we can observe the changes of the sequence. The next step is to make some adjustments to the data.

I already understand, now that there is the evolution of the sequence, why do we have to make adjustments?

Due to the degeneracy of codons, only about 25% of our sequences are synonymous mutations. Assuming that genes are not selected, that is, neutral evolution occurs, the chance for any mutation to change from rarity to common is the same, and is not affected by other external factors. The disappearance of most mutations is random, but we assume that the population size is N, and one allele has just appeared in the population through mutation, so the possibility of it being fixed in a population of 2N alleles is pendant 1 / (2N) (see genetic drift for details).

In the above example, the probability of each mutation being fixed is the same, so the possibility of non-synonymous mutation is the same as that of synonymous mutation. Therefore, in the context of neutral evolution, if we correct the degeneracy of codons, there should be a way to get that the number of non-synonymous mutations is equal to the number of synonymous mutations, namely Ka/Ks=1. Because Ks can tell us the background speed of evolution, a deviation of 1 will tell us about the selection of proteins.

Doesn't that sound simple?

Unfortunately, it's not. Take chestnut, for codons that encode aspartic acid and lysine: they all start with AA, lysine ends with An or G, and aspartic acid ends with T or C. So if C is more likely to mutate into T than C to mutate into An or G (usually such droplets), then the mutation at the third position is more likely to be synonymous. Therefore, many methods of calculating Ka/Ks, taking into account these factors, use different transformation models, which will also make the final Ka/ KS value a little different.

Do you need to consider the time after the separation of the two species?

Good question! Because the sequence changes constantly over time, the number of changes we observe may be less than the number of changes that actually occur. If a base starts with A, in a branch, it is replaced by C, then by T, and then only once in our comparison. Similarly, the complete alignment site we see may have been replaced many times, but ended up with the original base. Fortunately, the actual degree of divergence can be estimated from the total degree of divergence observed. However, no one can solve this problem perfectly: as the number of changes increases, the part of information from the alignment sequence will decrease and gradually approach saturation, in which case the data is useless and the results are inaccurate. Therefore, when calculating Ka/Ks, the sequences with closer genetic distance tend to get more accurate results.

OK, I've got the Ka and KS values, and then what?

Well, you now have a value (Ka) that characterizes the number of protein evolution. Assuming that evolutionary selection does not appear in silent site (sites with low possibility of synonymous mutation), from the neutral theory of evolution, the value of Ks should be proportional to the mutation rate of genes. This is because, assuming that μ is the neutral mutation rate of each generation, although the probability of new neutral mutations entering a fixed rate is 1 / 2N, they are generated at a rate of 2N μ per generation, so the rate of neutral evolution should be 2N μ / 2N = μ, which is characterized by Ka. If calculated in this way, the ratio of Ka to Ks also tells us how genes evolve. From the figure, we find that Ka is usually less than Ks. Because mutations in altered proteins are much less different between the two species than in silent species. That is, in most cases, selection eliminates harmful mutations and leaves the protein unchanged (that is, purified selection, purifyingselection).

In a few cases (usually when immune system genes coevolve with parasites), we find that Ka is much larger than Ks (i.e. Ka / Ks > > 1). This is strong evidence that selection has changed the protein (positive selection, positive selection).

I see, if Ka equals Ks, then the evolution of the sequence must be neutral?

It's not that simple. Neutral evolution is a possibility that cannot be ruled out. However, if part of the gene (such as a protein domain) is in positive selection and others are in purification selection, then you will also get the result that Ka/Ks is equal to 1. However, there are many ways to calculate the Ka/Ks (site model) of each codon in the sequence by considering the phylogeny of the species. In addition, it is possible to detect whether there are different ratios of genes in a pedigree, indicating that something specific to the species has happened (branching model). These analysis methods can reveal more positive choices and provide more analysis directions.

How can I convert my sequence alignment results into Ka/ KS values?

There are many different methods for us to choose from, and the most commonly used and convenient one is MEGA. Of course, the PAML of the veteran teacher Yang Ziheng is also very good. Hyphy software not only provides global Ka/Ks calculation, but also supports various models such as branch sites, but one thing I prefer about Hyphy is that it can be multithreaded. I will show how to use these software in the later push.

After reading the above, have you mastered how to read Ka and Ks? If you want to learn more skills or want to know more about it, you are welcome to follow the industry information channel, thank you for reading!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report