You want to test for the genetic homogeneity of your sequences (i.e. an a priori known structure). How much my groups differ from one to another? What is the probability of getting such a difference by chance ?
You have a set dna sequences and you can define groups among them. A group can be either a geographical isolate, a sample taken at a selected time or anything you think is pertinent. You must have at least 2 groups and each group must have at least two dna sequences.
You are computing the mean number of pairwise difference within (Kii or Kjj) and beetween groups (Kij) along with the probability that Kij exceed Kii by chance. More precisely, for two groups, you compute the probability that Ks=average(Kii, Kjj) is smaller or equal than a random Ks (obtained by permutation). As suggested in the original article (Hudson et al., Mol Biol Evol, 1992), we actually used a K*s, that is computed using the LOG(1+Kii) instead of Ks.
The sequences must be in aligned FASTA format. To set the groups, we use the sequence comment (the line staring with a '>'). ALL the sequences comments must end with a semi-colon (';') followed by the population name. This name should be void of space. The name is case sensitive. It means that Pop1 is different from pop1 All sequences must have same length (the same number of sites). They can contain indel but positions with indels won't be taken in account. The order in which the population names appear is important. As the script provides a graphic visualization of the probability associated with the K*s, if your sequences are ordered in time, or geographically they will appear in the order of appearance in the alignment. All sequences of a same subpop do not have to follow each other in the sequence set. The first occurence is the one that matters for the graphical output. e.g. on right picture, the first occurence pop5 appears before the first occurene of pop4 in the sequence set:
Imagine your are studying different populations spaced in time or geographically. You want to see on the graphic if you can see a difference beetween the two categories. A line will be drawn on the graphic dividing the groups into the two categories. Pay attention to the order (see above). If you leave the default NoGroup value or enter a blank value, no line will be drawn
Two different matrices of comparaisons of all groups versus all; one matrix is for Ks, the other for the K*s. In the upper part of the matrice you will get all the value and in the lower part the probability associated to this value. In addition you will get a graphic representation of the K*s probability. The more yellow is the square the more similar are the 2 groups The name of the group is reported along with its sample size. This matters because larger samples have more power to unravel a hidden structure.
If you have a good number of long sequences the script should take a few seconds before displaying a message of completion. The bottom line of your navigator should display a line saying "waiting for an answer"
Yes you have to dowload Soon available the package and install it on your machine as explained in the README file
Please use your favorite mail client and drop a line to sophieb(at)abi.snv.jussieu.fr