- Why should I run this test?
- What do I need to run ths test?
- What am I testing exactly?
- What format should I use for my sequences?
- What are the results?
- Why nothing happens when I press the button "go"?
- Is it possible to run this script on my own server?
- Where is the panic button if everything goes wrong?

You want to test for the genetic homogeneity of your sequences (i.e. an *a priori* known structure). How much my groups differ from one to another? What is the probability of getting such a difference by chance ?

You have a set __dna__ sequences and you can define groups among them.
A group can be either a geographical isolate, a sample taken at a selected time or anything you think is pertinent.
You must have at least 2 groups and each group must have at least two dna sequences.

You are computing the mean number of pairwise difference within (K_{ii} or K_{jj}) and beetween groups (K_{ij}) along with the probability that K_{ij} exceed K_{ii} by chance. More precisely, for two groups, you compute the probability that Ks=average(K_{ii}, K_{jj}) is smaller or equal than a random Ks (obtained by permutation). As suggested in the original article (Hudson et al., Mol Biol Evol, 1992), we actually used a K*s, that is computed using the LOG(1+Kii) instead of Ks.

The sequences must be in aligned FASTA format. To set the groups, we use the sequence comment (the line staring with a '>'). ALL the sequences comments must end with a semi-colon (';') followed by the population name. This name should be void of space. The name is case sensitive. It means that Pop1 is different from pop1 All sequences must have same length (the same number of sites). They can contain indel but positions with indels won't be taken in account. The order in which the population names appear is important. As the script provides a graphic visualization of the probability associated with the K*s, if your sequences are ordered in time, or geographically they will appear in the order of appearance in the alignment. All sequences of a same subpop do not have to follow each other in the sequence set. The first occurence is the one that matters for the graphical output. e.g. on right picture, the first occurence pop5 appears before the first occurene of pop4 in the sequence set:

Imagine your are studying different populations spaced in time or geographically. You want to see on the graphic if you can see a difference beetween the two categories. A line will be drawn on the graphic dividing the groups into the two categories. Pay attention to the order (see above). If you leave the default NoGroup value or enter a blank value, no line will be drawn

Two different matrices of comparaisons of all groups versus all; one matrix is for Ks, the other for the K*s. In the upper part of the matrice you will get all the value and in the lower part the probability associated to this value. In addition you will get a graphic representation of the K*s probability. The more yellow is the square the more similar are the 2 groups The name of the group is reported along with its sample size. This matters because larger samples have more power to unravel a hidden structure.

If you have a good number of long sequences the script should take a few seconds before displaying a message of completion. The bottom line of your navigator should display a line saying "waiting for an answer"

Yes you have to dowload Soon available the package and install it on your machine as explained in the README file

Please use your favorite mail client and drop a line to sophieb(at)abi.snv.jussieu.fr