Chromosomes or other long DNA sequences contain many highly similar repeated sub-sequences. While there are efficient methods for detecting strict repeats or detecting already characterized repeats, there is no software available for detecting all approximate repeats in DNA sequences allowing for gaps and a substitution matrix and using a statistical framework. Here, we present an implementation of a two-steps method (seed detection followed by their extensions) that detects those approximate repeats. Our method is computationally efficient enough to handle large sequences and is flexible enough to account for influencing factors such as sequence-composition biases both at the seed detection and alignment levels.
Please download the repseek user's guide that explains in details how repseek works (last version may 08).
If you are using the software, please cite the following manuscript :
Achaz, Boyer, Rocha, Viari and Coissac. Repseek, a tool to retrieve approximate repeats from large DNA sequences. Bionformatics (2006).
PubMed Link
Download and run the linux version
Download and run the macosx version (for Intel)
To compile the C sources of the software, please read help.README and download the latest src RepSeek.9Dec2009.tgz.
If you have any comment, suggestion or bug report, or if you want to be kept informed about possible repseek updates, please contact G. Achaz.
History of releases
RepSeek has been modified largely between summer-05 and winter-06. First, the four modes have been abandonned for a simpler usage. Plus, the statistics on repeats have been largely improved and the old versions should not be used anymore