The databases in use for Yakusa are made up of structures sharing variable amino acid sequence identities (from 20% to 95%).
Depending on the particular database (see below), several algorithms can be used for sequence comparisons, the structure group representative can be chosen in several way and proteic structures can be cut up into domains; therefore
The cluster databases are derived from protein structures and their sequences. These clusters are delivered within PDB mirror (in pdb/derived_data/NR sub=directory). Note: The file that contains nucleic acid chains and short polypeptides of fewer than 20 amino acids are not clustered.
CLUSTER 50: structures having less than 50% sequence identity to each other.
CLUSTER 70: structures having less than 70% sequence identity to each other.
CLUSTER 90: structures having less than 90% sequence identity to each other.