Structural databases in use

The databases in use for Yakusa are made up of structures sharing variable amino acid sequence identities (from 20% to 95%). Depending on the particular database (see below), several algorithms can be used for sequence comparisons, the structure group representative can be chosen in several way and proteic structures can be cut up into domains; therefore

ASTRAL databases

The ASTRAL databases are partially derived from protein structures and their sequences.

PDB CLUSTER50,70,90 databases

The cluster databases are derived from protein structures and their sequences. These clusters are delivered within PDB mirror (in pdb/derived_data/NR sub=directory). Note: The file that contains nucleic acid chains and short polypeptides of fewer than 20 amino acids are not clustered.

CULLING databases (PISCES)

For example,"CULLING database 20% identity resolution 1.6A R=0.25" contains PDB structures with

Our database

In order to get accurate results and to limit an uninteresting huge output, we use a non redundant PDB, made up of protein structures which:
  1. are made up of only one PDB chain,
  2. do not share more than 80% identities in amino-acid sequences,
  3. have more than 80 residues.

