|
PROTIN (CCP4: Unsupported Program)NAMEprotin - prepare restraints file for old refmac (obsolete).SYNOPSISprotin xyzin foo_cycle_i.pdb dictprotn protin.dic
protcounts foo_cycle_i.counts protout foo_cycle_i.protout
CONTENTS
DESCRIPTIONPROTIN is run before the restrained maximum likelihood refinement program REFMAC to prepare an input file which contains the ideal and observed atomic parameters and details of the restraints to be applied. The program uses a dictionary of ideal protein geometry. You have a choice of three dictionaries: the default is based on John Priestle's dictionary and with added co-factors, John's original and a dictionary from Victor Lamzin. See DICTPROTN below. There are some examples of input for different cases given below. KEYWORDED INPUTThe various data control lines are identified by the following keywords: CHNNAM, CHNTYP, CONTACTS, DISULPHIDE, HATOM, LIST, NONX, PEPP, SPECIAL, SYMMETRY, TITLE, VDWCUT, VDWRADII, END. DESCRIPTION OF KEYWORDSCompulsory keywords are: CHNNAM CHNTYP CHNNAM ID <chainid> CHNTYP <ichtyp> [ROFFSET <nroffset>]This keyword must be given once for each chain in the input
coordinate file. Subsidiary keywords:
CHNTYP <ichtyp> <type information>...<ichtyp> is the chain type code (1 to NCHTYP). This must match the information on the CHNNAM input. <type information>... may be one or more of: NTERMinal <nres0> <resnam> <ntertyp> CTERMinal <creso> <resnam> <ctertyp> WATer | SOLvent | HOH | NON-protein DISULphide <ndss> <ires1> <jres1> ... <ires_ndss> <jres_ndss> CISPeptide <ncis> <ires1> ... <ires_ncis> MULPLN <nmlp> <ires1> ... <ires_nmlp> CARBOhydrate <ncarb> <ires1> ... <ires_ncarb> SECOndary <ires1> <ires2> <kode> SPECial RESN <ires1> <ires2> | RESN <ires1> TO <ires2> ATNAMe <atid1> <atid2> DIST <dist> <ikwt> <ibwt> SUGA <angle> One or more CHNTYP keywords can be entered for any chain type. Examples: CHNTYP 1 NTERM 1 GLY 3 CTERM 21 ALA 2 MULPLN 1 143 CISPRO 1 17 CHNTYP 1 DISUL 1 6 11 SECO 246 256 1 CHNTYP 1 SPEC RESN 202 107 ATNA CA NE2 DIST 2.1 5 1 SUG 120. 0 0 CHNTYP 1 NTER 1 GLY 3 CTER 21 ASN 2 DISU 1 6 11 CHNTYP 2 WAT Subsidiary keywords to CHNTYP:
= 1 ALPHA HELIX (3.6/13) -64.0 -40.0 = 2 3-10 HELIX (3.0/10) -75.5 -4.5 = 3 PI HELIX (4.4/16) -57.1 -69.7 = 4 COLLAGEN-TYPE HELIX -64.0 145.0 = 5 PARALLEL BETA SHEET -119.0 113.0 = 6 ANTI-PARALLEL BETA -139.0 135.0 = 7 CLASSIC BETA-BULGE 1 -95.0 -65.0 = 8 CLASSIC BETA-BULGE 2 -130.0 150.0 = 9 BETA-BEND I (1-4) 2 -70.0 -30.0 = 10 BETA-BEND I (1-4) 3 -90.0 10.0 = 11 BETA-BEND II (1-4) 2 -60.0 130.0 = 12 BETA-BEND II (1-4) 3 80.0 0.0 = 13 GAMMA-TURN (1-3) 1 172.0 128.0 = 14 GAMMA-TURN (1-3) 2 68.0 -61.0 = 15 GAMMA-TURN (1-3) 3 -131.0 162.0 Subsidiary keywords to CHNTYP SPEC: Default PROLSQ weights are: Bonding length (1-2 neighbour) 0.02A Angle related distance (1-3 neighbour) 0.04A Intraplanar distance (1-4 neighbour) 0.05A H bond or metal coordination distance 0.05A Special distance: ???? <ibwt> gives the code number for the Bfactor restraint to be used
in PROLSQ. Main chain bond (1-2 neighbour) 1.00A**2 Main chain angle (1-3 neighbour) 1.50A**2 Side chain bond 1.00A**2 Side chain angle 1.50A**2 Special ????? XSIAN is the angle O5-C1-On (default 107.6) CHIAN is the angle C2-C1-On (default 109.0) See CCP4 newsletter 17, March 1986, article by Pete Artymiuk for further
details. CONTACTS [NOINtermolecular] | [TRANS <ntrans>] | [ CONFormers ] | [ OCCUP <bump_occ> ] | [ NAMEd <nbump> CHNNAM <chnid1> RESNo <ires1> ... ]NOTE: CONTACTS requires the presence of the SYMM keyword. Specify van der Waals contacts searches are to be done. By default, non bonded contacts between all symmetry equivalent molecules are checked. The number of unit cell translations to be tested can be defined by TRANS. Translations up to <ntrans> unit cells will be applied (positive and negative). <ntrans> defaults to 2. To prevent symmetry checking, specify NOINtermolecular. Atoms are considered as bumping if (a) the sum of their occupancies is greater tham bump_occ. ( default 1.2). If you wish to reset this use key word:
ATOM 233 N HIS B 10 0 -1.988 5.291 5.945 1.00 13.55 7 ATOM 234 CD2AHIS B 10 0 -6.528 3.345 6.769 0.50 15.87 6 ATOM 235 NE2AHIS B 10 0 -7.263 2.526 5.971 0.50 21.09 7 ATOM 236 CE1AHIS B 10 0 -6.574 1.869 5.099 0.50 19.35 6 ATOM 237 ND1AHIS B 10 0 -5.338 2.215 5.293 0.50 18.85 7 ATOM 238 CG AHIS B 10 0 -5.260 3.171 6.298 0.50 18.26 6 ATOM 404 CD2BHIS B 10 0 -1.925 2.079 7.626 0.50 15.07 6 ATOM 405 NE2BHIS B 10 0 -1.383 0.986 6.999 0.50 18.07 7 ATOM 406 CE1BHIS B 10 0 -2.210 0.582 6.038 0.50 16.09 6 ATOM 407 ND1BHIS B 10 0 -3.240 1.428 6.021 0.50 18.78 7 ATOM 408 CG BHIS B 10 0 -3.067 2.379 7.005 0.50 16.62 6 .... ATOM 656 OE2AGLU D 13 0 -0.518 4.288 0.606 0.67 18.74 ATOM 658 OE1AGLU D 13 0 1.477 4.954 1.382 0.67 18.61 ATOM 660 CD AGLU D 13 0 0.555 4.929 0.567 0.67 17.79 ATOM 657 OE2BGLU D 13 0 1.114 3.390 -0.972 0.33 18.14 ATOM 659 OE1BGLU D 13 0 0.766 3.550 1.136 0.33 18.70 ATOM 661 CD BGLU D 13 0 0.836 4.077 0.029 0.33 17.79 DISULPHIDE <ndss> CHNNAM <chnid1> <chnjd2> <ires1> <jres1> ...Use primary keyword DISU to specify disulphides between different chains
(max MXSS given at top of output, currently 20). HATOM <ihatom><ihatom> defines the required type of hydrogen atom treatment:
LIST [ ALL | SOME | FEW ]
NONX <kchn> CHNID <chnid1> <chnid2> .. <chnid_kchn> NSPA <nsp> <ires1 jres1 kode1> ... <ires_nspab jres_nsp kode_nsp> [MATRIX]sets up the restraint to noncrystallographic symmetry. Example: <kchn> is the number of chains in the symmetry group. Subsidiary keywords:
KODA Main_Chain Side_Chain 1 tight restraint tight restraint 2 tight restraint medium restraint 3 tight restraint loose restraint 4 medium restraint medium restraint 5 medium restraint loose restraint 6 loose restraint loose restraint R11 R12 R13 T1 R21 R22 R23 T2 R31 R32 R33 T3 R and T are the rotation and translation matrices for the symmetry transformation. PEPP <napep><napep> (default 5) is the number of atoms in the main chain that should be restrained to lie in the same planes (atoms of the link group): =4, restrain Ca, C, O, N to one plane. =5, restrain Ca, C, O, N, Ca to one plane (default). SPECIAL [ CHNNAM <chnid1> <chnid2> RESNo <ires1> <ires2> ] | [ ATNAMe <atid1> <atid2> ] | [ DIST <dist ikwt ibwt>] | [ SYMM <symop> ]Special distances restraint between atoms on different chains. (Max MXDG given at top of output, currently 100) Example: Subsidiary keywords:
Bonding length (1-2 neighbour) 0.02A Angle related distance (1-3 neighbour) 0.04A Intraplanar distance (1-4 neighbour) 0.05A H bond or metal coordination distance 0.05A Special distance: ???? <ibwt> gives the code number for the Bfactor restraint to be used in PROLSQ and can be 1,2,3,4,or 5 Main chain bond (1-2 neighbour) 1.00A**2 Main chain angle (1-3 neighbour) 1.50A**2 Side chain bond 1.00A**2 Side chain angle 1.50A**2 Special ????? SYMMETRY <nspgrp> | <NAMSPG>Specify the space group in International Tables style. Default is <nspgrp> = 1. TITLE <string>Title used on the printer output. VDWCUT <vdwcut><vdwcut> is the cut-off value for looking for possible Van der Waals contacts. This saves time. (Default=5A if VDWCUT=0 or not specified.) VDWRADII <nvdw> <type_1> <icode_1> <dvdw_1> ... <type_nvdw> <icode_nvdw> <dvdw_nvdw>Change Van-der-Waals contact distances for some atom types. The default atom types and contact distances held are as follows: TYPE ICODE DVDW C 1 3.70 N 2 3.10 O 3 3.00 S 4 3.60 FE 5 2.40 H 6 2.40 CA 7 3.80 I 8 4.30 C_SP2 9 3.40 OW 10 3.00 Example: VDWR 1 ZN 7 3.00
In the PROTIN dictionary, ICODE numbers have been assigned to standard atom TYPE names (C,N etc.). PROTIN assigns Van-der-Waal contact distances using these ICODEs. It is essential that you check the PROTIN dictionary to see that the atom type you are interested in has the same icode as you assign here. Part of dictionary: 1 26 NIC J -10.73511 -2.60989 1.44399 7 1 P -6.68856 0.79139 -2.59833 2 2 N1 ..........................................ICODE .......................... If you are changing the default DVDW be careful to choose an ICODE which isn't used for other atoms types in your coordinate file. So VDWR 1 ZN 7 3.00 will override the contact distance for all atom types with ICODE = 7 including for example P. If DVDW=0 then no VDW restraint is applied to atoms with this ICODE. ENDThis is obligatory as the final card. EXAMPLESSimpleprotin XYZIN $CTEST/toxd/toxd.pdb PROTOUT $SCRATCH/protout.dat PROTCOUNTS $SCRATCH/counts.dat << END-protin CHNNAM ID B CHNTYP 1 CHNNAM ID W CHNTYP 2 !CHNTYP NTER=N-terminal resid type;CTER=C-terminal resid type CHNTYP 1 NTER 1 GLN 3 CTER 59 GLY 2 CHNTYP 2 WAT PEPP 4 SYMM 19 VDWRadii 1 CA 7 3.8 VDWCUT 5 CONTACTS END END-protin ... WITH CARBOHYDRATEThere are 4 copies of the protein chain, and 4 solvent chains. The protein chains start at residue 24 and end at 408. There are two carbohydrate groups (the NAG residues 410,411) attached to the one glycosylated residue 315. The NAG linkage is beta(1-4). protin << eof TITLE REFINE OVALBUMIN CHNNAM ID A CHNTYP 1 CHNNAM ID B CHNTYP 1 CHNNAM ID C CHNTYP 1 CHNNAM ID D CHNTYP 1 CHNNAM ID E CHNTYP 2 CHNNAM ID F CHNTYP 2 CHNNAM ID G CHNTYP 2 CHNNAM ID H CHNTYP 2 CHNTYP 1 NTERM 24 GLY 5 CTERM 408 PRO 2 DISUL 1 96 143 CARBO 1 315 CHNTYP 1 SPEC RESN 410 315 ATNA C1 ND2 DIST 1.4 1 3 SUGA -120 CHNTYP 1 SPEC RESN 411 410 ATNA C1 O4 DIST 1.4 1 3 SUGA -114 CHNTYP 2 NTERM 0 HOH 3 CTERM 0 HOH 2 PEPP 4 VDWRadii 1 CU 8 0.3 VDWCUT 5 CONTACTS NOINTER SYMM 1 END eof ... WITH NONCRYSTALLOGRAPHIC SYMMETRYThere are 2 identical protein chains and 1 solvent chain. The N terminus is an acetylated ALA. Residue 401 in each protein is multiplanar group. The distance between atoms N7N and OG of residues 401 and 160 within the same chain is restrained. Noncrystallographic symmetry between the protein chains is defined and 2 residue spans have reduced restraint. protin XYZIN bin5.pdb DICTPROTN ideal_nad.dat PROTOUT protout.dat PROTCOUNTS counts.dat << END-protin TITLE restrain LDH/NADH dimer CHNNAM ID A CHNTYP 1 CHNNAM ID B CHNTYP 1 CHNNAM CHNTYP 2 CHNTYP 1 NTERM 1 ALA 5 CTERM 331 PHE 2 CHNTYP 1 MULPLN 1 401 CISPRO 1 138 SECO 246 256 1 CHNTYP 1 SECO 246 256 1 SPEC RESN 401 160 ATNA N7N OG DIST 4.5 1 3 CHNTYP 2 WAT CONTACTS TRANS 1 LIST FEW NONX 2 CHNID A B NSPANS 2 72 107 2 197 219 4 PEPP 4 SYMM 18 VDWRadii 1 CA 8 3.8 VDWCUT 5 END END-protin # ... WITH RNAThere are 4 protein chains, each linked with an RNA chain. This example includes both protein and single stranded RNA. Links are defined for the RNA (or DNA) suger-phosphate backbone. protin XYZIN coords.pdb \ PROTOUT hktmp.protout PROTCOUNTS hktmp.counts \ DICTPROTN ${saved}/protin_vl_rna.idl << EOF-protin TITLE U1A/RNA 1.92A cycle 391 SYMMETRY P6522 CHNNAME ID A CHNTYP 1 ROFFSET 0 CHNNAME ID B CHNTYP 1 ROFFSET 0 CHNNAME ID C CHNTYP 1 ROFFSET 0 CHNNAME ID P CHNTYP 2 ROFFSET 0 # RNA CHNNAME ID Q CHNTYP 2 ROFFSET 0 # RNA CHNNAME ID R CHNTYP 2 ROFFSET 0 # RNA CHNNAME ID X CHNTYP 3 ROFFSET 0 # Cl ion + glycerol CHNNAME ID U CHNTYP 3 ROFFSET 0 # waters CHNNAME ID V CHNTYP 3 ROFFSET 0 CHNNAME ID W CHNTYP 3 ROFFSET 0 CHNNAME ID Y CHNTYP 3 ROFFSET 0 CHNTYP 1 NTER 1 SER 3 CTER 99 SER 2 CHNTYP 2 WAT # or NONprotein CHNTYP 3 WAT LIST FEW PEPP 5 CHNTYP 2 SPEC ATNAM O3' P RESNO 1 TO 21 DIST 1.61 1 1 CHNTYP 2 SPEC ATNAM C3' P RESNO 1 TO 21 DIST 2.61 2 2 CHNTYP 2 SPEC ATNAM O3' O1P RESNO 1 TO 21 DIST 2.53 2 2 CHNTYP 2 SPEC ATNAM O3' O2P RESNO 1 TO 21 DIST 2.53 2 2 CHNTYP 2 SPEC ATNAM O3' O5' RESNO 1 TO 21 DIST 2.53 2 2 END EOF-protin # INPUT AND OUTPUT FILESInput Files
CRYST1 a b c alpha beta gamma (a6,3f9.3,3f7.2) SCALE1 ..................... SCALE2 ..................... SCALE3 .....................
.... ATOM 94 O TYR A 14 15.430 34.659 33.979 1.00 20.98 2 ATOM 95 CB ATYR A 14 16.476 33.812 37.212 0.50 22.22 2 ATOM 96 CB BTYR A 14 16.502 33.908 37.229 0.50 19.09 2 .....
The default dictionary $CLIBD/protin.dic was based on John Priestle's using the Engh and Huber parameters [Acta Cryst. A47, 392-400, 1991]. There is also a RNA/DNA dictionary composing five residues ADE, CYT, GUA, THY and URA plus other co-factors. A list is given below: Acetyl - CoA ACO Rna Adenine ( =DNA + O2' ) ADE Adenine Ribose Phosphate of CoA AP Adenosine Triphosphate ATP BME chloramphenicol CLM Co-enzyme A COA Rna Cytosine ( =DNA + O2' ) CYT dfp ??? DFP fucose FUC galactose GAL glucose GLC guanosine monophosphate - all 3 possible mono phosphates included GMP Rna Guanine ( =DNA + O2' ) GUA Haem HEM 4-iodophenol IPH IND Transition state intermediate in CAT INT Isoquialine ISQ mannose MAN M-cresol MCR Methyl paraben In insulin MPB methotrexate MTX Nicotinamide adenine dinucleotide NAD N-acetyl Glucosamine NAG nicotinamide - part of NAD NIC oxalic acid OXA Phenyl acetic acid PAA Phenyl methyl Sulphonyl PMS resorcinol In insulin RES Resorcinol RPH sialic acid SIA Thiocyanate SCN SO4 SUL Dna Thymine THY trimethoprim - bacterial DHFR inhibitor TMP Rna Urasil ( =DNA + O2' ) URA xylose XYL
Output Files
NOTES
Number Type 2 C-Terminal 3 N-Amino 4 N-Formyl 5 N-Acetyl Number Type Slc Number Type Slc 1 ALA A 13 MET M 2 ARG R 14 PHE F 3 ASN N 15 PRO P 4 ASP D 16 SER S 5 CYS C 17 THR T 6 GLN Q 18 TRP W 7 GLU E 19 TYR Y 8 GLY G 20 VAL V 9 HIS H 21 HEM X 10 ILE I 22 WAT O Water 11 LEU L 23 SUL U Sulphate 12 LYS K e.g. ATOM ... CE1A .. TYR ...............x y z 0.4 b ATOM ... CE1B .. TYR ...............x y z 0.6 b PRINTER OUTPUTThe printer output is divided into a number of sections as indicated below. In the sections of distances, only distances that deviate by more than 0.2 Angstroms from the ideal values are listed if LIST FEW is given. If LIST SOME is given then the control data, the dictionary of the standard groups (the first part of the dictionary listing) and the section giving the summary counts are output. With LIST ALL, all of the following is output. a) Details of the input control data. b) Details of the ideal protein dictionary under the following headings: Dictionary of Standard Groups. Dictionary of Distance Restraints. Dictionary of Planar Groups. Dictionary of Chiral Centres. Dictionary of Potential Contact Restraints. Dictionary of Conformation Torsional Angles. Dictionary of Secondary Structure. c) List of the atomic coordinates. Missing atoms for standard groups are listed at the end of this section. The items listed for each atom are: The atom sequence number (following the input order) The atom name. The chain number. The residue name. The residue number. The atom type number. The atom number within the residue. X, Y, Z, B, OCC. The chain identifier. The single letter amino acid code. The atom label (residue no. + atom name + chain identifier) for PROLSQ. d) Table of interatomic distances giving ideal and model values. Those distances deviating by more than 0.2 Angstroms from the ideal distance are flagged with an asterisk. The items listed for each distance are as follows: The distance sequence number. (The position in the list of distances). The sequence number of the first atom. The sequence number of the second atom. The residue number of the first atom (with chain offset). The label of the first atom (residue no. + atom name). The residue number of the second atom. The label of the second atom (residue no. + atom name). The ideal distance. The observed (model) distance. The distance type code KDWT (See Appendix A Card 2b.) The distance type code KBWT (See Appendix A Card 2b.) blank or * The distances are listed for the following categories: Intra-residue distances. Inter-residue or link distances. Special distances including Disulphides. Externally defined distances. e) Listing of planar groups. This includes terminal group, link group and side chain planar groups followed planes within a special group e.g. a Haem group. The items given are:
f) Table of chiral centres. the items given are: The chiral centre sequence number. The residue name. The residue number. The 4 atoms around the chiral centre (residue number + atom name). The six distance codes. The ideal chiral volume. The model chiral volume. g) Possible Van-der-Waals contacts for Intra-residue and Link contacts, Inter-residue contacts and possible Hydrogen Bonds. The items given are: The Van-der-Waals contact sequence number. The sequence number of the first atom. The sequence number of the second atom. The residue number of the first atom + atom name. The residue number of the second atom + atom name. The ideal Van-der-Waals contact distance. The observed distance. The distance type code (See Appendix A.). * if the observed is too close. h) Conformational torsion angles (Preceded by details of any secondary structural elements defined.) The items given are:
i) Non-crystallographic symmetry. This gives for each symmetry group
(maximum 2): j) The thermal ellipsoid specifiers. These are only listed if LIST ALL
is used. The items given for each atom are: k) The summary counts for the following items: The number of atoms (NA). The number of distances (NDIS). The number of planes (NPLN). The number of chiral centres (NCHR). The number of possible contacts (NVDW). The number of present contacts. The number of torsion angles (NTOR). The number of group 1 symmetry equivalences (NSYM1). The number of group 2 symmetry equivalences (NSYM2). The number of thermal ellipsoid specifiers. The number of variable occupancy factors. ERROR MESSAGESErrors in control data A syntax error in a numerical field of a data control card will give
the following error message and the program will stop. Errors when reading dictionary file b) Planes c) Chiral centres d) Contacts e) Torsion angles Errors in reading atoms and preparing the output file If a chain identifier found in the input file was not defined in the
control data the following message will be printed. An unidentified residue type will give the following message and the
program will stop. An unidentified atom name will give the following message and the atom
will be omitted though the program will continue. The following message will be printed if missing atoms cause problems
in the definition of a plane. REFERENCES
AUTHORSAuthorship : W.A. Hendrickson and J.H. Konnert. Modifications for this version including the conversion to FORTRAN 77
have been made by W. Pulford (Oxford), E.J. Dodson (York) and J.W. Campbell
(Daresbury). The documentation for Daresbury was prepared by J.W. Campbell
and was based on existing documentation by A. Sielecki, A. Wlodawer and
W. Hendrickson. PROGRAM FUNCTION AND STRUCTUREPROTIN is run prior to running the restrained maximum likelihood protein refinement program REFMAC (see refs. 1, 2 and 3). The program reads in a standard dictionary of protein geometry and then reads in an input coordinate file and compares the observed geometry with the ideal geometry. A file (PROTOUT) of data for use by REFMAC is created containing the following sections of data. These are in the order indicated though all sections need not necessarily be present. Atom list List of distances List of planar groups List of chiral centres List of possible contacts List of torsion angles List of non-crystallographic symmetry data List of thermal ellipsoids The main control of the program is divided amongst 6 subroutines PROT1 to PROT6 which are called in sequence from a jiffy main program. The main functions of these subroutines are outlined below:
DATA FORMATSFormat of PROTOUTThis is an unformatted sequential file divided into a number of sections as described below. These are present in the file in the order described.
word 1 (Integer) Atom number. word 2 (Character*9) Atom label (residue code,residue no.,atom ID,chain ID). word 3 (Integer) Atom type number (1=C, 2=N etc. See Appendix A). word 4 (Real) Fractional coordinate X. word 5 (Real) Fractional coordinate Y. word 6 (Real) Fractional coordinate Z. word 7 (Real) Temperature factor. word 8 (Real) Occupancy. Intra-residue distances. Inter-residue or Link distances. Special distances including disulphides. Externally defined distances. The format of the records is as follows: word 1 (Integer) Distance sequence number. word 2 (Integer) Sequence no. of the first atom. word 3 (Integer) Sequence no. of the second atom. word 4 (Real) Ideal distance in Angstroms. word 5 (Integer) Distance type code KDWT (or LDWT) word 6 (Integer) Distance type code KBWT (or LBWT) word 1 (Integer) The plane sequence number. word 2 (Integer) The number of atoms in the plane (NA). word 3 to NA+2 (Integer) The NA sequence numbers of the atoms in the plane. word NA+3 (Integer) The number of bonded pairs (NP). word NA+4 to NA+2*NP+3 (Integer) The NP pairs of bonded pair atom sequence number codes. word 1 (Integer) The chiral centre sequence number. word 2 to 5 (Integer) The sequence numbers of the 4 atoms around the chiral centre. word 6 to 11 (Integer) A list of the 6 distance sequence number codes. word 12 (Real) The ideal chiral volume. word 1 (Integer) The Van-der-Waals contact sequence number. word 2 (Integer) The sequence number of the 1st atom. word 3 (Integer) The sequence number of the 2nd atom. word 4 (Real) The allowed contact distance in Angstroms. word 5 (Integer) The contact distance type code KTYP. word 1 (Integer) The torsion angle sequence number. word 2 (Integer) The residue number. word 3 (Integer) The angle type, 1=Phi, 2=Ps1, 3=omega, (>=4)=Chi. word 4 to 7 (Integer) The sequence numbers of the 4 atoms defining the torsion angle. word 5 to 10 (Integer) The six distance sequence number codes. word 11 (Integer) The weighting code. word 12 (Real) The ideal angle in degrees. (i) Chains in symmetry group records (NCHN+2 words in length). word 1 (Integer) The number of chains in the symmetry group (NCHN). word 2 (Integer) KNOWNR =1, symmetry matrices known, =0, not known. word 3 to NCHN+2 (Integer) The chain numbers of the NCHN chains. (ii) NCHN records of transformation matrices (12 words). These are only present if KNOWNR=1. (iii) Atom equivalence records (NCHN+2 words in length). word 1 (Integer) The atom equivalence sequence number. word 2 (Integer) The weighting code specification. word 3 to NCHN+2 (Integer) The NCHN atom sequence numbers for this equivalence. Format of PROTCOUNTSThis contains 11 integer values as follows: word 1 NOATOM The number of atoms. word 2 NODIST The number of distances. word 3 NOPLAN The number of planes. word 4 NOCHRL The number of chiral centres. word 5 NOVDW The number of possible contacts. word 6 NONOW The number of current contacts. word 7 NOTOR The number of torsion angles. word 8 NSYMM1 The number of group1 equivalences. word 9 NSYMM2 The number of group2 equivalences. word 10 NAXES The number of ellipsoid specifiers. word 11 NOCC The number of variable occupancies.+} APPENDIX ASetting up an Ideal Parameters Dictionary for PROTINThis section describes the way in which a standard groups dictionary is set up. Three standard dictionaries are available for proteins without defined hydrogen atoms (referred to as IDEALS). Certain problems will require user modified dictionaries. The dictionary is set up as a card image file. Data Cards 1 Standard Groups DefinitionFor each type of group defined, there is a header card 1a followed by a set of atom definition cards 1b. Cards 1a and 1b are distinguished by the value of the item IN1. Card 1a. Name card (45X,2I5,10X,A4,A1) IN1 IN2 IN3 IN4 If IN1 >= 1, Indicates a Name card. If IN2<0 then IN1 takes the following values for the following cases. 1 trans-peptide 2 cis-peptide 3 trans-proline 4 cis-proline 5 disulphide bridge If IN2>=1 then IN1=2 flags the last amino acid side chain in the list of standard groups. If IN1 = -1, Indicates the end of data cards 1 (Also set IN3=END) If IN2 > 0, The residue or group identification number (i.e. 1 for ALA, 2 for ARG etc.) If IN2 = 0, Indicates that the group is a link group (e.g. cis or trans peptide, set IN1 to specify which type) < 0, MAIN, C-terminal or N-terminal group. IN3 is the 3 letter amino acid name (a4) IN4 is the 1 letter amino acid code (a1) The set of codes for IN2 defined in the standard dictionaries are given below. It is inadvisable to alter the codes from -1 to -3 or from 1 to 20 and it should be noted that PROTIN makes some specific assumptions about particular residue types e.g. CYS=5, GLY=8, PRO=15. MET is taken as the standard group defining chirality at CA. Number Type Number Type Slc Number Type Slc -1 MAIN 1 ALA A 13 MET M -2 C-Terminal 2 ARG R 14 PHE F -3 N-Amino 3 ASN N 15 PRO P -4 N-Formyl 4 ASP D 16 SER S -5 N-Acetyl 5 CYS C 17 THR T 0 Trans-peptide 7 GLU E 19 TYR Y 0 Cis-peptide 8 GLY G 20 VAL V 0 Trans-proline 9 HIS H 21 HEM X 0 Cis-proline 10 ILE I 22 WAT O Water 0 Disulphide 11 LEU L 23 SUL U Sulphate 12 LYS K Cards 1b Atom cards (3F10.5,10X,3I5,20X,A4) X Y Z KATOM IN1 IN2 LABEL X, Y, Z are the Cartesian coordinates, in Angstroms, in a reference frame with its origin at the Calpha atom. KATOM is the atom type code. Each atom type in the dictionary must be assigned a code in the range 1 to 10 (numbers outside this range must not be used unless the programs are modified). Each particular code may correspond to several atom types (e.g. most metals have code = 7), but the default types are as follows: Code Type 1 C 2 N 3 O 4 S 5 FE 6 H 7 CA 8 I 9 C-SP2 10 OW In the program PROTIN, Van-der-Waals contact distances are assigned on the basis of the atom type code (rather than the name). The distance associated with a particular code may be changed with the keyword VDWRADII, but remember that this will change the distances for all atom types with this code. IN1 is a flag set to 0 for an atom card. A non-zero value indicates another card of type 1a and terminates the atom cards for the current group. IN2 is the order number of an atom within a given residue, starting
with 1 for N, 2 for Calpha etc. Amino acid side chains start with IN2=5
for the Cbeta atom. For peptide groups, corresponding negative numbers
are used for denoting atoms belonging to the previous residue e.g. with
the first 3 atoms belonging to the previous residue we have: LABEL is the atom name up to 4 characters in length following the naming
as used in the Brookhaven file format. Data Cards 2 Interatomic Distances and CodesFor each group specified on data cards 1, a set of distance cards should be given. For each group there is a header card followed by the distance cards. Cards 2a Header card (A4,6X,3I5) Cards 2b. Distance cards for ND distances with up to 8 distances per
card. (8(2I3,2I2)) IATM(i) Number of the origin atom within the group. JATM(i) Number of the target atom within the group. KDWT(i), KBWT(i) are two codes indicating the type of distance for weighting purposes. THe codes are as follows: KDWT(i) KBWT(i) Description 1 1 Bonded pair between 2 main chain atoms. 1 3 Bonded pair involving 1 or more side chain atoms. 2 2 Angle pair involving only main chain atoms. 2 4 Angle pair involving 1 or more side chain atoms. 3 0 Atoms determining a torsion angle of the form A-D: A------D / B--C e.g. O(i)---Calpha(i+1) 4 4 Used for special inter-group contacts. Cards 2 are terminated by a distance header card with KIND=100. Data Cards 3 Planar Groups InformationCards 3a. Planar groups (A4,I2,2I3,17I4) IDGRP is the residue name (equivalent to IN3 on data cards 1a). KIND is the group number (equivalent to IN2 on data cards 1a). KIND=100 terminates the planar groups cards. NCNO is the number of non-hydrogen atoms in the planar group. NA is the number of atoms in the plane for this group (max. of 17, or for link groups, 6). IAT(1)...IAT(NA) are the numbers of the atoms within the group (equivalent to IN2 or cards 2b.) for the NA atoms of the plane. Cards 3b. Bonded pairs for link groups (16I5) These cards are only given for Link groups. IA(i), JA(i) are the atom numbers for the bonded pairs. The number of pairs given is NA*(NA-1)/2. Remember that only the first NAPEP atoms as defined in the data control cards for PROTIN will be considered to form the plane. Specification for multiplanar groups has been simplified. Multiplanar groups are now indicated by a non-zero flag in IAT(17). Otherwise the input is identical to that given above. Note that this imposes a limit of 16 atoms in the plane. Original specification of multiplanar groups is given below. This is now redundant, but is included because many users will have dictionaries with the old type of multiplanar specification. It has been changed because of the coupling between the dictionary and the PARAMETER MXG in the source code. This meant that if MXG was changed in the program, the dictionary also had to be edited !! For compatibility with existing dictionaries, the new version of PROTIN will STILL read and deal with multiplanar specifications using the old type of specification. It is NOT necessary to edit the dictionary. To do this, the assumption made is that if KIND is greater than the largest group number given in the list of coordinates (Card 1a), then it is an old-style dictionary, and the true group number is given in IAT(17). Old Style Dictionary Multiplanar Specification: KIND should be a unique identifier for each plane, and the first value used should be MXG+1, where MXG is the maximum number of groups (set in a PARAMETER statement in the program). Subsequent planes should then have identifiers MXG+2, MXG+3 etc. The other parameters are as above, except that the true group number for this residue type (i.e. IN2 on data cards 1a) must be given in the last I4 field on the card (i.e. IAT(17)). This restricts the possible number of atoms in a plane to 16. Cards 3 are terminated by a planar group card with KIND=100. Data Cards 4 Chiral Centres Specification (A4,2I3,4I5)IDGRP KIND IHAND IA(1) IA(2) IA(3) IA(4) IDGRP is the residue name (equivalent to IN3 on data cards 1a). KIND is the residue type number (equivalent to IN2 on data cards 1a). KIND=100 terminates the chiral centre cards. IHAND = 1 for intrinsically chiral groups, = 0 for those whose chirality is related to nomenclature (e.g. Leu, Val) IA(1)..IA(4) are the numbers of the atoms within the group (equivalent to IN2 on cards 1b.) for the atom at the asymmetric centre and the three other atom that determine the chirality of the group. MET is chosen in the standard dictionaries to specify the Calpha centre for all handed amino acids. Cards 4 are terminated by a chiral centres card with KIND=100. Data Cards 5 Non-Bonded Contact Codes (A4,6X,3I5)Cards 5a. Header cards IDGRP is the residue name (equivalent to IN3 on data cards 1a). KIND is the residue type number (equivalent to IN2 on data cards 1a). KIND=100 terminates the non-bonded contact cards. ND is the number of non-bonded contacts specified for this group. MD is the number of non-bonded contacts specified for the prolyl link group if KIND=0. Cards 5b. Distance cards (10(2I3,I2)) As many of these cards as required are given following the header card
to hold details of NA possible contact distances (up to 10 per card). IATM(i) is the order within the residue of the origin atom (equivalent to IN2 on a card 1b.). JATM(i) is the order within the residue of the target atom. KTYP(i) is the distance type code: =1, The relative position of the given atoms is determined by only one torsion angle. =2, The relative position of the given atoms is determined by two or more torsion angles. Cards 5 are terminated by a header card with KIND=100. Data Cards 6 Torsion Angle Specification CardsCards 6a. Header cards (A4,2I3,14I5) IDGRP is the residue name (equivalent to IN3 on data cards 1a). KIND is the residue type number (equivalent to IN2 on data cards 1a). KIND=100 terminates the torsion angle cards. NCHI is the number of side chain (Chi) torsion angles for this residue. IA(1), IA(2)... are the atom numbers within the group specifying the torsion angles. e.g. for PHE IA(1), IA(2) ... = 3 1 2 3 1 2 5 6 7 3 C(i-1) 1 N(i) 2 CA(i) 3 C(i) 1 N(i+1) 2 CA(i+1) 5 CB(i) 6 CG(i) 7 CD1(i) C(i-1)-N(i)-CA(i)-C(i) specifies Phi N(i)-CA(i)-C(i)-N(i+1) specifies Psi CA(i)-C(i)-N(i+1)-CA(i+1) specifies omega N(i)-CA(i)-CB(i)-CG(i) specifies Chi1 CA(i)-CB(i)-CG(i)-CD1(i) specifies Chi2 Cards 6b. Chi weighting codes (10X,6I5) This card is not read if NCHI=0 IWT(1) IWT(2) ... IWT(NCHI) IWT(i) are the weighting codes for the NCHI side chain (Chi) angles as follows: 0 no specifications 2 planar (e.g. Chi5 of ARG) 3 staggered (e.g. aliphatics) 4 orthonormal (e.g. Chi2 of aromatics) Cards 6c. Neighbour identifications of terminal group and main chain atoms (10X,6I5) These are only read if KIND < 0. MNABOR(1) ... MNABOR(6) MANBOR(1) ... MNABOR(6) are codes as follows: -1 atom is from residue i-1 0 atom is from residue i 1 atom is from residue i+1 5 atom is from the terminal group (e.g. OT of the carboxyl terminus) Cards 6d. Distance codes (6I4,2(4x,6I4)) DST1(1) ... DST1(6) DST2(1) ... DST2(6) DST3(1) ... DST3(6) DST1(1) ... DST1(6) are the distance codes for the Phi angle. For an atom string 1-2-3-4 the six distances referred to are 1-2, 1-3, 1-4, 2-3, 2-4 and 3-4 respectively. The code must correspond to a distance number identified from cards 2. DST2(1) ... DST2(6) are the distance codes for the Psi angle. DST3(1) ... DST3(6) are the distance codes for the omega angle. Cards 6 are terminated by a header card with KIND=100. Data Cards 7 Secondary Structure Conformations (A20,I4,2F8.1)LABEL KODE PHI PSI Data Cards 8 Thermal Ellipsoid Specification CardsCards 8a. Header cards (A4,I3,I3) IDGRP KIND NAT |