The molecular structure is specified in NMRFx using a sequence file. Here are examples. Note: when doing structure calculations more complex structures may be specified in the projects .yaml file. See the project file documentation.
met ala asn glu lys
Sequence starts at "5"
met 5 ala asn glu lys
Sequence with breaks in numbering
met 5 ala asn 8 glu lys thr his 15 arg thr
a u c g
da dt dc dg
Sequence with entity names
-molecule fred -polymer chainA -coordset mono1 met 5 ala asn glu lys
-molecule mymol -polymer poly1 -coordset A met ala asn glu -polymer poly2 -coordset A val asp arg
Polymer with ATP ligand
-molecule atpBinding -polymer chainA -coordset A -sdfile atp.sdf met ala asn glu lys
When generating a molecular structure in NMRFx by reading in a sequence file it is necessary to translate the residue names into the set of atoms and bonds that define the molecular topology. To do this, NMRFx, looks at each residue name in the sequence file, and checks to see if a file corresponding to that name with a ".prf" extension can be located.
NMRFx looks in two places for these files. First, if a local residue library directory has been set, it looks there. Local residue libraries are specified in the NMRFx Analyst GUI in the Preferences dialog (Structure section). In NMRFx Structure the location is specified in the project .yaml file as a "reslib" entry. For example:
molecule : reslib : reslib entities : - file : input/A.seq ptype : protein
If a local residue library is not specified, or the file is not found there, then the built-in residue library is searched.
Two special aspects of residue names can be used in the sequence file. First, residue names that are followed with ":d" are automatically changed to D-amino acids. For example, in the following:
gly ser ala:d trp
The alanine residue will be D-alanine
Second, entries that have an underscore character will internally
have the name set with the portion before the underscore character,
but will be set from the .prf file with the full name.
For example, "glu_prot", will be read from the file glu_prot.prf (which specifies the protonated form of glutamic acid), but will have the name "glu".
If a residue file is found, that file is scanned to extract the topology for that residue. NMRFx works with residues using an internal coordinate system, where the molecular topology is determined by a tree-like structure starting from the first atom in the structure. To properly define the structure then, it is necessary to provide values for the valance angles, dihedral angles, bond distances and connectivity. Most of this information is defined in the ATOM and FAMILY records. The .prf files, which are derived from information originally published by Robson & Osguthorpe (J. Mol. Biol. 132:19-51) are described below. Our original use of .prf files was in our structure calculation program, PEGASUS (Johnson & Sugg, Biochem., 1992, 31,8151-8159).
LNAME Serine SNAME ser RCHAR S ATOM N N 1.32 114.00 -120.0 -0.41570 ATOM H H 1.00 123.00 0.0 0.27190 middle ATOM H H 1.00 123.00 0.0 0.27190 end ATOM H1 H1 1.08 109.00 60.0 0.27190 start ATOM H2 H2 1.08 109.00 60.0 0.27190 start ATOM H3 H3 1.08 109.00 60.0 0.27190 start ATOM CA CX 1.47 123.00 180.0 -0.02490 ATOM HA H1 1.08 109.47 -120.0 0.08430 ATOM CB 2C 1.53 109.47 -121.5 0.21170 ATOM HB2 H1 1.08 109.47 -120.0 0.03520 ATOM HB3 H1 1.08 109.47 -120.0 0.03520 ATOM OG OH 1.42 109.47 0.0 -0.65460 ATOM HG HO 1.00 110.00 0.0 0.42750 ATOM C C 1.53 109.47 120.0 0.59730 ATOM O O 1.24 121.00 180.0 -0.56790 start ATOM O O 1.24 121.00 180.0 -0.56790 middle ATOM O O 1.24 180.00 120.0 -0.56790 end ATOM OXT OH 1.24 180.00 120.0 -0.56790 end FAMILY - N H CA middle FAMILY - N H CA end FAMILY - N H1 H2 H3 CA start FAMILY N CA C CB HA FAMILY CA CB OG HB3 HB2 FAMILY CB OG HG FAMILY CA C + =O FAMILY CA C OXT =O end ANGLE CA PHI 1 ANGLE CB CHI1 2 ANGLE OG CHI2 6 ANGLE C PSI 1 ATREE CA CB C ATREE CB OG ATREE OG ATREE C + PSEUDO QB HB2 HB3 CRAD CA 4.0
Full name of residue. Unused at present.
Short name of residue. Unused at present.
Single letter name of residue. Unused at present.
Properties of the atoms in the residue. There should be one line for each atom. The line is composed of at least 8 fields separated by white space (spaces or tabs).
Atom type, these are currently specified as atom types as used in the AMBER force field The type is used for getting atomic number and non-bond contact parameters when doing structure calculations.
Bond length, from this atom to the previous atom in the tree structure of the residue (as defined in the FAMILY lines).
Valance Angle, between this atom, its parent, and grandparent, as defined in the tree structure of the molecule.
Torsion Angle, between this atom, its parent, grandparent, and great grandparent (as defined in the tree structure of the molecule). The angle for the first "child" atom bonded to a given parent, as defined in the FAMILY lines, is an absolute torsion angle. The angles for subsequent atoms are relative to the previously defined dihedral.
Charges. These are currently taken from AMBER parameter files.
Defines the tree structure of the residue. Each line can be considered of the form
"FAMILY parentAtom thisAtom childAtom1 childAtom2...".
For example, a line like, "FAMILY N CA C CB HA", implies that the atom CA, is bonded to the N atom (the parent of CA is N), and it has three children, C, CB and HA. If the parent is specified as "-", then it is a connector atom in the previous residue. If a child is specified as "+", then it is the connector atom in the subsequent residue. Child atom names preceded with a "-", like-CD2 imply that this child atom will actually be defined in the tree structure in some other FAMILY entry in the structure, but that there should be a bond drawn between this child atom and the main atom of this FAMILY line. This is used to define bonds that close rings.
Each rotatable bond in the residue has an ANGLE entry.
The rotatable bond is that between the specified atom and its parent.
The name of the angle (PHI, PSI, CHI etc.)
One entry for each rotatable bond. The order in which they are specified gives the tree of rotation groups.
Mapping of actual atoms in structure to pseudo atom names, "PSEUDO pseudoAtomName atom1 atom2 ...". The pseudo atom position would be at the geometric mean of all the actual atoms that are listed. At present this is only used when reading in constraint files using pseudo atom names (like CYANA .upl files) and is used to translate the pseudo atom name into the set of actual atoms stored in the molecular structure.
This specifies the name of an atom near the center of the residue and the approximate radius of a sphere around the central atom that would encompass all atoms of the residue. Used for accelerating calculation of non-bond contact list. Unused at present in NMRFx.
Residue files can also be specified with .pdb or .mol files if they are located in local residue libraries. Residue files stored as .pdb or .mol files do not have information, as the .prf files do, on how to make connections to the preceding and succeeding residues. The sequence file should contain -entry and -exit lines before the residue name. For example,
leu -entry C1 -exit C8 phq
These lines specify the names of atoms in the residue that are used to make the connection to the adjacent residues. Note: this is an experimental feature, and may be subject to changes and require bug fixes. Please contact us for more information.