Basic Tutorial

NMRFx Structure

NMRFx Structure can be used to generate and analyze macromolecular structures. It can be run in two different ways. First, if the command is invoked with one of several subcommands (gen, batch, summary, score, predict, train) a predefined mode will be executed. Alternatively, one can simply give as an argument a Python (actually Jython) script which will be executed. The script has access to standard Python functions (including the standard Python library) and to specific commands provided by the NMRFx Structure program's Java code. The predifined subcommands are listed below along with documentation for usage and an example for demonstration.

List of subcommands :

1. gen

nmrfxs gen [ -s seed ] [ -d directory ] [ -r report ] projectFile [ script.py ]

Generate a single structure using data specified in a project file and initializing the random number generator with a specified seed. This is useful for testing out the project file before generating a whole family of structures with the batch command. An output directory will be created if not present. After successful execution, the generated PDB and violation files will be written to that directory. The output files will have the seed number appended to them (i.e. temp0.pdb, temp0.txt, etc.). By default, the output directory gets placed in the current directory, however, the relative path to a different directory can be specified using the directory option. When debugging a structure generated, it may be useful to view the energy violations for all the constraints specified in the project file. To do this, specify the report option. This will output constraint violations at prepartion stage into a file named energyDump$seed_prep.txt within the output directory. Lastly, define torsional angle molecular dynamic procedures in an executable script to replace the builtin annealing protocol. If a python script is specified as the last argument of the command, the script will be executed. The script can alternatively be placed inside the project file replacing the annealing data block.

Example: nmrfxs gen -s 0 -d ~/gen-structures -r project.yaml

2. batch

nmrfxs batch [OPTIONS] projectFile

Generate a family of structures using the data specified in a project file. An output directory and final directory will be created if not present. All generated structures will be written to files (temp1.pdb, temp2.pdb, ...) in the output directory. A violation file (temp1.txt, temp2.text ...) will also be written. The best structures, along with their violation files, will be written to the final directory. Multiple files are generated by repeatedly invoking the nmrfxs gen command with the specified project file and an incremented seed numbers. The number of invocations running simultaneously will be specified by the -p option.

OPTIONS:

-n nStructures
The total number of structures to generate.
-k keepNStructures
The number of structures to keep (and write to the final directory). The structures with lowest total target function will be kept.
-p useNProcesses
Structure generation can be sped up by running multiple structure calculation processes simultaneously. This option specifies the number of simultaneous calculations to perform.
-s startNumber
Starting number for seed. This defaults to 0, but if you want to run batch a second time and generate additional structures you can specify this value. Also used when submitting batch commands to multiple computers for higher level of parallelization.
-a align
If this option is specified then the structures will be automatically aligned and written to a new set of files (sup_final1.pdb, sup_final2.pdb ...). During alignment defined regions of the structure (lower residue rms) will be identified and used for the alignmnent.
-b baseName
Provide a base name for superimposed files. This option implies the structures are being aligned, or option -a is used.
-d directory
Redirect output directories into a specified base directory using a relative path. By default, if this option isn't used, the output will be placed in the current directory.
-c clean
Remove output and final directores (if present) before generating structures. Don't do this if you are using the -s flag to generate additional structures.
-m memory
Allot a specific amount of heap memory to use in MegaBytes (MB). By default, 512MB of heap memory are used.

Example: nmrfxs batch -n 100 -k 10 -p 5 -a project.yaml

3. summary

nmrfxs summary [final/final1.txt, final/final2.txt, ...]

Analyze output files and create a summary file showing what constraints are violated. If no output files are specified as arguments, all final*.txt files in the final subdirectory of the current directory will be analyzed.

The output will be placed in a file named analysis.txt and wil have a format like this:

4. score

nmrfxs score [OPTIONS] projectFile [pdbFile1.pdb, pdbFile2.pdb, ...]

Analyze the quality of the structure(s) generated by using the score subcommand. Note: the summary command listed above analyzes the output files from a previous run of nmrfx batch. This command will load pdb files and analyze them according to the constraints referenced in the .yaml and on the command line (see options below).

OPTIONS:

-y projectFile
The project file that defines constraints to be used in the analysis.
-o outDir
The directory in which to write output files. Defaults to analysis.
-p pdbFilePattern
PDB files can be specified as an argument to this command. This argument will result in execution of a Python level glob command to search for files that match the specified pattern. If wild card characters (*) are specified, then the whole pattern needs to be included in single quotes.
-d directory
Specify a directory to redirect output directories generated.
-c convert
???
-s shifts
Add chemical shift files that are not specified in the YAML project file.
-d distances
Add distance constraints files that are not specified in the YAML project file.
-r range
???

Examples:

  • nmrfxs score -y project.yaml pdb/\*.pdb

  • nmrfxs score -y project.yaml -p 'pdb/\*.pdb'

5. predict

NMRFx Structure can predict chemical shifts of proteins and RNA (support for arbitrary small, organic molecules is present, but not currently accessible in the command interface). Protein predictions are done using geometric (primarily dihedral angles and ring-current shifts). RNA predictions can be done using geometric or attribute based methods. The geometric methods are used if the input is a .pdb file. Attribute based methods for RNA are done with a sequence and dot-bracket notation specified in a .yaml file.

Protein predictions are done for these atoms: N,H,C,CA,CB,HA. RNA geometric predictions are done for all carbon bound protons and their carbons. RNA attribute predictions are done for non-exchangable protons and their parent carbon and nitrogen atoms.

The output of the geometric based prediction is a list of atom specifiers (residueNumber.atomName) and predicted shifts:

74.C 177.47
74.H 8.48
74.HA 4.05
75.N 112.92
75.CA 46.53

The output of the attribute based RNA prediction is a list of atom specifiers (residueNumber.atomName), predicted shifts and various attriburtes about the prediction.

  • Atom
  • Predicted Shift
  • N value: Value is the number of examples with the attributes this atom has.
  • Mean value: Value is the mean of the shifts of the examples.
  • +/- value: Value is the standard deviation of the shifts of the examples.
  • Range: lower upper: The upper and lower limits of the examples.
  • A list of attributes that characterize this atoms residue (within a 5-residue window)

Output example:

20.C2' 75.34 N  5 Mean 75.31 +/- 0.24 Range: 74.90 -75.45 Pp_AU_GC_CG_pP_-_-_-_-_-_-
20.H2' 4.43 N  5 Mean 4.43 +/- 0.01 Range: 4.41 -4.44 Pp_AU_GC_CG_pP_-_-_-_-_-_-
20.C1' 92.71 N  6 Mean 92.70 +/- 0.13 Range: 92.55 -92.83 Pp_AU_GC_CG_pP_-_-_-_-_-_-
22.N4 97.53 N  6 Mean 97.57 +/- 0.42 Range: 97.22 -98.38 Pp_CG_CG_-_-_-_-_-_-_-_-
22.H41 8.13 N 12 Mean 8.05 +/- 0.68 Range: 6.84 -8.67 Pp_CG_CG_-_-_-_-_-_-_-_-
22.H42 7.30 N 11 Mean 7.41 +/- 0.64 Range: 6.95 -8.50 Pp_CG_CG_-_-_-_-_-_-_-_-
22.C5 98.24 N 18 Mean 98.24 +/- 0.29 Range: 97.86 -99.20 Pp_CG_CG_-_-_-_-_-_-_-_-
22.H5 5.46 N 34 Mean 5.46 +/- 0.10 Range: 5.23 -5.77 Pp_CG_CG_-_-_-_-_-_-_-_-
22.C6 141.97 N 17 Mean 142.02 +/- 1.10 Range: 141.27 -146.07 Pp_CG_CG_-_-_-_-_-_-_-_-

Examples:

  • nmrfxs predict protein.pdb

  • nmrfxs predict project.yaml

6. train

Information pending...