NMRFx Structure can be used to generate and analyze macromolecular structures. It can be run in two different ways. First, if the command is invoked with one of several subcommands (gen, batch, summary, score, predict, train) a predefined mode will be executed. Alternatively, one can simply give as an argument a Python (actually Jython) script which will be executed. The script has access to standard Python functions (including the standard Python library) and to specific commands provided by the NMRFx Structure program's Java code. The predifined subcommands are listed below along with documentation for usage and an example for demonstration.
nmrfxs gen [ -s seed ] [ -d directory ] [ -r report ] projectFile [ script.py ]
Generate a single structure using data specified in a project file and initializing the random number generator with a specified seed. This is useful for testing out the project file before generating a whole family of structures with the batch command. An output directory will be created if not present. After successful execution, the generated PDB and violation files will be written to that directory. The output files will have the seed number appended to them (i.e. temp0.pdb, temp0.txt, etc.). By default, the output directory gets placed in the current directory, however, the relative path to a different directory can be specified using the directory option. When debugging a structure generated, it may be useful to view the energy violations for all the constraints specified in the project file. To do this, specify the report option. This will output constraint violations at prepartion stage into a file named energyDump$seed_prep.txt within the output directory. Lastly, define torsional angle molecular dynamic procedures in an executable script to replace the builtin annealing protocol. If a python script is specified as the last argument of the command, the script will be executed. The script can alternatively be placed inside the project file replacing the annealing data block.
nmrfxs gen -s 0 -d ~/gen-structures -r project.yaml
nmrfxs batch [OPTIONS] projectFile
Generate a family of structures using the data specified in a project file. An output directory and final directory will be created if not present. All generated structures will be written to files (temp1.pdb, temp2.pdb, ...) in the output directory. A violation file (temp1.txt, temp2.text ...) will also be written. The best structures, along with their violation files, will be written to the final directory. Multiple files are generated by repeatedly invoking the nmrfxs gen command with the specified project file and an incremented seed numbers. The number of invocations running simultaneously will be specified by the -p option.
nmrfxs batch -n 100 -k 10 -p 5 -a project.yaml
nmrfxs summary [final/final1.txt, final/final2.txt, ...]
Analyze output files and create a summary file showing what constraints are violated. If no output files are specified as arguments, all final*.txt files in the final subdirectory of the current directory will be analyzed.
The output will be placed in a file named analysis.txt and wil have a format like this:
nmrfxs score [OPTIONS] projectFile [pdbFile1.pdb, pdbFile2.pdb, ...]
Analyze the quality of the structure(s) generated by using the score subcommand. Note: the summary command listed above analyzes the output files from a previous run of nmrfx batch. This command will load pdb files and analyze them according to the constraints referenced in the .yaml and on the command line (see options below).
nmrfxs score -y project.yaml pdb/\*.pdb
nmrfxs score -y project.yaml -p 'pdb/\*.pdb'
NMRFx Structure can predict chemical shifts of proteins and RNA (support for arbitrary small, organic molecules is present, but not currently accessible in the command interface). Protein predictions are done using geometric (primarily dihedral angles and ring-current shifts). RNA predictions can be done using geometric or attribute based methods. The geometric methods are used if the input is a .pdb file. Attribute based methods for RNA are done with a sequence and dot-bracket notation specified in a .yaml file.
Protein predictions are done for these atoms: N,H,C,CA,CB,HA. RNA geometric predictions are done for all carbon bound protons and their carbons. RNA attribute predictions are done for non-exchangable protons and their parent carbon and nitrogen atoms.
The output of the geometric based prediction is a list of atom specifiers (residueNumber.atomName) and predicted shifts:
74.C 177.47 74.H 8.48 74.HA 4.05 75.N 112.92 75.CA 46.53
The output of the attribute based RNA prediction is a list of atom specifiers (residueNumber.atomName), predicted shifts and various attriburtes about the prediction.
20.C2' 75.34 N 5 Mean 75.31 +/- 0.24 Range: 74.90 -75.45 Pp_AU_GC_CG_pP_-_-_-_-_-_- 20.H2' 4.43 N 5 Mean 4.43 +/- 0.01 Range: 4.41 -4.44 Pp_AU_GC_CG_pP_-_-_-_-_-_- 20.C1' 92.71 N 6 Mean 92.70 +/- 0.13 Range: 92.55 -92.83 Pp_AU_GC_CG_pP_-_-_-_-_-_- 22.N4 97.53 N 6 Mean 97.57 +/- 0.42 Range: 97.22 -98.38 Pp_CG_CG_-_-_-_-_-_-_-_- 22.H41 8.13 N 12 Mean 8.05 +/- 0.68 Range: 6.84 -8.67 Pp_CG_CG_-_-_-_-_-_-_-_- 22.H42 7.30 N 11 Mean 7.41 +/- 0.64 Range: 6.95 -8.50 Pp_CG_CG_-_-_-_-_-_-_-_- 22.C5 98.24 N 18 Mean 98.24 +/- 0.29 Range: 97.86 -99.20 Pp_CG_CG_-_-_-_-_-_-_-_- 22.H5 5.46 N 34 Mean 5.46 +/- 0.10 Range: 5.23 -5.77 Pp_CG_CG_-_-_-_-_-_-_-_- 22.C6 141.97 N 17 Mean 142.02 +/- 1.10 Range: 141.27 -146.07 Pp_CG_CG_-_-_-_-_-_-_-_-
nmrfxs predict protein.pdb
nmrfxs predict project.yaml