PINE-SPARKY.2 – is a project to provide graphical user interfaces to automate and visualize protein secondary structure assignments, hydrophobic core detection, 3-D structure calculation, resonance assignments and referencing error detection by a few clicks in NMRFAM-SPARKY which is one of the two main software packages (the other is PONDEROSA-C/S) in the Integrative NMR platform. To obtain backbone chemical shifts and their derived information, only three NMR experiments (e.g. N HSQC, CBCA(CO)NH and HNCACB) are required in ideal cases to run PINE-SPARKY.2, however, PINE-SPARKY.2 also supports 19 different NMR experiments including side chain experiments such as C(CO)NH, H(CCO)NH and HCCH-TOCSY which are required for NOE-based 3-D structure calculation. As PINE-SPARKY.2 is developed as a plugin to the NMRFAM-SPARKY software package, a user does not need to install this separate from the NMRFAM-SPARKY. There are four different ways to use the feature. a) Individual NMRFAM-SPARKY installation, b) using NMRFAM Virtual Machine, c) getting support from SBGrid, and d) using a version installed in the NMRBox.org. PINE-SPARKY.2 is reproducible as it generates KEY in the program, therefore, the user can incorporate the results in the past at any time.
Biological important information such as protein secondary structures, 3-D structures, dynamics and protein-ligand interactions can be obtained from protein NMR spectroscopy. To do so, one of the most fundamental but time-consuming steps is chemical shift assignments. Our popular PINE web server that provides PINE automated assignment algorithm to the biomolecular community conducted over 2,000 jobs in 2016 alone.
In this PINE-SPARKY 2 project, we have developed PINE web server, NMRFAM-SPARKY, and their integration to make them work flawless and seamless to automate chemical shift assignment, protein secondary structure and 3-D structure calculation and probability visualization. A user can obtain the information automatically by using only three NMR spectra in the ideal case (e.g. 1H,15N HSQC, CBCA(CO)NH, HNCACB) or even single HNCACB experiment in very lucky case.
However, user demands vary. For instance, one who wants to calculate structures based on NOE data, side chain chemical shifts need to be assigned. In that case, additional experiments such as C HSQC, C(CO)NH, H(CCO)NH and HCCH-TOCSY may be included in the PINE-SPARKY 2.
Even though basic idea of PINE-SPARKY 2 is to make things as automated as possible, however, verification of the results is extremely important. Therefore, PINE-SPARKY 2 visualizes Bayesian probabilities with NDP-PLOT program and makes color-coded probabilistic PINE labels on the spectra to let the user understand the quality of automation easy. Additionally, one of the most powerful features of PINE-SPARKY 2 is that it is fully integrated into the NMRFAM-SPARKY which is one of the most popular NMR analysis tools in the biomolecular NMR field. Working together with all the built-in tools, original PINE-SPARKY extensions and new PINE-SPARKY 2 will provide the easiest ever NMR research experience.
To use PINE-SPARKY 2, you should have internet connection and NMRFAM-SPARKY availability. The following four NMRFAM-SPARKY options are available.
1. Using individually installed NMRFAM-SPARKY version [link to download page]
2. Using NMRFAM Virtual Machine [link to download page]
Simply, PINE-SPARKY 2 requires two kinds of information; a) sequences and b) peaks picked on the supported NMR spectra.
1. Sequence file
Supported sequence file formats are,
a) 3-letter-code with indices (.seq)
b) 1-letter-code (.fasta)
c) 3-letter-code without indices (.txt)
We suggest keeping file extensions depending on your sequence content to let PINE-SPARKY 2 understand your sequence correctly. If the file is set in the Sequence Entry window (two-letter-code “sq”) before launching PINE-SPARKY 2 (two-letter-code “ep”), it will be automatically set in the PINE-SPARKY 2 window. In the Sequence Entry window, a user may type or copy-and-paste 1-letter-code formatted sequence instead of choosing sequence file. That will automatically generate 3-letter-code sequence file (.seq) in the Projects directory if SPARKYHOME is already set (two-letter-code “RD”). The user may set the file in the PINE-SPARKY.2, too. Please notice that the first sequence number should be 1 to run PINE jobs. You can change using two-letter-code “rn” after getting resonance assignments.
< Sequence Entry window (two-letter-code “sq”) >
Regardless of file format and extension, the file is ASCII text. Please do not use RTF, DOCX, etc.
2. Peak picked NMR experiments
Currently, 19 different NMR experiments are supported like the following:
2D: 1H,15N HSQC, 1H,13C HSQC
3D: HNCO, HN(CA)CO, HNCACB, HN(CO)CACB, CBCA(CO)NH, HNCA, HN(CO)CA, HN(CA)CB, CB(CA)(CO)NH, C(CO)NH, HBHA(CO)NH, H(CCO)NH, HA(CO)NH, HCCH-TOCSY, CCH-TOCSY, HCCH-COSY, CCH-COSY
We recommend truncating unused area of the experiment (e.g. -1ppm – 6ppm in the 1H dimension of N HSQC) to let PINE-SPARKY 2 recognizes dimension order better.
The user may pick peaks by APES automated peak picking (two-letter-code “ae”), restricted peak picking (two-letter-code “kr”) or manual peak picking.
LAUNCHING PINE-SPARKY 2
PINE-SPARKY 2 window can be launched by two-letter-code “ep” or NMRFAM -> Automated assignment -> Run PINE automated assignment (PINE-SPARKY 2).
< Launched PINE-SPARKY 2 window >
Your name and email are required because your results will be sent to your email.
SELECTING NMR EXPERIMENTS TO BE ANALYZED
NMR experiments to be used should be added in the PINE-SPARKY 2 window. Simply, the user may select spectrum and its type, and click Add button like below.
< Adding NMR spectra to run PINE-SPARKY 2 >
THREE CHECK BOX OPTIONS
< Check box options to improve results and get 3-D structures >
There are three check box options above spectrum list.
1. By checking “Use pre-assignment“, a user can incorporate already assigned resonance assignments into automated chemical shift assignments to improve automation quality by reducing the number of choices. Under ideal circumstances such as perfectly aligned and promising resonance signals in the spectra, PINE easily achieves over 99% completeness and correctness. However, there are many non-ideal cases such as spin degeneration, peak overlaps and peak drifting in practice. Therefore, iterative employment of automation with a little bit of intermediate validation is required in some cases. To do so, this feature in the PINE-SPARKY 2 is very useful because the user can just check this box to re-run automated analysis of spectra with confirmed chemical shifts.
2. By checking “Use selective labeling“, a user can restrain amino acid types for certain 1H, 15N chemical shifts. For instance, ILV methyl labeling is one of popular selective labeling methods for larger systems. When the user identifies peaks need to be assigned to certain amino acids in 1H,15N HSQC, user label (two-letter-code “pl”) specifying amino acid types can be attached to the peaks.
< Peaks restrained to certain amino acid types >
3. By checking “Run CS-Rosetta with PINE outputs“, a user can get the 3-D structure. Structure calculation will be assigned to BMRB hosted CS-Rosetta server with 10,000 structure sampling option. Calculated structures will be sent to user email separately from PINE results. Calculation time will vary depending on protein size and computing resource availability. For a small protein under 10kDa, one day or a little more time duration is expected.
When everything is prepared, user may click “Submit” button to initiate the spectrum analysis.
< After clicking Submit button, KEY is generated >
Duration varies by protein size and experiments used. If only minimal sets of data (three experiments), it takes less than a minute. You can simply click Check button repeatedly until your job is finished. You can manually type KEY if you want to import results previously submitted. A user can also select to save all the results in the PINE sub-directory under working directory. On the other hand, if you have downloaded and unzipped results in your computer, you can click Browse button select the directory.
When automated analysis by PINE server is finished, you will be asked if you want to plot results. The followings are visualized results with NDP-PLOT.
Secondary structure based on backbone chemical shifts
< Secondary structure predicted by PECAN (green: helix, blue: strand) >
Assignment outlier based on secondary chemical shifts
< Chemical outlier analysis by LACS (red: outlier) >
Hydrophobic core prediction
< Hydrophobic core residues are predicted from PACSY DB by statistical counting (red: buried, purple: medium, blue: exposed, gray: no assignment) >
Protein flexibility (RCI S2- random coil index order parameter) prediction
< Random Coil Index (RCI) S2 order parameter prediction from TALOS-N (Wishart method) >
Probabilities for spin system assignments
< Bar height and color indicate assignment confidence (Green: >= 0.99, Cyan: >= 0.8, Yellow: >= 0.5, Red: < 0.5, Gray: no assignment) >
IMPORT ASSIGNMENTS IN THE NMRFAM-SPARKY
After plotting procedure, PINE-SPARKY 2 asks if a user wants to import PINE assignments and generate PINE labels for further validation and completion. Subsequently, PINE-SPARKY 2 asks if the user wants to accept assignments higher than 0.5. From our systematic PINE evaluation on many entries in PACSY database, using base probability 0.5 to accept assignments looks the most efficient considering correct and incorrect ratio. However, the user can also deny accepting any assignments, and use assign the best by PINE (two-letter-code “ab”) to use custom probability instead. From this step, user can use original SPARKY extensions (two-letter-code “pp”, “pr” and “ab”). Detailed tutorials can be found from here [Link to the PINE-SPARKY extension page]. If PINE labels seem to be too crowded, they can be selected by two-letter-code “se” and deleted by pressing “Del” key on the keyboard.
SPIN SYSTEM CONNECTIVITY VALIDATION
To confirm automatically assigned backbone chemical shifts, you may need to check spin system connectivity based on given assignments. You can use Strip Plot tool (two-letter-code “sp”) in the NMRFAM-SPARKY.
3-D STRUCTURE FROM BMRB CS-ROSETTA SERVER
If Run CS-Rosetta using PINE outputs was selected in the PINE-SPARKY 2, BMRB CS-Rosetta will be started with chemical shifts assigned by PINE. User will receive status update emails from BMRB and NMRFAM. When the job is done, the results will be automatically incorporated into PINE URL. Therefore, user can click Check button again to check updated package including CS-Rosetta results.
PyMOL visualization scripts are provided: @pine_pymol (PINE probability), @core_pymol (hydrophobic core) and @rci_pymol (flexibility)
< A. PINE probabilities on the structure by @pine_pymol. B. Hydrophobicities on the structure by @core_pymol. C. RCI-S2 on the structure by @rci_pymol >
Lee, W & Markley, JL. Submitted
PINE-SPARKY.2: TECHNOLOGICAL DESCRIPTION
PINE-SPARKY.2 consists of the GUI (graphical user interface) plugin installed in the NMRFAM-SPARKY, and a new server system at the NMRFAM that runs interactively with the GUI. The most important technological difference to get out of the primitive PINE framework that user uploads peak list files in the PINE webpage, waits for an email from PINE, and imports assignments into NMR analysis program, is to generate the Key (job identifier) at the user-side not the server-side. By that way, it becomes possible to make NMRFAM-SPARKY and PINE server interactively communicate. Because, PINE-SPARKY.2 immediately knows the URL where the results will be located when a job is being submitted, and it can simply fetch the results from the URL to apply in the NMRFAM-SPARKY. Also, we programmed to send the generated key to a user by email to reuse whenever later necessary.
Basically, PINE-SPARKY.2 is automated conversions, executions and visualizations of PINE, PECAN, LACS, PACSY, TALOS-N, and CS-ROSETTA. The PINE-SPARKY.2 GUI processes user data in the NMRFAM-SPARKY, and generates and sends PINE available files such as peak lists, chemical shift pre-assignment and specific labeling information to the server with Key. Computational intensive calculations and complex set up of different programs are provided by the server, and the results are fetched by the GUI. When a user submitted job is received, the server runs PINE to obtain probabilistic chemical shifts, and runs PECAN and LACS to obtain secondary structures, referencing errors, and chemical shift outliers in parallel.
If the server recognizes probabilistic chemical shifts for a given Key is generated, it launches PACSY search, TALOS-N and CS-ROSETTA with chemical shifts with the highest probabilities. We made a CS-ROSETTA queue for managing CS-ROSETTA jobs in the PINE server because it requires hours and days to get it done and it is often difficult to know what CS-ROSETTA results are from what PINE results. Registered CS-ROSETTA jobs in the queue is checked every hour by CRON and JSON, and the server automatically downloads coordinates from BMRB to PINE-SPARKY.2 URL when a job is finished and sends an email to the user that the job is finished.
To let chemical shift assignment probabilities, secondary structures, detected chemical shift outliers, hydrophobicities and random coil index (S2) easily understood by users, conversion scripts written in PYTHON are executed at the server. They make inputs for NDPPLOT (INI files) and PyMOL. The PINE-SPARKY.2 GUI visualizes NDPPLOT files after downloading from the server by launching the NDPPLOT with them one-by-one. PyMOL inputs are executable on the PyMOL command-line by typing @command.
PINE2SPARKY converter and the Assign-the-best-by-PINE extension (two-letter-code “ab”) in the original PINE-SPARKY do not need to be run separately. To incorporate these features into the NMRFAM-SPARKY, we developed the core C++/PYTHON interface that enables to create labels in the PYTHON extensions. A label on the spectrum can be generated like the following.
sparky.Label(spectrum, ‘label‘, ‘color‘, (position))
The GUI generates PINE probabilistic labels by using this function from PINE probabilistic outputs, and the Assign-the-best-by-PINE uses probability 0.5 as a base cutoff which we recommend regarding correct and incorrect ratio. After the chemical shift import, connectivity validation by strip plot (two-letter-code “sp”) is required as described above in the MANUAL section. For easier connectivity validation for selected assignment segment, we improved strip plot program to generate strips in the order of residue number if selected peaks are assigned. Also, buttons at the top panel in the strip plot are newly made for easy access to the strip plot features. To validate different types of connectivity (e.g. CA/CB and CO or H), we developed additional strip plot that can be operated simultaneously with the original strip plot (two-letter-code “SP”).