##########################################################################################
## pdb2dhl, pdb2epk:  Convert PDB files to dihedral angle/ECEPPAK input file format     ##
## Copyright (C) 2002  Farokh Jamalyaria                                                ##
##                                                                                      ##
## This program is free software; you can redistribute it and/or                        ##
## modify it under the terms of the GNU General Public License                          ##
## as published by the Free Software Foundation; either version 2                       ##
## of the License, or (at your option) any later version.                               ##
##                                                                                      ##
## This program is distributed in the hope that it will be useful,                      ##
## but WITHOUT ANY WARRANTY; without even the implied warranty of                       ##
## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the                        ##
## GNU General Public License for more details.                                         ##
##                                                                                      ##
## You should have received a copy of the GNU General Public License                    ##
## along with this program; if not, write to the Free Software                          ##
## Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA  02111-1307, USA.          ##
##                                                                                      ##
## Author's contact information:                                                        ##
## Farokh Jamalyaria, farokhj@yahoo.com                                                 ##
## 9 Commons Lobby G222                                                                 ##
## College Station, TX 77840                                                            ##
##########################################################################################

Introduction:

This package automates conversion of files in the Protein Databank (PDB) format to
ECEPPAK input files.  The C++ program pdb2dhl takes as input a polypeptide sequence's
PDB file and produces as output a file containing the dihedral angle description of
the polypeptide (in the correct ECEPPAK $GEOM field format) and also a file containing
the amino acid sequence (in the correct ECEPPAK $SEQ field format).  The Perl script
pdb2epk.pl produces an ECEPPAK input file from a polypept. sequence's PDB file.

Pdb2dhl is very useful because it generates dihedral and sequence files in the
ECEPPAK format, but you may notice that pdb2epk.pl is designed for creation of ECEPPAK
energy computation scripts.  This is because I wrote the package to facilitate my
research, which is as yet unfinished.  I have not had time to look into how difficult it
would be to change it for other ECEPPAK applications, nor to write a fully general script.
I created this package for myself as a minimal tool to complete a greater task (research),
so if you think it needs improvement, and you have time to burn, feel free to extend/modify
it!

Also, you may find the C++ functions and data structures in the source useful for future
projects (especially the PDB file stuff).

Using the Package:

Note: As downloaded, pdb2dhl does not recognize all the end groups that ECEPPAK recognizes.
There were simply too many for me to add, so if you need to use end groups that are not
supported by default (see the function names in ppo_endgrp.cpp), then you can easily add
your own as described in "Adding Your Own End Groups", below.

Make sure your PDB file is in the correct format (see the PDB literature at www.rcsb.org).
Pdb2dhl will not run correctly if your PDB file is incorrect.  It may do all sorts of strange
things, including stackdumping.

THE PDB FILE MUST CONTAIN 1 (SPATIALLY-CONNECTED, OR POLYPEPTIDE) AMINO ACID SEQUENCE.
IT MUST NOT CONTAIN ANYTHING ELSE.
THE PDB FILE MUST LIST THE AMINO ACIDS IN THEIR CONNECTED SEQUENCE.
THE POLYPEPTIDE MUST BE PROTONATED (HYDROGENS LISTED IN THE PDB FILE). You can do this with
several bioinformatics packages.  The chi dihedrals will not be correct if necessary hydrogens
are not present in the PDB.

Step 0: Ensure that you have a fairly recent version of the g++ compiler and a Perl interpreter
installed on your system.  Type:

make -f make.pdb2dhl

to compile pdb2dhl.  For a debuggable executable, you can use make.pdb2dhl_d instead.

Step 1: Create a new PDB file, containing only lines beginning with 'ATOM', from your original
file.  You can use stp-pdb.pl to do this (open stp-pdb.pl with a text editor and follow
the included instructions).  Open your PDB file with a text editor and also with a molecule
visualization tool such as rasmol and note whether the N-->C direction of the amino acid
sequence is in order of increasing or decreasing resnum.  What's a resnum, you say?  In

ATOM      1  N   HIS     1      49.668  24.248  10.436  1.00 25.00

the resnum is the "1" immediately following the "HIS".  It is in column 25 of this file.

Step 2: Now that you have a PDB file containing only ATOM lines, you can perform the
conversion.  To create 2 output files, one containing dihedral angles ($GEOM field of an
ECEPPAK input file), the other containing the amino acid sequence ($SEQ field of an ECEPPAK
input file), type:

./pdb2dhl

OR

pdb2dhl

and press enter.  Then:

Type the full name (less than 20 characters) of the PDB file and press enter.

Type the read direction that you determined in step 1 ("f" for forward and "b" for backward).
Press enter.

Type the prefix (less than 20 characters) of the .inp and .seq output files (e.g., "bob"
would be used to create "bob.inp" and "bob.seq"). Press enter.

Type the N-terminus group (e.g., "H3+"), and press enter. Typing "0" will force the program
to use a default group.

Type the C-terminus group (e.g., "COOH"), and press enter. Typing "0" will force the program
to use a default group.

The program will perform the conversion, and you will return to your Unix/Linux/Cygwin shell
prompt.  The output will be in the <prefix>.inp and <prefix>.seq files.  If the program
stackdumps, increase the value of the following constant:

NUM_PDB_LINES in Pdb_dhl1.h.

Alternatively, you can run pdb2epk.pl to create an ECEPPAK input file in addition to creating
<prefix>.inp and <prefix>.seq.  The default ECEPPAK input file is for energy computation.
Pdb2epk.pl contains instructions for running it.  Please open it with a text editor and read
the initial comments if you'd like to create ECEPPAK input files.

For more help, please see sample PDB, input, and output files in the /Examples directory.
You can also test the program with the included 1GCN.pdb (already stripped of all but 'ATOM'
lines).

Adding Your Own End Groups:

pdb2dhl, as downloaded, is capable of dealing with only a handful of end groups, but you
can easily extend its functionality as described in this section.

Let's say you want pdb2dhl to be able to use an end group not currently recognized by it.
Let's call this end group YYY, and let it be an N-terminal end group (adding a C-terminal
end group is symmetrical to the process described here).

We need to perform 4 tasks:

1) Update prt_seq_Nterm(...) in Pdb_dhl.cpp.
2) Add a new function to ppo_endgrp.cpp.
3) Update prt_inp_Nterm(...) in Pdb_dhl.cpp.
4) Update ppo_Nterm(...) in Pdb_dhl.cpp.

1) To update prt_seq_Nterm, all we have to do is add an "if" statement that will recognize
the endgroup "YYY" and write "YYY" to the .seq file:

if(strcmp(Nendgrp,"YYY")==0) f_out2 << "YYY";

2) The functions in ppo_endgrp.cpp all have the same format.  We want to add a similar
function.  The purpose of each of these functions is to compute the correct phi, psi, and
omega dihedral angles for a terminal amino acid.  If you are using this program, you
probably have significant background in chemistry and should be able to figure out which
4 atoms you need to compute each dihedral angle.  I'll leave this up to you.

3) To update prt_inp_Nterm, we need to add an "if" statement that will recognize the new
endgroup and write one or more dihedral angles to the .inp file.  The format is as follows.
Make sure the written angles are decimal-aligned with columns of angles in the ECEPPAK way.

if((strcmp(Nendgrp,"YYY")==0))
  f_out << "   " << "0.000" << ...(more angles, if necessary)... << endl;

4) To update ppo_Nterm, add an "if" statement to it that can call the function you created
in step 2 above.  If your function was called "YYY", the "if" statement would look as follows:

if(strcmp(Nendgrp,"YYY")==0) YYY(memfile,m0,m1,m2,ppo,x,y,z);


####################################################################
#Code in Format.cpp and Format.h adapted from:			   #
#								   #
#Stroustrup, Bjarne. The C++ Programming Language: Special Edition.#
#    Addison-Wesley, 1997.					   #
####################################################################