VMD and its selection language allows you to manipulate and write any data in the PDB file. Most PDB files that you will acquire from third parties (RCSB, a journal article, etc) will either not be usable in their supplied form (atom names are not right for your simulation package, solvent and solutes not differentiated, etc). There are other conventions used in chains (such as each monomer of a polymeric protein is given it's own chain number), which you will see as you examine protein PDBs.Ĭonnection records are becoming increasingly common, but you may not count on these to specific bonds as mentioned above.
By convention, chains are given letter identifications This expands the possible number of residues to 260,000. One way to expand the number of residues in a single PDB file is to use chain IDs (columns 21-22) and segments (columns 73-76). Solvent water molecules, could easily number in the many tens of thousands for example.
This means that if your structure has more than 10,0000 residues, other ways to count must be taken advantage of. For example the residue ID column (23-26 inclusive) may run only from 0-9999 (10,000 unique numbers). You should become familiar with the structure of the PDB file, it's restrictions and it's limitations. If you looked at any PDB files from the RCSB, you know there is also a large amount of information, in the form of comments in the PDB header. It is color-coded to help you understand what information is there (of course the actual PDB will just be a plain text file, black and white). For example, coordinates are discussed on page 186.) In addition, here is a graphic (kindly provided by Tamas Gunda ) which spells out the PDB specification. The PDB (and other) specifications can be found at the RCSB (this document is 200+ pages long! Use of the index is critical. There is also the additional complication that PDB files obtained from RCSB often do not contain coordinates for hydrogens (x-rays can’t see these little guys!!), and of course, hydrogens must be included in any simulation of a protein. This is fine for displaying information, but when we move to simulations, where bonded and non-bonded (VdW-type) interactions are much different, bond definitions must be made. How then does VMD draw in bonds? The answer is that standard molecular viewers which read PDB files make some sort of default bond definition: 1.5 angstroms or less means the atoms are bonded (or some other length). In fact VMD only reads lines beginning with “ ATOM ” or “ HETATM ”. It does not require a specification which atoms are connected to which ("CONECT" records). It turns out that standard PDBs only contain coordinates of atoms. We will also learn about associated library files (called topology and parameter files) which allow us to generate the necessary files.
We will also learn how and why PDB files are broken up into different segments. Structure files are necessary in order to run molecular simulations.
#Vmd atomselect how to#
In this section, we will learn how to create a structure file from a simple PDB file.