PDB, ENT (Standard PDB file)

Coordinate reader

MDAnalysis.coordinates.PDB.PDBReader

Coordinate writer

MDAnalysis.coordinates.PDB.PDBWriter

Topology parser

MDAnalysis.topology.PDBParser.PDBParser

Reading in

MDAnalysis parses the following PDB records (see PDB coordinate section for details):

  • CRYST1 for unit cell dimensions A,B,C, alpha,beta,gamma

  • ATOM or HETATM for serial, name, resName, chainID, resSeq, x, y, z, occupancy, tempFactor, segID

  • CONECT records for bonds

  • HEADER (Universe.trajectory.header)

  • TITLE (Universe.trajectory.title)

  • COMPND (Universe.trajectory.compound)

  • REMARK (Universe.trajectory.remarks)

All other lines are ignored. Multi-MODEL PDB files are read as trajectories with a default timestep of 1 ps (pass in the dt argument to change this). Currently, MDAnalysis cannot read multi-model PDB files written by VMD, as VMD uses the keyword “END” to separate models instead of “MODEL”/”ENDMDL” keywords.

Important

Previously, MDAnalysis did not read elements from a file. Now, if valid elements are provided, MDAnalysis will read them in and will not guess them from atom names.

MDAnalysis attempts to read segid attributes from the segID column. If this column does not contain information, segments are instead created from chainIDs. If chainIDs are also not present, then segids are set to the default 'SYSTEM' value.

Writing out

MDAnalysis can write both single-frame PDBs and convert trajectories to multi-model PDBs. If the Universe is missing fields that are required in a PDB file, MDAnalysis provides default values and raises a warning. There are 2 exceptions to this:

  • chainIDs: if a Universe does not have chainIDs, MDAnalysis uses the first character of the segment segid instead.

  • elements: Elements are always guessed from the atom name.

These are the default values:

  • names: ‘X’

  • altLocs: ‘’

  • resnames: ‘UNK’

  • icodes: ‘’

  • segids: ‘’

  • resids: 1

  • occupancies: 1.0

  • tempfactors: 0.0

PDB specification

CRYST1 fields

COLUMNS

DATA TYPE

FIELD

DEFINITION

1 - 6

Record name

“CRYST1”

7 - 15

Real(9.3)

a

a (Angstroms).

16 - 24

Real(9.3)

b

b (Angstroms).

25 - 33

Real(9.3)

c

c (Angstroms).

34 - 40

Real(7.2)

alpha

alpha (degrees).

41 - 47

Real(7.2)

beta

beta (degrees).

48 - 54

Real(7.2)

gamma

gamma (degrees).

ATOM/HETATM fields

COLUMNS

DATA TYPE

FIELD

DEFINITION

1 - 6

Record name

“ATOM “

7 - 11

Integer

serial

Atom serial number.

13 - 16

Atom

name

Atom name.

17

Character

altLoc

Alternate location indicator.

18 - 21

Residue name

resName

Residue name.

22

Character

chainID

Chain identifier.

23 - 26

Integer

resSeq

Residue sequence number.

27

AChar

iCode

Code for insertion of residues.

31 - 38

Real(8.3)

x

Orthogonal coordinates for X in Angstroms.

39 - 46

Real(8.3)

y

Orthogonal coordinates for Y in Angstroms.

47 - 54

Real(8.3)

z

Orthogonal coordinates for Z in Angstroms.

55 - 60

Real(6.2)

occupancy

Occupancy.

61 - 66

Real(6.2)

tempFactor

Temperature factor.

67 - 72

(not used in the official PDB format)

73 - 76

String

segID

(unofficial PDB format)

77 - 78

LString(2)

element

Element symbol, right-justified.

79 - 80

LString(2)

charge

Charge on the atom.

Changed in version 2.10.0: The columns 67-72 are not read by MDAnalysis any more.

Note

The columns 73-76 are not part of the official PDB format but are used by some programs to store/operate the segment ID. For instance, Chimera assigns it as the attribute pdbSegment.