This page was generated from examples/centroids.ipynb.
Centroids¶
Finding centroids¶
In this example, we’re going to find a “centroid” (representitive structure) for a group of conformations. This group might potentially come from clustering, using method like Ward hierarchical clustering.
Note that there are many possible ways to define the centroids. This is just one.
[1]:
%matplotlib inline
import mdtraj as md
import numpy as np
Load up a trajectory to use for the example.
[2]:
traj = md.load("ala2.h5")
print(traj)
<mdtraj.Trajectory with 100 frames, 22 atoms, 3 residues, without unitcells>
/Users/singhs15/work/src/dev-projs/mdtraj/mdtraj/core/trajectory.py:441: UserWarning: top= kwargs ignored since this file parser does not support it
warnings.warn("top= kwargs ignored since this file parser does not support it")
Lets compute all pairwise rmsds between conformations.
[3]:
atom_indices = [a.index for a in traj.topology.atoms if a.element.symbol != "H"]
distances = np.empty((traj.n_frames, traj.n_frames))
for i in range(traj.n_frames):
distances[i] = md.rmsd(traj, traj, i, atom_indices=atom_indices)
The algorithim we’re going to use is relatively simple: - Compute all of the pairwise RMSDs between the conformations. This is O(N^2), so it’s not going to scale extremely well to large datasets. - Transform these distances into similarity scores. Our similarities will calculated as
where \(s_{ij}\) is the pairwise similarity, \(d_{ij}\) is the pairwise distance, and \(d_\text{scale}\) is the standard deviation of the values of \(d\), to make the computation scale invariant. - Then, we define the centroid as
Using \(\beta=1\), this is implemented with the following code:
[4]:
beta = 1
index = np.exp(-beta * distances / distances.std()).sum(axis=1).argmax()
print(index)
83
[5]:
centroid = traj[index]
print(centroid)
<mdtraj.Trajectory with 1 frames, 22 atoms, 3 residues, without unitcells>