Welcome to the Vagabond tutorial!

Although some structures may benefit from specialist use of Vagabond, most structures will benefit from the automatic refinement protocol, which will be the focus of this basis tutorial.


For this tutorial you will need at least Vagabond v0.1 (along with the GUI) installed on your machine. Vagabond runs on a single thread, and therefore a personal machine will often be sufficient. You will also need internet access in order to download files from PDB-redo.

Download the data

Let us get some re-processed files from PDB-redo. We will work with data from the original deposition of PDB code 3sh4. PDB-redo automatically reprocesses all X-ray data sets possible from the PDB using the latest consistent algorithms and corrects a lot of common errors, and so is a good starting point for Vagabond.

Create a new top-level directory for your Vagabond tutorial.

mkdir tute_3sh4
cd tute_3sh4

Get the MTZ file (reflection list) and PDB file (atomic coordinates) using the command line, like so. 3sh4 is a small protein with data extending to 1.5 Å resolution.

wget 'pdb-redo.eu/db/3sh4/3sh4_final.mtz'
wget 'pdb-redo.eu/db/3sh4/3sh4_final.pdb'

Launch Vagabond

Launch the GUI version of vagabond on the command line, like so:


You will get an introductory screen allowing you to form a new experiment.

In this case, we would like to keep the default parameters for refinement. However, with decreasing resolution (roughly 2.8 Å and below) we may decide that there is not enough information to refine subtle parameters such as bond angles, in which case we may uncheck the use of variable beta-carbon angles. Note that this will only apply to importing a PDB, and not a previous Vagabond format.

Enter the filenames of your MTZ and PDB files, and then press OK.

Vagabond will now launch and immediately begin to process your data set, and you can watch its progress in real time. The console output contains a lot of information and you may prefer to watch the output in the same window. Press l on the keyboard to activate the log file. There are a number of keyboard shortcuts.

At first, Vagabond will be adjusting the initial torsion angles to bring the atoms onto the positions found in the PDB file, and once five cycles of that have been completed, you will get your first reported R factor.

You may also notice that you can rotate the molecule in the window. You can also pan the screen (ALT + drag mouse) and zoom (right click + drag mouse).

If you zoom in, you may notice it gets a little messy due to the complete display of every bond in the ensemble.

Pressing b will cycle through the displays for the bonds. At the moment there are two: full ensemble, and display of the average position only. During flexibility macrocycles, the average position of the ensemble will not move while the full ensemble can be seen to be flexing.

Pressing d will also cycle through displays of electron density: currently (a) none, (b) weighted map and (c) weighted map and difference map together. Note that before refinement of flexibility, flexible regions of the protein are too 'dense' and will have clear negative (red*) difference density associated with it.

Using the keys , and . will progress through the residues from the current selected residue. Pressing , for the first time will centre on the N-terminus of the first chain. You can see that the difference density has indicated that this region should be a lot less dense than the model currently specifies.

Coffee break

Vagabond will take some time to refine the structure, so please take a coffee break while the computer does the hard work.

Looking at the results

When Vagabond has finished refinement, we can inspect the log file on the command line. Keep Vagabond open, but also open a terminal to your root directory. In every instance of Vagabond, the console log is written to vagabond.log, so we can use this to extract the important information.

grep 'Rwork/Rfree' refine_1/vagabond.log

This will give us a list of every R factor on each cycle. It should look something like this:

Rwork/Rfree: 25.6275, 26.9768 % (diff: 1.3493 %)
Rwork/Rfree: 25.2238, 26.5170 % (diff: 1.2932 %)
Rwork/Rfree: 25.3837, 26.7420 % (diff: 1.3584 %)
Rwork/Rfree: 25.1089, 26.5699 % (diff: 1.4610 %)
Rwork/Rfree: 25.1578, 26.5642 % (diff: 1.4065 %)
Rwork/Rfree: 25.1550, 26.6125 % (diff: 1.4575 %)
Rwork/Rfree: 25.0700, 26.4239 % (diff: 1.3539 %)
Rwork/Rfree: 25.0794, 26.5038 % (diff: 1.4244 %)
Rwork/Rfree: 25.0829, 26.4523 % (diff: 1.3693 %)
Rwork/Rfree: 25.0422, 26.4809 % (diff: 1.4387 %)
Rwork/Rfree: 25.0654, 26.3277 % (diff: 1.2622 %)
Rwork/Rfree: 25.0873, 26.2800 % (diff: 1.1927 %)
Rwork/Rfree: 25.1032, 26.3503 % (diff: 1.2471 %)
Rwork/Rfree: 25.0883, 26.3116 % (diff: 1.2233 %)
Rwork/Rfree: 25.0884, 26.2912 % (diff: 1.2028 %)
Rwork/Rfree: 25.0883, 26.3116 % (diff: 1.2233 %)
Rwork/Rfree: 24.8400, 26.0617 % (diff: 1.2217 %)
Rwork/Rfree: 24.7608, 25.8706 % (diff: 1.1098 %)
Rwork/Rfree: 24.8558, 25.9858 % (diff: 1.1300 %)
Rwork/Rfree: 24.8954, 25.8861 % (diff: 0.9907 %)
Rwork/Rfree: 24.8540, 25.8391 % (diff: 0.9851 %)
Rwork/Rfree: 24.8270, 25.8020 % (diff: 0.9750 %)
Rwork/Rfree: 24.8146, 25.7961 % (diff: 0.9814 %)
Rwork/Rfree: 24.8180, 25.7943 % (diff: 0.9763 %)
Rwork/Rfree: 24.8146, 25.7961 % (diff: 0.9814 %)
Rwork/Rfree: 24.7347, 25.6773 % (diff: 0.9427 %)
Rwork/Rfree: 24.7458, 25.7014 % (diff: 0.9556 %)
Rwork/Rfree: 24.7793, 25.6055 % (diff: 0.8262 %)
Rwork/Rfree: 24.8071, 25.5945 % (diff: 0.7874 %)
Rwork/Rfree: 24.7782, 25.5703 % (diff: 0.7921 %)
Rwork/Rfree: 24.7645, 25.5681 % (diff: 0.8035 %)
Rwork/Rfree: 24.7639, 25.5821 % (diff: 0.8182 %)
Rwork/Rfree: 24.7781, 25.6095 % (diff: 0.8314 %)
Rwork/Rfree: 24.7639, 25.5821 % (diff: 0.8182 %)
Rwork/Rfree: 24.6106, 25.3220 % (diff: 0.7114 %)
Rwork/Rfree: 24.7424, 25.4072 % (diff: 0.6648 %)
Rwork/Rfree: 24.8142, 25.4305 % (diff: 0.6162 %)
Rwork/Rfree: 24.9118, 25.4807 % (diff: 0.5689 %)
Rwork/Rfree: 24.8941, 25.4713 % (diff: 0.5772 %)
Rwork/Rfree: 24.8905, 25.4784 % (diff: 0.5879 %)
Rwork/Rfree: 24.9045, 25.5166 % (diff: 0.6121 %)
Rwork/Rfree: 24.8905, 25.4784 % (diff: 0.5879 %)
Rwork/Rfree: 24.6432, 25.3775 % (diff: 0.7343 %)
Rwork/Rfree: 24.4921, 25.3257 % (diff: 0.8337 %)

As we can see, not only is this a slight downward trend, but the gap between Rwork and Rfree has decreased on each cycle, which suggests there has been a reduction in overfitting.

The R factors in Vagabond are always inflated over the values reported through atomic coordinate refinement, although this will go down in the future. The original deposition states R factors of 17.7/21.3%. However, Vagabond uses significantly fewer parameters than in conventional atomic models. If you check the top of the log file, you will see that Vagabond uses 1998 parameters to describe this protein (which has 1451 protein atoms).

In the GUI, you can open the current model and data in Coot from the Model > Open in Coot option from the drop down menu. This will also open the ensemble, described as is done for an NMR model. Note that although Vagabond produces an ensemble, there is no evidence, at present, to suggest that the atoms within the individual strands of the ensemble are fully correlated with each other as displayed in the ensemble. Therefore, the ensemble should only be considered as a complete unit, and no analysis of correlation of atom movements within an individual strand should be used to support a biological argument.

* Colour-blind and angry at this colour scheme? So am I. I'd love to come up with a visual cue which works for everyone. I have tried to keep the green light and the red dark to help distinguish on brightness instead of hue, but I am happy to hear suggestions.