Background
In my last blog post on the topic of OpenVista Server Revision Control, I talked about some changes we wanted to make to our revision control process for OpenVista Server. In particular, we wanted to store globals in ZWRITE format and we wanted to switch from Mercurial to Bazaar. In that post, I discussed the advantages for each change and promised to do benchmarks to see if these changes were feasible.
Methodology
We tested two methods of storing globals, GT.M database format and ZWRITE format. For each of these formats, we tested several iterations of a series of common source control operations using two revision control tools, Mercurial and Bazaar. The initial repositories were created manually, then a script was used to translate changes from our current Mercurial repository to the desired format for the benchmark.
More specifically, the script updates a working copy of our current Mercurial repository to a certain revision, then rsyncs the routines and globals over to the benchmark repository. For the ZWRITE repositories, the globals are exported into a number of text files and the original database files are removed. “status”, “add”, “diff”, and “commit” commands are then benchmarked. After the commit to the benchmark repository, the script checks the file sizes of the repository files to see how much they have grown. Finally, the script updates the working copy of the current Mercurial repository to the next revision, starting the process over again.
The script was run for 48 revisions. In that time, there were several types of commits made to the main repository – some commits were data changes only, others represented the installation of a KIDS build, others were just tags (resulting in very small commits), and there was even one commit where the GT.M database files were upgraded, causing the repository to grow significantly for the two repositories that were storing globals in GT.M database file format (upgrading the database files had no effect on the ZWRITE format of the globals).
All tests were run on the same machine – a Dell Dimension 4700C with a Pentium 4 2.8GHz HT Processor, 1GB of RAM, and Maxtor 6L100M0 hard drive (100GB, 7200RPM, 8MB cache). The machine was running Ubuntu 8.10 (Intrepid Ibex); the kernel was 2.6.27-7-generic #1 SMP i686. Both Mercurial and Bazaar were installed from the Ubuntu repository. The version of Mercurial used was 1.0.1 (1.0.1-5.1); the version of Bazaar used was 1.10 (1.10-1~bazaar1~intrepid1).
Results
In general, Bazaar was slower than Mercurial, but used less disk space. This was particularly apparent when globals were stored in ZWRITE format. The following graph shows the disk usage of all four test repositories, excluding the working copy files – in other words, how much space the revision control tool needs to store its data (lower is better). The blue squares represent the baseline; this is what we're doing today (Mercurial storing GT.M database files).

You can see that Mercurial's disk usage increases steadily, except for a dramatic increase in the baseline at revision 83 when we did a “mupip reorg upgrade” after upgrading to a new version of GT.M. As mentioned earlier, the ZWRITE repositories are not affected by this. Bazaar periodically compresses the repository, so on certain revisions the repository size actually shrinks. I'm not sure what Bazaar is doing at revision 64/65 - those commits were not large commits, but the repository size grew significantly. It makes up for it 12 revisions later, when the repository shrinks back down to ~175MB. While it's hard to tell for certain with Bazaar being so erratic, in the long run, it appears that Bazaar managing ZWRITE-formatted globals are going to use the least amount of disk space.
This space savings is actually pretty impressive if when you consider that globals in ZWRITE format are about 375MB larger than globals in GT.M database format. Both Mercurial and Bazaar seem to do a pretty good job with compression. Unfortunately, there are downsides with the ZWRITE format besides the larger working copy size. Exporting the data out of GT.M requires 5-6 minutes, and the larger working copy size slows down all of the Mercurial/Bazaar operations we tested.
This next graph shows the time it takes to run "hg status" or "bzr status" against the various test repositories.

The ZWRITE repositories are at a disadvantage here because the larger files take longer to scan. Bazaar is more consistent; with Mercurial there are spikes and dips between revisions. Running the status command on the first revision of the GT.M database repositories took uncharacteristically long for both Mercurial and Bazaar, so I adjusted the graph's maximum value so we could get a clear picture of what's going on for the rest of the revisions. I would guess that the 2+ minute long timings were because Linux had not populated the filesystem cache yet. Fortunately, even though the ZWRITE repositories took longer to scan, the difference was only a few seconds on average.
The next graph shows the time it takes to run "hg add" or "bzr add". Without arguments, the entire repository is scanned and any unknown files are added.

While the status graph was primarily divided by the global storage format, this graph is primarily divided by the revision control tool used. Mercurial has a clear lead over Bazaar, but the difference is always less than 3 seconds.
The next graph shows the time it takes to run "hg diff" or "bzr diff".

The diff command takes much longer than the status or add commands. From the graph, it appears that Mercurial handles binary files better, while Bazaar handles text files better. Even still, Bazaar handling globals in ZWRITE format trails Mercurial handling globals in GT.M database format by at least a minute.
The final graph shows the time it takes to run "hg commit" or "bzr commit".

Like the diff command, the commit command takes a few minutes to run. Mercurial handling ZWRITE formatted globals is able to commit in the shortest amount of time. Initially, Mercurial handling GT.M database files is faster than Bazaar handling ZWRITE formatted globals, but after the database upgrade at revision 83, Bazaar pulls ahead. Bazaar handling GT.M database files is a non-starter, taking an order of magnitude longer to commit than the other test repositories. I adjusted the graph's maximum value to more clearly illustrate the differences between the lower values. Commits at 65, 87, and 98 were tags in the original repository, so they are essentially no-ops for the GT.M database formatted repositories in this test. Exporting the same database still changes the ZWRITE files because they contain timestamps of when the database was exported, so the commit times do not fall to 0 for the ZWRITE formatted repositories.
Conclusion
Looking at the data, there does not appear to be a clear winner. Each tool and global storage format comes out ahead at different times, but it's important to focus on the actual time savings – a 3-second victory for one tool is not nearly as important as a 100-second victory for another. In general, Bazaar still seems to trail Mercurial in speed, but it looks as though Bazaar will use less space in the long run. With the exception of Bazaar's abysmal commit times when managing binary GT.M database files, the speed difference seems to be acceptable. When you consider that the tests were run on an older machine, the actual wall-clock time difference may be smaller on faster hardware. Further, I believe the benefit of using the same revision control system as OpenVista CIS and the benefit of storing globals in a GT.M-version-neutral format outweigh the performance penalties.
Future Work
Perhaps the most obvious benchmark missing from this blog post is the performance of cloning/branching, especially over the network. Also, it would be interesting to add other revision control tools to this benchmark - most notably Git and Subversion.
Finally, committing the entire database is still a crude approach to revision control for OpenVista Server. Some research should be done to determine if some tools can be written on the M/OpenVista Server side of things to "meet us halfway" and provide a more granular set of objects to revision control. This could lead to repository size gains, performance gains, and perhaps most importantly, improvements that will allow us to get meaningful diffs across revisions and automated merging of code.