Saturday, March 5, 2011

Software : In Depth: Best Linux compression tool: 8 utilities tested

Software : In Depth: Best Linux compression tool: 8 utilities tested


In Depth: Best Linux compression tool: 8 utilities tested

Posted: 05 Mar 2011 04:00 AM PST

In the '80s and early '90s, compression was king. As you struggled to connect to a BBS (bulletin board system) with the latest Amiga utilities on, you dreamed of when things would be faster and not having to spend as long decompressing files as they took to download.

Fast forward a few decades and the sheer size of the data files we juggle about is pretty boggling. Many have built in compression of some kind. Bandwidth isn't such an issue any more, and in some ways neither is disk space, but it would still be nice if there was a quick and convenient way of reclaiming a few GB here or there, or not having to wait so long when uploading email attachments.

Compression technologies have moved on in the interim, but perhaps not as much as you may expect, because we're fighting against an exponential curve of just how far things can be compacted. Many data formats are nigh on incompressible, because they've already squeezed the redundancies out.

Nevertheless, there are some tools available that leverage our superfast CPUs and gargantuan memory reserves to try some new tricks. In this test, we're looking at a selection of old and new tools currently available.

Some don't get a review, but are included in our tabulated data, which you'll find both in cut-down form here and a full version of online - gzip is there for comparative purposes, for instance.

Our selection

bzip2
rar
7zip
lbzip2
xz
lrzip
PeaZip
arj

RAR 4.00 beta

Originally released way back in 1993, the RAR format has gone through quite a few revisions and tweaks in the meantime. The original author, Eugene Roshal, licenced the software to a German software company who now produce the WinRAR variant and command line options for non-Windows platforms.

On the decompression side, RAR supports a lot of formats, including unusual ones, such as ISO files and CAB archives. The format is far more popular on the Windows platform, and is generally used for splitting large files into usable chunks. This makes it popular for posting large files to usenet groups, and the WinRAR utility for Windows is very well-used indeed.

The generation of parity and volume files alongside the chunks makes it easy to correct minor transmission errors and make sure you've got a perfect copy of whatever was sent. On Unix systems though, the native RAR format is pretty much nonexistent.

In performance terms, it does better than expected. While it is slower than most of the tools on test, it does actually manage some reasonable space savings across the different filetypes. Compression algorithms are usually focused on some particular type of data, and it may well be that better space savings would be recorded by testing against the sorts of files usually found on a Windows system.

It wasn't particularly troubled by the practically incompressible image files, and it did reasonably well with large disk images and the generic filesystem selection.

As a proprietary command line tool for Linux, though, its uses are limited, and is probably best saved for occasions when interoperability with Windows platforms is required.

RAR

Verdict

RAR
Version: 4.00 beta
Web: www.rarlab.com
Price: 29 Euros

As with ARJ, only really useful for trading files with Windows users.

Rating: 5/10

Bzip2

Julian Seward released the original bzip2 in 1997 under a BSD licence. In case you are wondering, there was indeed a bzip before that, but it was withdrawn by the author after possible patent worries loomed menacingly (ah, software patents, don't we all love them?).

Not to worry though, because bzip2 is better than it anyway. Using a combination of different algorithms - such as run-length encoding (RLE), the Burrows-Wheeler transform, and other such cunning trickery - it immediately became noteworthy in Unix circles because of the impressive compression achieved compared to the standard utility of the day, gzip.

Cunningly coded to be almost identical in terms of usage, bzip2 soon became a shoo-in replacement for all types of archiving purposes. Most notably, much source code was shipped using a tar/bzip2 combination instead of the usual tar/gzip combination of the time.

It's somewhat disappointing that in the intervening 14 years or so bzip2 hasn't replaced gzip entirely - changing the habits of Unix users is obviously like trying to steer a particularly fat continental shelf or something.

However, for large volumes of archiving, it seems the trade-off between space savings and compute time isn't always worth it. The figures we generated for Test 3 show that bzip2 running on maximum compression does shave a few per cent off the file size, but at the expense of taking around four times as long.

So if speed is of paramount importance to you, gzip is still a better option… Hang on, before we say that, you should check out the review for lbzip2.

bzip

Verdict

bzip2
Version: 1.0.6
Web: http://bzip.org
Price: Free (GPL)

It's fast and widely used, but switch to lbzip2 for a speed boost.

Rating: 5/10

lbzip2

This is an intriguing contender for the modern age. Using POSIX threads, this tool parallelises the compression routines so they can be run in more than one process and later combined. We care about this because lots of machines now have a multi-core processor.

Standard bzip and indeed many of the other tools on test are only capable of running in a single thread. That means if you have a dual-core processor, such as the one we used for testing, only one is being used for the hard work of compressing, while others lie idle. Of course, the other cores can take care of the system overhead, but it is a bit of a waste.

Parallelising the task does include a bit of overhead in terms of processor time, because there has to be a 'dispatcher' component that allocates tasks to the threads and combines their results at the end. Even so, on a dual-core machine you should see a reduction in the time taken by around 40%, depending on the actual task.

This is borne out by our results - with the same settings, the time taken by lbzip is between 35 and 45% faster. The significant thing is that it is by and large the same process, and you should end up with pretty much exactly the same files. In our tests, however, the resultant filesizes were a few bytes off in either direction, which may simply be due to slightly different application of the algorithms.

Importantly, files created with lbzip2 are valid bzip2 archives - the format hasn't changed, so they can be distributed to and uncompressed by those using bzip2. Lbzip2 is available in some repos, and some quarters suggest that it should just be aliased to the standard bzip2 commands - there is no real disadvantage to it even on a single core.

lbzip

Verdict

lbzip2
Version: 0.23
Web: http://lacos.hu
Price: Free (GPL)

This is a faster version of the old Unix favourite.

Rating: 7/10

7zip

Released in 1999, 7zip (aka 7z or 7za) is a relative newcomer to compression. It was written by Igor Pavlov, who also designed the LZMA algorithm that forms the default compression mode.

The 7zip code also includes other compression methods, such as bzip2, so it can support formats other than the default .7z.

Although it's open source, the main development focus is on the Windows platform, where 7z enjoys a great deal of popularity, and the code comes with a natty front-end. The basic source code has been tweaked by some, while other projects have made use of the LZMA SDK to produce very similar variants. One of these is xz, and others include p7zip. For this test we compiled from the original source code.

Looking at the test results, it's easy to think that 7z isn't making use of the multiple cores on offer. In fact, it is a threaded application, but even so takes slightly longer than the single-threaded bzip2 archiver, and twice as long as lbzip2. We could make some allowances for this code, since it's compiled from the generic source rather than being geared to work on Linux, but it fares better than pxz, the parallelised version of the derivative xz compressor.

One area in which this algorithm does perform well is decompression, as this and the xz utilities consistently perform better than the rest of the pack (apart from gzip, which isn't as compressed to begin with).

7z is certainly a useful tool, and one which may become more worthwhile on faster machines, or in cases where you want the compression to be good, but the decompression to be speedy (such as distributing apps and data).

7zip

Verdict

7zip
Version: 9.13 beta
Web: www.7-zip.org
Price: Free (GPL)

Pure LZMA action fares better than some of the derivatives

Rating: 7/10

xz

Xz is another piece of software that aims to replace gzip by offering similar options and syntax. It works using the LZMA algorithms, as also used in 7z, so the results should be rather similar.

The confusing thing is that the LZMA algorithms should eat up all the test files we have and generate a good space saving without too much of an increase in time. The numbers, though, tend to indicate otherwise.

The compression results are good and, in the default mode, xz seems to be tweaked to extract more compression than 7z, but the cost of that is a major amount of time. Even if we ignore the figures for the first test, which punishes tools that try to do a good job in time terms, xz fails to inspire in the other tests too. It takes over twice as long as bzip2, for instance, to produce a measly few percent more of space savings.

Due to its single-threaded nature, it also takes nearly twice as long as 7z to produce a file of almost identical size. There is a parallelised version of xz too, pxz, for which we have included separate data in the table at the end.

This does produce some significant time savings, but not to the same extent that lbzip2 manages for the bzip2 algorithm. It manages around a 35% time saving, which is welcome and propels it ahead of the single-thread archivers in terms of speed, but doesn't help it catch up to the other threadaware tools, or even 7z.

As mentioned in our 7z review, the advantage of this app is that it offers good compression and fast decompression times, which is why Slackware began using it for creating packages. That might be a good call for distributing packages, but it isn't a great choice if you're doing the archiving.

xz

Verdict

xz
Version: 5.0
Web: http://tukaani.org/xz
Price: Free (GPL)

Mildly disappointing performance from a supposed bzip successor.

Rating: 7/10

lrzip

Lrzip is relatively new in the world of compression utilities, and is derived from the rzip utility. The focus here is on compressing large files, and lrzip works best on systems that have large amounts of available memory and big files (greater than 100MB) to crunch.

This is because it uses some long-range redundancy checks to compare areas of data in the hopes of being able to save some space. The default method for the actual compression is to use the LZMA algorithms as used by the original 7z and also xz and pxz archivers in this test.

LZMA is rapidly becoming the standard algorithm, in spite of patchy performance with the other utilities that have adopted it. Whatever the secret sauce added to lrzip, it seems to work as it manages to be far faster than the other LZMA-based utilities on large files.

As well as LZMA, you can opt to use the LZO algorithms, which are insanely fast, but don't provide a great compression ratio, or the glacially slow ZPAQ, which gives maximum compression ratios.

The ZPAQ software is available as a standalone too, but it is messy to build yourself. Nevertheless, we've included figures later for comparison.

By use of extreme measures, it seems to be able to produce the most compact archives. It's really more of a proof of concept rather than an everyday compression utility, because of the immense amount of time and resources used to generate the files. The average throughput is only about 170k/second!

The real killer is it takes about as long to decompress. Lrzip manages some reasonable times, good compression and a variety of options, which make it one to watch.

lrzip

Verdict

lrzip
Version: 0.46
Web: http://ck.kolivas.org/apps/lrzip
Price: Free (GPL)

Awesome all-round performance and a great candidate for general us.

Rating: 9/10

PeaZip

PeaZip is a little unusual among our selection of archivers. Unlike the others, it's a GUI-driven app that covers a number of archive formats and offers other features besides. There are, of course, tools such as File Roller and Ark on Linux which will do similar jobs and act as frontends for most of the other archiving tools covered here, but PeaZip deserves to be here because it also creates its own archive format.

The PEA file aims to be a modern reinterpretation of the RAR format - a container for different types of compression that can also have different layers added, such as various types of compression, or be split neatly into manageable chunks for distribution.

The native format of the archive is simply a variation on the Zip algorithms. Outside the world of Linux, Zip is still the most widely used archive format, mainly for compatibility reasons. Although PeaZip does handle other formats, we've rated its performance in the table based on creating native PEA files with Zip-style algorithms. Consequently, the performance isn't that great.

It manages to nestle in the top group for speed, but compression ratios are poor, and it's often outperformed on both counts by lbzip2. However, PeaZip behaves nicely and doesn't suck up all your RAM by default, and therefore on more resourcechallenged systems, its performance may look better.

When using the better compression modes available, it's about on a par with 7z. As well as simply archiving, PeaZip can be used as a general file manager, and is available skinned for both GTK/ Gnome and KDE desktops. It may be more aimed at Windows users, but it is open source and has lots of Linux love.

peazip

Verdict

PeaZip
Version: 3.6
Web: www.peazip.org
Price: Free (LGPL) Certainly a useful tool that works well with KDE and Gnome.

Rating: 7/10

arj

Conceived in the '90s, the ARJ format took a while to catch on, but became a major one for some types of archiving. Like RAR, it supports easy file splitting for archiving onto disks or, more often, for splitting files up for ease of use transferring or distributing them.

The original arj software was written for DOS, but soon became a full desktop app for Windows systems, and most of its usage is on that platform. An open source version of the software was created, which naturally found its way to Linux, and although the format has never been particularly popular for pure Linux uses, it has advantages for cross-platform file transfers.

Our results show that this software sits firmly in the camp of getting the job done quickly, but that this comes at the cost of not doing a great deal of compression. On the second test it was only marginally slower that the superspeedy gzip, but it also recorded the worst space savings for the last two tests.

That it doesn't put in a better performance than the standard gzip software on Unix systems is probably one of the reasons you will rarely find anyone using it on Linux machines, although it is nevertheless still maintained and available in just about every Linux distro.

In the battle with the RAR format, arj held its own for a long time, but in the last few years, even the commercial version has had minimal updates, and it is safe to say that this format is certainly on the endangered list.

It's nice to know that it exists, but it isn't for everyday use. Even for the splitting and distribution of files to Windows, the RAR format is a much better bet.

arj

Verdict

arj
Version: 3.10.22
Web: http://arj.sourceforge.net
Price: Free (GPL) The archiver that time forgot and only really useful for legacy support.

Rating: 2/10

The best Linux compression tool is... lrzip 9/10

main graph

Our first conclusion from all the data gathered for this test is that if you have multiple cores in your Linux box, you should really check out one of the threaded tools available. On a two-core machine it will make a significant difference. With more than two, it could change your world view.

As far as the tests themselves go, the first one is a bit of a stinker, because no matter what algorithm you use, it's never going to be able to compress much of what is already highly compressed data. The only thing harder to compress is pure random data, which is why we avoided that here.

The arj and RAR utilities, it soon becomes obvious, are really only useful for our Windows cousins, or for exchanging files with them via email. As for the more Linux-y tools, one of the surprises is how well the old-timer tools do.

Bzip2 (along with the thread-aware lbzip2 variant) and gzip do a pretty good job of archiving, and they manage to do it incredibly fast - there isn't a great deal of difference between archiving a file with them or simply copying it, which can be useful for all sorts of reasons.

PeaZip deserves an honourable mention for being easy to use and for providing a front-end to a lot of these utilities.

Then there's the LZMA-based tools, which may seem to be the future. It was a little surprising that the generic 7z tool seems to do better than the Slackwarefavoured xz/pxz. For applications where the speed of decompressing the output is paramount, they're clearly out in front.

In that case, nothing beats pxz, apart from gzip itself. For all-round performance, the winner should be lrzip, though, which combines the popular LZMA algorithm into a fast and space-savingly great too, which is why it comes out top here.

No comments:

Post a Comment