Full Circle Magazine FR

Ceci est une ancienne révision du document !

Don’t worry Python fans. Greg will be back next month.

One of the constants in the history of desktop computing is disk prices going down over time in terms of money for the MByte. As a consequence (or is it perhaps a cause?), our disk usage for work or play has also gone steadily up. Since a large part of our digital lives could eventually be wiped out in the event of drive failure, a lot of attention has been given to the backing up of personal documents, photo libraries, music collections and other pieces of data.

But not all backup strategies are equal. In the first place, it must be easy for us to make backups. Otherwise, experience shows we will not make them at all. A perfect three-tiered backup strategy that guarantees no data loss if used correctly is, in practice, next to useless if it is not user-friendly, needs complicated command-line sequences, or just introduces yet more hassle into our digital lives.

On the other hand, a backup strategy also needs to take into account the limitations of hardware as regards space and time. Most cloud-based services are limited to about 5 GBytes of space. But filling this up from the Internet over a 1 Mbit/s asymmetrical link may take up to 37 hours of continuous uploading (I’m supposing a 384 kbit/s uplink here). So no, we will not be uploading the entire contents of our video library to Ubuntu One, and certainly not over a measly ADSL connection.

Finally, a good backup strategy must also be flexible: we need to plan for the worst, which could mean having to scrap our system completely and simply get a new one. But we will then need to get our data back on again, and usually in a bit of a hurry. Disaster almost never strikes at the most convenient time.

Fortunately enough, Ubuntu has some tips and tricks that makes it an easy operating system to back up, rather more so in fact than some others. Let’s review some of the more accessible practices.

Cloud-based storage

Cloud-based storage is the current fad in home (and, up to a point, business) desktop computing. You place your documents in a certain folder (for services such as Dropbox, StrongOak, Ubuntu One and many others), or even edit them directly online (Google Drive). They magically appear and are updated on all your other computers, often with download access from tablets and mobile phones as well. Easily sharing documents with other users may be an added perk. Providers have their own professional backup plans for their servers, so once on there your backed-up documents are probably safer than anywhere else – it would take a major technological disaster to render them inaccessible.

However, when using this type of service, we should also be aware of several caveats. As mentioned above, space is limited on the cloud and transfer times tend to be slow. This limits severely its use for storing common data items such as large photo libraries or music collections, without mentioning video files or large software packages downloaded from the Internet (e.g. Ubuntu CD images). On the security side of things, we should also be aware that all files we store in an unencrypted form may be readable by the organization behind the service. Though we may trust these people, they may eventually get hacked - or receive a subpoena to deliver access to our stuff.

Depending on the type of information we store, this may or may not be a problem. To take an example, should a general practitioner store his/her patients’ medical data in the cloud? Depending on applicable law, it may be prudent to – at the very least – encrypt the data files before uploading them to a server that the user has no direct control of.

Taking these elements into account, a reasonable cloud-based backup strategy will probably focus only on documents, not other types of media. Of these, perhaps only part of the user’s documents will be backed up, but not all.

Whatever strategy we apply will probably use the existing Ubuntu directory structure at some point. Most applications are already tuned to the default ~/Documents, ~/Images, ~/Downloads, etc … directories, so we might as well use them. But on the other hand, the cloud service will be based on a single folder, for example ~/Dropbox . The simplest solution is then to use subdirectories and links to choose which files to include in the cloud backup scheme and which to exclude.

For example, we could create a BackupedImages subdirectory within Pictures

mkdir ~/Pictures/BackupedImages

that we could then create a (soft) link to in the Dropbox folder

cd ~/Dropbox

ln -s ~/Pictures/BackupedImages

Any files placed or modified in Pictures/BackupedImages will be automatically uploaded into the cloud if we are online, or as soon as we get connectivity.

Naturally, this scheme will need to be replicated on the other machines we use to make these documents available to them as well. When doing so, it is best to pause Dropbox syncing before creating the directory and link, and start it up back again later.

This also applies to similar services such as Canonical’s Ubuntu One. However, in this particular case, you will need to use One’s control panel to add the different folders you want to backup, as directly creating a link does not seem to work well.

It may be tempting to include the Desktop itself within the backup scheme. However, some people use our Desktop as a temporary work zone. We tend to hammer our files quite hard, e.g. while compiling programs, so we would not want each and every file change to be reflected up into the cloud clogging up our Internet access in the process. In such cases, it is best to do our work in another directory, and then copy files into the backup zone when they have reached a certain level of stability, perhaps at the end of each work day.

Backing up to external drives

External drives are relatively cheap nowadays. Many USB-connected models can be found either in the 2½“ form factor, or in 3½” form. In the first case, the enclosure will contain a disk drive designed for laptops and that can usually run on 5V power supplied directly by the computer through the USB cable. In the second, it will contain a drive designed for desktop computers, that needs both 5V and 12V power, and so will require an additional power brick (transformer). If you need compactness and wish to minimize cabling, go for the first. If you need large capacities (above 1.5 TByte and up to 3 TByte), the second type will fulfill your needs better.

Besides the disk itself, a second factor to take into account is the connection. USB version 2 connectors are rated at 480 Mb/s line speed, which in practice translates to 30 to 35 MBytes/s. USB version 3 can in theory go up to 5 Gb/s line speed, far outstripping current spinning platter disks’ read/write speeds. However, they do need both a USB 3 enclosure and a USB 3 port on the computer. Finally, other solutions also exist such as Firewire (IEEE 1394) enclosures, or the newer generation of consumer-grade NAS stations than can be accessed directly through the network. In order of speed, USB 3 and high-speed Firewire connections outstrip USB 2 and Gigabit Ethernet (to a NAS), with Fast Ethernet (also to a NAS) a slower alternative. Needless to say, a WiFi connection to a NAS is to be avoided whenever possible: it has the lowest bandwidth, and this is shared between all users connected at the same time.

Whichever connection type you use, performance depends on both factors, the disk and the connection. It is important for the actual use of backups. Nobody wants to wait half an hour for some videos to be copied over, when, with another setup, five minutes might suffice.

The Disks applet (gnome-disk) comes as standard on current Ubuntu distributions, and has several utilities built-in. One, the benchmark, can measure several interesting performance parameters for hard disks. Two screen captures are included here, both created with the same USB 2 adaptor but using different hard drives.

In the first (top left), we can see how a standard 500 GByte spinning platter hard drive - natively capable of reading and writing at about 100 MByte/s - is limited to 33 MByte/s reading and 20 MByte/s writing when connected through USB 2: the USB connection is the limiting factor for transfer speeds. If we were to connect this disk using USB 3, we could expect 100 MByte/s transfer speeds (the disk’s limit), though not the approximately 500 MByte/s that USB 3 is capable of pushing. If we are to transfer large files to and from this disk, these are the numbers that really matter.

On the other hand, we can also see access times in the 15 - 20 ms range. These values are rather standard for platter hard drives. They will affect backup speeds especially when transferring a large number of small files, since the disks must make a seek operation at the beginning of each transfer.

The second benchmark (next page, top left) was made with an SSD (Solid-State Disk) drive, also connected through the very same USB 2 link. In this case, we can see that transfer speeds remain essentially the same, even though the disk itself is capable of very high throughputs, probably in excess of 250 MByte/s. This is a clear case of a speedy hard drive stuck behind the bottleneck of a slow connection: using USB 3 would definitely speed things up considerably when transferring large files - even more noticeably so than with the preceding spinning platter unit.

On the other hand, we can also see how access times are greatly reduced, to the sub-1ms range. This means that using an SSD, even behind a slow USB 2 connection, is still worthwhile when transferring a lot of very small files. Access to each file to initiate each individual transfer happens more quickly. This advantage can also be retained when using a faster connection such as USB 3.

The general take-home idea is that, in each backup situation, we should take into account not only the hardware aspects, but also the kind (size and number) of files to be transferred in order to decide on the external disk units and connection technology to be used. In general, however, one can almost never go wrong in buying the fastest and largest disks one can afford at a given time.

This being said, we can also decide on several different software approaches to perform the actual file transfer.

From a personal standpoint, I tend to avoid the use of compressed files (ZIP or gzipped TAR archives), preferring simply to clone the file and directory hierarchy on the backup. This way, it is easier to navigate through the backup and retrieve a single file or several files if needed, without having to uncompress a complete disk image.

To make such a plain file system backup, several strategies can be used. The easiest is simply to copy over the complete directory tree, replicating both new files and those already existing in the backup. A more advanced option would be to copy over only new or modified files, reducing backup times by not transferring existing files. This second approach can be performed either by hand (slow and error prone), or using an automated system (quicker).

rsync is a utility that has been included in most GNU/Linux distributions for some time now. Originally designed to perform remote synchronization operations - as its name indicates - it is also very effective to do local backups to an external drive. Supposing that we wish to backup the complete contents of our user’s home directory /home/alan to an external drive that is mounted on /media/alan/backups, we could issue:

rsync -aruv /home/alan/* /media/alan/backups/

and it will take care of the complete backup for us. Output shows us which files are being transferred at each step.

On the other hand, if we should need to recover backuped files - for example, when “populating” a new or freshly-formatted computer - we can reverse the process with:

rsync -aruv /media/alan/backups/* /home/alan/

As you can see, it is in fact indifferent whether our /home directory is mounted as a separate partition or not, though it is certainly of good practice to do so.

Syncing files between two computers

In this day and age, many of us are the happy owners of not one, but two or more computers. When older machines are replaced by newer ones, they are not always sold off. The economics of the consumer market are set up so that selling a computer with, say, three to five years of continuous use on it will not net us much cash.

But such older machines, while not quite as fast, may in many cases still function quite well as secondary or backup machines. Actual flavours of Ubuntu (or perhaps Lubuntu or Xubuntu) work quite well on a 2008 dual-core or suchlike. If the computer itself has no major hardware issues, the only part that may need replacing in order to convert it into a backup unit would be a larger hard disk drive, though this may not even be the case depending on your storage needs. Actually, this could be a good way to get even more service out of a laptop with a broken screen or a dead battery.

A strategy that has worked quite well for me is to actually clone my various computers, and keep a complete copy of all of my files on each computer at all times. This way, I can choose one or the other for any given task, taking into account only the needs of that particular task (do I need the fast CPU? Or the large screen? Or the little, light ‘un to carry around all day?) and not the availability of data files. My data is always available to me.

Having accumulated several hundreds of GigaBytes over the years, copying over all the files over WiFi can get to be a bother. This is when a modest investment (less than $10) in an Ethernet crossover cable can help speed up the process considerably. This is basically a piece of cable that internally connects one computer’s Ethernet TX (Transmission) port to the opposite side’s RX (Reception), and vice-versa. It can be used to do away with a network switch, and since there are only two computers on a full-duplex communications link, connection speeds can actually be rather higher than through a switch.

These days, crossover cables are usually found in red color, though it is best to check it is not actually a straight patch cable (slightly cheaper).

Before connecting the computers, we will need to draw up a strategy for file transfers. Which protocol do we use, and which programs? Since we have already seen the rsync utility in this article, I will continue using it, but this time over a SSH link instead of to a locally-connected external disk.

Setting up an SSH link on Ubuntu systems simply implies installing the openssh-server package on one computer (the client package is already installed by default on both). This can be done from the terminal:

$ sudo bash

# aptitude update

# aptitude install openssh-server

or it can also be done from within any graphical software management program you prefer: Synaptic, the Ubuntu Software center, Muon, etc.

Once installed, the SSH server automatically creates its key pair, and starts up. You can check it is working from the terminal on the same machine it is installed on by issuing:

ssh localhost

If the terminal requests confirmation to continue connecting, and then asks you for your password, you are in business.

If you prefer to use SSH using a certificate, without having to enter a password each time you connect, you can follow the instructions in this thread on Askubuntu: http://askubuntu.com/questions/46930/how-can-i-set-up-password-less-ssh-login

Once both computers are connected, we will need to configure IP addresses. Since running a DHCP server on one of our machines is probably a bit high on the geek-meter for ordinary users, that leaves two choices open to us: • manually set an IPv4 address on each computer • use IPv6 and its autoconfiguration feature

If using IPv4, I suggest you use the 172.16.0.0/16 private network address block, since it seems to be rather less used in domestic routers than the more common 192.168.0.0/16. For example, you could issue on the SSH server:

sudo bash

# ifconfig eth0 172.16.0.1/16

and on the other machine

sudo bash

# ifconfig eth0 172.16.0.2/16

If this works out well, you can now ping back and forward between the machines. On the server, to ping three times the other computer, issue:

ping 172.16.0.2 -c 3

You should see several lines starting with “received 64 bytes” and with a time in milliseconds at the end. If you get lines containing “Destination Host Unreachable”, the connection is not working.

On the other hand, IPv6 automatic address configuration has a distinct advantage over IPv4: the same link-local address will always be assigned to the same interface on each machine every time, and without needing any manual configuration. To know which address you have assigned to the eth0 interface, issue

ifconfig eth0

You should obtain something like that shown below.

Note the IPv6 address on the second line down, beginning with “fe80::”.

From the other computer, I can now ping this one using the IPv6 ping6 utility:

ping6 fe80::de0e:a1ff:fe4e:7c86%eth0

Please note I left out the /64 netmask, and tacked on the %eth0 interface indicator at the end.

Now, on to doing the actual backup. From the other computer, I can update all my files on the SSH server using either

rsync -aruv /home/alan/* 172.16.0.1:/home/alan/

or

rsync -aruv /home/alan/* [fe80::de0e:a1ff:fe4e:7c86%eth0]:/home/alan

IPv6 addresses often need to be placed between ‘[]’ keys. In either case, the server should require my password, and then start performing synchronization.

In the other sense, I can synchronize all files from the server to the other computer using similar rsync commands:

rsync -aruv 172.16.0.1:/home/alan/* /home/alan/

or

rsync -aruv [fe80::de0e:a1ff:fe4e:7c86%eth0]:/home/alan/* /home/alan/

To make sure all files are up-to-date on both machines, it may be necessary to perform synchronization in both directions - especially if both computers are occasionally used to work on and modify files. Naturally, this sequence can be automated within a script file, that could, for example, be called backup.sh:

#!/bin/bash

rsync -aruv /home/alan/* [fe80::de0e:a1ff:fe4e:7c86%eth0]:/home/alan

rsync -aruv [fe80::de0e:a1ff:fe4e:7c86%eth0]:/home/alan/* /home/alan/

The file will need to be made executable with

chmod 755 backup.sh

before use. It can then executed with

./backup.sh

Graphical frontends for automatic backups

By this point, some readers may be asking themselves if I ever intend to talk about automatic backup apps, such as the Déjà Dup program that now comes installed as standard in Ubuntu.

There are two different points of view regarding this kind of program. Automatic backups can help take a little of the hassle out of making backups: they do the remembering for you, and handle most if not all of the action.

But it can also be argued that this can in fact become a bit of a liability, since depending on automatic actions will, over time, tend to make us less aware of what the computer is actually doing. We will end up not staying on top of our backup volume’s level of free space, for example. We may take for granted that such or such files are being regularly backup up, without noticing that in actual fact they are not. Graphical interfaces to the backup process add an element of abstraction that may make things easier to configure for the novice, but unfortunately also obscure the inner workings of the process - which in turn makes mistakes and subsequent disaster situations all the more probable.

Doing manual backups, on the other hand, obligates us to remain aware of what we are doing. We can also keep visual track of the files as they go over - if anything that is important for us and really should be copied, but it’s not copied, we see it on the spot. If something weird comes up in the messages, likewise.

So, while making the backup process as automatic as possible may seem a good idea from the standpoint of hassle reduction (remember backups must be easy to do, if we want them to actually get done in the real world), perhaps a more balanced approach would be to combine automatic features such as scripts or even graphical applications with a certain level of human control and overview.

In any case, if in any doubt whether to backup or not, please do so - and in as many copies as possible.