Outils pour utilisateurs

Outils du site


issue95:mon_histoire

Ceci est une ancienne révision du document !


Table des matières

1

Ubuntu has become well-known as a distribution designed for normal users, with an emphasis on ease of use. However, its open-source nature also makes this kind of work environment especially useful for the scientist. Investigators form a rather specific category of computer users. They tend to have quite precise needs, that overlap those of “normal” users only up to a certain extent. For example, a statistician may in some cases have a use for the very same spreadsheet as a manager, though put to other tasks. However, at some point, the statistician will require a more powerful number-crunching environment, such as R (also available in the Ubuntu repositories). However, the number of potential users for specific, scientific, programs is naturally much smaller than that for more ordinary tasks. Many of those developing programs for scientific purposes are actually scientists themselves, as the specialization of modern science makes a background in each specific field at the very least a certain advantage. Having an open-source operating system makes building programs easier for these people who may not be computer engineers. At the same time, having a software management tool such as the apt and repository system at your disposal makes distributing your program much easier than without them. All of this has contributed to making a wide range of scientific applications available both for Ubuntu and for the distribution it is based on, Debian. To illustrate this topic, in this piece I would like to show you some of the options to display chemical molecules in 3D on your computer, with an emphasis on organic chemistry. Applications include not only teaching chemistry in itself, but also learning more about biology and, to some extent, genetics. For example, we could view a 3D model of hemagglutinin (PDB code 1RUZ) that viruses such as the infamous Influenza A virus use to bind to the host’s cells - the “H1” part of e.g. H1N1 representing the specific type of hemagglutinin contained by that virus.

Ubuntu est très connue en tant que distribution conçue pour les utilisateurs normaux, car l'accent est mis sur la facilité d'utilisation. Cependant, sa nature Open Source rend ce genre d'environnement de travail très utile pour les scientifiques.

Les chercheurs composent une catégorie plutôt particulière d'utilisateurs d'ordinateur. Leurs besoins ont tendance à être très précis et ne chevauchent ceux des utilisateurs « normaux » que jusqu'à un certain point. Par exemple, un statisticien peut, dans certains cas, utiliser la même feuille de calcul qu'un gestionnaire, mais pour des tâches différentes. Cela étant dit, à un certain moment, le statisticien aura besoin d'un environnement de calcul plus puissant, tel que R (également disponible dans les dépôts Ubuntu).

Cependant, le nombre d'utilisateurs potentiels de programmes spécifiques scientifiques est tout naturellement beaucoup plus restreint que celui des utilisateurs ordinaires. Beaucoup de ceux qui développent des logiciels à but scientifique sont, en fait, eux-mêmes des scientifiques, puisque la spécialisation de la science moderne rend de l'expérience dans chaque domaine particulier un avantage, pour ne pas dire plus. Avec un système d'exploitation Open Source, la construction de programmes est rendue plus facile pour des personnes n'étant éventuellement pas des informaticiens. En outre, avoir un outil de gestion de logiciels tel que le système apt et dépôt à disposition, rend la distribution de votre programme beaucoup plus facile. Tout ceci a contribué à rendre une large gamme d'applications scientifiques disponible à la fois pour Ubuntu et pour la distribution en amont, Debian.

Pour illustrer ce sujet, j'aimerais vous montrer dans cet article quelques-unes des options d'affichage de molécules chimiques en 3D sur votre ordinateur, en insistant sur la chimie organique. Les applications comprennent non seulement l'enseignement de la chimie en tant que telle, mais aussi l'approfondissement des connaissances en biologie et, jusqu'à un certain point, en génétique. Par exemple, on pourrait voir un modèle en 3D de l'hémagglutinine (code PDB 1RUZ) que des virus comme le tristement célèbre virus de la Grippe A utilisent pour se fixer sur les cellules de l'hôte - la partie « H1 » de, notamment, H1N1, représentant le type spécifique d'hémagglutinine contenu dans ce virus-là.

2

GETTING MOLECULES Several file formats are currently in use, but perhaps the most extended are the MDL Molfile format (extension: .mol) and the Protein Data Bank format (extension: .pdb). Most molecule viewers are capable of handling both, or even of converting a molecule between formats. It is worth noting that both formats are originally text-based files with a well-documented structure, which illustrates how open data formats are helpful to share data in the scientific world. Compressed versions may also be found, generally using standard gzip compression. An example of the molecule of glycerol (glycerin) in the Molfile format is shown below. NOTE: the distances between atoms are completely off, this is just an example). There are several good sources for molecule files on the Internet. One of the better known is the Research Collaboratory for Structural Bioinformatics (RCSB) Protein Data Bank (PDB), at http://www.rcsb.org/pdb/home/home.do . This has a comprehensive collection of molecules contributed by many teams from all over the world. Of special interest for the layperson such as myself is their PDB-101 primer http://www.rcsb.org/pdb/101/structural_view_of_biology.do with a structured presentation by topics. The “Molecule of the Month” section contains a large assortment of articles on specific molecules that can certainly fill us in on the salient points of how the biology works. By searching for different keywords, I was able to find a specific molecule of interest: hemoglobin (PSB code 1VWT) from human red blood cells. Each molecule is described, the team that announced it is given, as is the citation to the scientific publication it initially appeared in. A download link is also provided (to the right of the PDB code in large letters), by which we can download the corresponding file in the PDB format.

OBTENIR DES MOLÉCULES

Actuellement, plusieurs formats de fichier sont utilisées, mais, sans doute, les plus étendus sont le format MDL Molfile (extension : .mol) and le format de la Protein Data Bank (extension : .pdb). La plupart des visionneuses de molécules peuvent gérer les deux ou même convertir une molécule d'un format à l'autre. Remarquez que les deux formats sont, à l'origine, des fichiers basés sur du texte avec une structure bien documentée, ce qui démontre une des façons dont les formats de données ouverts aident à partager des données dans le monde de la science. Des versions compressées peuvent se trouver, utilisant en général une compression gzip standard.

Un exemple du molécule de glycerol (de la glycérine) dans le format Molfile peut se voir ci-dessous.

NOTA : les distances entre les atomes sont totalement incorrectes - ce n'est qu'un exemple.

Il y a plusieurs bonne sources pour des fichiers de molécules sur le Net. L'une des plus connues est la Research Collaboratory for Structural Bioinformatics (RCSB) Protein Data Bank (PDB), à http://www.rcsb.org/pdb/home/home.do Elle a une collection complète de molécules contribuées par beaucoup d'équipes de par le monde. Présentant un intérêt particulier pour un non-spécialiste comme moi est leur guide élémentaire, PDB-101, http://www.rcsb.org/pdb/101/structural_view_of_biology.do, qui propose une présentation structurée par sujet. La section « Molecule of the Month » contient beaucoup d'articles divers sur des molécules spécifiques ; ceux-ci peuvent nous donner des détails sur la fonctionnement de la biologie.

En faisant des recherches sur des mots clé différents, j'ai réussi à trouver une molécule particulière intéressante : l'hémoglobine (code PSB 1VWT) venant de globules rouges du sang humain. Chaque molécule est décrite, le nom de l'équipe qui l'a annoncée est fourni, tout comme les coordonnées de la publication scientifique dans laquelle elle est parue au départ. Il y a aussi un lien de téléchargement (à droite du code PDB en grands caractères), avec lequel nous pouvons télécharger le fichier correspondant dans le format PDB.

3

VIEWING MOLECULES There are quite a few programs available in the Ubuntu repositories to view the file we have just downloaded. One of the oldest and best known is Rasmol, that now sports a GTK interface. The window itself is very simple: all options are accessible through the menu bar at the top. The user can rotate the structure with the mouse within the main window itself, so spatial relationships that cannot be shown on a printed page become much more clear. When we load up a file we see it by default in stick-form representation, where bonds between atoms are represented by short sticks, color-coded by atom type (white for carbon, red for oxygen, yellow for iron, etc.) Hydrogen atoms are usually not shown directly, though this option can be set if desired. This is the molecule of hemoglobin, with its four main structures (alpha and beta units) surrounding a central space. Other viewing options allow us to show atoms represented as filled spheres (Display > Ball and Stick, or Display > Spacefill), which can be useful for the smaller molecules or to see the complete volume a molecule occupies. However, for large molecules with several hundreds or thousands of carbon atoms, it may be more clear if we hide individual atoms and bonds, and instead move to a view based on strands (Display > Strands) or the cartoon view (Display > Cartoon). In this screenshot, the strands view was colored by functional units (Colours > Chain) so we can distinguish the alpha and beta chains by color. We can also activate stereoscopic vision (Options > Stereo) to see a separate view for each eye if so desired.

VISIONNER DES MOLÉCULES

Il y a pas mal de programmes disponibles dans les dépôts Ubuntu pour visionner le fichier que nous venons de télécharger. L'un des plus anciens et des plus connus est Rasmol, qui maintenant bénéficie d'une interface GTK.

La fenêtre même est très simple : vous pouvez accéder à toutes les options en utilisant la barre de menu en haut. L'utilisateur peut faire tourner la structure avec la souris dans la fenêtre principale ; ainsi, les relations spéciales qui ne peuvent pas se voir sur une page imprimée deviennent beaucoup plus clairs.

Quand nous chargeons un fichier, il s'affiche par défaut dans une représentation ligne et forme, où les liaisons entre les atomes sont représentées par de courtes lignes, codées par couleur par typé d'atome (blanc pour le carbone, rouge pour l'oxygène, jaune pour le fer, etc. Les atomes d'hydrogène ne sont généralement pas montrés directement, bien que vous puissiez configurer cette option si vous voulez. Voici une molécule d'hémoglobine, avec ses quatre structures principales

4

JMol is a more recent offering. Written in Java, it is available for different platforms such as Windows and OS-X as well as GNU/Linux, and should be readily portable to others. It has similar options to Rasmol, though the interface is different. Some tools are available to slightly edit the molecule (add or delete atoms) and to connect to other programs. However, some of these are unfortunately no longer easily available on Ubuntu, such as the Povray raytracing environment. The default representation in JMol is sufficiently clear for easy viewing of biological models, and can be rotated using the mouse as before. This is JMol’s view of the hemoglobin model from the PDB file. Two complexes that imprison iron (Fe) atoms (in yellow) are quite visible in the lower part of the foreground: The newer PyMOL Molecular Graphics System is one of the more recent applications available. Written in the very same Python modern interpreted language that has often been seen in the pages of Full Circle, its presentation revolves around not one but two windows. One of these is used as a combination of log viewer and general input dialog, while the other holds the molecule view proper and its associated options. PyMOL has the richest collection of options of all the applications presented here - although the interface is perhaps not very intuitive. Just to set you on course, the “A” button is to add elements to the molecule, the “S” button serves to set (activate) view options, and the “H” button to unset (hide) features. The “C” button changes between colorizing schemes. There are also more options to see molecules in several types of stereo, and some options to build videos of the molecule, that I have not played with too much.

5

ROLL YOUR OWN Playing around with existing module files is interesting not only in themselves, but also as a way to appreciate the actual amount of useful information (accent placed on “useful”) found on the Internet. However, at some point we may wish to start drawing up our own molecules. A simple place to start is the molecule of propane-1,2,3-triol - perhaps better known as glycerol or glycerin. It can be found not only in soaps, but also in foodstuffs and even in electronic cigarettes. Basically, we have a chain of three carbon atoms (the propane skeleton), with a hydroxyl (-OH) group hanging off each carbon. There are in fact several applications in the Ubuntu repositories that draw planar representations of organic chemical molecules, and more can be found in various places on the Web. One of the easiest to use is Chemtool. Drawing tools in the upper toolbar allow us to place various chemical bonds at specific angles to each other, to form the carbon skeleton for our molecule. When done, a text tool can be used to add the functional groups at various places. Double and triple bonds are naturally also available. Once drawn, elements can be moved, erased, flipped horizontally and vertically, etc. The finished molecule can be exported in various flat graphical formats such as PNG, but also in Molfile format. This can then be read in by PyMol or any of the other viewers. In PyMol, the missing hydrogen atoms can easily be added to our geometry.

6

However, we can see that something weird has happened to the center carbon atom: the supplementary hydrogen needed to complete its bonds has somehow grown out at a strange angle. This is not what we expected, and may be attributed to the fact that Chemtool is basically a molecule sketching application for 2D. Its output is fine for publishing on paper, but lacks the information about depth needed to draw realistic 3D models of molecules. This is where another program, Avogadro, comes in handy. This is more of a molecule builder than just a sketching tool. In much the same way as Chemtool, Avogadro comes with an interface that allows us to build up the carbon skeleton of our molecule, and then add oxygen atoms where needed to form the additional functional groups. In this case, however, additional hydrogen atoms are adjusted dynamically during construction, helping us see exactly what we are constructing. Once the molecule has been built, we can choose Extensions > Optimize geometry and the program will calculate the most realistic positions for the atoms (using “realistic” in the sense of positions with the least potential energy). This can then be exported as a flat image file, or as a file in PDB format that can then be opened with PyMol. The end result is much more satisfactory, as we can observe that each of the three carbon atoms has a tetrahedral structure, not planar. This can be even better appreciated as we rotate the molecule with the mouse. The software reviewed in this article is from the apt packages named: rasmol, jmol, pymol, chemtool and avogadro. They can easily be found and added to an existing *buntu installation using the usual tools, such as apt-get, synaptic, Ubuntu Software Center, etc.

issue95/mon_histoire.1429107398.txt.gz · Dernière modification : 2015/04/15 16:16 de auntiee