issue106:monopinion
Différences
Ci-dessous, les différences entre deux révisions de la page.
Prochaine révision | Révision précédente | ||
issue106:monopinion [2016/02/28 16:10] – créée auntiee | issue106:monopinion [2016/03/04 17:53] (Version actuelle) – andre_domenech | ||
---|---|---|---|
Ligne 1: | Ligne 1: | ||
- | Big Data is a term that has morphed into a buzzword that is doing the rounds in today’s information technology circles. Managers speak of Big Data as a concept to somehow make use of the information that is collected by businesses from their customers. Computer system and software vendors see in it a commercial opportunity, | + | ===== 1 ===== |
+ | |||
+ | **Big Data is a term that has morphed into a buzzword that is doing the rounds in today’s information technology circles. Managers speak of Big Data as a concept to somehow make use of the information that is collected by businesses from their customers. Computer system and software vendors see in it a commercial opportunity, | ||
But what connections are there between the concept of Big Data and the Ubuntu distribution? | But what connections are there between the concept of Big Data and the Ubuntu distribution? | ||
- | Let us start out by trying to define the concept somehow. It should be stressed that this cannot be a very definite and precise definition, since geometry is often variable in computer science and the concept of Big Data is about at the limit of what is possible with today’s off-the-shelf hardware. Tomorrow’s technology may change the basic tenets. | + | Let us start out by trying to define the concept somehow. It should be stressed that this cannot be a very definite and precise definition, since geometry is often variable in computer science and the concept of Big Data is about at the limit of what is possible with today’s off-the-shelf hardware. Tomorrow’s technology may change the basic tenets.** |
+ | |||
+ | Le Big data et Ubuntu | ||
+ | |||
+ | Big data est une expression qui s'est métamorphosée en mot à la mode qui circule dans les sphères actuelles des technologies de l' | ||
+ | |||
+ | Mais quels sont les liens entre le concept du Big data et la distribution Ubuntu ? | ||
+ | |||
+ | Commençons par essayer de trouver une quelconque définition du concept. Je dois souligner que la définition ne peut être ni très concrète, ni très précise, puisque la géométrie de l' | ||
+ | |||
+ | ===== 2 ===== | ||
- | Naturally, Big Data is, in the first place, about large amounts of data, as the name itself implies. “Large” may have different meanings depending on context, so let us settle on the idea that “large” in “Big Data” is functionally equivalent to saying a data set “larger than what can be handled (stored or processed) in a reasonable amount of time on a single computer.” As you can see, there is some imprecision here, since the capacities in terms of processing power and disk space vary quite a bit between my laptop and a large mainframe computer. But it is clear from this definition that parallel processing is very much part of what Big Data is all about. | + | **Naturally, Big Data is, in the first place, about large amounts of data, as the name itself implies. “Large” may have different meanings depending on context, so let us settle on the idea that “large” in “Big Data” is functionally equivalent to saying a data set “larger than what can be handled (stored or processed) in a reasonable amount of time on a single computer.” As you can see, there is some imprecision here, since the capacities in terms of processing power and disk space vary quite a bit between my laptop and a large mainframe computer. But it is clear from this definition that parallel processing is very much part of what Big Data is all about. |
This being said, parallel processing can very well take place in separate threads inside a single computer, so caveats apply. | This being said, parallel processing can very well take place in separate threads inside a single computer, so caveats apply. | ||
- | Another way of clarifying the concept is by considering how Big Data is generated in the first place. In the initial stages of computer science, data was collected mostly as written documents (printed matter, forms), and then entered painstakingly by hand into computers. Large companies employed whole rooms of people whose task was to type data into perforated cards, paper tape and then magnetic media. | + | Another way of clarifying the concept is by considering how Big Data is generated in the first place. In the initial stages of computer science, data was collected mostly as written documents (printed matter, forms), and then entered painstakingly by hand into computers. Large companies employed whole rooms of people whose task was to type data into perforated cards, paper tape and then magnetic media. |
- | This is no longer the case. Nowadays, large amounts of commercial | + | En premier lieu, de par sa nature, le Big data concerne de très grandes quantités de données. Le terme « grand » peut signifier diverses choses, selon le contexte, et nous devons nous mettre d' |
- | Finally, large amounts of information are now available that have been generated not through an active participation by a human being, but simply through automatic methods. Some examples may help clarify this. Let us say Average Joe walks out of his house one fine morning, gets into his car and goes to the grocery two streets away for a loaf of bread. Naturally, he has his mobile phone in his pocket, so his local telephone utility company has automatic information on his whereabouts by tracking to which cell base-stations his phone connects. If he has left GPS connected on his phone, this may result in both a depleted battery life, and whatever app is running in the background and has been authorized to consult GPS data to track his physical movements. If his route has passed in front of a police automatic license plate reader, data has been generated on his car’s movements. Finally, if he has used a credit card to pay for his acquisition at the store, at least two different financial organizations (his bank, and the store’s bank), now have data on the whereabouts of his credit card, and its usage patterns. | + | Cela étant dit, le traitement en parallèle peut avoir lieu dans des fils séparés à l' |
- | It should be stressed that all of this data has been collected purely by automatic means, through the use of machines that are never turned off. Some of the information may be considered private (a private transaction between Average Joe and his baker), but much of it will in fact be considered public information in many jurisdictions. A street is, by nature, a public place, and nobody can have valid expectations of privacy concerning his movements when exposed to public view. | + | Une autre façon de clarifier le concept est d' |
+ | |||
+ | ===== 3 ===== | ||
+ | |||
+ | **This is no longer the case. Nowadays, large amounts of commercial data are actually entered by the user him- or herself. One of the effects of the growth of e-commerce, and its companions e-business and e-administration, | ||
+ | |||
+ | Finally, large amounts of information are now available that have been generated not through an active participation by a human being, but simply through automatic methods. Some examples may help clarify this. Let us say Average Joe walks out of his house one fine morning, gets into his car and goes to the grocery two streets away for a loaf of bread. Naturally, he has his mobile phone in his pocket, so his local telephone utility company has automatic information on his whereabouts by tracking to which cell base-stations his phone connects. If he has left GPS connected on his phone, this may result in both a depleted battery life, and whatever app is running in the background and has been authorized to consult GPS data to track his physical movements. If his route has passed in front of a police automatic license plate reader, data has been generated on his car’s movements. Finally, if he has used a credit card to pay for his acquisition at the store, at least two different financial organizations (his bank, and the store’s bank), now have data on the whereabouts of his credit card, and its usage patterns. ** | ||
+ | |||
+ | Ce n'est plus le cas. Aujourd' | ||
+ | |||
+ | Enfin, de grandes quantités d' | ||
+ | |||
+ | ===== 4 ===== | ||
+ | |||
+ | **It should be stressed that all of this data has been collected purely by automatic means, through the use of machines that are never turned off. Some of the information may be considered private (a private transaction between Average Joe and his baker), but much of it will in fact be considered public information in many jurisdictions. A street is, by nature, a public place, and nobody can have valid expectations of privacy concerning his movements when exposed to public view. | ||
Processing Big Data is also quite a handful. Although there is a wide variety of programming languages and data access APIs that can be used to “do” Big Data, the main paradigm is the MapReduce pattern introduced by Google, and nowadays often implemented by Hadoop. To produce an output from large amounts of data, two individual steps are taken - and both are parallelizable. In the first “Map” step, data goes through filtering and sorting. The output from this step is then piped through into a second “Reduce” step, where global outputs can be calculated. | Processing Big Data is also quite a handful. Although there is a wide variety of programming languages and data access APIs that can be used to “do” Big Data, the main paradigm is the MapReduce pattern introduced by Google, and nowadays often implemented by Hadoop. To produce an output from large amounts of data, two individual steps are taken - and both are parallelizable. In the first “Map” step, data goes through filtering and sorting. The output from this step is then piped through into a second “Reduce” step, where global outputs can be calculated. | ||
Ligne 23: | Ligne 49: | ||
SELECT SUM(salary) FROM employee WHERE division = “Logistics” | SELECT SUM(salary) FROM employee WHERE division = “Logistics” | ||
- | In classical computing, this would require all employee files to be held within a single relational database table, which is then read sequentially - perhaps using an index - to select registers in which field “division” equals “Logistics”. The contents of field “salary” in the chosen rows are then added up and returned. | + | In classical computing, this would require all employee files to be held within a single relational database table, which is then read sequentially - perhaps using an index - to select registers in which field “division” equals “Logistics”. The contents of field “salary” in the chosen rows are then added up and returned.** |
- | In Big Data, things are a bit different, since employee registers will probably be spread out over several different physical computers. Actually, this is seen as desirable, since each individual record can be replicated several times, thus giving us redundancy. This can be done automatically, | + | Il faut souligner que toutes ces données ne sont collectées que par des méthodes automatiques, |
+ | |||
+ | Le traitement du Big data nécessite aussi beaucoup d' | ||
+ | |||
+ | Par exemple, on pourrait examiner une requête classique SQL telle que l' | ||
+ | |||
+ | SELECT SUM(salary) FROM employee WHERE division = “Logistics” | ||
+ | |||
+ | Dans l' | ||
+ | |||
+ | ===== 5 ===== | ||
+ | |||
+ | **In Big Data, things are a bit different, since employee registers will probably be spread out over several different physical computers. Actually, this is seen as desirable, since each individual record can be replicated several times, thus giving us redundancy. This can be done automatically, | ||
In the “Map” step, each individual worker node will analyse the registers at its disposal, producing an intermediate output that consists of, for instance, the names and salaries of those employees that satisfy the criterion: division = “Logistics”. This data is then input into “Reduce”, | In the “Map” step, each individual worker node will analyse the registers at its disposal, producing an intermediate output that consists of, for instance, the names and salaries of those employees that satisfy the criterion: division = “Logistics”. This data is then input into “Reduce”, | ||
- | Two interesting considerations arise here. In the first place, each worker node that performs initial selection and sorting is handling large amounts of data. Thus, it makes sense to try and keep the distance between the worker and the data it needs to work on as small as possible. In an ideal situation, the data would reside physically on the same computer system as the worker itself. A second point is that intermediate results are a digested version of the original data; for this reason they will, in most cases, occupy much lower volumes than the original complete set. This will then take up less space and network bandwidth when it needs to be reshuffled and distributed to other workers in preparation for the “Reduce” phase. | + | Two interesting considerations arise here. In the first place, each worker node that performs initial selection and sorting is handling large amounts of data. Thus, it makes sense to try and keep the distance between the worker and the data it needs to work on as small as possible. In an ideal situation, the data would reside physically on the same computer system as the worker itself. A second point is that intermediate results are a digested version of the original data; for this reason they will, in most cases, occupy much lower volumes than the original complete set. This will then take up less space and network bandwidth when it needs to be reshuffled and distributed to other workers in preparation for the “Reduce” phase.** |
- | Now, let us consider the role our favourite operating system has to play, regarding both the storage and processing of Big Data, and its original collection. It will come as no surprise to the reader that most servers on the Internet run some distribution of the GNU/Linux operating system. This is pertinent to the handling of Big Data, since the very concept could be said to have been invented by Google - who is in a privileged position to collect large amounts of data on its users, and who has been a market leader in trying to make commercial use of this information. Google is also known to be a large GNU/Linux user, at various levels (its own servers, in software development, | + | Côté Big data, les choses sont un peu différentes, |
+ | |||
+ | À l' | ||
+ | |||
+ | Deux considérations intéressantes se présentent ici. D' | ||
+ | |||
+ | ===== 6 ===== | ||
+ | |||
+ | **Now, let us consider the role our favourite operating system has to play, regarding both the storage and processing of Big Data, and its original collection. It will come as no surprise to the reader that most servers on the Internet run some distribution of the GNU/Linux operating system. This is pertinent to the handling of Big Data, since the very concept could be said to have been invented by Google - who is in a privileged position to collect large amounts of data on its users, and who has been a market leader in trying to make commercial use of this information. Google is also known to be a large GNU/Linux user, at various levels (its own servers, in software development, | ||
There are also quite a few large mainframes in service, running various operating systems. However, in a tendency started many years back, mainframe resources are often parcelled out into various virtual machines. System builders such as IBM are major players in this field, for example with the zSeries and “Linux on a z”: many instances of virtual machines running Linux coexist within the mainframe’s process and memory space. Others such as Amazon run large clouds of virtual machines, though on smaller servers with Intel x86_64 processors. | There are also quite a few large mainframes in service, running various operating systems. However, in a tendency started many years back, mainframe resources are often parcelled out into various virtual machines. System builders such as IBM are major players in this field, for example with the zSeries and “Linux on a z”: many instances of virtual machines running Linux coexist within the mainframe’s process and memory space. Others such as Amazon run large clouds of virtual machines, though on smaller servers with Intel x86_64 processors. | ||
- | This is where Ubuntu comes in. There are several choices of Linux distributions for a virtual machine, but in actual practice a choice is often made either from the RedHat subset of distributions (Red Hat Enterprise Linux with paid subscription, | + | This is where Ubuntu comes in. There are several choices of Linux distributions for a virtual machine, but in actual practice a choice is often made either from the RedHat subset of distributions (Red Hat Enterprise Linux with paid subscription, |
- | The insistence on the availability of paid support options may seem strange to some users. However, it should be taken into account that operating systems used as servers are in a commercial environment. Information systems are mission-critical to businesses’ workflow. Computer department heads are under pressure to deliver, and make sure they can continue to deliver in a timely manner. With these considerations taken into account, it makes sense to pay for quality service to make sure that if and when problems occur, they can be dealt with using not only the company’s own resources, but also high-quality external expertise. | + | Examinons maintenant le rôle joué par notre système d' |
+ | |||
+ | Beaucoup de grandes unités centrales sont également en fonction, faisant tourner divers systèmes d' | ||
+ | |||
+ | C'est ici qu' | ||
+ | |||
+ | ===== 7 ===== | ||
+ | |||
+ | **The insistence on the availability of paid support options may seem strange to some users. However, it should be taken into account that operating systems used as servers are in a commercial environment. Information systems are mission-critical to businesses’ workflow. Computer department heads are under pressure to deliver, and make sure they can continue to deliver in a timely manner. With these considerations taken into account, it makes sense to pay for quality service to make sure that if and when problems occur, they can be dealt with using not only the company’s own resources, but also high-quality external expertise. | ||
This explains why most large Linux distributions propose specific solutions to set up and configure virtual computers in the cloud, often prominently displayed in their web pages. CentOS proposes “a generic cloud-init enabled image” within their first paragraph https:// | This explains why most large Linux distributions propose specific solutions to set up and configure virtual computers in the cloud, often prominently displayed in their web pages. CentOS proposes “a generic cloud-init enabled image” within their first paragraph https:// | ||
- | The virtual (cloud) server technology is often applied to Big Data processing. In the first place, as has already been pointed out, it makes sense to have the workers placed near the data they will be working on, thus reducing network overhead. But most data processed by large organizations is already in the cloud, having been collected through e-commerce servers that are already virtual machines. When the collection point, storage and processing takes place within the same physical facility, data transmission costs are null to negligible, and transfers can take advantage of the server farm’s LAN infrastructure for speed. | + | The virtual (cloud) server technology is often applied to Big Data processing. In the first place, as has already been pointed out, it makes sense to have the workers placed near the data they will be working on, thus reducing network overhead. But most data processed by large organizations is already in the cloud, having been collected through e-commerce servers that are already virtual machines. When the collection point, storage and processing takes place within the same physical facility, data transmission costs are null to negligible, and transfers can take advantage of the server farm’s LAN infrastructure for speed.** |
+ | |||
+ | L' | ||
+ | |||
+ | Cela explique pourquoi la plupart des grandes distributions Linux proposent des solutions précises pour créer et configurer des ordinateurs virtuels dans le nuage, et ces solutions sont affichées de façon bien visible sur leurs pages Web. Sur leur site, CentOS propose « une image générique d' | ||
+ | |||
+ | La technologie du serveur virtuel (dans le nuage) s' | ||
+ | |||
+ | |||
+ | ===== 8 ===== | ||
- | In the second place, using virtualization as a basis for data processing means organizations (which need to process large amounts of data) no longer need to acquire and maintain large server farms. The infrastructure costs are externalized to cloud computing providers such as Amazon, and leased only as needed. This introduces more flexibility, | + | **In the second place, using virtualization as a basis for data processing means organizations (which need to process large amounts of data) no longer need to acquire and maintain large server farms. The infrastructure costs are externalized to cloud computing providers such as Amazon, and leased only as needed. This introduces more flexibility, |
Big Data processing today seems to be firmly in the domain of Linux-based virtual machines in the cloud, with Ubuntu as at least one of the main players in the field. But what about data collection in the first place? | Big Data processing today seems to be firmly in the domain of Linux-based virtual machines in the cloud, with Ubuntu as at least one of the main players in the field. But what about data collection in the first place? | ||
- | Please take a moment to navigate to the Ubuntu project’s homepage, http:// | + | Please take a moment to navigate to the Ubuntu project’s homepage, http:// |
- | Once these are used to make Average Joe’s consumer electronics more smart - televisions, | + | En deuxième lieu, utiliser la virtualisation comme base pour le traitement des données veut dire que les organisations, |
+ | |||
+ | Aujourd' | ||
+ | |||
+ | Veuillez prendre une minute pour naviguer jusqu' | ||
+ | |||
+ | |||
+ | ===== 9 ===== | ||
+ | |||
+ | **Once these are used to make Average Joe’s consumer electronics more smart - televisions, | ||
Average Joe may even benefit from the innovation. Obtaining real-time access to traffic conditions while driving may be seen as a useful bit of progress for the large part of humanity that lives in congested urban areas. Being able to monitor and fine-tune home central heating at a distance, thus reducing heating bills and carbon emissions cannot be seen as a bad proposition. | Average Joe may even benefit from the innovation. Obtaining real-time access to traffic conditions while driving may be seen as a useful bit of progress for the large part of humanity that lives in congested urban areas. Being able to monitor and fine-tune home central heating at a distance, thus reducing heating bills and carbon emissions cannot be seen as a bad proposition. | ||
Ligne 59: | Ligne 131: | ||
What about individual freedom, including the freedom not to be tracked not only in the digital world, but also in our real-life physical existence? When will the community - and Canonical itself - stand up and clearly state its position on the matter? | What about individual freedom, including the freedom not to be tracked not only in the digital world, but also in our real-life physical existence? When will the community - and Canonical itself - stand up and clearly state its position on the matter? | ||
- | Just saying… | + | Just saying… |
+ | |||
+ | Une fois que ceux-ci seront utilisés pour rendre l' | ||
+ | |||
+ | Il se peut que Monsieur Tout-le-monde bénéficie de l' | ||
+ | |||
+ | Ubuntu Core se prête parfaitement à ce type d' | ||
+ | |||
+ | Il y a juste un doute qui me taraude l' | ||
+ | |||
+ | Quid de la liberté individuelle, | ||
+ | |||
+ | Mon grain de sel... |
issue106/monopinion.1456672215.txt.gz · Dernière modification : 2016/02/28 16:10 de auntiee