Full Circle Magazine FR

Ceci est une ancienne révision du document !

When I see the phrase ‘Cookbook’ in a title, I’m immediately attracted to it, and, once I thumb through the book, I’m more times than not disappointed. The reason for this is that the recipes presented are usually either so basic or so obscure that I would never use them. So when I volunteered to review this book, I was expecting to experience this once again. However, once I got into the book, I was very pleasantly surprised.

As promised, this book provides source code examples in R and Python. The R projects are limited to chapters 2 through 5, but give enough information to whet the appetite of anyone interested in data analysis. Chapters 6 through 11 are focused on Python solutions and I must say, the code is very clean and the presentation is very good.

While the subjects of some of the chapters aren’t really my cup of tea (Recommending Movies or Harvesting and geolocating twitter data), the authors presented the information in such a way that the examples could be extrapolated to cover many forms of data, not just movies or twitter.

Chapter 1 is dedicated to preparing the data evaluation environment on your computer for both R and Python. It is done in a very clear and easy-to-follow manner – without spurious packages that tend to obfuscate not only the intent of the project, but also make the reasoning behind the need for those packages questionable. Their choice of the free Anaconda Python distribution actually flies in the face of my above statement; however it is the correct tool (in my humble opinion) for the data analysis that is to follow, and will follow if you are going to continue in a serious data analysis role. In the same vein, the section on setting up a R environment is very straightforward and allows the reader to choose the best tool for the particular job. Enough information is given about the usage of R vs Python for even the greenest programmer to make a reasonable decision of which one to use.

The four authors, Tony Ojeda, Sean Patrick Murphy, Benjamin Bengtort and Abhijit Dasgupta all have extremely impressive credits and have done a tremendous job on this book. Their roles in the ‘real world’ include work at Johns Hopkins University to Masters Degrees and PhDs. I doubt anyone could have come up with a more impressive group to discuss this very complex subject.

The bottom line here is that if you are looking for a book to learn about data analysis and get snippets to help you along, then this is the book for you. You will want to pay close attention to Chapter One when setting up your analysis workstation, since the reasoning behind the packages used is clearly explained and the examples are well done. I would suggest that you install both R and Python as described in the book, since not all jobs are best handled by only one package.