Data Analysis with Open Source Tools: A Hands-On Guide for Programmers and Data Scientists
J**W
Excellent Book, For What It Is
I'm a Python software developer with an interest in applied statistics. This is an excellent book on data analysis, but for review purposes, it's worth initially pointing out what this book is not.It is not a comprehensive survey of open source tools that are available, and it does not contain many examples of working code to implement the techniques he talks about, though there are some. For this reason, I'd strike the "with Open Source Tools" from the title in evaluating whether you want to purchase the book.The author greatly favors mathematical notation over code examples in describing the data analysis techniques he presents. While this is not a bad thing per se, you'll have to struggle to comprehend the content if you're a programmer without an academic familiarity with math, or if you've been away from mathematics for a long time.As other reviewers have pointed out, the organization of the content is somewhat disjointed. Going from chapter to chapter, there is little in the way of causality, and the early chapters are pretty math-heavy. The reader is advised to consult appendices at the back of the book to refresh themselves on the basics, if required.Wait! I didn't say you shouldn't buy it.Despite a few shortcomings, this book does offer a good introduction and overview of several basic techniques. It's an excellent survey of the current data analysis landscape for anyone who's not familiar with it. If a topic seems irrelevant to you, it's pretty easy to skip that chapter and move forward.On top of that, the author's writing style and ways of explaining relatively esoteric concepts is generally very good. As with many good books, you get the sense the author is a co-worker, trying to explain something to you in terms you can understand. It's very example-based, even if those examples don't always involve code.All in all, to get the most out of this book, the best approach is careful and methodical study. The author covers many topics quickly, and not any one in depth, so if one chapter interests you, I'd plan on consulting other resources on particular topics. Luckily, the author does offer several "Further Reading" recommendations for each topic.Most books containing information on these techniques are far harder to read, and they generally cost at least twice as much. Highly recommended. Thanks for this one, Philipp.
J**S
Stunning! And unexpected
I bought this book hoping for a reference on open source tools. But the open source tools are a minor aspect of this book. The core is about data analysis--and it is fantastic. I should have known this from the title I suppose: the "data analysis" is in big font with a colorful background, and "with open source tools" is in small font--and it is literally about the same ratio with the book. Each chapter has small section that works one example with an open source tool. And there is a chapter at the end about the array of open source tools available.But the data analysis aspects of the book--most outstanding. I have a master's in computer science, and do data and analytics for a living, so I have many books on the topic. Some books with more of a theoretical and rigorous foundation, some with more of a hands-on slant. I was expecting this book to be the latter, but it is quite the former.Yet it is still very practical. It is not a "theory" work as such, just a rigorous book useful in practice (there is a big difference!). Throughout the book the author points out the value of solving the problem at hand, rather than being excessively precise--which is the bigger risk in this domain. Examples would be: using visuals to get a feel for data but not trying to use visuals to give precise answers (which they fundementally cannot), and using techniques that get "close enough" such as perturbation.And it is extremely well written. The writing is in reasonably simple English, relative to the topic, yet not insulting or goofy the way the "Dummies" series can be for example. It is easy to read yet content rich--a fantastic combination.
A**.
It has its flaws, but on general a great overview
I've read some of the other reviews, and I do agree with most of the criticisms. There are quite a few errors in formulas and in the text, and it would've been really nice if the source codes and data files were given in a CD or were available on a website.That being said, the book addresses a lot of different topics - ranging from the introductory, freshman-level statistics to more advanced data mining and machine learning techniques, and passing through notions of design. It doesn't go in depth into each of them, but offers a fairly good overview, and references in case you're interested. Furthermore, the author gives some useful hints on how to do outside-the-box thinking and how to apply these techniques into business.Being a physics grad student, I've found many of the topics pretty much basic, but even so, I've learned a lot. Overall, a great introduction; I really hope the flaws are corrected on a future 2nd edition.
B**K
The book provides very good math and stats foundation without diving into the code
The book provides very good math and stats foundation without diving into the code. I liked the methodology behind the way the book is structured.
V**O
Ottimo
Un libro da avere per chi (come me ) si occupa di data analysis con strumenti open source ( spesso sviluppati per l'occasione).Un testo pregno di teoria ma con un occhio sempre orientato all'applicazione concreta della stessa.Davvero ben scritto,ben organizzato e spesso utile!Consigliatissimo!
D**R
Extraordinary
This is the book you want, if you try to get quickly into scientific programming and visualization with Python and R! I strongly reccommend this book!
T**M
Mixed opinion
I have to agree with a lot of the US reviews. I am missing a focus in the book.The author wants to make a point how important it is to understand the math behind real world problems, but I was disappointed by his attempts to convey mathematical principles. Formulas may work for some people, to me the book failed to point out why they are necessary - or how i can add value with them in the analyses i do. In this regards, the author overpays his dues to his academic background. I can see how the author studied physics and addresses people with like-wise framed minds. But for these people, the book will be too trivial. The major disappointment for me was that the book failed to live up to its expectations regarding the subtitle "with open Source tools". I would have expected a range of cool tools to work with, instead it's GNU and R, and there is not a single end-to-end case of getting the data, figuring out the issue and then presenting it in a graph. Sometimes, the style is too conversational, sometimes it is too strict and abstract. There are few moments when the two extremes touch. Other parts of the book - were the author shares his academic insights - felt awkward. The statement "You will never understand what mathematics is if you see it only as something you use to obtan certain results" will definitely find its way in my "Dictionary of Received Ideas".Still after all this negative criticism, I am giving it an average 4 stars. Why? There were some conversational parts that are helpful. This happens especially when the author highlights pitfalls and real-world application on distribution laws and showing/interpreting graphical analysis (although he doesn't point out how it's done). I can put these ideas to use, and they are valuable, because they show the true expertise of the author and can serve as a guideline for people learning to get familiar with advanced statistical analysis. And I want to give credit to the broad scope of the book. I prefer this to textbooks that focus on one aspect only. Although the book is often too abstract, I appreciate the approach to cover many topics in 10-20 page essays.
F**S
A gold mine
Every person involved in any computational scienceshould have read this book and always keep itat arm's reach.
Trustpilot
2 months ago
1 month ago