Full description not available
A**N
Great overview of what Big Data is doing today and what can be done in the future
Big Data is a topic that is all the rage but at the same time isnt well defined. Authors Viktor Mayer-Schönberger and Kenneth Cukier give an overview of what is being done with the massive amount of data that is being generated from online interaction coupled with advances in practical statistics on the analysis of this data. The authors go through examples of how big data is being used today to give a flavour of it and then follow up the rest of the book with what is going on in the field, how it is useful, where aspects of it are going and some of the concerns we should have about our privacy.The authors start by discussing how Google using its analysis of people's queries is more predictive about flu epidemics than medical experts have been. The human genome can be codified in a fraction of the time that was required when it was being decoded for the first time. They discuss how big data has enabled entrepreneurs to inform customers about the optimal time to buy flight tickets given that airlines vary their prices according to hidden methods that big data statistics has helped to make more sense of. The examples are a good starting point to start the discussion with the reader. The authors start by discussing how we have always been trying to come up with data about our populations, desires to do census analysis has been with us for a long time. We made progress through sampling techniques and statistics helped to enable data gathering about the population at large using smaller and less time consuming samples. The authors discuss how big data is messy, it is imprecise and is helpful for overviews but not for model building with respect to figuring out the mechanics of what is being observed. When you try to get all of the data about something there will inevitably be noise and looking for correlations can sometimes be the most fruitful way to use the data to figure out empirical relationships rather than search for underlying dynamics. The authors discuss datification which means the consolidation of data into a larger database that can then be used to give much more useful guidance to the population at large about phenomenon that required a look from above at all the data together. Matthew Maury is used to reinforce the usefulness of this approach, he was a naval officer who aggregated ships logs to help inform ship captains about most useful routes and more efficient transiting. The authors move on to the more concrete and start to discuss the value of big data. They give the obvious background on the value of traditional data and then give food for thought on how having data for everything can lead to new ideas and utility that was unimaginable in the past. Big data analytics will be required for document translation, smart device coordination, smart cities and social network analysis. The value in big data is of course, the data, but the utility of that data might be further midstream or downstream that others are better placed to harvest. The authors move on to discuss the data value chain and how to think about it. The authors discuss the implication of the big data revolution and how it is enabling consumers to get the best deals and how statisticians are a highly desirable skill set. The authors move on to the risks of big data which are numerous of course. Much discussed are the privacy of the data that is generated. The ownership of that data and the licensing of it are topics which will continue to surface and the legal framework to analyze disputes will need to be further developed. Misunderstanding correlation and causation will also be a risk in big data analytics and hypotheticals like the government quarantining those who search for flu on google are used as hyperbolized examples. The authors finally leave the reader with a view on the future. They use an example of how big data statistics was used to substantially improve the ability to find overcrowded illegal slum housing as a concrete example of how we can use data to enhance our cities and improve governance and efficiency.Big data is a subject which continues to step into more and more categories as our ability to measure continues to improve. How big data can be used will be a continued subject that both academics and practitioners will continue to be thought about and experimented on. It will give rise to a new consumer culture and potentially to new ways of organizing people and infrastructure. Big Data is an excellent readable overview of how data has always been used to guide policy, how big data is being used today, what the value chain of the data industry looks like, what the risks are of big data and how big data can enhance the future. Its easy to read and illuminating.
M**H
Interesting and Engaging, but flawed by repetition of unsupported assertions & wacky theories; lacks any "how-to" guidance
"Big Data: A Revolution..." was often engaging and included some interesting examples, but it was a disappointment. As others mention, the authors use repetition instead of evidence or proof, and ultimately I was not convinced by many of their claims.I encountered two huge issues in the text. First, the authors repeatedly argue that it's OK if Big Data contains "messy" data, because they assert that when "n=all" then the statistical rules about sampling don't apply. This argument fails two ways: first, if n=all but if the data contains "messy" (erroneous) data points in critical places, then it will be misleading and perhaps even completely wrong. Second, when using past data where "n=all" to project future events, then it's no longer true that "n=all." Instead, we have data for "n=all(where(time=past))" and we're using that data to try to predict events in a completely separate data set ("time=future"), and it's entirely possible that there are critical differences demarcated by "time=now."The second huge issue, for me, was the authors' focus on the concept that Big Data brings with it a huge risk that we will use data to predict future behavior -- and that we will then use those predictions to punish people for acts they have not committed (e.g., the "Minority Report" problem). They distort this argument in two ways: first, by assuming that society would actually do this, and second, by asserting that any action taken based on these predictions (such as increasing scrutiny or assigning social workers to visit at-risk juveniles) is "punishment."I was also skeptical of the authors' general reverence of, and deference to, data scientists as professionals and experts. The author believe that it's plausible to expect a new profession of internal and external "algorithmists" to arise, to protect consumers' privacy interests and society's interests against the potential abuses by Big Data users.The book also failed to provide real-world "how-to" examples, instead providing only "end result" examples and conclusions that often seem incomplete and sometimes implausible. Their many useful examples of useful information extracted from Big Data all doubtless represent the end-point of many, many explorations of Big Data; they probably also represent a subset of correlations derived, after many misleading correlations were removed.Finally, note that the book's lengthy end notes, bibliography, and index represent a full one-third of the book's length.There's a lot of useful information in this book, especially for someone just trying to learn about the concept of Big Data. But there's also a lot of hype, and a lot of repetition of ideas without meaningful factual support.
Trustpilot
1 week ago
3 weeks ago