Introduction to Data Mining
D**N
Good overview, but needs to include real-world case studies
Data mining could be considered to be "Artificial Intelligence Lite", since it deals with many of the same issues in learning, classification, and analysis as they occur in the field of artificial intelligence but does not have as its goal the construction of "thinking machines." Instead, the emphasis is on practical problems that are important in business and industry, even though the solutions of many of these problems makes use of techniques that a thinking machine should be expected to have. Data mining has become an enormous industry, and has even been the subject of political and legal concerns due to the efforts of some governments to mine data on its citizens. This book gives a general overview of data mining with emphasis on classification and associative analysis. Anyone who is interested in data mining could read the book, but some rather sophisticated background in mathematics will be needed to read some of the sections. Pseudocode is given throughout the book to illustrate the different data mining algorithms. There are also exercises at the end of each chapter, but noticeably missing in the book is the inclusion of real case studies in data mining. The inclusion of these case studies would alert the reader to the fact that data mining is of great interest from the standpoint of business and industry, and would lessen the belief that data mining is just another academic field or just another branch of statistics.Speaking somewhat loosely, the goal of data mining is to find interesting patterns in massive amounts of data or the classification of such patterns. This entails of course that one have a notion of what is "interesting" and one of the main problems in data mining is to find suitable `interestingness measures'. And since one is typically dealing with large amounts of data, one must use various statistical sampling and preprocessing techniques to massage the data and obtain a `representative' sample of the original data. In addition, one must be able to handle data that is `anomalous', i.e. data that has characteristics that are markedly different from most of the other data, or that has attributes that are unusual if compared with typical values for those attributes. These issues and techniques are discussed in detail in the first three chapters of the book, where the authors outline some of the bread-and-butter topics needed for effective manipulation of data.The real substance and power of data mining comes from its role in classification and for discovering interesting patterns in huge data sets. The authors, in chapters 4 - 7, discuss various powerful techniques for data classification and association analysis. Association analysis in particular has been used quite extensively in recent years, due to the use of market basket transactions in on-line purchasing and the goal of marketers to learn the purchasing behavior of their customers. Association analysis uncovers relationships in the marketing data in the form of `association rules'. For disjoint itemsets X and Y, an association rule is a logical implication expression between these itemsets that has a certain `strength' that is measured by its `support' and `confidence.' The support measures how often a rule is applicable to a given data set, while the confidence measures how frequently the items in Y appear in X. The support reflects the ability of the rule to be not due to chance alone, while the confidence measures the reliability of the rule inference. The collection of all association rules that can be formed from a data set is too large to be practical and so strategies must be developed to prune the number of rules. The authors discuss in detail various methods for dealing with this computational drawback, such as `frequent itemset generation' and `rule generation.'The detection of anomalies consists of the identification of `outliers', which as the name implies are data objects that lie "far away" from the other data objects. It remains of course to quantity what it means to be "far away" and for this reason this branch of data mining, as the author points out, is sometimes called `deviation detection' or `exception mining'. The omission of outliers is sometimes justified, since they are merely artifacts that only serve to alter the statistics of a particular data set. However, sometimes their presence signals important information, if not a major scientific discovery. Data mining therefore must contain tools that detect anomalies intelligently and efficiently. The authors discuss anomaly detection in fair detail, emphasizing the statistical techniques that are available to do it. They classify the techniques for anomaly detection as being `unsupervised', `supervised', and `semi-supervised'. As the name implies, supervised anomaly detection requires the existence of a training set with both anomalous and "normal" data with each class being labeled as such. When these labels are unavailable, one has to perform unsupervised anomaly detection, and for this approach to work the anomalies must be distinct from one another. If the normal data is labeled but the anomalies are not, one must do semi-supervised anomaly detection. The only weakness in the authors' discussion is that they do not include real-world case studies that illustrate the different techniques, such as clustering and density methods.
L**Z
Book Pages are YELLOW
I mean at least I didn’t have to pay $170 so I guess you get what you pay for 😂
S**T
A Reasonable Academic Approach to DM
We used this book in a class which was my first academic introduction to data mining.The book's strengths are that it does a good job covering the field as it was around the 2008-2009 timeframe. Included are discussions of exploring data, classification, clustering, association analysis, cluster analysis, and anomaly detection. Additional bonus appendices cover some elements of linear algebra, dimensionality reduction, probability and statistics, regression analysis, and optimization, in case those concepts are fuzzy for the student. They're by no means thorough enough to learn the topic, merely to remind the reader of salient points they should remember.I liked the structure of the book, with each analysis topic being divided into a basic concepts and algorithms chapter, followed by an additional issues and algorithms chapter.I liked that when algorithms were presented, they were presented as pseudocode rather than in any particular language.What I did not like is that separating the concepts from their applications created a bit too much distance for those wanting to apply these concepts. In our class, we were using a tool called Weka, which provides reference implementations of various data mining algorithms in Java, and sometimes it was difficult to tell what we should learn from the results of our experiments. The book did not discuss this very deeply, and certainly not against the types of results that we were getting from our application.During the course, because I knew we would be relying on Weka, I purchased a copy of ISBN-10: 0123748569 http://www.amazon.com/Data-Mining-Practical-Techniques-Management/dp/0123748569/ref=pd_bxgy_b_text_b, which was written by the group that maintains Weka. I found their book to be helpful while I ran the Weka tool, and I was able to use it to develop command line use of the tool and solve some memory management problems. This book also covers much the same ground, although from a bit more practical perspective.Later, because I'm interested in data mining in a large database environment, I purchased ISBN-10: 0123814790 http://www.amazon.com/Data-Mining-Concepts-Techniques-Management/dp/0123814790/ref=pd_bxgy_b_text_c, which is much more focused on the "how" of data mining, to include describing the use of data cubes and the necessities of processing it using data mining algorithms.I cannot complain about Tan's book, just that I wished it had slightly more thorough explanations of what one should learn as data mining is certainly an iterative process. If you're interested in Weka, I recommend the Witten book, and if you're new to data modeling as well, I recommend the Han book.
C**Z
Look elsewhere, this book is simply too old.
So I've only read the first chapter, and I have to say, so far, I am not impressed. As others have said, the quality of the book itself is cheap; extremely thin pages, poor printing, and no color. The color is especially disappointing as there are many graphics that would have benefitted from color. I purchased a used, hard cover copy. I did receive it in time, and it is in near new condition...well, as new as it can be for such an old book. Ultimately though, what's so unbelievable to me, is the fact that this book is15 years old! Surely data mining has evolved since the writing of this book. Unfortunately, it's a textbook for a course I'm enrolled in. Bad on my professor for selecting this, but that's on her and/or possibly the school too. This is a graduate level course and I'm having difficulties understanding why this book. I definitely plan on bringing this up to my professor. If I were buying a book to explore this subject, I would not be buying this one. It's simply too old. Technology just changes too fast, and for a 15 year old book, I can ONLY see how it can cover nothing but a rudimentary introduction of the subject. This does not require a whole book to do so, I'm sure the majority of it's contents is so dated, that it's no longer applicable. Find something else.
B**F
A good overview
A good general summary of various areas of data mining, with a few sections that give the opportunity to go more in depth. It gives more basics than specifics, so while it may be something a person would keep around to reference, there may be more comprehensive guides people would go with.It was an enjoyable read; neither overly dense or too simplified. And it comes in paperback so weaklings like me don't strain to lift it
M**Y
Good textbook for data mining!
It's a great book for data mining - lots of pretty examples covering clustering, classification, and some other concepts too. There are some exercises after each chapter, but the answers are not included within this book. My course didn't go into too much detail about the topics covered in this book, but it was still a nice alternative to reading my university's lecture slides (which tend to lack some information). You will need to find a free PDF document online somewhere with the answers in it (not hard, just google it).
E**D
Book's content great, Structure just awful!
Great content, really! But the organization of information is the worst I've ever seen in a book. Data mining is a lot about structuring data before you process it. The authors miss this point in writing a book: There is only one page table of contents for ~713 pages of complex knowledge. There are no pages given when referring to other sections of the book. The funniest part is the index: It is made automatically by some stupid algorithm and the reader has to bet which page of the often ~30 given different sections per keyword given is the right one :D
L**L
Five Stars
Good price and prompt delivery
P**N
This is the best book for machine learning than data mining
Received well packed, professional and as mentioned.. This is the best book for machine learning than data mining..
A**I
Solid on mathematical concepts - But not for beginners
Tan, Steinbach and Kumar have authored a very good book on the elements of data mining (data science). If you have a degree in mathematics and comfortable with computational aspects with a curious mind for data mining, then this book is for you! The authors take a deep dive and seamlessly merge the concepts from linear algebra, calculus and matrix operations with computational aspects such as databases, data handling, and logical processing to present a very comprehensive foundation of data science methods. You’ll be particularly impressed with the depth of coverage with respect to SVMs, non-classical methods such as ensemble techniques and addressing the class imbalance problems. The authors go into details on (seldom used) sequence analysis, infrequent pattern analysis, BIRCH and OPOSSUM variations in clustering and subgraph analysis methods. Of particular interest is the chapter on outlier analysis (anomaly detection) which is handled very well. Some readers will miss the treatment of genetic algorithms which is kind of mentioned in passing.However, a glaring omission is the treatment of regression concepts; perhaps because the authors feel that regression concepts have been (and are) widely covered in other books that they’ve focused on the more esoteric aspects. But to truly appreciate this book one is expected to have a strong foundation, and a high level of comfort, in advanced mathematics. Overall it’s a very good book.
Trustpilot
2 months ago
5 days ago