Exploring Data in Engineering, the Sciences, and Medicine
Two recent and ongoing developments have greatly increased both the range of opportunities for exploratory data analysis and the variety of tools to support this type of analysis. First has been the dramatic rise in the number of publicly available datasets available free from the Internet and second has been the similarly dramatic evolution of the Open Source software movement, making powerful analysis packages like R also freely available. The objective of this book is to provide a reasonably thorough introduction to a useful subset of these analysis tools, illustrating what they are, what they do, and when and how they sometimes fail or do something very different than we expect them to. Specific topics covered include descriptive characterizations like summary statistics (mean, median, standard deviation, MAD scale estimate, etc.), graphical techniques like boxplots and nonparametric density estimates, various forms of regression modeling (standard linear regression models, logistic regression, and highly robust techniques like least trimmed squares), and the recognition and treatment of important data anomalies like outliers and missing data.
In addition, the book also introduces a variety of dynamic data analysis tools, including autocorrelation analysis, parametric and nonparametric spectrum estimation, and the use of nonlinear data cleaning filters to improve dynamic characterization results. The book assumes familiarity with calculus and linear algebra, but does not assume any prior exposure to probability or statistics. Both simulation-based and real data examples are included and the book is intended either as an introductory textbook for an exploratory data analysis course like ones the author taught at the ETH where some of this material was used, or for self-study. Exercises are included at the end of each chapter and both R code and datasets are available through the associated OUP website.