Data Science: Kelleher & Tierney - Review

Paul Clough, Data Scientist at Peak Indicators, reviews a new data science book by John Kelleher and Brendan Tierney as part of the MIT Press Essential Knowledge Series, an ideal primer for anyone wanting to find out what data science is all about.

I continually review books and articles around the topic of data science to help keep abreast of the field and discover new training materials.

In 2018 a new book by John Kelleher and Brendan Tierney from the Dublin Institute of Technology entitled “Data Science” was released as part of the MIT Press Essential Knowledge Series. The book describes itself as “A concise introduction to the emerging field of data science, explaining its evolution, relation to machine learning, current uses, data infrastructure issues, and ethical challenges” and really does manage to cover many of the concepts associated with the emerging field of data science. 

Data science is often defined as a process or set of principles that guides the extraction of knowledge from data, e.g. to assist with decision making or the understanding of (often large) volumes of data.

This is similar to the goal of existing disciplines, such as statistical analysis and data mining; however, the differences arise with the scale and diversity of data being analysed, as well as bringing together a number of existing disciplines that have not necessarily been combined before.

Data Management & Data Governance

This could also include areas such as data visualisation, data management and governance, data and societal issues (e.g., privacy and ethics) and big data infrastructures and analytics.

Kelleher & Tierney’s book comprises 7 chapters that include introducing and defining data and data science, the data science lifecycle and typical ecosystem, summaries of the basic machine learning algorithms; such as regression, k nearest neighbours and neural networks; which drive areas including predictive analytics, anomaly detection and market segmentation.

The methods are presented in a very readable form with clear examples from business and society. The book is also exemplary in providing a chapter on privacy and ethics. These topics form an important and increasingly vital consideration when utilising and deploying data science in practice.

“Every successful data science project begins by clearly defining the problem that the project will help solve.”

An Excellent Overview Of A Complex Field

Finally, the book concludes with some examples of future trends (e.g. medical data science and smart cities) and reflections on the principles of success, such as obtaining buy-in from senior management for data science projects and the need to continually update and maintain machine learning models.

Overall, the book covers a lot of ground in a very readable manner, providing an excellent overview of a complex and emerging field. What’s more the book is small (although packs a lot of content) and reasonably priced (around £11.95), making it an affordable and accessible way into learning about data science. I highly recommend it!

If you found this interesting Peak Indicators will be exhibiting at the forthcoming HRD Summit and The Gartner Data & Analytics Conferences this year with a focus on how companies can start to capitalize on the benefits of machine learning quickly by adopting their Tallinn toolkit and methodology without the need of an army of data scientists. 

If you can't make the HRD Summit and The Gartner Data & Analytics Conferences this year contact for a consultation and discover the benefits for your company! 

Leave a comment