Book Review: Pentaho for Big Data Analytics (November 2013)


Book review by: David Fombella Pombal (twitter: @pentaho_fan)

Book Title: Pentaho for BIg Data Analytics

Authors: Manoj R Patil, Feris Thia

Paperback: 118 pages

I would like to suggest this book if you want to get started with Pentaho Open Source BI tool together with Hadoop and Big Data.

Target Audience
If you are  a Data Scientist, a Hadoop programmer, a Big Data enthusiast, or a developer working in the Business Intelligence domain who is aware of Hadoop or the Pentaho tools and want to try out creating a solution in the Big Data space, this is your manual.

Rating: 7 out of 10

Chapter 1, The Rise of  Pentaho Analytics along with Big Data

This chapter serves as a brief summary of the Pentaho tools and its history around Business Intelligence field, weaving in stories on the rise of Big Data.

Pentaho Tools:

Server Applications

  • Business Analytics (BA) Server: Java-based BI system with a report management system and lightweight process-flow engine, HTML5-based web interface. In Community Edition , there is another substitute application called Business Intelligence (BI) Server


  • Data Integration (DI) Server: Enterprise version only server for the ETL processes and Data Integration

Thin Client Tools

  • Pentaho Interactive Reporting: WYSIWYG type of design interface used to construct simple and adhoc reports on the fly without the need of having IT or programming skills. There are several CE alternatives as WAQR (Web Ad-Hoc Query Reporting) and Saiku Reporting.

PIRPentaho Interactive Reporting (EE)

saikurepSaiku Reporting (CE)

WAQRjpgWeb Ad Hoc Query Reporting

  • Pentaho Analyzer: An advanced OLAP viewer with support for drag-and-drop. It is an EE intuitive analytical visualization tool with the capability  to filter and drill down into data, stored in a Mondrian (Pentaho ROLAP engine) data source.

analyzer_territoryPentaho Analyzer

  • Pentaho Dashboard Designer (EE): Commercial plugin that allows users to create dashboards with an easy graphical interface

Design Tools

  • Schema Workbench: Graphical tool for creating ROLAP schemas for Pentaho Analysis (Mondrian).
  • Aggregation Designer: Generate pre-calculated tales  to improve the performance of Mondrian OLAP schemas with this tool.
  • Design Studio: An eclipse-based application and plugin, that eases the creation of business process flows with a special XML script to define action sequences xactions.
  • Report Designer: A banded report designing tool with a great GUI, useful to create sub-reports, charts and graphs.
  • Data Integration:  This wonderful ETL tool is also known as Kettle, and is composed by an ETL engine and GUI  that allows the user to design ETL jobs and transformations.
  • Metadata Editor: This tool is used to create business models and acts as an abstraction layer from the underlying physical database.


chp1Pentaho BI Suite components

Chapter 2, Setting Up the ground

In this topic we will install Pentaho BI Suite CE and Saiku OLAP plugin from Marketplace. Besides, in the chapter we learn how to administer data sources using Pentaho User Console and Pentaho Administration Console.

chp2 marketplaceMarketplace plugin

Chapter 3, Churning Big Data with Pentaho

This chapter provides a basic understanding of the Big Data ecosystem and an example to analyze data sitting on the Hadoop framework using Pentaho. At the end of this chapter, you will learn how to translate diverse data sets into meaningful data sets using Hadoop/Hive.
This chapter covers the following subjects:
• Overview of Big Data and Hadoop
• Hadoop architecture
• Big Data capabilities of Pentaho Data Integration (PDI)  Kettle
• Working with PDI and Hortonworks Data Platform, a Hadoop distribution
• Loading data from Hadoop Distributed File System (HDFS) to Hive using PDI

Hadoop ecosystemThe Hadoop ecosystem

HDFS to hive transformationHDFS to Hive transformation

Chapter 4, Pentaho Business Analytics Tools

This topics gives a quick summary of the business analytics life cycle. We will look at several applications such as Pentaho Action Sequence and Pentaho Report Designer, as well as the Community Dashboard Editor (CDE), Community Data Access (CDA) and Community Dashboard Framework (CDF) plugins and their configuration, in order to get in touch with them.


Hive Java queryHive Java query using User Defined Java Class Step

Chapter 5, Visualization of Big Data

This chapter provides a basic understanding of visualizations and examples to analyze the patterns using various charts based on Hive data. This chapter shows us  how to create an interactive analytical dashboard that gets data from Hive. Summarizing this chapter covers the following themes:
• Evolution of data visualization and its classification
• Data source preparation
• Consumption of HDFS-based data through HiveQL
• Creation of several types of charts
• Making charts more attractive using styling

hive query chp5Hive query

DashboardStock Price Analysis Dashboard

Appendix A, Big Data Sets

Talks about data preparation with one sample from stock exchange data.

Appendix B, Hadoop Setup

Takes you through the installation and configuration of the third-party Hadoop distribution, Hortonworks Sandbox, which is used throughout the book .




Leave a Reply

Please log in using one of these methods to post your comment: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s