Book review by: David Fombella Pombal (twitter: @pentaho_fan)
Book Title: Pentaho for BIg Data Analytics
Authors: Manoj R Patil, Feris Thia
Paperback: 118 pages
I would like to suggest this book if you want to get started with Pentaho Open Source BI tool together with Hadoop and Big Data.
If you are a Data Scientist, a Hadoop programmer, a Big Data enthusiast, or a developer working in the Business Intelligence domain who is aware of Hadoop or the Pentaho tools and want to try out creating a solution in the Big Data space, this is your manual.
Rating: 7 out of 10
Chapter 1, The Rise of Pentaho Analytics along with Big Data
This chapter serves as a brief summary of the Pentaho tools and its history around Business Intelligence field, weaving in stories on the rise of Big Data.
- Business Analytics (BA) Server: Java-based BI system with a report management system and lightweight process-flow engine, HTML5-based web interface. In Community Edition , there is another substitute application called Business Intelligence (BI) Server
- Data Integration (DI) Server: Enterprise version only server for the ETL processes and Data Integration
Thin Client Tools
- Pentaho Interactive Reporting: WYSIWYG type of design interface used to construct simple and adhoc reports on the fly without the need of having IT or programming skills. There are several CE alternatives as WAQR (Web Ad-Hoc Query Reporting) and Saiku Reporting.
Pentaho Interactive Reporting (EE)
Saiku Reporting (CE)
Web Ad Hoc Query Reporting
- Pentaho Analyzer: An advanced OLAP viewer with support for drag-and-drop. It is an EE intuitive analytical visualization tool with the capability to filter and drill down into data, stored in a Mondrian (Pentaho ROLAP engine) data source.
- Pentaho Dashboard Designer (EE): Commercial plugin that allows users to create dashboards with an easy graphical interface
- Schema Workbench: Graphical tool for creating ROLAP schemas for Pentaho Analysis (Mondrian).
- Aggregation Designer: Generate pre-calculated tales to improve the performance of Mondrian OLAP schemas with this tool.
- Design Studio: An eclipse-based application and plugin, that eases the creation of business process flows with a special XML script to define action sequences xactions.
- Report Designer: A banded report designing tool with a great GUI, useful to create sub-reports, charts and graphs.
- Data Integration: This wonderful ETL tool is also known as Kettle, and is composed by an ETL engine and GUI that allows the user to design ETL jobs and transformations.
- Metadata Editor: This tool is used to create business models and acts as an abstraction layer from the underlying physical database.
Pentaho BI Suite components
Chapter 2, Setting Up the ground
In this topic we will install Pentaho BI Suite CE and Saiku OLAP plugin from Marketplace. Besides, in the chapter we learn how to administer data sources using Pentaho User Console and Pentaho Administration Console.
Chapter 3, Churning Big Data with Pentaho
This chapter provides a basic understanding of the Big Data ecosystem and an example to analyze data sitting on the Hadoop framework using Pentaho. At the end of this chapter, you will learn how to translate diverse data sets into meaningful data sets using Hadoop/Hive.
This chapter covers the following subjects:
• Overview of Big Data and Hadoop
• Hadoop architecture
• Big Data capabilities of Pentaho Data Integration (PDI) Kettle
• Working with PDI and Hortonworks Data Platform, a Hadoop distribution
• Loading data from Hadoop Distributed File System (HDFS) to Hive using PDI
The Hadoop ecosystem
HDFS to Hive transformation
Chapter 4, Pentaho Business Analytics Tools
This topics gives a quick summary of the business analytics life cycle. We will look at several applications such as Pentaho Action Sequence and Pentaho Report Designer, as well as the Community Dashboard Editor (CDE), Community Data Access (CDA) and Community Dashboard Framework (CDF) plugins and their configuration, in order to get in touch with them.
Hive Java query using User Defined Java Class Step
Chapter 5, Visualization of Big Data
This chapter provides a basic understanding of visualizations and examples to analyze the patterns using various charts based on Hive data. This chapter shows us how to create an interactive analytical dashboard that gets data from Hive. Summarizing this chapter covers the following themes:
• Evolution of data visualization and its classification
• Data source preparation
• Consumption of HDFS-based data through HiveQL
• Creation of several types of charts
• Making charts more attractive using styling
Stock Price Analysis Dashboard
Appendix A, Big Data Sets
Talks about data preparation with one sample from stock exchange data.
Appendix B, Hadoop Setup
Takes you through the installation and configuration of the third-party Hadoop distribution, Hortonworks Sandbox, which is used throughout the book .