Hello friends today I am going to review Pentaho Data Integration Beginner’s Guide – Second Edition:
Below you can check the link to purchase the book:
Book Title: Pentaho Data Integration Beginner’s Guide – Second Edition
Authors: María Carina Roldán
Paperback: 502 pages
I would like to recommend this book because if you are a noob in Pentaho Data Integration you will gain a lot of knowledge of this cool tool, besides if you are advanced with PDI you can use it as reference guide book.
This book is an excellent starting point for database administrators, data warehouse developers, or anyone who is responsible for ETL and data warehouse projects and needs to load data into them.
Rating: 9 out of 10
Although this book is oriented to PDI 4.4.0 CE version, some new features of PDI 5.0.1 CE are listed in an Appendix of the book
Chapter 1 – Getting Started with Pentaho Data Integration
In this chapter you learn what Pentaho Data Integration is and installing the software required to start using PDI graphical designer. As an additional task MySQL DBMS server is installed.
Chapter 2 – Getting started with Transformations
This chapters introduces us in the basic terminology of PDI and an introduction in handling runtime errors is performed. We will also learn the simplest ways of transforming data.Calculating project duration transformation
Chapter 3 – Manipulating Real-World Data
Here we will learn how to get data from different sorts of files (csv, txt, xml …) using PDI. Besides we will send data from Kettle to plain files
Chapter 4 – Filtering, Searching, and Performing Other Useful Operations with Data
Explains how to sort and filter data, grouping data by different criteria and looking up for data outside the main stream of data. Some data cleasing tasks are also performed in this chapter.
Chapter 5 – Controlling the Flow of Data
In this very important for ETL developers chapter we will learn how to control the flow of data. In particular we will cover the following topics: Copying and distributing rows, Splitting streams based on conditions and merging streams of data.
Chapter 6 – Transforming Your Data by Coding
Chapter 7 – Transforming the Rowset
This chapter will be dedicated to learn how to convert rows to columns (denormalizing) and converting columns to rows (normalizing) . Furthermore, you will be introduced to a very important topic in data warehousing called time dimensions.
Chapter 8 – Working with databases
This is the firs of two chapters fully dedicated to working with databases. We will learn how to connect to a database, preview and get data from a database and insert/update/delete data from a database.
Chapter 9 – Performing Advanced Operations with Databases
This chapter explains different advanced operations with databases: Doing simple and complex lookups in a database. Besides an introduction in dimensional modeling and loading dimensions is included.
Chapter 10 – Creating Basic Task Flows
So far, we have been working with data (running transformations). A PDI transformation does not run in isolation and usually is embedded in a bigger process. These processes like generating a daily report and transfer the report to a shared repository or updating a data ware house and sending a notification by email can be implemented by PDI jobs. In this chapter we will be introduced to jobs, executing tasks upon conditions and working with arguments and named paramenters.
Chapter 11 – Creating Advanced Transformations and Jobs
This chapter is about learning techniques for creating complex transformations and jobs (create subtransformations, implement process flows, nest jobs, iterate the execution of jobs and transformations …)
Chapter 12 – Developing and Implementing a Simple Datamart
This chapter will cover the following: Introduction to a sales datamart based on a provided database, loading the dimensions and fact table of the sales datamart and automating what has been done.
Appendix A- Working With Repositories
PDI allows us storing our transformations and jobs under 2 different configurations: file-based and database repository. Along this book we have used file-based option, however the database repository is convenient in some situations.
Appendix B- Pan and Kitchen – LaunchingTransformations and Jobs from the Command Line
Despite having used Spoon as the tool for running jobs and transformation you may also run them from a terminal window. Pan is a cmd-line program which lets you launche the transformations designed in Spoon, both the .ktr files and from a repository. The counterpart to Pan is Kitchen, which allows you to run jobs from .kjb files and from a repository.
Appendix C- Quick Reference – Steps and Job Entries
This appendix summarizes the purpose of the steps and jobs entries used in the labs throughout the book.
Appendix D- Spoon Shortcuts
This very useful appendix includes tables summarizing the main Spoon shortcuts.
Appendix E- Introducing PDI 5 features
New PDI 5 features (PDI 5 is currently available now)