Parallelization jobs in Kettle – Pentaho Data Integration


Reblogged from http://spektom.blogspot.com.es/2014/02/parallelization-monster-framework-for.html

We always end up with ROFL in our team, when trying to find a name for strange looking ETL processes diagrams. This monster has no name yet:

Parallel kettle job

This is a parallelization framework for Pentaho Kettle 4.x. As you probably know in the upcoming version of Kettle (5.0) there’s native ability to launch job entries in parallel, but we haven’t got there yet.

In order to run a job in parallel, you have to call this abstract job, and provide it with 3 parameters:

  • Path to your job (which is supposed to run in parallel).
  • Number of threads (concurrency level).
  • Optional flag that says whether to wait for completion of all jobs or not.
Regarding the number of threads, as you can see the framework supports up to 8 threads, but it can be easily extended.
How this stuff works. “Thread #N” transformations are executed in parallel on all rows copies. Rows are split then, and filtered in these transformations by the given number of threads, so only a relevant portion of rows is passed to the needed job (Job – Thread #N). For example, if the original row set was:
           [“Apple”, “Banana”, “Orange”, “Lemon”, “Cucumber”]
and the concurrency level was 2, then the first job (Job – Thread #1) will get the [“Apple”, “Banana”, “Orange”] and the second job will get the rest: [“Lemon”, “Cucumber”]. All the other jobs will get an empty row set.
Finally, there’s a flag which tells whether we should wait until all jobs are completed.
I hope one will find attached transformations useful. And if not, at least help me find a name for the ETL diagram. Fish, maybe? 🙂
Advertisements

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s