“As a System Administrator I need to find a scheduling solution for our Pentaho Data Integration Jobs “
Reblog from http://opendevelopmentnotes.blogspot.com/2014/09/pentaho-data-integration-scheduling.html
Scheduling is a crucial task in all ETL and Data Integration processes. The scheduling options available on the community edition of Pentaho Data Integration (Kettle) basically relay on the Operating System capability (Cron on Linux, Task Scheduler on Windows) but there is at last another free, open source and solid alternative for job scheduling,Jenkins
Jenkins is a Continuos Integration tool, the de facto standard adopted in Java projects, and it’s so extensible and easy to use that do a perfect job in scheduling Jobs and Transformations developed in Kettle.
So let start to build a production ready (probably) scheduling solution.
OS: Oracle Linux 6
Before starting Jenkins verify to have Java installed running:
and if it’s not found on your system just install it with:
#sudo yum install java
Now it’s time to start Jenkis:
#sudo service jenkins start
Open you browser and go to console page.
Resolve port conflict
If you are not able to navigate to the web page check the log file:
#sudo cat /var/log/jenkins
Probably there is a port conflict (in my case I was running another web application on the same machine).
Look at your config file:
#sudo nano /etc/sysconfig/jenkins
and change the default ports:
Now that Jenkis is up and running is time to test a simple Job.
The transformation and job are self explained:
Go to the Jenkins web console and click on New Item.
Give it a name and check the Free style project box.
Set the schedule (each minutes only to test the job).
Now fill the Build section with the Kitchen command and save the project.
Just wait one minute and look at the left side of the page, you will find your Job running.
Click the Build Item and select Console Output. You will be able to see the main output of Kitchen.
Jenkins is a powerful tool and, even if it’s not the primary purpose, you can use it as your Enterprise Scheduler taking advantage of all the options for executing, monitoring and manage your Kettle Jobs.
Explore all the features that Jenkins provides and build your own free, solid and open source scheduling solution.
Take advantage of the big Jenkins community in order to meet the most complex scheduling scenarios and from time to time, if you find any interesting thing, remember to give back it to the community.