Pentaho Data Integration scheduling with Jenkins


“As a System Administrator I need  to find a scheduling solution for our Pentaho Data Integration Jobs “
Reblog from  http://opendevelopmentnotes.blogspot.com/2014/09/pentaho-data-integration-scheduling.html
Scheduling is a crucial task in all ETL and Data Integration processes. The scheduling options available on the community edition of Pentaho Data Integration (Kettle) basically relay on the Operating System capability (Cron on Linux, Task Scheduler on Windows) but there is at last another free, open source and solid alternative for job scheduling,Jenkins.
Jenkins is a Continuos Integration tool, the de facto standard adopted in Java projects, and it’s so extensible and  easy to use that do a perfect job in scheduling Jobs and Transformations developed in Kettle.
So let start to build a production ready (probably) scheduling solution.

System configuration

OS: Oracle Linux 6
PDI: 5.1.0.0
Java: 1.7
Jenkins: 1.5

Install Jenkins

Jenkins install on Linux is trivial, just run some commands and in a few minutes you will have the system up and running.

#sudo wget -O /etc/yum.repos.d/jenkins.repo http://pkg.jenkins-ci.org/redhat/jenkins.repo
#sudo rpm –import https://jenkins-ci.org/redhat/jenkins-ci.org.key
#sudo yum install jenkins

At the end of the installation process you will have your Jenkins system ready to run.

Before starting Jenkins verify to have Java installed running:

#java -version

and if it’s not found on your system just install it with:

#sudo yum install java

Now it’s time to start Jenkis:

#sudo service jenkins start

Open you browser and go to console page.

Resolve port conflict

If you are not able to navigate to the web page check the log file:

#sudo cat /var/log/jenkins

Probably there is a port conflict (in my case I was running another web application on the same machine).

Look at your config file:

#sudo nano /etc/sysconfig/jenkins

and change the default ports:

JENKINS_PORT=”8082″

JENKINS_AJP_PORT=”8011″

Job example

Now that Jenkis is up and running is time to test a simple Job.

The transformation and job are self explained:

Scheduling

Go to the Jenkins web console and click on New Item.
Give it a name and check the Free style project box.
Set the schedule (each minutes only to test the job).
Now fill the Build section with the Kitchen command and save the project.
Just wait one minute and look at the left side of the page, you will find your Job running.
Click the Build Item and select Console Output. You will be able to see the main output of Kitchen.

CONCLUSION

Jenkins is a powerful tool and, even if it’s not the primary purpose, you can use it as your Enterprise Scheduler taking advantage of all the options for executing, monitoring and manage your Kettle Jobs.
Explore all the features that Jenkins provides and build your own free, solid and open source scheduling solution.
Take advantage of the big Jenkins community in order to meet the most complex scheduling scenarios and from time to time, if you find any interesting thing, remember to give back it to the community.

Creating a connection to SAP HANA using Pentaho PDI


 

Reblog from http://scn.sap.com/community/developer-center/hana/blog/2014/09/04/creating-a-connection-to-sap-hana-using-pentaho-pdi

In this blog post we are going to learn how to create a HANA Database Connection within Pentaho PDI.

1)  Go to SAP HANA CLIENT installation path and copy the “ngdbc.jar”

*You can get SAP HANA CLIENT & SAP HANA STUDIO from :https://hanadeveditionsapicl.hana.ondemand.com/hanadevedition/

 

1.png

2) Copy and paste the jar file to : <YourPentahoRootFolder>/data-integration/lib

2.png

3) Start Pentaho PDI and create a new Connection

* Make sure your JAVA_HOME environment variable is setting correctly.

3.png

3_1.png

3_2.png

4) Create a transformation,  rick click on Database connection to create a new database connection

4.png

 

5) Select “Generic Database” connection type and Access as “Native(JDBC)”

 

5.png

6)  Fill the following parameter on Settings

Connection Name: NAMEYOURCONNECTION

Custom Connection URL: jdbc:sap://YOUR_IP_ADDREES:30015

Custom Driver Class Name: com.sap.db.jdbc.Driver

User Name: YOURHANAUSER

Password: YOURHANAPASSWORD

6.png

 

7) Test your connection.

7.png