Improving Performance with Pentaho Data Integration (Kettle) Table Input Step and Oracle Databases

When using Pentaho Data Integration  (a.k.a. Kettle)  Table Input step to connect to Oracle via a JDBC connection there is a setting in your connection information that you can specify that can dramatically improve your performance in retrieving data. This property is the defaultRowprefetch. Oracle JDBC drivers allow you to set the number of rows to prefetch from the server while the result set is being populated during a query. Prefetching row data into the client reduces the number of round trips to the server. The default value for this property is 10.

In the table input step, edit your connection, click on the options tab and then enter in your defaultRowprefetch specification:


Publishing Pentaho Reporting Templates Tips!

Yesterday I have issues updating Repository cache after uploading new .PRPT ‘s to my solution repository on a BI Server version 3.6 .

After a while I noticed that reports which failed had éá and latin characters so I turned it into ASCII ea and all worked correctly.

Second problem was to open on my Pentaho User Console, reports with + char on its title. The solution was to avoid the using of this type of strange characters.

Hope you help

Solution for converting CLOBS datatype to VARCHAR

Solution for converting CLOBS datatype to VARCHAR datatypes, all the documents I refer to talk about converting BLOBS to VARCHAR and when I try and apply the examples to CLOBS, get errors

and we said…

dbms_lob.substr( clob_column, for_how_many_bytes, from_which_byte );

for example:

select dbms_lob.substr( x, 4000, 1 ) from T;

will get me the first 4000 bytes of the clob.

Note that when using SQL as I did, the max length is 4000. You can get 32k using plsql:

declare my_var long;

begin for x in ( select X from t )

loop my_var := dbms_lob.substr( x.X, 32000, 1 );

Oracle translate tips

Oracle/PLSQL: Translate Function

In Oracle/PLSQL, the translate function replaces a sequence of characters in a string with another set of characters. However, it replaces a single character at a time. For example, it will replace the 1st character in the string_to_replace with the 1st character in the replacement_string. Then it will replace the 2nd character in the string_to_replace with the 2nd character in the replacement_string, and so on.

The syntax for the translate function is:

translate( string1, string_to_replace, replacement_string )

string1 is the string to replace a sequence of characters with another set of characters.

string_to_replace is the string that will be searched for in string1.

replacement_string – All characters in the string_to_replace will be replaced with the corresponding character in the replacement_string.

Applies To:

  • Oracle 8i, Oracle 9i, Oracle 10g, Oracle 11g

For Example:

translate(‘1tech23’, ‘123’, ‘456’); would return ‘4tech56’
translate(‘222tech’, ‘2ec’, ‘3it’); would return ‘333tith’

Pentaho Data Integration (Kettle) Loading into LucidDB

By far, the most popular way for PDI users to load data into LucidDB is to use the PDI Streaming Loader. The streaming loader is a native PDI step that:

  • Enables high performance loading, directly over the network without the need for intermediate IO and shipping of data files.
  • Lets users choose more interesting (from a DW perspective) loading type into tables. In particular, in addition to simple INSERTs it allows for MERGE (aka UPSERT) and also UPDATE. All done, in the same, bulk loader.
  • Enables the metadata for the load to be managed, scheduled, and run in PDI.

However, we’ve had some known issues. In fact, until PDI 4.2 GA and LucidDB 0.9.4 GA it’s pretty problematic unless you run through the process of patching LucidDB outlined on this page: Known Issues.

In some ways, we have to admit, that we released this piece of software too soon. Early and often comes with some risk, and many have felt the pain of some of the issues that have been discovered with the streaming loader.

In some ways, we’ve built an unnatural approach to loading for PDI: PDI wants to PUSH data into a database. LucidDB wants to PULL data from remote sources, with it’s integrated ELT and DML based approach (with connectors to databases, salesforce, etc).   Our streaming loader “fakes” a pull data source, and allows PDI to “push” into it.

There’s mutliple threads involved, when exceptions happen users have received cruddy error messages such as “Broken Pipe” that are unhelpful at best, frustrating at worse. Most all of these contortions will have sorted themselves out and by the time 4.2 GA PDI and 0.9.4 GA of LucidDB are released the streaming loader should be working A-OK. Some users would just assume avoid the patch instructions above and have posed the question: In a general sense, if not the streaming loader how would I load data into LucidDB?

Again, LucidDB likes to “pull” data from remote sources. One of those is CSV files. Here’s a nice, easy, quick (30k r/s on my MacBook) method to load a million rows using PDI and LucidDB:

LucidDB pulling data

This transformation outputs to a Text File 1 million rows, waits for that to complete then proceeds to the load that data into a new table in LucidDB. Step by Step the LucidDB statements

— Points LucidDB to the directory with the just generated flat file
— LucidDB has some defaults, and we can “guess” the datatypes by scanning the file
— Let’s create a foreign table for the data file (“DATA.txt”) that was output by PDI
>create foreign table server csv_file_server;
— Create a staging, and load the data from the flat file (select * from

We hope to have the streaming loader ready to go in 0.9.4 (LucidDB) and 4.2 (PDI). Until then, consider this easy, straight forward method of loading data that’s high performance, proven, and stable for loading data from PDI into LucidDB.

Example file:csv_luciddb_load.ktr

Pentaho and Google Web Kit GeoMap issues resolved

During Last week while I was  developing a Dashboard on GWT to deploy in Pentaho (Remember to use GWT sdk of a version 2.0.4 newer versions are not valid for Pentaho)

I found several issues but finding docs about Charts was specially difficult

-) Youl’ll need to install GWT plugin for eclipse

-)New > Web Application Project (Blue logo google)

-) Add  GWT visualization jar  (1.1.2 version for example) as external library to your project

-)inherit in you module.xml from gwt.visualization

<!– Other module inherits                                      –>
<inherits name=””/>

Here I  attach source code for creating a GeoMap chart, remember to add it to an RootPanel on the onModuleload() method of your project





// creation worldMap
private void createWorldMap(){


//grafico_mundo_options.setColors(0xFF8747, 0xFFB581, 0xc06000);

data_world_chart= DataTable.create();

data_world_chart.addColumn(ColumnType.STRING, “Country”);
data_world_chart.addColumn(ColumnType.NUMBER, “Population”);

data_world_chart.setValue(0, 0, “Germany”);
data_world_chart.setValue(0, 1, 1000);
data_world_chart.setValue(1, 0, “Spain”);
data_world_chart.setValue(1, 1, 1500);
data_world_chart.setValue(2, 0, “Italy”);
data_world_chart.setValue(2, 1, 1000);
data_world_chart.setValue(3, 0, “France”);
data_world_chart.setValue(3, 1, 600);
data_world_chart.setValue(4, 0, “Portugal”);
data_world_chart.setValue(4, 1, 200);

grafico_mundo = new GeoMap(data_world_chart,   world_chart_options);

//Panel is a VerticalPanel


GeoMap GWT