Pentaho 6.0.x Install


Interesting IT Tip's

Introduction

After reading the changelog, I thought that the numbering from 5.4 to 6.0 was not justified. But then I realized that this is the first version of Pentaho under the Hitachi brand, and, seting a round number setted a milestone as completed. A good sign is that they kept the release day of the community version (CE), the same as the commercial one.

Recomendation: Check the ETL, it is a wonderfull tool. Spend time with it, it will let you clean and process data from several sources (Text, NoSql, DB, Excel, SAP ERP) and send it on its way [to services or users] in different formats. Of course it can be used with other suites. Then proceed with the Web server that is primary a client side tool that show processes according to user privileges and calendarize processes. To build dashboards you can choose between two sets of bulding blocks: CTools or…

View original post 333 more words

Pentaho Mondrian: Custom Formatting with Cell Formatter


Repost from http://diethardsteiner.github.io/mondrian/2015/07/29/Mondrian-Cell-Formatter.html

Formatting measures in cubes is quite essential for the readability of any analysis results. Pentaho Mondrian features a formatString (attribute of measure declaration in a Mondrian Cube Definition) and FORMAT_STRING (MDX query) options which allow you to define the format in a Visual Basic syntax. Say you want to display 12344 as 12,333 you can easily create following formatting mask: #,###. This kind of approach works for most use cases.

However, sometimes you might required a custom formatting option which is not covered by the Visual Basic formatting options. Imagine e.g. that one of the measures of your cube is a duration in seconds. The integer value that you store in your database is not really a nice way of presenting this measure, unless of course you are only dealing with very small figures. But what if you have e.g. a figure like 102234 seconds. How many days, hours etc is this?

One approach of dealing with this is to create hidden calculated members which break this figure down into days, hours, minutes etc:

measure mdx calculation
days CAST(INT(sec/(60*60*24)) AS INTEGER)
hours CAST(INT((sec/(60*60))-(days*24)) AS INTEGER)
minutes CAST(INT((sec/60)-(days*(24*60))-(hours*60)) AS INTEGER)
seconds CAST(ROUND(sec-(days*(24*60*60))-(hours*60*60)-(minutes*60),0) AS INTEGER)

You could then create a visible measure which concatenates all these invisible measures and displays the figure as 1d 4h 24min 54sec. Example final calculated visible measure:

CAST([Measures].[Days] AS STRING) || "d " || IIF([Measures].[Hours] < 10, "0", "") || CAST([Measures].[Hours] AS STRING) || ... and so forth

However, while this approach works, you will realize that you cannot sort by this measure properly! That’s rather inconvenient.

Fortunately enough, Mondrian also provides a Cell Formatter, which allows you to access the value of e.g. a measure and manipulate it any way for display purposes, but – and this is the very important bit – this does not influence the underlying data type. So in our example, the integer value for the duration will still be an integer value and hence the sorting will work! The other really good point is that you can use various languages to manipulate the value, e.g. Java or JavaScript.

To add a special Cell Formatter, simply nest the CellFormatter XML element within the Measure or CalculatedMeasure XML element. Then nest another Script XML element within this one to specify the Script Language and finally nest your code within this element. Example (this time including weeks as well):

<Measure name="Duration" column="duration" visible="true" aggregator="sum">
    <CellFormatter>
         language="JavaScript">
            var result_string = '';
            // access Mondrian value
            var sec =  value;
            var weeks = Math.floor(sec/(60*60*24*7));
            var days = Math.floor((sec/(60*60*24)) - (weeks*7));
            var hours = Math.floor(sec/(60*60) - (weeks*7*24) - (days*24));
            var minutes = Math.floor((sec/60) - (weeks*7*24*60) - (days*24*60) - (hours*60));
            var seconds = Math.floor(sec - (weeks*7*24*60*60) - (days*24*60*60) - (hours*60*60) - (minutes*60));
            result_string = weeks.toString() + 'w ' + days.toString() + 'd ' + hours.toString() + 'h ' + minutes.toString() + 'min ' + seconds.toString() + 'sec';
            return result_string;
        
    </CellFormatter>
</Measure>

You could of course improve the JavaScript further by only showing the relevant duration portions:

var result_string = '';
// access Mondrian value
var sec =  value;
var weeks = Math.floor(sec/(60*60*24*7));
var days = Math.floor((sec/(60*60*24)) - (weeks*7));
var hours = Math.floor(sec/(60*60) - (weeks*7*24) - (days*24));
var minutes = Math.floor((sec/60) - (weeks*7*24*60) - (days*24*60) - (hours*60));
var seconds = Math.floor(sec - (weeks*7*24*60*60) - (days*24*60*60) - (hours*60*60) - (minutes*60));
if(weeks !== 0){
    result_string = weeks.toString() + 'w ' + days.toString() + 'd ' + hours.toString() + 'h ' + minutes.toString() + 'min ' + seconds.toString() + 'sec';
} else if(days !== 0){
    result_string = days.toString() + 'd ' + hours.toString() + 'h ' + minutes.toString() + 'min ' + seconds.toString() + 'sec';
} else if(hours !== 0){
    result_string = hours.toString() + 'h ' + minutes.toString() + 'min ' + seconds.toString() + 'sec';
} else if(minutes !== 0){
    result_string = minutes.toString() + 'min ' + seconds.toString() + 'sec';
} else if(seconds !== 0){
    result_string = seconds.toString() + 'sec';
} else {
    // always provide a display value - do not return null
    result_string = '0sec';
}
return result_string;

Note: The script has to return a value in any situation – it must not return null, otherwise there can be issues with the client tool. E.g. Analyzer doesn’t display all the results properly if null is returned.

Important: When adding the CellFormatter make sure that you removed the formatString attribute from the Measure or CalculatedMeasure XML Element, otherwise this attribute will take precedence over the CellFormatter.

Amazing isn’t it? If I had only known about this feature earlier on! A big thanks to Matt Campbell for pointing out that Mondrian has a Cell Formatter.

Free Learning Campaign #Packt


packtApacktB Every day Packt Publishing is giving away books for free to help teach new tech skills

From 30th April, 2015 Packt Publishing has thrown open the virtual doors of its new Free Learning Library and offering its customers a daily chance to grab a fresh free eBook from its website. The publisher is encouraging people to learn new skills and try out new technologies and so every day it will be offering a different eBook from its huge list of titles free for anyone to download. The Free Learning Library will be open all year-round but each title will only be up for 24 hours, so make sure you keep checking back to get your hands on the latest book! Packt has well over 2000 titles published and the range of topics that could potentially feature is huge. From AngularJS to Zabbix, there’s going to be something to appeal to everyone – this is a great opportunity to try out a different technology or a new technique. All you’ll have to do is simply click on the day’s free eBook and it will instantly be added to your account. New customers are also encouraged to take advantage, with the offer being a brilliant chance to try out Packt’s great range of books and products – all that’s required is a Packt account. Find out more:http://bit.ly/1Kt6U6w #FreeLearning

Working with Big (lots) Data and Pentaho – Extreme Performance


Codeks Blog

OK, firstly, I’m not talking proper BigData here.  This is not Hadoop, or even an analytical database.  (Lets not get into whether an analytical database counts as bigdata though!). And it’s certainly not NoSQL.  Disk space we’re looking at 100’s of gigabytes, not terabytes.  Yet this project involves more data than the Hadoop projects I’ve done.

So tens of billions of records. Records that must be processed in a limited environment in extremely tight time windows.  And yes; I’m storing all of that in MySQL!

Hey, wake up, yes, I did say billions of records in MySQL, try not to lose consciousness again…  (It’s not the first time I’ve had billions of rows in MySQL either – Yet I know some of you will guffaw at the idea)

In fact, in this project we are moving away from a database cluster, to a single box. The database cluster has 64 nodes and 4TB…

View original post 495 more words