Error : Can’t connect to X11 window server using ‘localhost:10.0’


On Pentaho BI Server CE , my experience was 6.1 CE I had this error on a UNIX OS.

I had to add this flag to JAVA on star_pentaho.sh script:

-Djava.awt.headless=true

Using this flag we start java using headless mode and we get rid of error

Error : Can’t connect to X11 window server using ‘localhost:10.0’

Filter data by user on Pentaho CDE


  1. Create a dashboard on CDE
  2. Create a Simple Parameter for example param_user
  3. On Layout Perspective –> Add Code Snippet. We will use  Dashboards.context.user
    $( document ).ready(function() {
    var param_user=Dashboards.context.user;
    Dashboards.fireChange("param_user",param_user);
    });
    
  4. Now on datasources create a query with param_user as parameter
    SELECT * FROM
    tablename
    WHERE user=${param_user}
    
  5.  Use your query in a CDE component (Table, Chart, Text Component)

Pentaho Mondrian: Custom Formatting with Cell Formatter


Repost from http://diethardsteiner.github.io/mondrian/2015/07/29/Mondrian-Cell-Formatter.html

Formatting measures in cubes is quite essential for the readability of any analysis results. Pentaho Mondrian features a formatString (attribute of measure declaration in a Mondrian Cube Definition) and FORMAT_STRING (MDX query) options which allow you to define the format in a Visual Basic syntax. Say you want to display 12344 as 12,333 you can easily create following formatting mask: #,###. This kind of approach works for most use cases.

However, sometimes you might required a custom formatting option which is not covered by the Visual Basic formatting options. Imagine e.g. that one of the measures of your cube is a duration in seconds. The integer value that you store in your database is not really a nice way of presenting this measure, unless of course you are only dealing with very small figures. But what if you have e.g. a figure like 102234 seconds. How many days, hours etc is this?

One approach of dealing with this is to create hidden calculated members which break this figure down into days, hours, minutes etc:

measure mdx calculation
days CAST(INT(sec/(60*60*24)) AS INTEGER)
hours CAST(INT((sec/(60*60))-(days*24)) AS INTEGER)
minutes CAST(INT((sec/60)-(days*(24*60))-(hours*60)) AS INTEGER)
seconds CAST(ROUND(sec-(days*(24*60*60))-(hours*60*60)-(minutes*60),0) AS INTEGER)

You could then create a visible measure which concatenates all these invisible measures and displays the figure as 1d 4h 24min 54sec. Example final calculated visible measure:

CAST([Measures].[Days] AS STRING) || "d " || IIF([Measures].[Hours] < 10, "0", "") || CAST([Measures].[Hours] AS STRING) || ... and so forth

However, while this approach works, you will realize that you cannot sort by this measure properly! That’s rather inconvenient.

Fortunately enough, Mondrian also provides a Cell Formatter, which allows you to access the value of e.g. a measure and manipulate it any way for display purposes, but – and this is the very important bit – this does not influence the underlying data type. So in our example, the integer value for the duration will still be an integer value and hence the sorting will work! The other really good point is that you can use various languages to manipulate the value, e.g. Java or JavaScript.

To add a special Cell Formatter, simply nest the CellFormatter XML element within the Measure or CalculatedMeasure XML element. Then nest another Script XML element within this one to specify the Script Language and finally nest your code within this element. Example (this time including weeks as well):

<Measure name="Duration" column="duration" visible="true" aggregator="sum">
    <CellFormatter>
         language="JavaScript">
            var result_string = '';
            // access Mondrian value
            var sec =  value;
            var weeks = Math.floor(sec/(60*60*24*7));
            var days = Math.floor((sec/(60*60*24)) - (weeks*7));
            var hours = Math.floor(sec/(60*60) - (weeks*7*24) - (days*24));
            var minutes = Math.floor((sec/60) - (weeks*7*24*60) - (days*24*60) - (hours*60));
            var seconds = Math.floor(sec - (weeks*7*24*60*60) - (days*24*60*60) - (hours*60*60) - (minutes*60));
            result_string = weeks.toString() + 'w ' + days.toString() + 'd ' + hours.toString() + 'h ' + minutes.toString() + 'min ' + seconds.toString() + 'sec';
            return result_string;
        
    </CellFormatter>
</Measure>

You could of course improve the JavaScript further by only showing the relevant duration portions:

var result_string = '';
// access Mondrian value
var sec =  value;
var weeks = Math.floor(sec/(60*60*24*7));
var days = Math.floor((sec/(60*60*24)) - (weeks*7));
var hours = Math.floor(sec/(60*60) - (weeks*7*24) - (days*24));
var minutes = Math.floor((sec/60) - (weeks*7*24*60) - (days*24*60) - (hours*60));
var seconds = Math.floor(sec - (weeks*7*24*60*60) - (days*24*60*60) - (hours*60*60) - (minutes*60));
if(weeks !== 0){
    result_string = weeks.toString() + 'w ' + days.toString() + 'd ' + hours.toString() + 'h ' + minutes.toString() + 'min ' + seconds.toString() + 'sec';
} else if(days !== 0){
    result_string = days.toString() + 'd ' + hours.toString() + 'h ' + minutes.toString() + 'min ' + seconds.toString() + 'sec';
} else if(hours !== 0){
    result_string = hours.toString() + 'h ' + minutes.toString() + 'min ' + seconds.toString() + 'sec';
} else if(minutes !== 0){
    result_string = minutes.toString() + 'min ' + seconds.toString() + 'sec';
} else if(seconds !== 0){
    result_string = seconds.toString() + 'sec';
} else {
    // always provide a display value - do not return null
    result_string = '0sec';
}
return result_string;

Note: The script has to return a value in any situation – it must not return null, otherwise there can be issues with the client tool. E.g. Analyzer doesn’t display all the results properly if null is returned.

Important: When adding the CellFormatter make sure that you removed the formatString attribute from the Measure or CalculatedMeasure XML Element, otherwise this attribute will take precedence over the CellFormatter.

Amazing isn’t it? If I had only known about this feature earlier on! A big thanks to Matt Campbell for pointing out that Mondrian has a Cell Formatter.

Pentaho Reporting Video Course


I would like to recommend this excellent video course created by my friend Francesco Corti and officially reviewed by Paul Hernandez and me.

Pentaho Reporting [Video]

http://www.packtpub.com/pentaho-reporting/video

Screenshots:

Course Contents:

    1. Getting Started with Pentaho Reporting [15:57 minutes]
      • Installing Pentaho Reporting
      • Loading and Saving Reports and Having a Preview
      • Building a Report Using the Report Wizard
      • Building the ‘My First Report’
      • Customizing the ‘My First Report’
      • Advanced Customization on the My First Report

 

    1. Dive Deeper into the Pentaho Reporting Engine’s XML and Java APIs [11:44 minutes]
      • Setting the Java Development Environment
      • Embedding a Pentaho Report in an Enterprise Web Application
      • Embedding a Pentaho Report in a SWING Application
      • Introducing Serialized Reports
      • Building a Report Using Pentaho Reporting’s Java API

 

    1. Configuring the JDBC Database and Other Data Sources [12:43 minutes]
      • Configuring Your Data Source to a DBMS Using JDBC
      • Configuring Your Data Source to an OLAP Engine (Mondrian)
      • Configuring Your Data Source to an XML File and a Table
      • Configuring Your Data Source to Metadata and PDI
      • Working with Data Sources in Java

 

    1. Introducing Graphic Chart Types – Pie, Bar, Line, and Others [10:36 minutes]
      • Incorporating a Line Chart into a Pentaho Report
      • Incorporating Supported Charts and Common Properties
      • Incorporating and Customizing Charts into a Report
      • Incorporating Images into a Report

 

    1. Modifying Reports Using Parameters and Internationalization [11:14 minutes]
      • Parameterizing a Pentaho Report
      • Parameterizing a Pentaho Report Using Java
      • Working with Functions and Expressions
      • Working with Formulas
      • Internationalization and Localization of Pentaho Reports

 

    1. Adding Subreports and Cross Tabs in Your Reports [09:52 minutes]
      • Adding a Multi-page Subreport in a Pentaho Report
      • Parameterizing and Adding Chart Subreport in a Pentaho Report
      • Adding a Side-by-Side Subreport in a Pentaho Report
      • Adding Cross Tabs in a Pentaho Report

 

    1. Building Interactive SWING and HTML Reports [12:29 minutes]
      • Building Interactive Reports in SWING
      • Building Interactive Reports in HTML

 

  1. Using Pentaho Reporting in the Pentaho Suite [13:10 minutes]
    • Using Pentaho Reporting with Pentaho Business Intelligence Server
    • Using Pentaho Reporting with Pentaho Data Integration (Kettle)

What you will learn from this video course

  • Install Pentaho Report in your development or production environment
  • Create impressive reports with advanced charts, interaction, multi-language support and much more
  • Use the Pentaho Report Engine in your Java environment for web and swing applications
  • Interact and customize your Pentaho reports using Java (in a web and swing application)
  • Develop your basic and advanced reports using several datasources comprised of the OLAP Engines
  • Deploy and use your Pentaho Reports inside the Pentaho suite, in particular in the Pentaho Business Intelligence Server and the Pentaho Data Integration

Who this video course is for

If you are a Java developer or IT professional who wants to assemble custom reporting solutions with Pentaho Reporting, this video course is ideal for you. Master the advanced concepts within Pentaho Reporting such as sub-reports, cross-tabs, data source configuration, and metadata-based reporting.

In Detail

Pentaho Report Designer is one of the most important core modules of the Pentaho BI Suite, that builds impressive reports using Open Source Business Intelligence Solutions . Pentaho Report Designer helps you to develop professional applications, making them interact with a multi-language support as well as parameterized reports.

You will learn exactly how to develop basic and advanced reports using the Pentaho Report Designer environment, and a more customized Java environment. All of the examples are described in-depth with the source code, and you will be guided through this book using a step-by-step approach which will ensure that you’ll achieve impressive results.

This course begins with the installation of the Java Development Environments using practical examples, moving onto how to develop impressive reports using tables, charts and sub-reports. The examples will also be shown in a Java development environment for web and swing applications.

Next, you will be taken on a practical run through the Pentaho Report Designer. This guide will then explain Java APIs, data source connections, and the development of several chart types. You also learn the most relevant, advanced features needed to make a report , such as internationalization, parameterization, interaction, functions, expressions, sub-reports and cross-tabs, leading the way to the use of reports in the Pentaho Suite (especially in the Pentaho BI server and Pentaho Data Integration).

With the Pentaho Report basic and advanced development video course, you’ll get in touch with the enterprise development of reports, with one of the most relevant Open Source Business Intelligence solutions.

Open Business Analytics Training in London #BI #BigData #ETL #OLAP


Training Main page

Training

Dates:  From 28th April to 1st May 2014

Duration: 24 hours. 4 days

Location: Executive offices group meeting rooms. London.

Address: Central Court, 25 Southampton Bldgs – WC2A 1AL .

Training contents:

DAY 1
Business Intelligence Open Source Introduction and BI Server User Console
a. Pentaho 5 Architecture and new features, Mondrian, Kettle, etc…
b. Users and roles in Pentaho 5.
c. Browsing the repository in the user console.
d. Design tools.
Pentaho Data Integration (Kettle) ETL tool
a. Best Practices of ETL Processes.
b. Functional Overview (Jobs, transformations, stream control)
c. Parameters and Variables in kettle
• Environment variables and shared connections.
• ETL scheduling
d. Jobs
• Overview
• Step types (Mail, File Management, Scripting, etc…)
• Steps description
e. Transformations
• Overview
• Step types (Input, Output, Transform, Big Data, etc…)
• Steps description
f. Practical exercises
g. Data profiling with DataCleaner (pattern analysis, value distribution, date gap analysis …)
h. Talend Open Studio vs Kettle comparative
DAY 2
Data warehousing, OLAP and Mondrian
a. Datawarehouse – Datamart.
b. Star database schemas.
c.Multidimensional/OLAP
d. Mondrian ROLAP engine.
e. JPivot and MDX.
f. Designing OLAP structures Schema Workbench.
g. Tips to maximize Mondrian performance.
h. Alternatives to JPivot: STPivot, Saiku Analytics, OpenI
i. Practical Exercises
Social Intelligence
a. Introduction
b. Social KPIs (Facebook, Twitter …)
c. Samples
DAY 3
Reporting
a. AdHoc Reporting
• WAQR
• Pentaho Metadata Editor
• Creating a business model
b. Pentaho Reporting. Report Designer.
c. Practical Exercises

Big Data

a. Big Data Introduction
b. Pentaho Big Data components
c. Relational vs Columnar and Document Databases
DAY 4
Dashboards and Ctools
a. Introduction.
• Previous concepts.
• Best practices in dashboard design.
• Practical design.
b. Types of dashboards.
c. CDF
• Introduction.
• Samples.
d. CDE (Dashboard Editor)
• Introduction.
• Samples.
• Practical Exercise.
e. Ad hoc Dashboards.
• Introduction.
• Samples.
f. STDashboard
Plug-ins
a. SPARKL (Application designer)
b. Startup Rules (Substitute of xactions)

Parallelization jobs in Kettle – Pentaho Data Integration


Reblogged from http://spektom.blogspot.com.es/2014/02/parallelization-monster-framework-for.html

We always end up with ROFL in our team, when trying to find a name for strange looking ETL processes diagrams. This monster has no name yet:

Parallel kettle job

This is a parallelization framework for Pentaho Kettle 4.x. As you probably know in the upcoming version of Kettle (5.0) there’s native ability to launch job entries in parallel, but we haven’t got there yet.

In order to run a job in parallel, you have to call this abstract job, and provide it with 3 parameters:

  • Path to your job (which is supposed to run in parallel).
  • Number of threads (concurrency level).
  • Optional flag that says whether to wait for completion of all jobs or not.
Regarding the number of threads, as you can see the framework supports up to 8 threads, but it can be easily extended.
How this stuff works. “Thread #N” transformations are executed in parallel on all rows copies. Rows are split then, and filtered in these transformations by the given number of threads, so only a relevant portion of rows is passed to the needed job (Job – Thread #N). For example, if the original row set was:
           [“Apple”, “Banana”, “Orange”, “Lemon”, “Cucumber”]
and the concurrency level was 2, then the first job (Job – Thread #1) will get the [“Apple”, “Banana”, “Orange”] and the second job will get the rest: [“Lemon”, “Cucumber”]. All the other jobs will get an empty row set.
Finally, there’s a flag which tells whether we should wait until all jobs are completed.
I hope one will find attached transformations useful. And if not, at least help me find a name for the ETL diagram. Fish, maybe? 🙂