Working with Big (lots) Data and Pentaho – Extreme Performance


Originally posted on Codeks Blog:

OK, firstly, I’m not talking proper BigData here.  This is not Hadoop, or even an analytical database.  (Lets not get into whether an analytical database counts as bigdata though!). And it’s certainly not NoSQL.  Disk space we’re looking at 100’s of gigabytes, not terabytes.  Yet this project involves more data than the Hadoop projects I’ve done.

So tens of billions of records. Records that must be processed in a limited environment in extremely tight time windows.  And yes; I’m storing all of that in MySQL!

Hey, wake up, yes, I did say billions of records in MySQL, try not to lose consciousness again…  (It’s not the first time I’ve had billions of rows in MySQL either – Yet I know some of you will guffaw at the idea)

In fact, in this project we are moving away from a database cluster, to a single box. The database cluster has 64 nodes and 4TB…

View original 495 more words

Pentaho Mondrian: The MDX Generate Function


Repost from http://diethardsteiner.github.io/cde/2015/04/23/MDX-Generate-Function.html

This is a brief overview of how you can use the MDX GENERATE() function. All the examples below work with the SteelWheelsSales Cube, which ships with the Pentaho BA Server. Try the examples listed below either with Pentaho Analyzer, the Saiku CE plugin, Pivot4J or Roland Bouman’s Pentaho Analysis Shell (Pash) (The last 3 ones are plugins, which you can download via the Pentaho Marketplace).

The GENERATE() function has two different applications:

  • If the second parameter is a set, it will return the members that exist in both sets ( a UNION() of the sets). You can think of it as a kind of for-each function which checks if members in the first set exist in the second one. ALL can be used as a third parameter to keep any duplicates.
  • If the second parameter is a string expression then the members will be concatenated.

Examples:

Find the members that are in both sets:

SELECT
    [Measures].[Sales] ON 0
    , NON EMPTY GENERATE(
        {[Markets].[Country].[Australia], [Markets].[Country].[USA]}
        , [Markets].[Country].[Australia]
    ) ON 1
FROM [SteelWheelsSales]
Country Sales
Australia 630,623

In this case Australia survives, because it is a member of the first as well as the second set.

For each member of the first set retrieve the first child:

SELECT
    [Measures].[Sales] ON 0
    , NON EMPTY GENERATE(
        {[Markets].[Country].[Australia], [Markets].[Country].[USA]}
        , [Markets].currentMember.firstChild
    ) ON 1
FROM [SteelWheelsSales]
State Province Sales
NSW 305,567
CA 1,505,542

Next we’d like to know the top performing country within each territory:

SELECT
    [Measures].[Sales] ON 0
    , NON EMPTY GENERATE(
        [Markets].[Territory].Members
        , TOPCOUNT([Markets].CurrentMember.Children, 1, [Measures].[Sales])
    ) ON 1
FROM [SteelWheelsSales]
Country Sales
Australia 630623
Spain 1215687
Japan 188168
USA 3627983

Taking the previous example a bit further, we can get the top performing country and customer combination based on sales for each territory:

SELECT
    [Measures].[Sales] ON 0
    , NON EMPTY GENERATE(
        [Markets].[Territory].Members
        , TOPCOUNT([Markets].CurrentMember.Children * [Customers].[Customer].Members, 1, [Measures].[Sales])
    ) ON 1
FROM [SteelWheelsSales]
Country Customer Sales
Australia Australian Collectors, Co. 200,995
Spain Euro+ Shopping Channel 912,294
Singapore Dragon Souveniers, Ltd. 172,990
USA Mini Gifts Distributors Ltd. 654,858

There is also an option to retain duplicates in the results using the ALL flag as the third argument:

SELECT
    [Measures].[Sales] ON 0
    , NON EMPTY GENERATE(
        {[Markets].[Country].[Australia], [Markets].[Country].[Australia]}
        , [Markets].currentMember.firstChild
        , ALL
    ) ON 1
FROM [SteelWheelsSales]
State Province Sales
NSW 305,567
NSW 305,567

Now let’s have a look at a completely different way we can use the GENERATE() function:

Let’s show the territories with all their countries next to each other in one line. We can define a delimiter as the third parameter of the GENERATE() function:

WITH 
MEMBER [Measures].[Countries] AS
    GENERATE(
            [Markets].CurrentMember.Children
            , [Markets].CurrentMember.Name
            , "-"
        )
SELECT
    {[Measures].[Countries]} ON 0
    , NON EMPTY [Markets].[Territory].Members ON 1
FROM [SteelWheelsSales]
Territory Countries
#null Germany-Ireland-Israel-Netherlands-Poland-Portugal-Russia-Singapore-South Africa-Spain-Switzerland
APAC Australia-New Zealand-Singapore
EMEA Austria-Belgium-Denmark-Finland-France-Germany-Ireland-Italy-Norway-Spain-Sweden-Switzerland-UK
Japan Hong Kong-Japan-Philippines-Singapore
NA Canada-USA

As mentioned earlier, this is just a brief intro the extremely useful GENERATE() function. I hope that the example gave you some ideas on how to use this function.

Install Pentaho BI Server 5 Enterprise Edition with PostgreSQL repository


Reblog from:

Install Pentaho BI Server 5 Enterprise Edition with PostgreSQL repository.

Pentaho provides different ways to install Pentaho BI server. Each method has its own flexibility in installation.
1. Installer mode – Easy to install BA & DI server & tools in one flow with default PostgreSQL repo & default Tomcat server. (New Postgres installed on port 5432.)
2. Archive mode – BA server installed with own BA repository & default Tomcat server. Necessary tools need to be installed manually.
3. Manual mode – BA server installed with own BA repository & own application server (Tomcat or JBoss). Necessary tools need to installed manually.

We have a Postgres instance running on our server and are good with Tomcat as application server so Archive mode of installation is suitable for us. Pentaho installation requires two things be installed before starting with Pentaho installation.

  • Java 7
  • PostgreSQL

Archive mode installation files can be accessible only to license purchased users. Download biserver-ee-5.x-dist.zip from Pentaho customer portal with account credentials here: https://support.pentaho.com/forums/20413716-Downloads

Unzip the archive file and you can see the installation files inside extracted directory.

1
2
3
$ unzip biserver-ee-5.x-dist.zip
$ cd biserver-ee-5.x;ls
install.bat  installer.jar  install.sh  license.txt  README.txt

In remote servers Pentaho can be installed on console mode with ‘-console’. Accept the license and provide the installation path to install Pentaho BI server.

1
$ java -jar installer.jar -console

Find biserver-ee directory under the installation path and set sh files to executable mode.

1
2
$ cd biserver-ee;
$ find . -type f -iname '*.sh' -exec chmod a+x {} \;

Let’s create repository databases by running queries in SQL files located at biserver-ee/data/postgresql.
quartz, hibernate and jackrabbit databases will be created by executing these SQL files. Database names, usernames and password can be changed by modifying in SQL files if required.

1
2
3
4
5
$ cd biserver-ee/data/postgresql
$ psql -U postgres -p 5432 -a -f create_jcr_postgresql.sql
$ psql -U postgres -p 5432 -a -f create_quartz_postgresql.sql
$ psql -U postgres -p 5432 -a -f create_repository_postgresql.sql
$ psql -U postgres -p 5432 -a -f pentaho_mart_postgresql.sql

Pentaho uses postgresql as default database and files are configured to use postgresql. So just verify the database_name, username and password to work with installed postgresql and databases created.

– biserver-ee/pentaho-solutions/system/quartz/quartz.properties

1
org.quartz.jobStore.driverDelegateClass = org.quartz.impl.jdbcjobstore.PostgreSQLDelegate

– biserver-ee/pentaho-solutions/system/hibernate/hibernate-settings.xml

1
    <config-file>system/hibernate/postgresql.hibernate.cfg.xml</config-file>

– biserver-ee/pentaho-solutions/system/hibernate/postgresql.hibernate.cfg.xml

1
2
3
4
5
    <property name="connection.driver_class">org.postgresql.Driver</property>
    <property name="connection.url">jdbc:postgresql://localhost:5432/hibernate</property>
    <property name="dialect">org.hibernate.dialect.PostgreSQLDialect</property>
    <property name="connection.username">pentaho_user</property>
    <property name="connection.password">password</property>

There are more occurrences in this file. Carefully do the necessary changes in all the places.
– biserver-ee/pentaho-solutions/system/jackrabbit/repository.xml
– biserver-ee/pentaho-solutions/system/jackrabbit/repository/workspaces/default/workspace.xml

– biserver-ee/tomcat/webapps/pentaho/META-INF/context.xml

1
2
3
4
5
<Resource name="jdbc/PDI_Operations_Mart" auth="Container" type="javax.sql.DataSource"
            factory="org.apache.commons.dbcp.BasicDataSourceFactory" maxActive="20" maxIdle="5"
            maxWait="10000" username="pentaho_user" password="password"
            driverClassName="org.postgresql.Driver" url="jdbc:postgresql://localhost:5432/hibernate"
            validationQuery="select 1"/>
1
Download postgresql and h2 drivers then place it under biserver-ee/tomcat/lib

postgresql-9.x.jdbc4.jar
h2-1.2.x.jar

Change Tomcat port on these two files to run Pentaho on different port.
Specify the Pentaho solutions path, server URL and port in web.xml of Tomcat webapp.

biserver-ee/tomcat/webapps/pentaho/WEB-INF/web.xml

1
2
3
4
5
6
7
8
9
        <context-param>
                <param-name>solution-path</param-name>
                <param-value>$INSTALLATION_PATH/biserver-ee/pentaho-solutions</param-value>
        </context-param>
        <context-param>
                <param-name>fully-qualified-server-url</param-name>
                <param-value>http://localhost:8080/pentaho/</param-value>
        </context-param>

Pentaho can be configured to run on different ports by changing ports on Tomcat server.xml and web.xml
– biserver-ee/tomcat/biserver-ee/tomcat/server.xml

1
2
3
    <Connector URIEncoding"UTF-8" port"9090" protocol"HTTP/1.1"
               connectionTimeout"20000"
               redirectPort"8443" />
1
2
3
4
5
- biserver-ee/tomcat/webapps/pentaho/WEB-INF/web.xml
         <context-param>
                <param-name>fully-qualified-server-url</param-name>
                <param-value>http://localhost:9090/pentaho/</param-value>
        </context-param>

Install license

A license needs to be installed to use Pentaho. Navigate to the license-installer directory in installation path. Feed license file to install_license.sh, separating more than one license file with spaces.

1
$ ./install_license.sh install ../license/Pentaho\ BI\ Platform\ Enterprise\ Edition.lic

Install plugins
Archive mode of installation installs only BI Server. Necessary plugins can to be installed manually. Here install the plugins for reporting, analyzer and dashboard. Plugins are available at the same place where download BI server. Download these three files and place at any path on server
Reporting – pir-plugin-ee-5.x-dist.zip
Analyzer – pdd-plugin-ee-5.0.0.1-dist.zip
Dashboard – paz-plugin-ee-5.0.0.1-dist.zip

All the plugins installed through same procedure
– Unzip the plugins and navigate to extracted directory
– run installer on console, accept the license and provide $INSTALLTION_PATH/biserver-ee/pentaho-solutions/system as location to install plugins

1
$ java -jar installer.jar -console

Lets start the BI server

1
biserver-ee$ ./start-pentaho.sh

Install the licenses for the plugins by login as admin user (default – Admin:password) or install through the command line
Administration -> License -> install licenses for plugin by click +

Troubleshooting:

1
2
biserver-ee$ tail -f tomcat/logs/catalina.out
biserver-ee$ tail -f tomcat/logs/pentaho.log

If the pentaho.xml is present at tomcat/conf/Catalina directory, delete it. It will be generated again when you start the BA Server.

Start and stop the BI server :

1
2
biserver-ee$ ./start-pentaho.sh
biserver-ee$ ./stop-pentaho.sh

Playing with Jedox (BI + CPM)


 

Today I would like to introduce Jedox , this MOLAP tool is the perfect mate for Pentaho BI.

BI & CPM

Analyze. Plan. Act.

Jedox empowers you to analyze historical data for meaningful insight, and to plan and forecast intelligently. Jedox unleashes your capabilities in one solution, unifying Business Intelligence and Corporate Performance Management (BI & CPM) into your competitive edge.

Below I explain some examples of Jedox capabilities:

1-) If we start pasting a following view with 2014 Actual and Budget data by Region

jedox ini pasteview

2-) We could copy actual 2014 on empty 2015 Budget

copy actual;2014

2

 

2b

3-) Then we will increase value a 20%

##20%

3a3b

 

 

4-) Now we will predict 2016 based on 2014 and 2015 data

predict 2014:2015

4a4b

 

5-) Distribute a fixed amount based on a previous year

17M like 2014; Actual

5a5b

 

The Pentaho Journey – A Big Data Booster


Originally posted on Pentaho Business Analytics Blog:

Pentaho-Journey-ingographic-updated

Ten years ago we set out to commoditize BI, disrupt the existing old school proprietary vendors and give customers a better choice. It’s been an exciting journey building up a scalable analytic platform, building an open source community, surviving a deep recession, beating up the competition, building a great team, providing great products and services to our customers, and being a major player in the big data market.

Some of the key points along the way:

2008 – the recession hits and frankly as painful as that was it actually helped Pentaho as we were the best value proposition in BI and people were looking to reduce expenditures. It also drastically reduced the cost of commercial office space and we opened up our San Fran office in Q2 2009.

2009 – at a happy hour in Orlando in November we stumbled upon our ability to natively integrate Pentaho Data Integration…

View original 222 more words

Pentaho 5.3 – Taming messy and extreme data


anonymousbi:

#Future has arrived #Pentaho53

Originally posted on Pentaho Business Analytics Blog:

cloudpoints
There has definitely been an evolution of how the industry talks about data. About five years ago the term ‘Big Data’ emerged to define the volume aspect of Big Data. Soon after, the definition of Big Data expanded to a better one that explains what it really is; not just big, but data that moves extremely fast, often lacks structure, varies greatly from existing data, doesn’t fit well with more traditional database technologies, and frankly, is best described as “messy”.

Fast-forward to 2015 and Pentaho’s announcement of version 5.3 this week to deliver on demand big data analytics at scale on Amazon Web Services and Cloudera Impala. This release is driven by what we see in more and more of our customers – (a new data term for you) — EXTREME data problems! Our customer NASDAQ is a very interesting example of where traditional relational data systems have maxed out and have been replaced by cloud architectures that include Hadoop, Pentaho…

View original 285 more words

Packt $5 eBook Bonanza campaign #packt5dollar


Packt $5 eBook Bonanza – Every title, every topic

Hi friends, the $5 eBook Bonanza is here!

Treat yourself to the eBook or Video of your choice for just $5 and get as many as you like until January 6th 2015. To get you started, we’ve put together the Top 20 Titles of 2014 for you to pick up here. But don’t forget, you can get ANY eBook or Video for $5 in this offer.

For specific tech, you can browse all our categories at the bottom of this page.
5dollar-300px_0