Big Data Bucks the Big Data Global Economic Trend


Big Data Bucks the Big Data Global Economic Trend.

As I approach the end of my first year as Pentaho’s CEO, I’ve being reflecting on two things: one, the exceptional opportunity that big data analytics presents to individual companies and the global economy and two, my good fortune in having joined such a courageous and visionary team of people. As you start planning for 2013, I wanted to take a few minutes to share some of my reflections.

Let me start with the opportunity. Many Western economies that we do business in have struggled this year, from 25 percent unemployment levels in Spain to double-dip recession in the UK to disappointing job growth in the US. And yet in the past year at Pentaho, we achieved record growth overall, with the last quarter being our best ever for big data sales.

Economists advocate the urgent need for growth but as many countries have learned the hard way, growth doesn’t arise simply from cost-cutting or producing more of the same, but in intelligently creating new demand.

And that’s precisely where big data analytics comes in.  Big data analytics, with its potential to tap into the staggering wealth of online and corporate data, is empowering companies to identify new revenue opportunities and use precious resources more profitably and more sustainably.

Our customer stories prove that this opportunity is no pipe dream – whether it’s German-based developer Travian Games, which uses Pentaho to analyses usage patterns from 120 million gamers to innovate its products or Shareable Ink, an enterprise cloud application provider that analyses and presents critical, document-based information to health care professionals in order to improve patient care.  These companies are outmaneuvering and outpacing their competitors in very crowded, saturated markets.  And new uses cases for big data analytics are emerging all the time, like clickstream analysis in online retailing and device analytics in IT departments.

One of the biggest revelations for me this year has been hearing prospective customers tell me how hard it is it to work with big data technology.  With amazing foresight, our product development team decided several years ago to take ‘the road less travelled’ in business intelligence and prioritized solving the most complex aspects of data integration for customers.  This laid the foundation for our early leadership in big data analytics.  Kenneth Wrife from our Swedish partner Omicron summed it up best when he said, “data visualization is of course necessary, but also something of a commodity.  The hardest part of data integration is extracting data from a variety of different types of sources and assimilating that data so that it is ready to be analyzed.  That’s what Pentaho does better than any other vendor’.

However, don’t imagine for a second that we’re resting on our laurels.  In order for big data analytics to ‘cross the chasm’ into mainstream adoption, we aim to make it much easier for ordinary business users, data analysts and data scientists to work with.  It also needs to be more accessible to the growing range SaaS-deployed applications on mobile devices, especially tablets.  Without giving the game away, I can promise you that you will see some very exciting new developments unfolding in these areas in the fourth quarter.

We’d love to hear from you if you are defying the global economic slump and using big data analytics to identify new sources of value for your company.  We hope you like what you see in Q4 and, as ever, thanks for your continued support.

Advertisements

Pentaho Data Integration: Remote execution with Carte


Requirements

  • Software: PDI/Kettle 4.3.0,  installed on your PC and on a server
  • Knowledge: Intermediate (To follow this tutorial you should have good knowledge of the software and hence not every single step will be described)

Carte is an often overlooked small web server that comes with Pentaho Data Integration/Kettle. It allows remote execution of transformation and jobs. It even allows you to create static and dynamic clusters, so that you can easily run your power hungry transformation or jobs on multiple servers. In this session you will get a brief introduction on how to work with Carte.

Now let’s get started: SSH to the server where Kettle is running on (this assumes you have already installed Kettle there).

Encrypt password

Carte requires a user name and password. It’s good practise to encrypt this password. Thankfully Kettle already comes with an encryption utility.
In the PDI/data-integration/ directory run:
sh encr.sh -carte yourpassword
OBF:1mpsdfsg323fssmmww3352gsdf7

Open pwd/kettle.pwd and copy the encrypted password after “cluster: “:

vi ./pwd/kettle.pwd
# Please note that the default password (cluster) is obfuscated using the Encr script provided in this release
# Passwords can also be entered in plain text as before
#
cluster: OBF:1mpsdfsg323fssmmww3352gsdf7

Please note that “cluster” is the default user name.

Start carte.sh
Make sure first that the port you will use is available and open.

In the simplest form you start carte with just one slave that resides on the same instance:

nohup sh carte.sh localhost 8181 > carte.err.log &
After this, press CTRL+C .
To see if it started:
tail -f carte.err.log
Although outside the scope of the session, I will give you a brief idea on how to set up a cluster: If you want to run a cluster, you have to create a configuration XML file. Examples can be found in the pwd directory. Open one of these XMLs and amend it to your needs. Then issue following command:

sh carte.sh ./pwd/carte-config-8181.xml >> ./pwd/err.log
Check if the server is running

Issue following commands:

[root@ip-11-111-11-111 data-integration]# ifconfig
eth0      Link encap:Ethernet  HWaddr …
          inet addr:11.111.11.111  Bcast:
[… details omitted …]
[root@ip-11-111-11-111 data-integration]# wget http://cluster:yourpassword@11.111.11.111:8181
–2011-01-31 13:53:02–  http://cluster:*password*@11.111.11.111:8181/
Connecting to 11.111.11.111:8181… connected.
HTTP request sent, awaiting response… 401 Unauthorized
Reusing existing connection to 11.111.11.111:8181.
HTTP request sent, awaiting response… 200 OK
Length: 158 [text/html]
Saving to: `index.html’
100%[======================================>] 158         –.-K/s   in 0s
2011-01-31 13:53:02 (9.57 MB/s) – `index.html’ saved [158/158]

If you get a message like the one above, a web server call is possible, hence the web server is running.

With the wget command you have to pass on the
  • user name (highlighted blue)
  • password (highlighted violet)
  • IP address (highlighted yellow)
  • port number (highlighted red)
Or you can install lynx:
[root@ip-11-111-11-111 data-integration]# yum install lynx
[root@ip-11-111-11-111 data-integration]# lynx http://cluster:yourpassword@11.111.11.111:8181
It will ask you for user name and password and then you should see a simple text representation of the website: Not more than a nearly empty Status page will be shown.
                                                            Kettle slave server
Slave server menu
   Show status
Commands: Use arrow keys to move, ‘?’ for help, ‘q’ to quit, ‘<-‘ to go back.
  Arrow keys: Up and Down to move.  Right to follow a link; Left to go back.
 H)elp O)ptions P)rint G)o M)ain screen Q)uit /=search [delete]=history list

You can also just type the URL in your local web browser:

You will be asked for user name and password and then you should see an extremely basic page.
Define slave server in Kettle

  1. Open Kettle, open a transformation or job
  2. Click on the View panel
  3. Right click on Slave server and select New.
Specify all the details and click OK. In the tree view, right click on the slave server you just set up and choose Monitor. Kettle will now display the running transformations and jobs in a new tab:
Your transformations can only use the slave server if you specify it in the Execute a transformation dialog.
For jobs you have to specify the remote slave server in each job entry dialog.
If you want to set up a cluster schema, define the slaves first, then right click on Kettle cluster schemas. Define a Schema Name and the other details, then click on Select slave servers. Specify the servers that you want to work with and define one as the master. A full description of this process is outside the scope of this article. For further info, the “Pentaho Kettle Solutions” book will give you a detailed overview.
For me a convenient way to debug a remote execution is to open a terminal window, ssh to the remote server and tail -f carte.err.log. You can follow the error log in Spoon as well, but you’ll have to refresh it manually all the time.

First Impressions are Important!


My mother always told me that first impressions mean everything. I believe this theory is also the same for companies. I could talk to you all day about the value and benefits of our technology, services and company standards, but at the end of the day you probably have a few minutes to explore what Pentaho is all about.

We are very excited to unveil a new and improved version of a Pentaho product “Test Drive.”  Take a look at our “Try Pentaho Now” page to:

  • Get to know Pentaho in under 2 minutes – watch the Pentaho Business Analytic video
  • Test drive Pentaho through a new hands-on, download free product experience to learn about Pentaho through tutorials, videos and samples to “read, watch and do” to learn about Pentaho.

We think you will leave impressed with Pentaho!

Let us know what you think….

(Post by Donna Prlich
Director, Product Marketing
Pentaho)