Pivotal Greenplum useful DBA SQL queries


1. Viewing table data distribution across segment Servers To view the data distribution of a table’s rows (the number of rows on each segment), you can run a query such as:

SELECT gp_segment_id, count(*)
FROM abc
GROUP BY gp_segment_id;

To see the data distribution of a table’s rows on segment servers and the directory location of where the data is located, you can run a query such as:

SELECT
table_name.gp_segment_id,
hostname,
fselocation as datadir,
count(*)
FROM table_name,
pg_filespace_entry pgfse,
gp_segment_configuration gsc
WHERE
gsc.dbid=pgfse.fsedbid
and table_name.gp_segment_id= pgfse.fsedbid
GROUP BY
table_name.gp_segment_id,hostname,datadir
ORDER BY
gp_segment_id;

Balanced Distribution: A table is considered to have a balanced distribution if all of the segments have roughly the same number of rows. 2. Check the size of a table Answer: Table Level:

select
 pg_size_pretty(pg_relation_size('schema.tablename'));

Replace schema.tablename with your search table. Table and Index:

select
pg_size_pretty(pg_total_relation_size('schema.tablename'));

Replace schema.tablename with your search table. 3. Check the Schema size Answer: Schema Level:

psql> select schemaname ,
round(
sum(pg_total_relation_size(schemaname||'.'||tablename))
/1024/1024) "Size_MB"
from pg_tables where schemaname='SCHEMANAME' group by 1;

Replace SCHEMANAME with your schema name. –4. Check the database size Answer: To see size of specific database:

psql> select
pg_size_pretty(pg_database_size('DATABASE_NAME'));

Example:
gpdb=# select pg_size_pretty(pg_database_size('gpdb'));

pg_size_pretty
----------------
24 MB
(1 row)

To see all database sizes:

 psql> select
datname,pg_size_pretty(pg_database_size(datname))
from pg_database;
psql>
select
sodddatname,
(sodddatsize/1073741824) AS sizeinGB
from gp_toolkit.gp_size_of_database;
 

6. Check partitioned table size including indexes and partitions Answer: Table size with partitions: The following SQL gives you employee_dailly table size, which includes partitions.

select
schemaname,
tablename,
round(
sum(pg_total_relation_size(schemaname || '.' || partitiontablename))
/1024/1024) "MB"
from pg_partitions
where tablename='employee_daily'
group by 1,2;

schemaname | tablename | MB
-----------+----------------+-----
public | employee_daily | 254

7. Generate Scripts to grant DML priv to other user or role

david=>
select
'grant select,insert,update, delete on '||schemaname||'.'||tablename||' to gpadmin'
from pg_tables
where tableowner='david';

?column?
------------------------------------------------------------------------------
grant select,insert,update, delete on david.countries to gpadmin
grant select,insert,update, delete on david.departments to gpadmin
grant select,insert,update, delete on david.employees to gpadmin
grant select,insert,update, delete on david.job_history to gpadmin
grant select,insert,update, delete on david.jobs to gpadmin
grant select,insert,update, delete on david.locations to gpadmin
grant select,insert,update, delete on david.regions to gpadmin
grant select,insert,update, delete on david.ext_countries_error to gpadmin
grant select,insert,update, delete on david.ext_countries to gpadmin
grant select,insert,update, delete on david.ext_departments_error to gpadmin
grant select,insert,update, delete on david.ext_departments to gpadmin
grant select,insert,update, delete on david.ext_employees_error to gpadmin
grant select,insert,update, delete on david.ext_employees to gpadmin
grant select,insert,update, delete on david.ext_job_history_error to gpadmin
grant select,insert,update, delete on david.ext_job_history to gpadmin
grant select,insert,update, delete on david.ext_jobs_error to gpadmin
grant select,insert,update, delete on david.ext_jobs to gpadmin
grant select,insert,update, delete on david.ext_locations_error to gpadmin
grant select,insert,update, delete on david.ext_locations to gpadmin
grant select,insert,update, delete on david.ext_regions_error to gpadmin
grant select,insert,update, delete on david.ext_regions to gpadmin
(21 rows)
-- Count
8. select count(*) from pg_tables where schemaname='david'; 

 -- List users
9. select rolname from pg_roles where rolcanlogin is TRUE;

--- List Roles
10. select rolname from pg_roles where rolcanlogin is FALSE;  

11. select * from pg_views ;
12. select * from information_schema.table_privileges;
13. select * from pg_user;
14. select * from pg_views where viewname like '%priv%';
15. select * from pg_views where viewname like '%role%';
16. select * from pg_views where viewname like '%gran%';
17. select * from pg_views where viewname like '%part%';
18. select * from pg_views where viewname like '%schema%';
19. select * from pg_views where viewname like '%stat%';
20. select * from information_schema.role_table_grants;
21. select * from information_schema.role_routine_grants;
22. select * from information_schema.applicable_roles;
23. select * from information_schema.enabled_roles;
24. select * from gp_toolkit.gp_roles_assigned;

25. Find dependent views created on any table The following query will give you the dependent views for that table:

# select
a.*,b.definition
from information_schema.view_table_usage
a, pg_views b where a.table_name='test' and
a.view_name=b.viewname;

When you alter an existing table (for example dropping a column), you may want to determine if there are any views that depend on the columns you are changing.

Example:
\dt test
List of relations
Schema | Name | Type | Owner | Storage
--------+------+-------+------------+---------
public | test | table | gpadmin | heap

create view test_view as (select * from test);

--Run the following query
--with "where a.table_name='<table>'

select
a.*,b.definition
from
information_schema.view_table_usage a,
pg_views b
where a.table_name='test'
and
a.view_name=b.viewname;

view_catalog | view_schema | view_name | table_catalog |
table_schema | table_name | definition
-------------+------------+-----------+-----------+--------------
+------------+--------------------
ddata | public | test_view | ddata |
public| test | SELECT test.a FROM test;

–26. Connections per database and activity:

SELECT usename,
 application_name, -- only valid for 9.x
 client_addr,
 backend_start,
 query_start,
 current_query
FROM pg_stat_activity
order by usename,client_addr

–26 b. Running queries and age:

SELECT age(query_start, backend_start) AS queryage,*
 FROM pg_stat_activity 
WHERE current_query NOT LIKE '%IDLE%'
 ORDER BY queryage DESC

–27 List of users:

select * from pg_user

–28 Connections per database:

SELECT datname, numbackends 
FROM pg_stat_database
WHERE numbackends > 0
ORDER BY numbackends DESC, datname

–29 Connections per database v2:

SELECT datname, COUNT(*) AS numbackends
FROM pg_stat_activity 
GROUP BY datname HAVING COUNT(*) > 0

–30 List procedures by user:

select * from pg_proc, pg_user
 where pg_user.usesysid = pg_proc.proowner
 order by proname

–31 Top 20 tables by size:

SELECT nspname || '.' || relname AS table,
 pg_size_pretty(pg_total_relation_size(c.oid)) AS size
FROM pg_class c
LEFT JOIN pg_namespace n ON (n.oid = c.relnamespace)
WHERE n.nspname NOT IN ('pg_catalog', 'information_schema') AND
 c.relkind <> 'i' AND
 n.nspname !~ '^pg_toast'
ORDER BY pg_total_relation_size(c.oid) DESC
LIMIT 20;
Advertisements

Oracle convert a string field with a list of elements in a set of rows


I will show one tricky way of creating a  subquery to build a set of rows coming from a string field which includes a list of valuaes separated by comma

Given the example of a string field with the following content ‘A,B,C,D’.Using REGEXP_SUBSTR you can extract only one of the 4 matches (A,B,C,D): the regex [^,]+ matches any character sequence in the string which does not contain a comma.

If you run:

SELECT REGEXP_SUBSTR ('A,B,C,D','[^,]+') as set_of_rows
FROM   DUAL

you’ll get A.

and if you’ll try running:

SELECT REGEXP_SUBSTR ('A,B,C,D','[^,]+',1,1) as set_of_rows
FROM   DUAL

you’ll also get A only that now we also sent two additional parameters: start looking in position 1 (which is the default), and return the 1st occurrence.

Now lets run:

SELECT REGEXP_SUBSTR ('A,B,C,D','[^,]+',1,2) as set_of_rows
FROM   DUAL

this time we’ll get B (2nd occurrence) and using 3 as the last parameter will return C and so on.

The use of recursive connected by along with level makes sure you’ll receive all the relevant results (not necessarily in the original order though!):

SELECT DISTINCT REGEXP_SUBSTR ('A,B,C,D','[^,]+',1,LEVEL) as set_of_rows
FROM   DUAL
CONNECT BY REGEXP_SUBSTR ('A,B,C,D','[^,]+',1,LEVEL) IS NOT NULL
order by 1

will return:

set_of_rows
A
B
C
D

which not only contains all 4 results, but also breaks it into separate rows in the resultset and will be useful to add it on an IN() sql clause

This query “abuses” the connect by functionality to generate rows in a query on dual. As long as the expression passed to connect by is true, it will generate a new row and increase the value of the pseudo column LEVEL. Then LEVEL is passed to regex_substr to get the nth value when applying the regular expression

Removing Special Characters from a string field in Oracle


Today while I was doing consultancy work I faced against the issue of loading a table into from Oracle to PostgreSQL, when I checked the logs I saw the some oracle varchar fields had strange characters at the end of them and this caused INSERT statements fail.  Initially I tried using Pentaho Data Integration  replace values in string and replace CR, LF and CRLF since they looked like carriage returns when copied the log files in Notepad++. But all attempts were unsuccessful, so I decided to look for Oracle functions and soon I got a proper solution.

REGEXP_REPLACE helped my as you could see in the query below

SELECT
REGEXP_REPLACE( customer_description ,'[^[:alnum:]'' '']', NULL)
 FROM dim_customer

 

Brief Explanation

The [[:alnum:]] character class represents alphabetic and numeric characters, and it is same as using [a-zA-Z0-9] in regular expression.

 

Hope you have enjoyed 🙂

Increase MySQL output to 80K rows/second in Pentaho Data Integration


One of our clients has a MySQL table with around 40M records. To load the table it took around 2,5 hours. When i was watching the statistics of the transformation I noticed that the bottleneck was the write to the database. I was stuck at around 2000 rows/second. You can imagine that it will take a long time to write 40M records at that speed.
I was looking in what way I could improve the speed. There were a couple of options:
  1. Tune MySQL for better performance on Inserts
  2. Use the MySQL Bulk loader step in PDI
  3. Write SQL statements to file with PDI and  read them with mysql-binary
When i discussed this with one of my contacts of Basis06 they faced a similar issue a while ago. He mentioned that speed can be boosted by using some simple JDBC-connection setting. useServerPrepStmts=false
rewriteBatchedStatements=true
useCompression=true

These options should be entered in PDI at the connection. Double click the connection go to Options and set these values.

Used together, useServerPrepStmts=false and rewriteBatchedStatements=true will “fake” batch inserts on the client. Specifically, the insert statements:

INSERT INTO t (c1,c2) VALUES ('One',1);
INSERT INTO t (c1,c2) VALUES ('Two',2);
INSERT INTO t (c1,c2) VALUES ('Three',3);

will be rewritten into:

INSERT INTO t (c1,c2) VALUES ('One',1),('Two',2),('Three',3);

The third option useCompression=true compresses the traffic between the client and the MySQL server.

Finally I increased the number of copies of the output step to 2 so that there are two treads inserting into the database.

This all together increased the speed to around 84.000 rows a second! WOW!

Pentaho BI Server 5.0.1CE MySQL installation guide


How to install Pentaho BI Server 5.0.1 Community Edition with MySQL 5.x and Windows

Hi all this is David Fombella and I would like to share a way to deploy Pentaho Community last edition installing its security into a MySQL database server.

Help me keep the guides up to date and the posts flowing by donating, every small amount of money helps!

btn_donatecc_lg

Here is the download link to get the biserver 5.0.1 CE .

http://sourceforge.net/projects/pentaho/files/Business%20Intelligence%20Server/5.0.1-stable/biserver-ce-5.0.1-stable.zip/download

Creation of Databases & Users

In our first stage we are going to create hibernate database and the user hibuser. To achieve this execute the SQL script  create_repository_mysql.sql included at biserver-ce\data\mysql5

SQL code

CREATE DATABASE IF NOT EXISTS `hibernate` DEFAULT CHARACTER SET latin1;
USE hibernate;
GRANT ALL ON hibernate.* TO 'hibuser'@'localhost' identified by 'password';
commit;

Next we create quartz database and the user pentaho_user. You only need to execute the SQL script  create_quartz_mysql.sql included at biserver-ce\data\mysql5

CREATE DATABASE IF NOT EXISTS `quartz` DEFAULT CHARACTER SET latin1;

grant all on quartz.* to 'pentaho_user'@'localhost' identified by 'password';

USE `quartz`;

DROP TABLE IF EXISTS QRTZ5_JOB_LISTENERS;
DROP TABLE IF EXISTS QRTZ5_TRIGGER_LISTENERS;
DROP TABLE IF EXISTS QRTZ5_FIRED_TRIGGERS;
DROP TABLE IF EXISTS QRTZ5_PAUSED_TRIGGER_GRPS;
DROP TABLE IF EXISTS QRTZ5_SCHEDULER_STATE;
DROP TABLE IF EXISTS QRTZ5_LOCKS;
DROP TABLE IF EXISTS QRTZ5_SIMPLE_TRIGGERS;
DROP TABLE IF EXISTS QRTZ5_CRON_TRIGGERS;
DROP TABLE IF EXISTS QRTZ5_BLOB_TRIGGERS;
DROP TABLE IF EXISTS QRTZ5_TRIGGERS;
DROP TABLE IF EXISTS QRTZ5_JOB_DETAILS;
DROP TABLE IF EXISTS QRTZ5_CALENDARS;

CREATE TABLE QRTZ5_JOB_DETAILS
  (
    JOB_NAME  VARCHAR(200) NOT NULL,
    JOB_GROUP VARCHAR(200) NOT NULL,
    DESCRIPTION VARCHAR(250) NULL,
    JOB_CLASS_NAME   VARCHAR(250) NOT NULL,
    IS_DURABLE VARCHAR(1) NOT NULL,
    IS_VOLATILE VARCHAR(1) NOT NULL,
    IS_STATEFUL VARCHAR(1) NOT NULL,
    REQUESTS_RECOVERY VARCHAR(1) NOT NULL,
    JOB_DATA BLOB NULL,
    PRIMARY KEY (JOB_NAME,JOB_GROUP)
);

CREATE TABLE QRTZ5_JOB_LISTENERS
  (
    JOB_NAME  VARCHAR(200) NOT NULL,
    JOB_GROUP VARCHAR(200) NOT NULL,
    JOB_LISTENER VARCHAR(200) NOT NULL,
    PRIMARY KEY (JOB_NAME,JOB_GROUP,JOB_LISTENER),
    FOREIGN KEY (JOB_NAME,JOB_GROUP)
        REFERENCES QRTZ5_JOB_DETAILS(JOB_NAME,JOB_GROUP)
);

CREATE TABLE QRTZ5_TRIGGERS
  (
    TRIGGER_NAME VARCHAR(200) NOT NULL,
    TRIGGER_GROUP VARCHAR(200) NOT NULL,
    JOB_NAME  VARCHAR(200) NOT NULL,
    JOB_GROUP VARCHAR(200) NOT NULL,
    IS_VOLATILE VARCHAR(1) NOT NULL,
    DESCRIPTION VARCHAR(250) NULL,
    NEXT_FIRE_TIME BIGINT(13) NULL,
    PREV_FIRE_TIME BIGINT(13) NULL,
    PRIORITY INTEGER NULL,
    TRIGGER_STATE VARCHAR(16) NOT NULL,
    TRIGGER_TYPE VARCHAR(8) NOT NULL,
    START_TIME BIGINT(13) NOT NULL,
    END_TIME BIGINT(13) NULL,
    CALENDAR_NAME VARCHAR(200) NULL,
    MISFIRE_INSTR SMALLINT(2) NULL,
    JOB_DATA BLOB NULL,
    PRIMARY KEY (TRIGGER_NAME,TRIGGER_GROUP),
    FOREIGN KEY (JOB_NAME,JOB_GROUP)
        REFERENCES QRTZ5_JOB_DETAILS(JOB_NAME,JOB_GROUP)
);

CREATE TABLE QRTZ5_SIMPLE_TRIGGERS
  (
    TRIGGER_NAME VARCHAR(200) NOT NULL,
    TRIGGER_GROUP VARCHAR(200) NOT NULL,
    REPEAT_COUNT BIGINT(7) NOT NULL,
    REPEAT_INTERVAL BIGINT(12) NOT NULL,
    TIMES_TRIGGERED BIGINT(10) NOT NULL,
    PRIMARY KEY (TRIGGER_NAME,TRIGGER_GROUP),
    FOREIGN KEY (TRIGGER_NAME,TRIGGER_GROUP)
        REFERENCES QRTZ5_TRIGGERS(TRIGGER_NAME,TRIGGER_GROUP)
);

CREATE TABLE QRTZ5_CRON_TRIGGERS
  (
    TRIGGER_NAME VARCHAR(200) NOT NULL,
    TRIGGER_GROUP VARCHAR(200) NOT NULL,
    CRON_EXPRESSION VARCHAR(200) NOT NULL,
    TIME_ZONE_ID VARCHAR(80),
    PRIMARY KEY (TRIGGER_NAME,TRIGGER_GROUP),
    FOREIGN KEY (TRIGGER_NAME,TRIGGER_GROUP)
        REFERENCES QRTZ5_TRIGGERS(TRIGGER_NAME,TRIGGER_GROUP)
);

CREATE TABLE QRTZ5_BLOB_TRIGGERS
  (
    TRIGGER_NAME VARCHAR(200) NOT NULL,
    TRIGGER_GROUP VARCHAR(200) NOT NULL,
    BLOB_DATA BLOB NULL,
    PRIMARY KEY (TRIGGER_NAME,TRIGGER_GROUP),
    FOREIGN KEY (TRIGGER_NAME,TRIGGER_GROUP)
        REFERENCES QRTZ5_TRIGGERS(TRIGGER_NAME,TRIGGER_GROUP)
);

CREATE TABLE QRTZ5_TRIGGER_LISTENERS
  (
    TRIGGER_NAME  VARCHAR(200) NOT NULL,
    TRIGGER_GROUP VARCHAR(200) NOT NULL,
    TRIGGER_LISTENER VARCHAR(200) NOT NULL,
    PRIMARY KEY (TRIGGER_NAME,TRIGGER_GROUP,TRIGGER_LISTENER),
    FOREIGN KEY (TRIGGER_NAME,TRIGGER_GROUP)
        REFERENCES QRTZ5_TRIGGERS(TRIGGER_NAME,TRIGGER_GROUP)
);

CREATE TABLE QRTZ5_CALENDARS
  (
    CALENDAR_NAME  VARCHAR(200) NOT NULL,
    CALENDAR BLOB NOT NULL,
    PRIMARY KEY (CALENDAR_NAME)
);

CREATE TABLE QRTZ5_PAUSED_TRIGGER_GRPS
  (
    TRIGGER_GROUP  VARCHAR(200) NOT NULL,
    PRIMARY KEY (TRIGGER_GROUP)
);

CREATE TABLE QRTZ5_FIRED_TRIGGERS
  (
    ENTRY_ID VARCHAR(95) NOT NULL,
    TRIGGER_NAME VARCHAR(200) NOT NULL,
    TRIGGER_GROUP VARCHAR(200) NOT NULL,
    IS_VOLATILE VARCHAR(1) NOT NULL,
    INSTANCE_NAME VARCHAR(200) NOT NULL,
    FIRED_TIME BIGINT(13) NOT NULL,
    PRIORITY INTEGER NOT NULL,
    STATE VARCHAR(16) NOT NULL,
    JOB_NAME VARCHAR(200) NULL,
    JOB_GROUP VARCHAR(200) NULL,
    IS_STATEFUL VARCHAR(1) NULL,
    REQUESTS_RECOVERY VARCHAR(1) NULL,
    PRIMARY KEY (ENTRY_ID)
);

CREATE TABLE QRTZ5_SCHEDULER_STATE
  (
    INSTANCE_NAME VARCHAR(200) NOT NULL,
    LAST_CHECKIN_TIME BIGINT(13) NOT NULL,
    CHECKIN_INTERVAL BIGINT(13) NOT NULL,
    PRIMARY KEY (INSTANCE_NAME)
);

CREATE TABLE QRTZ5_LOCKS
  (
    LOCK_NAME  VARCHAR(40) NOT NULL,
    PRIMARY KEY (LOCK_NAME)
);

INSERT INTO QRTZ5_LOCKS values('TRIGGER_ACCESS');
INSERT INTO QRTZ5_LOCKS values('JOB_ACCESS');
INSERT INTO QRTZ5_LOCKS values('CALENDAR_ACCESS');
INSERT INTO QRTZ5_LOCKS values('STATE_ACCESS');
INSERT INTO QRTZ5_LOCKS values('MISFIRE_ACCESS');
commit;

Finally we create jackrabbit database and the user jcr_user. You only need to execute the SQL script  create_jcr_mysql.sql included at biserver-ce\data\mysql5

CREATE DATABASE IF NOT EXISTS `jackrabbit` DEFAULT CHARACTER SET latin1;
grant all on jackrabbit.* to 'jcr_user'@'localhost' identified by 'password';
commit;

Configuring JDBC Security

This section describes how to configure the Pentaho BI Platform JDBC security to use a MySQL server, this means the Pentaho BI Platform will now point to the hibernate database on the MySQL server instead of the packaged HSQL in memory database.

1. applicationContext-spring-security-hibernate.properties.

Edit the file pentaho-solutions\system\applicationContext-spring-security-hibernate.properties.

Original code

jdbc.driver=org.hsqldb.jdbcDriver
jdbc.url=jdbc:hsqldb:hsql://localhost:9001/hibernate
jdbc.username=hibuser
jdbc.password=password
hibernate.dialect=org.hibernate.dialect.HSQLDialect
 

Make the changes necessary to get the snippet of code below

jdbc.driver=com.mysql.jdbc.Driver
jdbc.url=jdbc:mysql://localhost:3306/hibernate
jdbc.username=hibuser
jdbc.password=password
hibernate.dialect=org.hibernate.dialect.MySQLDialect
 

    2. hibernate-settings.xml

Edit the file pentaho-solutions\system\hibernate\hibernate-settings.xml.

Original code

<config-file>system/hibernate/hsql.hibernate.cfg.xml</config-file>
 

Make the changes necessary to get the snippet of code below

<config-file>system/hibernate/mysql5.hibernate.cfg.xml</config-file>
 

3. mysql5.hibernate.cfg.xml

Edit the file pentaho-solutions\system\hibernate\mysql5.hibernate.cfg.xml .

You do not need to make any changes to this file if you would like to use the default user hibuser. However, if you would like to specify your custom user, change connection.username and password  properties.

Original code

<property name="connection.driver_class">com.mysql.jdbc.Driver</property>
<property name="connection.url">jdbc:mysql://localhost:3306/hibernate</property>
<property name="dialect">org.hibernate.dialect.MySQL5InnoDBDialect</property>
<property name="connection.username">hibuser</property>
<property name="connection.password">password</property>
 

4. quartz.properties

Edit the file pentaho-solutions\system\quartz\quartz.properties .

Original code

org.quartz.jobStore.driverDelegateClass = org.quartz.impl.jdbcjobstore.PostgreSQLDelegate
 

Make the changes necessary to get the snippet of code below

org.quartz.jobStore.driverDelegateClass = org.quartz.impl.jdbcjobstore.StdJDBCDelegate
 

Configuring Hibernate and Quartz

Hibernate and Quartz need to specifically use the hibernate and quartz databases which were created on the MySQL server. To do so modifications need to be executed in context.xml file .

5. context.xml

Edit the file tomcat\webapps\pentaho\META-INF\context.xml.

Remember deleting tomcat\conf\Catalina\localhost\pentaho.xml , Pentaho creates  on startup pentaho.xml as a copy of context.xml.

Original code

<Resource name="jdbc/Hibernate" auth="Container" type="javax.sql.DataSource"
factory="org.apache.commons.dbcp.BasicDataSourceFactory" maxActive="20" maxIdle="5"
maxWait="10000" username="hibuser" password="password"
driverClassName="org.hsqldb.jdbcDriver" url="jdbc:hsqldb:hsql://localhost/hibernate"
validationQuery="select count(*) from INFORMATION_SCHEMA.SYSTEM_SEQUENCES" />

<Resource name="jdbc/Quartz" auth="Container" type="javax.sql.DataSource"
factory="org.apache.commons.dbcp.BasicDataSourceFactory" maxActive="20" maxIdle="5"
maxWait="10000" username="pentaho_user" password="password"
driverClassName="org.hsqldb.jdbcDriver" url="jdbc:hsqldb:hsql://localhost/quartz"
validationQuery="select count(*) from INFORMATION_SCHEMA.SYSTEM_SEQUENCES"/>
 

Make the changes necessary to get the snippet of code below

<Resource name="jdbc/Hibernate" auth="Container" type="javax.sql.DataSource"
factory="org.apache.commons.dbcp.BasicDataSourceFactory" maxActive="20" maxIdle="5"
maxWait="10000" username="hibuser" password="password"
driverClassName="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost:3306/hibernate"
validationQuery="select 1" />

<Resource name="jdbc/Quartz" auth="Container" type="javax.sql.DataSource"
factory="org.apache.commons.dbcp.BasicDataSourceFactory" maxActive="20" maxIdle="5"
maxWait="10000" username="pentaho_user" password="password"
driverClassName="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost:3306/quartz"
validationQuery="select 1"/>
 

6. repository.xml

Edit the file pentaho-solutions\system\jackrabbit\repository.xml.

One of the new features of pentaho 5 is the use of Jackrabbit content repository.

http://jackrabbit.apache.org/repository-server-howto.html

jlogo

Comment the original code in the FileSystem part

<!--
Replace the following "FileSystem" XML node to use supported databases as
the repository file system.  Change the url, user, password and other parameters
to suit your db installation.  The schemaObjectPrefix should
be a unique prefix that will be prepended to the table names.
NOTE: The database must be pre-created in and match the parameters.  See Jackrabbit
documentation for further explanation.
-->
<FileSystem>
<param name="path" value="${rep.home}/repository"/>
</FileSystem>
 

Make this code active on FileSystem part of the code

<FileSystem>
<param name="driver" value="com.mysql.jdbc.Driver"/>
<param name="url" value="jdbc:mysql://localhost:3306/jackrabbit"/>
<param name="user" value="jcr_user"/>
<param name="password" value="password"/>
<param name="schema" value="mysql"/>
<param name="schemaObjectPrefix" value="fs_repos_"/>
</FileSystem>
 

Comment the original code in the DataStore part

<!--
data store configuration
-->
<!--
Replace the following "DataStore" XML node to use supported databases as the data
store for the repository.  Change the url, user, password and other parameters
to suit your db installation.  The schemaObjectPrefix should
be a unique prefix that will be prepended to the table names.
NOTE: The database must be pre-created in and match the parameters.  See Jackrabbit
documentation for further explanation.
-->
<DataStore class="org.apache.jackrabbit.core.data.FileDataStore"/>
 

Make this code active on DataStore part of the code

 <DataStore class="org.apache.jackrabbit.core.data.db.DbDataStore">
    <param name="url" value="jdbc:mysql://localhost:3306/jackrabbit"/>
    <param name="user" value="jcr_user"/>
    <param name="password" value="password"/>
    <param name="databaseType" value="mysql"/>
    <param name="driver" value="com.mysql.jdbc.Driver"/>
    <param name="minRecordLength" value="1024"/>
    <param name="maxConnections" value="3"/>
    <param name="copyWhenReading" value="true"/>
    <param name="tablePrefix" value=""/>
    <param name="schemaObjectPrefix" value="ds_repos_"/>
  </DataStore>

Below the security part comment the original code in the FileSystem Workspace part

<!--
virtual file system of the workspace:
class: FQN of class implementing the FileSystem interface
-->
<!--
Replace the following "FileSystem" XML node to use supported databases as
the repository file system.  Change the url, user, password and other parameters
to suit your db installation.  The schemaObjectPrefix should
be a unique prefix that will be prepended to the table names.
NOTE: The database must be pre-created in and match the parameters.  See Jackrabbit
documentation for further explanation.
-->
<FileSystem>
<param name="path" value="${wsp.home}"/>
</FileSystem>
 

Make this code active on FileSystem WorkSpace part of the code

<FileSystem>
<param name="driver" value="com.mysql.jdbc.Driver"/>
<param name="url" value="jdbc:mysql://localhost:3306/jackrabbit"/>
<param name="user" value="jcr_user"/>
<param name="password" value="password"/>
<param name="schema" value="mysql"/>
<param name="schemaObjectPrefix" value="fs_ws_"/>
</FileSystem>

Below  FileSystem Workspace part you will find the PersistenceManager part

<!--
persistence manager of the workspace:
class: FQN of class implementing the PersistenceManager interface
-->
<!--
Replace the following "PersistenceManager" XML node to use a supported database as the
persistenanceManager store.  Change the url, user, password and parameters
to suit your db installation.  The schemaObjectPrefix should
be a unique prefix that will be prepended to the table names.
NOTE: The database must be pre-created in and match the parameters.  See Jackrabbit
documentation for further explanation.
-->

<PersistenceManager>
<param name="url" value="jdbc:h2:${wsp.home}/db"/>
<param name="schemaObjectPrefix" value="${wsp.name}_"/>
</PersistenceManager>

 

Make this code active on PersistenceManager part of the code

<PersistenceManager>
<param name="url" value="jdbc:mysql://localhost:3306/jackrabbit"/>
<param name="user" value="jcr_user" />
<param name="password" value="password" />
<param name="schema" value="mysql"/>
<param name="schemaObjectPrefix" value="${wsp.name}_pm_ws_"/>
</PersistenceManager>

Below you will find FileSystem Versioning part

<!--
Configures the filesystem to use for versioning for the respective
persistence manager
-->
<!--
Replace the following "FileSystem" XML node to use a supported database as
the repository file system.  Change the url, user, password and other parameters
to suit your db installation.  The schemaObjectPrefix should
be a unique prefix that will be prepended to the table names.
NOTE: The database must be pre-created in and match the parameters.  See Jackrabbit
documentation for further explanation.
-->
<FileSystem>
<param name="path" value="${rep.home}/version" />
</FileSystem>
 

Make this code active on FileSystem Versioning part

<FileSystem>
<param name="driver" value="com.mysql.jdbc.Driver"/>
<param name="url" value="jdbc:mysql://localhost:3306/jackrabbit"/>
<param name="user" value="jcr_user"/>
<param name="password" value="password"/>
<param name="schema" value="mysql"/>
<param name="schemaObjectPrefix" value="fs_ver_"/>
</FileSystem>

Below you will find PersistenceManager Versioning part

<!--
Configures the persistence manager to be used for persisting version state.
Please note that the current versioning implementation is based on
a 'normal' persistence manager, but this could change in future
implementations.
-->
<!--
Replace the following "PersistenceManager" XML node to use a supported database as the
persistenanceManager store.  Change the url, user, password and parameters
to suit your db installation.  The schemaObjectPrefix should
be a unique prefix that will be prepended to the table names.
NOTE: The database must be pre-created in and match the parameters.  See Jackrabbit
documentation for further explanation.
-->
<PersistenceManager>
<param name="url" value="jdbc:h2:${rep.home}/version/db"/>
<param name="schemaObjectPrefix" value="version_"/>
</PersistenceManager>

Make this code active on PersistenceManager Versioning part

<PersistenceManager>
<param name="url" value="jdbc:mysql://localhost:3306/jackrabbit"/>
<param name="user" value="jcr_user" />
<param name="password" value="password" />
<param name="schema" value="mysql"/>
<param name="schemaObjectPrefix" value="pm_ver_"/>
</PersistenceManager>

Quit HSQL Hypersonic automatic startup

By default  Hypersonic database starts up automatically – to avoid this  comment or delete locate the following snippets of code from web.xml:

<!-- [BEGIN HSQLDB DATABASES] -->
<context-param>
<param-name>hsqldb-databases</param-name>
<param-value>sampledata@../../data/hsqldb/sampledata,hibernate@../../data/hsqldb/hibernate,quartz@../../data/hsqldb/quartz</param-value>
</context-param>
<!-- [END HSQLDB DATABASES] -->
 

Second  section you need to comment or eliminate

<!-- [BEGIN HSQLDB STARTER] -->
<listener>
<listener-class>org.pentaho.platform.web.http.context.HsqldbStartupListener</listener-class>
</listener>
<!-- [END HSQLDB STARTER] -->
 

Sign into the user console

Using this URL http://localhost:8080/pentaho you can reach pentaho login screen:

login biserver5Once you are into the user console, you will see  Home perspective with several buttons:

  • Browse Files: Navigate through Public and Home root directories and Cut, Copy, Share, Download, Schedule all contents included in Pentaho ( PRPT reports,  sequence actions, jobs, transformations, olap views, saiku views —)
  • Create New: (JPivot View , Add components via Marketplace)
  • Manage DataSources: Create, Edit and Delete  JDBC datasources, Mondrian metadata sources  and Metadata.xmi reporting ad hoc datasources
  • Documentation: Documentation for Pentaho 5.0

PUC biserver5

One of the main changes of Pentaho 5 is the fact that there isn’t an external Administration Console and administration capabilities are included in the User Console. In this version our “ancient”  friend joe has been replaced with the admin user. Besides the new roles available are the following:

  • Administrator
  • Power User
  • Report Author
  • Business Analyst

You can access in the administration perspective in the user console using the dropdown and selecting Administration:

administration biserver5Then you will reach the administration perspective

administration roles biserver5

Oracle tips (Display sessions by user,application and Tablespace info)


Sometimes you are developing with kettle and you need to open several connections to the same database, in this case I will show how to check the amount of connections opened in Oracle. Below there are some useful queries:

-- Show sessions by user
select osuser, username, machine, program
from v$session
order by osuser
-- Show sessions by application
select program Application, count(program) Number_of_Sessions
from v$session
group by program
order by Number_of_Sessions desc

If you detect you have too many connections it is time to convert your transformations transactional, using this feature you will prevent collapse the pool of connections. (Check the option Make the transformation database transactional? )

Transactional database

Below I attach a query to display the size and free space on the different tablespaces.

-- Show Tablespaces
select b.tablespace_name, tbs_size SizeMb, a.free_space FreeMb
from  (select tablespace_name, round(sum(bytes)/1024/1024 ,2) as free_space
from dba_free_space
group by tablespace_name) a,
(select tablespace_name, sum(bytes)/1024/1024 as tbs_size
from dba_data_files
group by tablespace_name) b
where a.tablespace_name(+)=b.tablespace_name;

Slowly changing dimensions


Slowly Changing Dimensions

January 25, 2012

The “Slowly Changing Dimension” problem is a common and persistent issue in data warehousing. Simply this applies to records where attributes change over time. Here is an example;  Sue is a customer with a major retailer. She first lived in Potter Heigham, Norfolk. So, the original entry in the customer lookup table has the following record:

lngCustomerURN strName Town County
1001 Sue Potter Heigham Norfolk

Later she moved to Cambridge, Cambridgeshire in March, 2009. How should we modify its customer table to reflect the change? This is the “Slowly Changing Dimension” problem.

There are in general three ways to solve this type of problem, and they are categorized as follows:

Type 1: Replace the old record with a new one, the only problem is that no trace of the old record exists and any historical information is lost.  If for example, we had sales for one region associated with her, when the replacement is done, the sale will move with her.

Type 2: We add a new record to the customer table. We retain the existing history of old transactions against the customer and new transactions are linked to the new customer record.

Type 3: The original record is modified to reflect the change, this generally involves having an additional column (or columns).  This is a simplified Customer record to show how a type 3 record could look.

lngCustomerURN strName strPreviousTown strPreviousCounty strCurrentTown strCurrentCounty
1001 Sue Potter Heigham Norfolk Cambridge Cambridgeshire

It is important to recognise that even with a type 2 change additional fields need to exist in the table in order to ensure that the data retain their integrity.

lngCustomerURN strName strTown strCounty blnActive blnDateActive
1001 Sue Potter Heigham Norfolk False 22/11/2003
1001 Sue Cambridge Cambridgeshire True 12/03/2009

There is a type 6 change which basically incorporates all the combined functionality of types 1, 2 and 3 combined.  Type 6 was first described by Ralph Kimball who called it a ‘hybrid approach’ combining the three basic techniques.

  • Type 2, creating new rows to capture change
  • Type 3, adding attributes to reflect an alternative view
  • Type 1, which are overwritten for all earlier dimension rows.