Pivotal Greenplum useful DBA SQL queries


1. Viewing table data distribution across segment Servers To view the data distribution of a table’s rows (the number of rows on each segment), you can run a query such as:

SELECT gp_segment_id, count(*)
FROM abc
GROUP BY gp_segment_id;

To see the data distribution of a table’s rows on segment servers and the directory location of where the data is located, you can run a query such as:

SELECT
table_name.gp_segment_id,
hostname,
fselocation as datadir,
count(*)
FROM table_name,
pg_filespace_entry pgfse,
gp_segment_configuration gsc
WHERE
gsc.dbid=pgfse.fsedbid
and table_name.gp_segment_id= pgfse.fsedbid
GROUP BY
table_name.gp_segment_id,hostname,datadir
ORDER BY
gp_segment_id;

Balanced Distribution: A table is considered to have a balanced distribution if all of the segments have roughly the same number of rows. 2. Check the size of a table Answer: Table Level:

select
 pg_size_pretty(pg_relation_size('schema.tablename'));

Replace schema.tablename with your search table. Table and Index:

select
pg_size_pretty(pg_total_relation_size('schema.tablename'));

Replace schema.tablename with your search table. 3. Check the Schema size Answer: Schema Level:

psql> select schemaname ,
round(
sum(pg_total_relation_size(schemaname||'.'||tablename))
/1024/1024) "Size_MB"
from pg_tables where schemaname='SCHEMANAME' group by 1;

Replace SCHEMANAME with your schema name. –4. Check the database size Answer: To see size of specific database:

psql> select
pg_size_pretty(pg_database_size('DATABASE_NAME'));

Example:
gpdb=# select pg_size_pretty(pg_database_size('gpdb'));

pg_size_pretty
----------------
24 MB
(1 row)

To see all database sizes:

 psql> select
datname,pg_size_pretty(pg_database_size(datname))
from pg_database;
psql>
select
sodddatname,
(sodddatsize/1073741824) AS sizeinGB
from gp_toolkit.gp_size_of_database;
 

6. Check partitioned table size including indexes and partitions Answer: Table size with partitions: The following SQL gives you employee_dailly table size, which includes partitions.

select
schemaname,
tablename,
round(
sum(pg_total_relation_size(schemaname || '.' || partitiontablename))
/1024/1024) "MB"
from pg_partitions
where tablename='employee_daily'
group by 1,2;

schemaname | tablename | MB
-----------+----------------+-----
public | employee_daily | 254

7. Generate Scripts to grant DML priv to other user or role

david=>
select
'grant select,insert,update, delete on '||schemaname||'.'||tablename||' to gpadmin'
from pg_tables
where tableowner='david';

?column?
------------------------------------------------------------------------------
grant select,insert,update, delete on david.countries to gpadmin
grant select,insert,update, delete on david.departments to gpadmin
grant select,insert,update, delete on david.employees to gpadmin
grant select,insert,update, delete on david.job_history to gpadmin
grant select,insert,update, delete on david.jobs to gpadmin
grant select,insert,update, delete on david.locations to gpadmin
grant select,insert,update, delete on david.regions to gpadmin
grant select,insert,update, delete on david.ext_countries_error to gpadmin
grant select,insert,update, delete on david.ext_countries to gpadmin
grant select,insert,update, delete on david.ext_departments_error to gpadmin
grant select,insert,update, delete on david.ext_departments to gpadmin
grant select,insert,update, delete on david.ext_employees_error to gpadmin
grant select,insert,update, delete on david.ext_employees to gpadmin
grant select,insert,update, delete on david.ext_job_history_error to gpadmin
grant select,insert,update, delete on david.ext_job_history to gpadmin
grant select,insert,update, delete on david.ext_jobs_error to gpadmin
grant select,insert,update, delete on david.ext_jobs to gpadmin
grant select,insert,update, delete on david.ext_locations_error to gpadmin
grant select,insert,update, delete on david.ext_locations to gpadmin
grant select,insert,update, delete on david.ext_regions_error to gpadmin
grant select,insert,update, delete on david.ext_regions to gpadmin
(21 rows)
-- Count
8. select count(*) from pg_tables where schemaname='david'; 

 -- List users
9. select rolname from pg_roles where rolcanlogin is TRUE;

--- List Roles
10. select rolname from pg_roles where rolcanlogin is FALSE;  

11. select * from pg_views ;
12. select * from information_schema.table_privileges;
13. select * from pg_user;
14. select * from pg_views where viewname like '%priv%';
15. select * from pg_views where viewname like '%role%';
16. select * from pg_views where viewname like '%gran%';
17. select * from pg_views where viewname like '%part%';
18. select * from pg_views where viewname like '%schema%';
19. select * from pg_views where viewname like '%stat%';
20. select * from information_schema.role_table_grants;
21. select * from information_schema.role_routine_grants;
22. select * from information_schema.applicable_roles;
23. select * from information_schema.enabled_roles;
24. select * from gp_toolkit.gp_roles_assigned;

25. Find dependent views created on any table The following query will give you the dependent views for that table:

# select
a.*,b.definition
from information_schema.view_table_usage
a, pg_views b where a.table_name='test' and
a.view_name=b.viewname;

When you alter an existing table (for example dropping a column), you may want to determine if there are any views that depend on the columns you are changing.

Example:
\dt test
List of relations
Schema | Name | Type | Owner | Storage
--------+------+-------+------------+---------
public | test | table | gpadmin | heap

create view test_view as (select * from test);

--Run the following query
--with "where a.table_name='<table>'

select
a.*,b.definition
from
information_schema.view_table_usage a,
pg_views b
where a.table_name='test'
and
a.view_name=b.viewname;

view_catalog | view_schema | view_name | table_catalog |
table_schema | table_name | definition
-------------+------------+-----------+-----------+--------------
+------------+--------------------
ddata | public | test_view | ddata |
public| test | SELECT test.a FROM test;

–26. Connections per database and activity:

SELECT usename,
 application_name, -- only valid for 9.x
 client_addr,
 backend_start,
 query_start,
 current_query
FROM pg_stat_activity
order by usename,client_addr

–26 b. Running queries and age:

SELECT age(query_start, backend_start) AS queryage,*
 FROM pg_stat_activity 
WHERE current_query NOT LIKE '%IDLE%'
 ORDER BY queryage DESC

–27 List of users:

select * from pg_user

–28 Connections per database:

SELECT datname, numbackends 
FROM pg_stat_database
WHERE numbackends > 0
ORDER BY numbackends DESC, datname

–29 Connections per database v2:

SELECT datname, COUNT(*) AS numbackends
FROM pg_stat_activity 
GROUP BY datname HAVING COUNT(*) > 0

–30 List procedures by user:

select * from pg_proc, pg_user
 where pg_user.usesysid = pg_proc.proowner
 order by proname

–31 Top 20 tables by size:

SELECT nspname || '.' || relname AS table,
 pg_size_pretty(pg_total_relation_size(c.oid)) AS size
FROM pg_class c
LEFT JOIN pg_namespace n ON (n.oid = c.relnamespace)
WHERE n.nspname NOT IN ('pg_catalog', 'information_schema') AND
 c.relkind <> 'i' AND
 n.nspname !~ '^pg_toast'
ORDER BY pg_total_relation_size(c.oid) DESC
LIMIT 20;

Slowly changing dimensions


Slowly Changing Dimensions

January 25, 2012

The “Slowly Changing Dimension” problem is a common and persistent issue in data warehousing. Simply this applies to records where attributes change over time. Here is an example;  Sue is a customer with a major retailer. She first lived in Potter Heigham, Norfolk. So, the original entry in the customer lookup table has the following record:

lngCustomerURN strName Town County
1001 Sue Potter Heigham Norfolk

Later she moved to Cambridge, Cambridgeshire in March, 2009. How should we modify its customer table to reflect the change? This is the “Slowly Changing Dimension” problem.

There are in general three ways to solve this type of problem, and they are categorized as follows:

Type 1: Replace the old record with a new one, the only problem is that no trace of the old record exists and any historical information is lost.  If for example, we had sales for one region associated with her, when the replacement is done, the sale will move with her.

Type 2: We add a new record to the customer table. We retain the existing history of old transactions against the customer and new transactions are linked to the new customer record.

Type 3: The original record is modified to reflect the change, this generally involves having an additional column (or columns).  This is a simplified Customer record to show how a type 3 record could look.

lngCustomerURN strName strPreviousTown strPreviousCounty strCurrentTown strCurrentCounty
1001 Sue Potter Heigham Norfolk Cambridge Cambridgeshire

It is important to recognise that even with a type 2 change additional fields need to exist in the table in order to ensure that the data retain their integrity.

lngCustomerURN strName strTown strCounty blnActive blnDateActive
1001 Sue Potter Heigham Norfolk False 22/11/2003
1001 Sue Cambridge Cambridgeshire True 12/03/2009

There is a type 6 change which basically incorporates all the combined functionality of types 1, 2 and 3 combined.  Type 6 was first described by Ralph Kimball who called it a ‘hybrid approach’ combining the three basic techniques.

  • Type 2, creating new rows to capture change
  • Type 3, adding attributes to reflect an alternative view
  • Type 1, which are overwritten for all earlier dimension rows.

Be welcome to Open Source Business Intelligence experiences!


Business Intelligence is a growing field. The roots are in Data Warehousing (collecting data), data mining (doing analytics on data) and decision support (dashboards, reports and event notifications). That is why I am attracted to this field. During 2011 I worked as Business Intelligence consultant at CSC, focused in designing and developing BI solutions (ETL processes, ad –hoc reporting, OLAP) with Pentaho Business Intelligence and Analytics. Currently I am working at Stratebi in Madrid (Pº de la Castellana).

Nowadays, you need to provide the business decision makers
with the tools they need to make use of the data. In this
context, “tools” means much more than just software. It
means everything the business users need to understand
what information is available, find the subsets they need,
and structure the data to illuminate the underlying business
dynamics

In this blog I will try to share my experiences with Business Intelligence and open source software.