Oracle convert a string field with a list of elements in a set of rows

I will show one tricky way of creating a  subquery to build a set of rows coming from a string field which includes a list of valuaes separated by comma

Given the example of a string field with the following content ‘A,B,C,D’.Using REGEXP_SUBSTR you can extract only one of the 4 matches (A,B,C,D): the regex [^,]+ matches any character sequence in the string which does not contain a comma.

If you run:

SELECT REGEXP_SUBSTR ('A,B,C,D','[^,]+') as set_of_rows

you’ll get A.

and if you’ll try running:

SELECT REGEXP_SUBSTR ('A,B,C,D','[^,]+',1,1) as set_of_rows

you’ll also get A only that now we also sent two additional parameters: start looking in position 1 (which is the default), and return the 1st occurrence.

Now lets run:

SELECT REGEXP_SUBSTR ('A,B,C,D','[^,]+',1,2) as set_of_rows

this time we’ll get B (2nd occurrence) and using 3 as the last parameter will return C and so on.

The use of recursive connected by along with level makes sure you’ll receive all the relevant results (not necessarily in the original order though!):

SELECT DISTINCT REGEXP_SUBSTR ('A,B,C,D','[^,]+',1,LEVEL) as set_of_rows
order by 1

will return:


which not only contains all 4 results, but also breaks it into separate rows in the resultset and will be useful to add it on an IN() sql clause

This query “abuses” the connect by functionality to generate rows in a query on dual. As long as the expression passed to connect by is true, it will generate a new row and increase the value of the pseudo column LEVEL. Then LEVEL is passed to regex_substr to get the nth value when applying the regular expression


Removing Special Characters from a string field in Oracle

Today while I was doing consultancy work I faced against the issue of loading a table into from Oracle to PostgreSQL, when I checked the logs I saw the some oracle varchar fields had strange characters at the end of them and this caused INSERT statements fail.  Initially I tried using Pentaho Data Integration  replace values in string and replace CR, LF and CRLF since they looked like carriage returns when copied the log files in Notepad++. But all attempts were unsuccessful, so I decided to look for Oracle functions and soon I got a proper solution.

REGEXP_REPLACE helped my as you could see in the query below

REGEXP_REPLACE( customer_description ,'[^[:alnum:]'' '']', NULL)
 FROM dim_customer


Brief Explanation

The [[:alnum:]] character class represents alphabetic and numeric characters, and it is same as using [a-zA-Z0-9] in regular expression.


Hope you have enjoyed 🙂

Oracle tips (Display sessions by user,application and Tablespace info)

Sometimes you are developing with kettle and you need to open several connections to the same database, in this case I will show how to check the amount of connections opened in Oracle. Below there are some useful queries:

-- Show sessions by user
select osuser, username, machine, program
from v$session
order by osuser
-- Show sessions by application
select program Application, count(program) Number_of_Sessions
from v$session
group by program
order by Number_of_Sessions desc

If you detect you have too many connections it is time to convert your transformations transactional, using this feature you will prevent collapse the pool of connections. (Check the option Make the transformation database transactional? )

Transactional database

Below I attach a query to display the size and free space on the different tablespaces.

-- Show Tablespaces
select b.tablespace_name, tbs_size SizeMb, a.free_space FreeMb
from  (select tablespace_name, round(sum(bytes)/1024/1024 ,2) as free_space
from dba_free_space
group by tablespace_name) a,
(select tablespace_name, sum(bytes)/1024/1024 as tbs_size
from dba_data_files
group by tablespace_name) b
where a.tablespace_name(+)=b.tablespace_name;

Improving Performance with Pentaho Data Integration (Kettle) Table Input Step and Oracle Databases

When using Pentaho Data Integration  (a.k.a. Kettle)  Table Input step to connect to Oracle via a JDBC connection there is a setting in your connection information that you can specify that can dramatically improve your performance in retrieving data. This property is the defaultRowprefetch. Oracle JDBC drivers allow you to set the number of rows to prefetch from the server while the result set is being populated during a query. Prefetching row data into the client reduces the number of round trips to the server. The default value for this property is 10.

In the table input step, edit your connection, click on the options tab and then enter in your defaultRowprefetch specification:

Solution for converting CLOBS datatype to VARCHAR

Solution for converting CLOBS datatype to VARCHAR datatypes, all the documents I refer to talk about converting BLOBS to VARCHAR and when I try and apply the examples to CLOBS, get errors

and we said…

dbms_lob.substr( clob_column, for_how_many_bytes, from_which_byte );

for example:

select dbms_lob.substr( x, 4000, 1 ) from T;

will get me the first 4000 bytes of the clob.

Note that when using SQL as I did, the max length is 4000. You can get 32k using plsql:

declare my_var long;

begin for x in ( select X from t )

loop my_var := dbms_lob.substr( x.X, 32000, 1 );

Oracle translate tips

Oracle/PLSQL: Translate Function

In Oracle/PLSQL, the translate function replaces a sequence of characters in a string with another set of characters. However, it replaces a single character at a time. For example, it will replace the 1st character in the string_to_replace with the 1st character in the replacement_string. Then it will replace the 2nd character in the string_to_replace with the 2nd character in the replacement_string, and so on.

The syntax for the translate function is:

translate( string1, string_to_replace, replacement_string )

string1 is the string to replace a sequence of characters with another set of characters.

string_to_replace is the string that will be searched for in string1.

replacement_string – All characters in the string_to_replace will be replaced with the corresponding character in the replacement_string.

Applies To:

  • Oracle 8i, Oracle 9i, Oracle 10g, Oracle 11g

For Example:

translate(‘1tech23’, ‘123’, ‘456’); would return ‘4tech56’
translate(‘222tech’, ‘2ec’, ‘3it’); would return ‘333tith’