LEARNING TO LOVE LOBS

Natalka Roshak's picture
articles: 

LOBs, or Large OBjects, are Oracle's preferred way of handling and storing non-character data, such as mp3s, videos, pictures, etc., and long character data. Binary large objects, or BLOBs, and character large objects, or CLOBs, can store up to terabytes of data - much more than the paltry 4000 bytes permitted in a varchar2 column. LOBs and CLOBs offer DBAs and developers great flexibility and storage space; the tradeoff is that they're a bit clunkier to handle.

The first thing to know about LOBs is that there are two basic types: external LOBs, which are stored outside the database, and internal LOBs, which are stored in the database. External LOBs are of the BFILE datatype; essentially, the database stores a pointer to the LOB's location in the file system. As such, they can't participate in transactions, and access is read-only. This article will deal with internal LOBs.

Fun with CLOBs

In many ways, CLOBs can be manipulated much like VARCHAR2s, and CLOB columns in the database are much easier to deal with than LONG columns. But LOB column values do have the annoying feature that they often can't easily be manipulated with standard DML statements; we need to call an Oracle-supplied package, DBMS_LOB, instead. Fortunately, many of the subprograms in DBMS_LOB, especially those intended for use with CLOBs, are quite similar to analogous SQL functions.

For example, finding the length or a substring of a CLOB is as simple as finding the length or a substring of a CHAR or VARCHAR2. For this example, we'll use this simple table:

create table documents
(doc_id number, doc_description varchar2(100), doc_text clob);

Selecting a substring from the varchar2 column is straightforward:

SQL> select substr(doc_description,1,24)
  2  from documents
  3  where doc_id=2;

SUBSTR(DOC_DESCRIPTION,1,24)
----------------------------
Macavity the Mystery Cat

And it's almost as straightforward to select a substring from the CLOB column, because the DBMS_LOB package provides the very similar dbms_lob.substr program:

SQL> select dbms_lob.substr(doc_text,41,1)
  2  from documents
  3  where doc_id=2;

DBMS_LOB.SUBSTR(DOC_TEXT,41,1)
------------------------------------------
Macavity's a Mystery Cat: he's called the

The only operational difference between the SQL function SUBSTR and the DBMS_LOB function is that the argument orders are slightly different. In dbms_lob.substr, the requested substring length comes first, followed by the offset.

Lengths are just as straightforward:

SQL> select length(doc_description), dbms_lob.getlength(doc_text)
  2  from documents
  3  where doc_id=2;

LENGTH(DOC_DESCRIPTION) DBMS_LOB.GETLENGTH(DOC_TEXT)
----------------------- ----------------------------
                     37                         2462

It Gets Even Easier

In fact, with Oracle 9i and above, you can use the plain SQL functions LENGTH and SUBSTR on CLOBs -- although you should use them only on relatively small CLOBs, up to about 100K in size, as they'll give poor performance on larger LOBs.

SQL> select length(doc_text), substr(doc_text,1,41)
  2  from documents
  3  where doc_id=2;

LENGTH(DOC_TEXT) SUBSTR(DOC_TEXT,1,41)
---------------- ------------------------------------------
            2462 Macavity's a Mystery Cat: he's called the

Many other string comparison and manipulation functions are available for use with smaller LOBs, from 9i up:

SQL> select instr(doc_text,'Macavity, Macavity, there''s no one like Macavity',1,1)
  2  from documents
  3  where doc_id=2;

INSTR(DOC_TEXT,'MACAVITY,MACAV
------------------------------
                           944

Although for larger LOBs, you should again use the dbms_lob functions:

SQL> select dbms_lob.instr(doc_text,'Macavity, Macavity, there''s no one like Macavity',1,1)
  2  from documents
  3  where doc_id=2;

INSTR(DOC_TEXT,'MACAVITY,MACAV
------------------------------
                           944

You can also use INSERT and UPDATE to edit LOB values in 9i and above, just as you would to edit character or LONG data:

SQL> insert into documents
  2  values
  3  (3, 'To a Mouse, by Robbie Burns',
  4  'To a Mouse, On Turning her up in her Nest with the Plough
  5  
  6  Wee, sleekit, cow''rin'', tim''rous beastie,
  7  O what a panic''s in thy breastie!
  8  Thou need na start awa sae hasty,
  9  Wi'' bickering brattle!
 10  I wad be laith to rin an'' chase thee
 11  Wi'' murd''ring pattle!...');

1 row inserted

For more information on using selected regular SQL statements and functions to interact with small LOBs in 9i and above, see the chapter in the Oracle Database Application Developer's Guide - Large Objects on "Data Interface for Persistent LOBs" (chapter 13 in the 10gR2 documentation).

Character data in BLOBs

Unlike CLOBs, BLOBs can store any type of data. Even if we stuff character data into a BLOB, Oracle will store and treat it as raw data. This means that we can't use SQL functions line SUBSTR to handle BLOB data. Consider this simple table:

create table docs_mixedmedia
(doc_id number, doc_type varchar2(10),
  doc_description varchar2(100), doc_contents blob);

We can store text data in a BLOB:

SQL> select doc_id, doc_type, doc_description
  2  from docs_mixedmedia
  3  where doc_id=2;

    DOC_ID DOC_TYPE   DOC_DESCRIPTION
---------- ---------- ------------------------------------------------
         2 TEXT       Macavity  - The Mystery Cat, a poem by T S Eliot

We can use the SQL function LENGTH on it:

SQL> select length(doc_contents) 
  2  from docs_mixedmedia
  3  where doc_id=2;

LENGTH(DOC_CONTENTS)
--------------------
                2509

But not SUBSTR:

SQL> select substr(doc_contents,1,41) 
  2  from docs_mixedmedia
  3  where doc_id=2;

ORA-00932: inconsistent datatypes: expected NUMBER got BLOB

And if we try to peek at the contents of the BLOB using dbms_lob.substr, we get a nasty surprise:

SQL> select dbms_lob.substr(doc_contents,41,1)
  2  from docs_mixedmedia
  3  where doc_id=2;

DBMS_LOB.SUBSTR(DOC_CONTENTS,4
--------------------------------------------------------------------------------
4D6163617669747927732061204D797374657279204361743A20686527732063616C6C6564207468

There's character data in there (trust me!), but since it's a BLOB, dbms_lob.substr returns the raw data, which is displayed as the hex representation of the characters. For example, 0x4D is 77 in decimal notation, which is an M in my character set:

SQL> select chr(77) from dual;

CHR(77)
-------
M

To read out the character data stored in the BLOB, we'll need to repeat this operation for each character's hex value and then paste them together into a string.

The following anonymous block uses the built-in SQL function rawtohex to convert the output of dbms_lob.substr to a character representation of the hex value (eg. '4D61636176...'), and a quick in-line function, hex_to_decimal, to convert this string to a number. hex_to_decimal is based on a similar function by Connor McDonald, which is available online.

declare  
  v_hold_blob blob; 
  v_len   NUMBER; 
  v_raw_chunk RAW(10000); 
  v_chr_string varchar2(32767);
  v_position    NUMBER; 
  c_chunk_len number := 1;
  function hex_to_decimal
  --this function is based on one by Connor McDonald
  --http://www.jlcomp.demon.co.uk/faq/base_convert.html
  ( p_hex_str in varchar2 ) return number
  is
    v_dec   number;
    v_hex   varchar2(16) := '0123456789ABCDEF';
  begin
    v_dec := 0;
    for indx in 1 .. length(p_hex_str) 
    loop
      v_dec := v_dec * 16 + instr(v_hex,upper(substr(p_hex_str,indx,1)))-1;
    end loop;
    return v_dec;
  end hex_to_decimal;  
begin 
  select doc_contents into v_hold_blob
  from docs_mixedmedia
  where doc_id=2;  

  -- find the length of the blob column 
  v_len := dbms_lob.getlength(v_hold_blob); 
  dbms_output.put_line('Column Length: ' || TO_CHAR(v_len)); 

  -- read the first 32767 chars into v_chr_string
  v_position := 1; 
  WHILE ( v_position <= least(v_len,32767) )  LOOP 
    v_raw_chunk := dbms_lob.substr(v_hold_blob,c_chunk_len,v_position); 
    v_chr_string := v_chr_string || chr( hex_to_decimal(rawtohex(v_raw_chunk)));
    v_position := v_position + c_chunk_len;
  END LOOP; 

  --print out the varchar2 string 255 bytes at a time
  dbms_output.put_line('Column Data:');
  print(v_chr_string,255);
end;

(Note: "print" is a simple function that loops through a string, inserting line breaks at word boundaries to create lines of N or less characters, and prints the lines out with dbms_output.put_line. It's available on my website here.)

Sample output from this code:

Column Length: 2509
Column Data:
Macavity's a Mystery Cat: he's called the Hidden Paw--
For he's the master criminal who can defy the Law.
He's the bafflement of Scotland Yard, the Flying Squad's despair:
For when they reach the scene of crime--Macavity's not there!
[...]

Conclusion

We've seen how easy it is to do simple manipulations on CLOBs, and how to extract character data stored in a BLOB. A guide to more handy LOB functions can be found on my website, at http://toolkit.rdbms-insight.com/lob_quick_ref.php.

About the author

Natalka Roshak is a senior Oracle and Sybase database administrator, analyst, and architect. She is based in Kingston, Ontario, and consults across North America. More of her scripts and tips can be found in her online DBA toolkit at http://toolkit.rdbms-insight.com/.

Comments

This is very informative article with some basic knowledge to learn clob and blob, easy to learn for beginners and encourage them to learn more detail. Thanks for the good work.

Thanks Natalka. Your article is, as usually, simple to understand and putting the information right on the table :)
There is, however, something I would like to add: Oracle SQL Developer (probably other tools as well) have some issues when handling LOBs.

I'm getting the following error and have seen this might be a bug in 11g:

ora-29913:error in executing odciexttableopen callout
ora-29400:data cartridge error
kup-04027:file name check failed:

Is anybody else seeing this or know if it's confirmed as a bug?

Great article!

Steve