Sunday, September 10, 2017

Duplication check on a BLOB column containing image files

My title
We can store multimedia files like music, image, or video files either inside the database (BLOB type) or outside the database (BFILE type).

BLOB:

The BLOB datatypes are read-write binary large datatypes, that lets us store large binary files like video, audio, and PDF documents in the database. The maximum size of these types can range from 8 TB to 128 TB based on the DB_BLOCK_SIZE initialization parameter, which is set during the database creation in the init.ora file. The formula for calculating the maximum size of the LOB type is (4 GB - 1) * DB_BLOCK_SIZE value. Once this parameter is set, there is no going back for any change except for re-creating the database. The maximum value that can be set for the DB_BLOCK_SIZE parameter is 16 KB for Linux X86 environments and 32 KB for other environments. However, setting this parameter’s value to its maximum may affect the I/O operations.

BFILE: Binary File

BFILE is a read-only datatype that the database uses to access external binary files with a maximum size limit of 4 GB. Even though the BFILE and the BLOB types intends to store binary data, they both work differently from each other. Both the BFILE and BLOB column stores only the locators which point to the actual data that resides outside the ROW, but the BLOB type stores its binary data inside the database whereas the BFILE type stores its data outside the database in the OS layer. The BFILE type is capable of storing external binary files which can be in PDF, GIF, JPEG, MPEG, MPEG2, text, or other formats.

Note: The External LOB type doesn’t support UPDATE or DELETE operations as they are READ-ONLY.

Consider the below example for creating a table which holds a sequence number and a BLOB column to hold image files.

CREATE TABLE mytable 
  ( 
     seq        NUMBER, 
     image_file BLOB 
  );
/

When we want to maintain uniqueness for a non-LOB column, we can either create an unique index, primary key (which by default creates an unique index), or an unique key (which by default creates an unique index).

Can we also create a similar unique index, primary key or an unique key for a LOB column?

The answer is NO!

Oracle does not allow us to create any of those for a LOB column. Let's check it out using the below examples.
  • Creating an unique index on a LOB column

CREATE UNIQUE INDEX idx_image_file ON mytable(image_file);

Error report -SQL Error: ORA-02327: cannot create index on expression with datatype LOB
02327. 00000 -  "cannot create index on expression with datatype %s"
*Cause:    An attempt was made to create an index on a non-indexable
           expression.
  • Creating a unique constraint on a LOB column
ALTER TABLE mytable ADD CONSTRAINT uk_image_file UNIQUE (image_file); 

Error report -
SQL Error: ORA-02329: column of datatype LOB cannot be unique or a primary key
02329. 00000 -  "column of datatype %s cannot be unique or a primary key"
*Cause:    An attempt was made to place a UNIQUE or a PRIMARY KEY constraint
           on a column of datatype VARRAY, nested table, object, LOB, FILE
           or REF.

So, how can we achieve the uniqueness for a LOB column? There is a neat trick to do that.

We can create another column to our table which can be used for holding the hash value of the image and check uniqueness for that column. The hash value is actually a compact numeric version of the image file which will be same for same input image. By checking uniqueness on this column, we can make sure that the image is not duplicated.

Let's try this using the below examples.

Firstly, we are adding a new column to our MYTABLE. This new column is a virtual column which will return the hash value of the IMAGE_FILE column.

ALTER TABLE mytable ADD (hash_value generated always AS 
   (dbms_crypto.hash(image_file, 2)) virtual);

The second parameter to the DBMS_CRYPTO.HASH function accepts any of the below hash functions.

  • HASH_MD4    = 1
  • HASH_MD5    = 2
  • HASH_SH1     = 3
  • HASH_SH256 = 4
  • HASH_SH384 = 5
  • HASH_SH512 = 6

After the column is added, we must create a unique index on this virtual column as below,

CREATE UNIQUE INDEX idx_hash_value ON mytable(hash_value);

We must now create a directory object for the images' location as below,

CREATE OR replace directory dir_pictures AS 'F:\Pictures';

Now, we are all set to insert the image file into our table using the below script.

DECLARE 
    l_bf_src        BFILE := Bfilename('DIR_PICTURES', 'Batman.JPG'); 
    l_bl_dest       BLOB := Empty_blob(); 
    l_i_src_offset  INTEGER := 1; 
    l_i_dest_offset INTEGER := 1; 
BEGIN 
    dbms_lob.Createtemporary(l_bl_dest, TRUE, dbms_lob.SESSION); 

    dbms_lob.OPEN(l_bf_src, dbms_lob.file_readonly); 

    dbms_lob.OPEN(l_bl_dest, dbms_lob.lob_readwrite); 

    dbms_lob.Loadblobfromfile(l_bl_dest, l_bf_src, dbms_lob.lobmaxsize, 
    l_i_dest_offset, l_i_src_offset); 

    INSERT INTO mytable 
                (seq, 
                 image_file) 
    VALUES      (1, 
                 l_bl_dest ); 
         commit;

    dbms_output.Put_line('Total size of the image file is ' 
                         ||dbms_lob.Getlength(l_bl_dest) 
                         ||' Bytes'); 

    dbms_lob.CLOSE(l_bf_src); 

    dbms_lob.CLOSE(l_bl_dest); 
END; 
/ 

We can now check our image file using any IDE. I am currently using the SQL DEVELOPER 4.1.3 version for checking the inserted image file.


Now, when we rerun the same PL/SQL block again to insert the same image, we get the below unique index violation.

Error report -
ORA-00001: unique constraint (SYS.IDX_HASH_VALUE) violated
ORA-06512: at line 11
00001. 00000 -  "unique constraint (%s.%s) violated"
*Cause:    An UPDATE or INSERT statement attempted to insert a duplicate key.
           For Trusted Oracle configured in DBMS MAC mode, you may see
           this message if a duplicate entry exists at a different level.
*Action:   Either remove the unique restriction or do not insert the key.

We have found a workaround to check uniqueness on the LOB columns in this blog.


Thank you,
Boobal Ganesan

No comments:

Post a Comment