2012-04: SOUNDEX and other "cool" features - part one for DB2 V8

New Scalar functions in every release IBM

DB2 has always been delivered with Scalar functions and in every release IBM create and/or improve more and more. This month I thought I would do a trawl through the archives and list out my personal favorites from DB2 V8, 9, and 10. This is not a complete list of all new scalar functions in each release but nearly!! In fact there are so many that I decided to split this newsletter into three parts one for V8 one for 9 and one for 10. Starting off with DB2 V8 – The last ever version of DB2 with a V in the title…shame really as I was looking forward to V12.

See the Scalar functions in DB2 V9: Newsletter 2012-05 SOUNDEX part 2

See the Scalar functions in DB2 10: Newsletter 2012-06 SOUNDEX part 3

Starting off with DB2 V8 : ASCII function

Now V8 introduced us to the ASCII function:

SELECT ASCII('BOXWELL')              
FROM SYSIBM.SYSDUMMY1 ;              
         66

So it takes the leftmost character of any given string expression and returns the ASCII value, in this case 66 for B – I must admit that I have *not* yet found a use for this little gem… Now V8 was also a major change due to use of UNICODE and so IBM released CHARACTER_LENGTH to complement the LENGTH scalar function. The difference being that CHARACTER_LENGTH counts the characters and LENGTH counts the bytes. Now in the US or GB these will probably be the same but in Germany they are nearly always different!

SELECT CHARACTER_LENGTH(                                  
    CAST('DüSSELDORF' AS VARCHAR(128) CCSID UNICODE)      
                       , CODEUNITS32)                     
FROM SYSIBM.SYSDUMMY1 ;                                   
         10                                               

SELECT CHARACTER_LENGTH(                                  
    CAST('DüSSELDORF' AS VARCHAR(128) CCSID UNICODE)      
                       , CODEUNITS16)                     
FROM SYSIBM.SYSDUMMY1 ;                                   
         10                                               

SELECT CHARACTER_LENGTH(                                  
    CAST('DüSSELDORF' AS VARCHAR(128) CCSID UNICODE)      
                       , OCTETS)                          
FROM SYSIBM.SYSDUMMY1 ;                                   
         11                                               

SELECT LENGTH(                                            
    CAST('DüSSELDORF' AS VARCHAR(128) CCSID UNICODE)      
              )                                           
FROM SYSIBM.SYSDUMMY1 ;                                   
         11

Note that the CODEUNITS/OCTETS is how you specify the CCSID schema. Not that clear but you can see that the umlaut (ü) in Düsseldorf is counted as 1 character for 32 and 16 but as two bytes for Octets and the simple LENGTH function. I bet that there are hundreds of SQLs out there that do not use this feature and are therefore not 100% correct! Just waiting for your first UNICODE named employee or customer to cause problems somewhere! Now of course with a new LENGTH you also had to be able to SUBSTR the data on a character boundary *not* on the byte boundary and sure enough along came SUBSTRING

SELECT SUBSTRING(                                         
    CAST('DüSSELDORF' AS VARCHAR(128) CCSID UNICODE)      
                , 1 , 2 , CODEUNITS32)                    
FROM SYSIBM.SYSDUMMY1 ;                                   
Dü                                                        

SELECT SUBSTRING(                                         
    CAST('DüSSELDORF' AS VARCHAR(128) CCSID UNICODE)      
                , 1 , 2 , CODEUNITS16)                    
FROM SYSIBM.SYSDUMMY1 ;                                   
Dü                                                        

SELECT SUBSTRING(                                         
    CAST('DüSSELDORF' AS VARCHAR(128) CCSID UNICODE)      
                , 1 , 2 , OCTETS)                         
FROM SYSIBM.SYSDUMMY1 ; 
D                                                                                  

SELECT SUBSTR(                                                         
    CAST('DüSSELDORF' AS VARCHAR(128) CCSID UNICODE)                   
                , 1 , 2)                                               
FROM SYSIBM.SYSDUMMY1 ;                                                

DSNE610I NUMBER OF ROWS DISPLAYED IS 0
DSNT408I SQLCODE = -331, ERROR:  CHARACTER CONVERSION CANNOT BE PERFORMED
         BECAUSE A STRING, POSITION 1, CANNOT BE CONVERTED FROM 1208 TO 1141,
         REASON 16
DSNT418I SQLSTATE   = 22021 SQLSTATE RETURN CODE
DSNT415I SQLERRP    = DSNXROHB SQL PROCEDURE DETECTING ERROR 
DSNT416I SQLERRD    = -117  0  0  -1  0  0 SQL DIAGNOSTIC INFORMATION
DSNT416I SQLERRD    = X'FFFFFF8B'  X'00000000'  X'00000000'  X'FFFFFFFF'
         X'00000000'  X'00000000' SQL DIAGNOSTIC INFORMATION

OK just like CHARACTER_LENGTH you “tell” the system which schema to use by giving the 32, 16 or octets again. Here you can see what happens if you attempt to split the byte by using the normal SUBSTR but exactly on a two byte character. Again this is a “lurking” bug that will byte people at some point (Sorry about that I could not resist!) Finally IBM brought out TIMESTAMPDIFF. Bonnie Baker once held a class where she wondered out loud “Who on Earth programmed this?”

 SELECT TIMESTAMPDIFF( 64 ,                                  
         CAST(CURRENT_TIMESTAMP                             
            - CAST('1939-05-10-01.00.00.000000' AS TIMESTAMP)
         AS CHAR(22))) AS GRANDADS_AGE_IN_MONTHS
FROM SYSIBM.SYSDUMMY1 ;                     
---------+---------+---------+---------+---------+---------+-
GRANDADS_AGE_IN_MONTHS                                      
---------+---------+---------+---------+---------+---------+-
                   874

Now the first option is the bit that Bonnie was talking about (Oh dear another pun has crept in – sorry!) this number is just a simple bit flag Valid values for numeric-expression equivalent intervals 1 Microseconds 2 Seconds 4 Minutes 8 Hours 16 Days 32 Weeks 64 Months 128 Quarters 256 Years The question is just “Why?” Why not use the words??? There is no answer except “Blame the programmer!”. The important thing to remember is that TIMESTAMPDIFF works *only* with assumptions and has rounding errors such as One year has 365 days One year has 52 weeks One month has 30 days These are obviously not always true (Think of leap years or simply days in February) but if you are looking for ball-park figures – like Grandad – then who cares? The other thing to remember is that the input must be casted as a CHAR(22) like in my example. So stay tuned for more Scalar fun next month with a bumper crop of interesting DB2 9 new functions! Feel free to send me your comments and ask questions. TTFN, Roy Boxwell Senior Architect