2011-06: Unused index analysis

 

Building on what we have read over the last months, there is now a methodology available for checking if all of your indexes are actually being used as you think they should be.

 

Check the LASTUSED field in the RTS tables

First, you can simply check the LASTUSED field in the RTS tables, which gives you a good pointer to “used” or “not used”. But beware if the index is Unique and is there purely to stop duplicate inserts, then this style of usage is *not* recorded in the RTS!

You should have full EXPLAIN data for production Static SQL and you should  also trap and run snapshot EXPLAINs of all the Dynamic SQL. Over time, you  can easily build up a usage table to show which indices are being actively used by the SQL running on your machine.

Now all you have to do is work out which indices are:

  1. Marked as not used in the RTS
  2. Not listed in any EXPLAIN table (for dynamic and static)

 

Analyze candidate indexes in depth

This list of candidate indexes then needs to be analyzed in depth. The way I do it, is as follows:

– Pick an index (start with the largest, longest nastiest index(s), where the payback is good or where you have lots of unused indexes on a give table, where the INSERT and DELETE ratio is relatively high).

– Then find the table that it is indexing. Use this as a “driver” for the following EXPLAINS. Use EXPLAIN to get all the access paths that use this table, (this captures non-use of existing indexes), and store into a temporary work table. Use DSN_VIRTUAL_INDEXES to mark our candidate index(s) as DROPPED, and then re-run the EXPLAINS into a new table.
Compare the access paths.

If there is no difference in access path, then you can be 95% sure that the index can indeed be dropped. Now all you must do, is make sure it is either non-unique, or that it is not being used as a duplication stopper!

If all is ok, drop the index and monitor the table usage for a week or two.

 

Repeat the entire procedure every six months or so…

 

What is important to stress here, is that some indexes – even though “not  used” – are actually used by the DB2 optimizer to decide access path. This  is because the SYSCOLDIST is based on the table and not the index, so any  access path choice can use the filter factors and cardinality data in the table. Virtual Indexes does not stop this usage as the data has not been deleted from SYSCOLDIST. This means, that to get that last 5% certainty, you must do the EXPLAINS and the compare *again* after physically dropping the index.

I hope that this little bit of data was of interest and/or of use and, as always, comments or questions are more than welcome!
TTFN,
Roy Boxwell
Senior Architect

2011-07: Recursive SQL – Examples (VIEW explode etc)

 

Ahhh What a topic! The most scary SQL that anyone can write must be a recursive cursor…The possibility to do it was introduced in DB2 V8 but I still only see sporadic use and normally it is my own SQLs! The thing is to learn to trust the recursive SQL and once you know how it works it can become a great friend but you must build trust up over time.

 

Here is a typical example use of recursive SQL for the trivial task of getting a list of numbers

WITH TEMP(N) AS
 (SELECT 1
  FROM SYSIBM.SYSDUMMY1
  UNION ALL
  SELECT N+1
  FROM TEMP
  WHERE N < 10)
SELECT N FROM TEMP
;

Running this in SPUFI gives you:

---------+---------+---------+---------+---------+
          N                                               
---------+---------+---------+---------+---------+
          1
          2
          3
          4
          5
          6
          7
          8
          9
         10
DSNE610I NUMBER OF ROWS DISPLAYED IS 10
DSNE616I STATEMENT EXECUTION WAS SUCCESSFUL, SQLCODE IS 100

 

So how does it work and what controls are there?

Basically it was the creation of CTEs (Common Table Expressions) that allowed the creation of recursive SQL. The WITH xxx AS is the definition of a CTE – Then within (‘s the SQL has two “stages” the first is what I call the seeder – It starts the whole process off and generates the “first row” if you like. Then there is the UNION ALL that is needed to actually join all of the data together. If you try and use just UNION you get a nasty SQL error.

WITH TEMP(N)                
         AS 
(SELECT 1                                    
  FROM SYSIBM.SYSDUMMY1 
UNION                              
SELECT N+1                                                             
FROM TEMP 
WHERE N < 10) 
SELECT N FROM TEMP 
;                                  
---------+---------+---------+---------+---------+---------+---------
DSNT408I SQLCODE= -342, ERROR:THE COMMON TABLE EXPRESSION TEMP MUST 
NOT USE 
SELECT DISTINCT AND MUST USE UNION ALL BECAUSE IT IS RECURSIVE

See? The SQL parser knows what it is doing. So now onto the next stage and that is the recursion statement and *very important* the emergency brake for the whole caboodle.

The recursive part is relatively clear. In this case we wish to select one  number higher than the current number in the TEMP table. This could carry  on forever so there is also an emergency brake that will stop any runaway SQL. In this case I wish to stop after reaching 10 (WHERE N < 10).

 

An explain of this query yields this access path:

---+---------+---------+---------+---------+---------+-------+----+
QNO  PNO  SQ  M  TABLE_NAME   A   CS  INDEX  IO  UJOG  P  CE  TYPE
--+---------+---------+---------+---------+---------+--------+----+
01   01   00  0  TEMP         R   00         N   ----  S     SELECT
02   01   00  0                   00             ----        UNIONA
03   01   00  0  SYSDUMMY1    R   00         N   ----  S     NCOSUB
04   01   00  0  TEMP         R   00         N   ----  S     NCOSUB

Not really a great access path, but it is recursive after all!

Now you can write recursive SQL without using an “emergency brake” but let me show you a sensible usage of recursive SQL first. Imagine you’re interested in extracting DDL for a specific table and you are not sure about the dependencies of the VIEWs or MQTs that are being used (you could always buy our DDL Generator pocket tool of course 😉 <cough> <cough> anyway let us assume you do *not* have it. How do you find all of the “exploding VIEWs/MQTs”? Which ones are dependant on which objects etc. from one level all the way “back up the tree”? How would you write this SQL? You could do it with numerous fetches and probes of the DB2 catalog,  but you could also do it with one single SQL statement (note that this SQL statement has been surgically killed. – If you are interested in getting an actual running version then please contact us.

 

Here’s the outline SQL

WITH VIVLIST                          
       (MAX                          
       ,BCREATOR                      
       ,BNAME                        
       ,BTYPE                        
       ,DCREATOR                      
       ,DNAME                        
       ,DTYPE) AS                    
 (SELECT 1                            
        ,STRIP(A.BCREATOR)            
        ,STRIP(A.BNAME)              
        ,A.BTYPE                      
        ,STRIP(A.DCREATOR)            
        ,STRIP(A.DNAME)              
        ,A.DTYPE                      
  FROM SYSIBM.SYSVIEWDEP A            
  WHERE A.DCREATOR = :WS-CREATOR      
    AND A.DNAME    = :WS-NAME        
  UNION ALL                          
  SELECT B.MAX + 1                    
        ,STRIP(A.BCREATOR)            
        ,STRIP(A.BNAME)              
        ,A.BTYPE                      
        ,STRIP(A.DCREATOR)           
        ,STRIP(A.DNAME)              
        ,A.DTYPE                      
  FROM SYSIBM.SYSVIEWDEP A            
       ,VIVLIST B                      
   WHERE A.DNAME    = B.BNAME          
     AND A.DCREATOR = B.BCREATOR      
     AND B.MAX       < 256            
  )                                    
 .
 .
 .  
;

Now, in this cursor my emergency brake is the B.MAX < 256, which stops this after 256 number of “view in view” definitions have been found. In practice a loop is impossible, as DB2 guarantees that you cannot generate cyclic views and I am pretty sure that no-one has more than 255 dependencies on a single view, or MQT. Anyway, I still like the brake, because when you do not have one (for example if you comment out the “AND B.MAX < 256” line) you get this “worrying” warning at BIND time:

 DSNX105I  -S810 BIND SQL WARNING
            USING MDB2 AUTHORITY
            PLAN=(NOT APPLICABLE)
            DBRM=SQLDDLD
            STATEMENT=2476
            SQLCODE=347
            SQLSTATE=01605
            TOKENS=VIVLIST

For details refer to DB2 for z/OS messages.

A quick look in the DB2 for z/OS codes leads you to
>>
+347 THE RECURSIVE COMMON TABLE EXPRESSION name MAY CONTAIN AN INFINITE LOOP

Explanation: The recursive common table expression called name may not complete. This warning is based on not finding specific syntax as part of the iterative portion of the recursive common table expression. The expected syntax includes:

  • incrementing an INTEGER column in the iterative select list by 1.
  • a predicate in the where clause of the iterative portion of the form „counter_col < constant“ or „counter_col < :hostvar“.

The absence of this syntax in the recursive common table expression may result in an infinite loop. The data or some other characteristic of the recursive common table expression may allow the successful completion of the statement anyway.
<<

So now you know why I use that brake!

 

Finally here are two examples

The first copied and slightly changed from the IDUG Code Place for doing maths with recursion:

--- PRINTS THE FACTORIAL OF NUMBERS FROM 1 TO N
WITH TEMP( N , FACT) AS
( SELECT
        1
,CAST (1 AS DECIMAL(31 , 0))
  FROM SYSIBM.SYSDUMMY1
UNION ALL
  SELECT N + 1 , FACT * ( N + 1 )
  FROM TEMP
  WHERE N < 21
)
SELECT *
FROM TEMP
;

Running this query gives you:

---------+---------+---------+---------+-------  
          N                             FACT                        
---------+---------+---------+---------+-------                           
          1                              1.                       
          2                              2. 
          3                              6. 
          4                             24. 
          5                            120.
          6                            720.
          7                           5040.  
          8                          40320. 
          9                         362880.
         10                        3628800.  
         11                       39916800. 
         12                      479001600.                
         13                     6227020800.                   
         14                    87178291200.                 
         15                  1307674368000.     
         16                 20922789888000. 
         17                355687428096000. 
         18               6402373705728000. 
         19             121645100408832000. 
         20            2432902008176640000. 
         21           51090942171709440000. 
DSNE610I NUMBER OF ROWS DISPLAYED IS 21

And now something new from the List Serv (again slightly changed)
One of the posters posted this request:

Given a table A1:

Number          Asset
1               AAAA
1               BBBB
1               CCCC
2               DDDD
2               EEEE

The result should be:

1      AAAA,BBBB,CCCC
2      DDDD,EEEE

“Does that work? I am not sure, if recursive SQL will do it.”

 

A short time later the same poster posted this extremely elegant piece of SQL code

WITH S (LEVEL, NUMBER, ASSET, ASSET_LIST) AS
     (SELECT 1 , A1.NUMBER, MIN(A1.ASSET) AS ASSET,
             MIN(A1.ASSET CONCAT SPACE(3000)) AS ASSET_LIST 
       FROM A1 
       GROUP BY A1.NUMBER 
       UNION ALL 
       SELECT LEVEL + 1 , S.NUMBER, A1.ASSET,
              STRIP(S.ASSET_LIST) CONCAT ',' CONCAT A1.ASSET
       FROM S                 
       INNER JOIN TABLE 
              (SELECT A1.NUMBER, MIN(A1.ASSET) AS ASSET 
                FROM A1           
                WHERE S.NUMBER = A1.NUMBER
                AND S.ASSET  < A1.ASSET
                GROUP BY A1.NUMBER 
               ) AS A1 
             ON S.NUMBER = A1.NUMBER
     )                                      
SELECT S.NUMBER, MAX(S.ASSET_LIST)  AS LIST 
  FROM S 
  GROUP BY S.NUMBER
;

I hope that this little bit of data was of interest and/or of use and as always comments or questions are more than welcome!
TTFN,
Roy Boxwell
Senior Architect

2011-09: CRUD – How to find which program is doing what to which table

 

Now it might sound a bit rude, but CRUD is one of my favourite SQL queries and I use it a lot!

 

CRUD. Create, Read, Update, and Delete

However, this newsletter topic probably created most fun for me in advance because I have a couple of colleagues -including non-technicians- reading the stuff before it gets published and the feedback on this content is definitely worth a mention. It’s all about this nice acronym that doesn’t leave any open questions for me – it actually stands for Create, Read, Update, and Delete I guess that Insert, Select, Update, and Delete (ISUD) just doesn’t have a ring to it! Anyway it looks like some others never heard of it before, or associate something  totally different… a couple of funny prompts for clarification later led me to do quick search in Wikipedia, which opened up some other associations besides my topic…

  • A sticky substance, especially when dirty and/or encrusted
  • Crud (game), played on a billiard table
  • Chalk river unidentified deposits, corroded products containing radioactive nuclear species
  • Neil Crud, Welsh rock musician, journalist and DJ
  • Crud Puppy, a cartoon character from User Friendly
  • Crud, minced oath for crap, meaning feces
  • Crud, a radio station in Rochdale College
  • All right folks, we’ll skip those for today, cause what I associate with
  • CRUD is „Create, read, update and delete“ summarized by Wiki as basic functions of a computer database

All right folks, we’ll skip those for today, cause what I associate with CRUD is „Create, read, update and delete“ summarized by Wiki as basic functions of a computer database.

 

Which static SQL program does what to which of my tables?”

Anyway this topic helps you guys with that age old problem of Table usage or even, horror of horrors, the Auditors asking questions…
The problem is “Which static SQL program does what to which of my tables?”
So my first exercise for this is; I wish to see all packages in the TEST collection that have anything whatsoever to do with the SYSIBM.SYSDATABASE table.
Now you could trawl through the SYSxxxxAUTH etc with GRANTEE and GRANTOR and with PUBLIC etc or you could run this (make use of the exclude/include place holder if you like):

SELECT SUBSTR(TCREATOR, 1 , 8) AS CREATOR                  
     , SUBSTR(TTNAME, 1 , 18)   AS NAME                                 
     , SUBSTR(GRANTEE, 1 , 8)  AS PROGRAM 
     , CASE WHEN INSERTAUTH = 'Y' THEN 'C' 
            ELSE '-'  
       END      AS C                                  
     , CASE WHEN SELECTAUTH = 'Y' THEN 'R' 
            ELSE '-'  
       END      AS R  
     , CASE WHEN UPDATEAUTH = 'Y' THEN 'U' 
            ELSE '-'
       END      AS U
     , CASE WHEN DELETEAUTH = 'Y' THEN 'D'                    
            ELSE '-'  
       END      AS D  
     , CASE WHEN COLLID = ' ' THEN '** PLAN **' 
            ELSE COLLID                                 
       END      AS "PLAN/COLLECTION" 
  FROM SYSIBM.SYSTABAUTH 
 WHERE GRANTEETYPE = 'P' 
   AND COLLID      =    'TEST' 
-- AND NOT GRANTEE =    '<exclude>'
-- AND GRANTEE     =    '<include>' 
   AND TCREATOR    =    'SYSIBM' 
   AND TTNAME      =    'SYSDATABASE'                           
--ORDER BY 1 , 2
ORDER BY 3 , 2 
WITH UR  
;

 

The output looks like

---------+---------+---------+---------+---------+---------+--------
CREATOR   NAME                PROGRAM   C  R  U  D  PLAN/COLLECTION     
---------+---------+---------+---------+---------+---------+--------
SYSIBM    SYSDATABASE         M2DBIN09  -  R  -  -  TEST 
SYSIBM    SYSDATABASE         M2DBRI09  -  R  -  -  TEST
SYSIBM    SYSDATABASE         M2DBRT09  -  R  -  -  TEST 
SYSIBM    SYSDATABASE         M2DBTS09  -  R  -  -  TEST 
SYSIBM    SYSDATABASE         MDB2DB41  -  R  -  -  TEST 
SYSIBM    SYSDATABASE         MDB2DB45  -  R  -  -  TEST  
SYSIBM    SYSDATABASE         MDB2DBPD  -  R  -  -  TEST                  
SYSIBM    SYSDATABASE         MDB2DBTS  -  R  -  -  TEST 
SYSIBM    SYSDATABASE         MDB2SY8T  -  R  -  -  TEST 
SYSIBM    SYSDATABASE         MDB2SY9C  -  R  -  -  TEST 
SYSIBM    SYSDATABASE         MDB2SYSC  -  R  -  -  TEST 
SYSIBM    SYSDATABASE         MDB2SYST  -  R  -  -  TEST           
SYSIBM    SYSDATABASE         O2RTS030  -  R  -  -  TEST 
SYSIBM    SYSDATABASE         SQLDDLD   -  R  -  -  TEST       
SYSIBM    SYSDATABASE         SQLDDLS   -  R  -  -  TEST 
SYSIBM    SYSDATABASE         SQLDV9DB  -  R  -  -  TEST 
SYSIBM    SYSDATABASE         SQLDV9TS  -  R  -  -  TEST 
DSNE610I NUMBER OF ROWS DISPLAYED IS 17                                   
DSNE616I STATEMENT EXECUTION WAS SUCCESSFUL, SQLCODE IS 100

 

The data shown from this simple example lists 17 programs with select, or lets say read authority on SYSDATABASE. Now, let’s have a look at a scenario where I have a program that updates  SYSTABLESPACESTATS. I use the same SQL statement and change the last line of the where clause to

  AND TTNAME      =    'SYSTABLESPACESTATS'
---------+---------+---------+---------+---------+---------+-------
CREATOR  NAME                PROGRAM   C  R  U  D  PLAN/COLLECTION
---------+---------+---------+---------+---------+---------+------- 
SYSIBM   SYSTABLESPACESTATS  M2DBRT09  -  R  U  -  TEST 
SYSIBM   SYSTABLESPACESTATS  MDB2DB46  -  R  -  -  TEST 
SYSIBM   SYSTABLESPACESTATS  MDB2T003  -  -  U  -  TEST 
SYSIBM   SYSTABLESPACESTATS  MOD#CTLG  C  R  U  D  TEST 
DSNE610I NUMBER OF ROWS DISPLAYED IS 4

A lot less programs and you can see not just read access, but updates are  also done. There is even a MOD#CTLG program that does additionally inserts and updates.

Simple, easy and very straight forward if you need to determine what your static SQL applications are shooting against your DB2 tables.
If you like to do some checks which programs in your shop update the salary table, or who has authority to access sensitive financial data this  may be the easiest way. There are lots of alternatives that you can think  of yourselves! Very handy for housekeeping, documentation, or checking out which programs will be affected by a table change etc.

 

I hope that this little bit of data was of interest and/or of use and as always comments or questions are more than welcome!
TTFN,
Roy Boxwell
Senior Architect

2011-10: PLAN to PACKAGE Migration – Do you have any “bad guys”

 

Now we all know that DB2 10 is here – You do know that don’t you?? 🙂

Do away with DBRMs being bound directly to a PLAN

Anyway one of the points of DB2 10 is to finally do away with DBRMs being bound directly to a PLAN. Now you absolutely must go to PACKAGES and COLLECTIONS. I have seen no real need of having PLAN-based programs for years, but some people/sites/software still seem to insist upon having these things so I know that some shops are now facing the following questions:

 

  1. Let DB2 10 “do the magic” – In other words let the migration create all the collections for you?
  2. Bite the Bullet – Do it now in DB2 V8 or DB2 9; thus allowing *you* to decide on naming conventions, etc? Of course I would be failing in my marketing skills if I forgot to plug our DBRM Reconstruct feature of Bind ImpactExpert that can also do this for you…

 

Guess what I would choose?? I have two reasons for choosing answer number 2 and they are:

  1. I like to control my own naming standards.
  2. I need to create these PACKAGES *before* the Migration.

 

Use PLAN MANAGEMENT during a migration

Why (2) ? You may well ask. Quite simple – If you rebind/bind now with current statistics, and you happen to be on DB2 9, you can use PLAN MANAGEMENT during migration to keep your (new) current access path “locked down” so in case of regression you can fall back to a good access path. Additionally, if you are still on DB2 V8, then a rebind before skip-level migration is basically a must so you have a chance to find any problems before landing at the door of DB2 10 with unknown, weird, or downright strange access paths, and you have simply no idea where they came from. For example, could they have come in V8, or is it a new special rule access path from DB2 9, or even perhaps an artefact from bad statistics somewhere down the chain?

At this point, I should also mention that you need good consistent statistics for *any* migration. Make sure that your SYSCOLDIST and SYSKEYTGTDIST contain good trustworthy data! (See my earlier newsletter about our Statistics HealthCheck Freeware for more details there.)

 

What is the worst thing that could happen?

So back to this newsletter – What is the worst thing that could happen? Well, what if you happen to have two PLANs that both have a DBRM called ROYBOY, but both ROYBOY’s have different pre-compile timestamps? This is nasty – really really nasty – and should of course “never happen”, but it  does happen and far too often for my taste! So now to the meat and veg of the newsletter – Here is a nice little query that will show which DBRMs you have on your system that have the same name, but different pre-compile timestamps. These all need to be “sorted out” (That’s a euphemism for FREE of course 🙂   ) before beginning the journey to DB2 10.

 

SELECT SUBSTR(B.NAME , 1 , 8) AS DBRM_NAME 
     , COUNT(*)               AS NUMBER_DIFF_PRECOMPTS 
  FROM                                               
      (SELECT NAME 
            , PRECOMPTS 
         FROM SYSIBM.SYSDBRM 
        GROUP BY NAME , PRECOMPTS) AS B 
 GROUP BY B.NAME
HAVING COUNT(*) > 1

 

The output looks like this:

---------+---------+---------+---
DBRM_NAME  NUMBER_DIFF_PRECOMPTS                
---------+---------+---------+---  
PLMEM1                         2 
DSNE610I NUMBER OF ROWS DISPLAYED IS 1

Using this output you can select the needed data, or you could imbed this SELECT in another SELECT to get the needed data like this:

SELECT SUBSTR(A.NAME , 1 , 8 )      AS DBRM_NAME 
      ,SUBSTR(A.PLNAME , 1 , 8 )    AS PLAN_NAME 
      ,SUBSTR(A.PLCREATOR , 1 , 8 ) AS CREATOR  
      ,A.PRECOMPTS                  
  FROM SYSIBM.SYSDBRM A 
     ,(SELECT SUBSTR(B.NAME , 1 , 8) AS DBRM_NAME 
            , COUNT(*)               AS NUMBER_DIFF_PRECOMPTS 
         FROM   
            (SELECT NAME 
                  , PRECOMPTS
               FROM SYSIBM.SYSDBRM 
              GROUP BY NAME , PRECOMPTS) AS B 
        GROUP BY B.NAME
       HAVING COUNT(*) > 1
      ) AS C  
 WHERE C.DBRM_NAME = A.NAME                 
 ORDER BY 1 , 2                                            
;

The output then looks like this:

---------+---------+---------+---------+---------+---------+-             
DBRM_NAME  PLAN_NAME  CREATOR   PRECOMPTS      
---------+---------+---------+---------+---------+---------+-             
PLMEM1     TESTP4     MDU       2007-11-26-19.48.41.982926      
PLMEM1     TESTP5     MDU       2009-10-26-10.22.12.362131  
PLMEM1     TESTP6     MDU       2007-11-26-19.48.41.982926    
DSNE610I NUMBER OF ROWS DISPLAYED IS 3

This now gives the “overview” of what is possibly wrong. In this case, you can see the different timestamps and could deduce what action(s) must be taken.

I hope that this little bit of data was of interest and/or of use and, as always, comments or questions are more than welcome!
TTFN,
Roy Boxwell
Senior Architect

2012-01: Native SQL procedures

 

These were introduced years and years ago in DB2 9 – Remember that version?? Anyway the basic idea was to streamline and simplify the DB2 side of things as well as the DBA work ( Creation, control and performance.) The central change was the elimination of a need to initialize a Language Environment where the compiled program will execute. The creation of this environment and the need to pass all of the SQL calls back-and-forth added a large amount of overhead and hindered the mass acceptance of Stored Procedures (on the mainframe). As an added bonus it also gave us the possible usage of zIIP – Only from DRDA at the moment but any usage is pretty good!
Now this newsletter is *not* going to get into the coding aspects as there are tons of examples and red books out there but more into the technical realization (How do I EXPLAIN the darn things) and the “Replace all , none or some” methodology.

 

How to create a Native SQL procedure?

To create a Native SQL Procedure you must *not* use the FENCED or EXTERNAL NAME keywords in the PROCEDURE creation body. An extra tip for the SPUFI users amongst us is to switch the SPUFI SQL format default to be SQLPL it makes it a lot easier to build and test (Unless you use DataStudio to do it all for you of course!). In case you are wondering the default panel looks like this

                     CURRENT SPUFI DEFAULTS             SSID: S91A      
 ===>             
             
1  SQL TERMINATOR .. ===> ;    (SQL Statement Terminator)              
2  ISOLATION LEVEL   ===> CS   (RR=Repeatable Read, CS=Cursor
                                Stability,UR=Uncommitted Read)         
3  MAX SELECT LINES  ===> 9999 (Max lines to be return from SELECT)
4  ALLOW SQL WARNINGS===> YES  (Continue fetching after sqlwarning)
5  CHANGE PLAN NAMES ===> NO   (Change the plan names used by SPUFI)  
6  SQL FORMAT....... ===> SQL  (SQL, SQLCOMNT, or SQLPL)

 

This then “enables” the line number of the SQL to be used as QUERYO automatically thus enabling you to explain and *find* the results!!! A very good idea these days!

 

The question now is: Have you migrated all your “old” C programs across yet? If not – Why not?

Now we all know the saying “if it aint broke don’t fix it!” however in this case the time and effort involved in choosing the stored procedures that get migrated is well worth it. First you should get yourself some knowledge from the documentation and the web all about building, versioning (a very cool feature of Native SQL procedures by the way!) and DEPLOYing these things (An even cooler feature as the DEPLOY option stops the BIND on production and therefore the access path currently in use in test is simply “copied over” to the remote production system. Of course you must have access to the remote production system from test to do this and that is normally a no no – sadly!).
As always it is better to start out with a couple of “the usual suspects” and once these have been migrated across and monitored *and* the savings calculated – Then you should roll out a general plan for all of the current “heavy hitters”

For further reading I can recommend the red book “DB2 9 Stored Procedures: Through the CALL and Beyond“ sg247604 which was last updated in Feb 2011 so it is still “fresh”!

There are also two compact and concise technical articles. First is a Blog entry from Peggy Zagelow and then a very nice technote. Last, but not least, have alook at this 2 parts article from Linda Claussen.

Feel free to send me your comments and ask questions.

TTFN,
Roy Boxwell
Senior Architect

2012-02: HASH Index – What is it good for?

Oh dear! I seem to have made a HASH of it!

Many thanks to Jeff for requesting information about HASH indexing and thus causing the terrible pun in the first line!

 

A brand new access path for DB2 10

Hash access is one of the new goodies that arrived in DB2 10 NFM and is a brand new access path for DB2. (Anyone remember HDAM?) First the good news – When Hash access is used, it is pretty fast! If you are lucky, one I/O is all that is needed unless you get your hash space definition hopelessly wrong. In which case, it gets messy!

 

HASH Access. What is it?

Simply put, hash access takes your given key (max 255 bytes) and computes a 64 bit number. DB2 computes “the relative page number” using the MOD function with 2 modulo arguments: “this 64 bit number” and “a prime number prime number derived from the size of the table”.   Next, DB2 computes the hash anchor point using the MOD function with 2 modulo arguments:  “this 64 bit number” and “another prime number between 17 and 53”. Now the “relative page number” and the “hash anchor point” act like a cross-reference directly to the row in question. Of course, life is a bit more complicated than that, and you can easily get collisions where two keys end up with the same numbers.  If this happens, the second one gets written to the ominous sounding “hash over flow area” and its associated index (Remember for later that this is also a sparse index!). By the way, the optimal setting is to have between 0% and 8% of rows in the overflow area.

 

Any physical limitations?

Yes, there are a whole bunch! The table must be in a UTS, and it must be in RRF format. The hash key is *not* updateable (déjà vu here with partitioned key changes from aeons ago!). DROP a hash, and your table space is put into REORGP, which of course needs a SHRLEVEL NONE REORG to repair it and that is not a recoverable scenario either. No CLUSTER indexes are allowed of course. Hash index *must* be implicitly created. No LOB or XML columns and it cannot be an MQT. <phew!>

Which all reminds me of a certain Monty Python sketch: “But apart from that, what have the Romans done for us?”. Just replace Romans with Hash Access. 🙂

So that is the list of physical limitations – What about logical limitations?

 

What about logical limitations?

No parallel access or star/hybrid joins are possible. Hash access is not used for sensitive dynamic scrollable cursor with multi-element in-list. RUNSTATS is a waste of time as it is a sparse index and so, by definition, only has a few percent of the table rows in it. My advice is to simply exclude hash indexes from your DB2 database maintenance during RUNSTATS processing. However, remember you must still monitor these indexes for REORG and for re-sizing as well as hash usage.

For which data does it actually makes sense to change to be Hash accessible? Straight off the bat are: Single Row unique equals and IN predicates. No range scans. Access is random. The row that you are searching for is normally found. The size of the table is static or well known. The columns all have the same size (no large variants please!), which should ideally be between 30 and 2000 bytes. Finally it would be perfect if 20 or more rows fit on a page.

Good candidates should have four or more index levels in order to save tons of I/O.

Bad candidates have high insert rates or updates that change row length.
So you can see that you have to do a fair bit of work to find the candidates!

Once you have decided to change a table to be hash-based, you can see from EXPLAIN whether or not a given query will attempt to use the hash method or not. Access types of H, HN, and MH will confirm usage – You can use the RTS to see when and how often the last HASH access actually occurred (HASHLASTUSED and REORGHASHACCESS columns).

 

If you decide to stop using the hash organization and drop it, remember the REORGP.

LOADing is still a bit of trauma of course. The current best method is:
Create table, Load Data, Alter to Hash, and REORG.

Mass Inserts are also pretty bad.

I did some testing here loading up 90,000 rows with an eight byte (repeatable) random key and found the elapsed time varies greatly with the size of the HASHSPACE and with the bufferpool definition of the table space.  (Remember that basic BP tuning is always a good idea!) Here is what I found:

1 MB Hash Space leads to 73,183 overflow records 36 secs and 251 hash pages
Then I went high with the HASH SPACE:
256 MB Hash Space leads to zero overflow records 96 secs and 65,327 hash pages
Then I stepped down (in binary of course!):
128 MB Hash Space leads to zero overflow records 82 secs and 32,573 hash pages
64 MB Hash Space leads to zero overflow records 44 secs and 16,381 hash pages
32 MB Hash Space leads to zero overflow records 23 secs and 8,191 hash pages
16 MB Hash Space leads to zero overflow records 22 secs and 4,093 hash pages
8 MB Hash Space leads to one overflow record 22 secs and 2,039 hash pages
4 MB Hash Space leads to 21,620 overflow records 36 secs and 1,021 hash pages
2 MB Hash Space leads to 55,897 overflow records 38 secs and 509 hash pages

And then finally the optimum size for my little test:
9 MB Hash Space leads to zero overflow records 19 secs and 2,297 hash pages

Then I decided to test out the AUTOESTSPACE YES function to see what changes it made.
The following is a list of results from my test data:
256MB Hash Space LOADed 90,000 using 32,719 Hash pages with 0 overflows
REORG of above changed the hash pages down to 653 with 3,762 overflow records
1MB Hash Space with no data REORGed to 31 pages
1MB Hash Space LOADed 90,000 using 127 Hash pages with 72,728 overflows
REORG of above changed the hash pages up to 653 with 3,762 overflow records
8MB Hash space LOADed 90,000 using 1,021 Hash pages with 0 overflows
REORG of above changed the hash pages down to 653 with 3,762 overflow records

So you can see that REORG aims for about 5% of rows, which is OK.
Finally, also check out “Chapter 20. Organizing tables by hash for fast access to individual rows” in the Managing Performance book for some great details about how to squeeze the last bytes out of your hash space!

Feel free to send me your comments and ask questions.

TTFN,
Roy Boxwell
Senior Architect

2012-03: INCLUDE Index Usage with Examples to find Redundant Indexes

DB2 10 contains a whole bunch of great new stuff, but my personal favourite has got to be the new INCLUDE syntax for CREATE INDEX.

What it does is quite simply magical – it allows you to “add” columns to an index which are *not* used for uniqueness, but are in the index of course. This means you must so allow that wonderful little Y to appear in the INDEXONLY column in the PLAN_TABLE when you BIND / REBIND with EXPLAIN(YES) – You are all doing that, aren’t you?

Up until DB2 10, indexes “bred” because the business process *needed* a unique key for COL, COL2, and COL3, but for INDEXONLY Y access you *needed* COL5 and COL6; thus leading to a second basically pointless index coming into existence. Now of course, the Optimizer is a clever little devil and may, in its infinite wisdom, decide never to use the first index, but an INSERT causes the odd -811. The end user suffers under the extra CPU, disk space and I/O time of two indexes. Now with INCLUDE syntax, you no longer need that second index.

 

DB2 10 new INCLUDE syntax for CREATE INDEX

The following indexes existed for the following query runs (I did a full runstats of course!)

For QUERYs 1 – 34

CREATE TYPE 2 UNIQUE INDEX IQA061HU.IQAXY1092    
                        ON IQA061HU.IQATY109     
                         ( STMT_KEY              
                         , STMT_TYPE             
                         )                       
                           USING STOGROUP SYSDEFLT
                           BUFFERPOOL BP0        
                           FREEPAGE 0            
                           PCTFREE 10            
                           CLOSE YES;

Leaf pages 2079 space 10800

For QUERYs 21 – 34

CREATE TYPE 2 UNIQUE INDEX IQA061HU.IQAXY1093    
                        ON IQA061HU.IQATY109     
                         ( STMT_KEY              
                         , STMT_TYPE             
                         , ISOLATION_LEVEL       
                         )                       
                           USING STOGROUP SYSDEFLT
                           BUFFERPOOL BP0        
                           FREEPAGE 0            
                           PCTFREE 10            
                           CLOSE YES;

Leaf pages 2294 space 10800

For QUERYs 41 – 54

CREATE TYPE 2 UNIQUE INDEX IQA061HU.IQAXY109I      
                        ON IQA061HU.IQATY109       
                         ( STMT_KEY                
                         , STMT_TYPE               
                         )                         
                           INCLUDE (ISOLATION_LEVEL)
                           USING STOGROUP SYSDEFLT 
                           BUFFERPOOL BP0          
                           FREEPAGE 0              
                           PCTFREE 10              
                           CLOSE YES;

Leaf pages 2294 space 10800

 

The queries looked like

EXPLAIN ALL SET QUERYNO = 1, 21, 41 FOR             
SELECT STMT_KEY, STMT_TYPE                  
FROM IQA061HU.IQATY109                      
;                                           
COMMIT ;                                    
EXPLAIN ALL SET QUERYNO = 2, 22, 42 FOR             
SELECT STMT_KEY, STMT_TYPE                  
FROM IQA061HU.IQATY109                      
WHERE STMT_KEY = X'01329149008E899B000001D6'
;                                           
COMMIT ;                                    
EXPLAIN ALL SET QUERYNO = 3, 23, 43 FOR             
SELECT STMT_KEY, STMT_TYPE                  
FROM IQA061HU.IQATY109                      
WHERE STMT_KEY = X'01329149008E899B000001D6'
  AND STMT_TYPE = 'M'                       
;                                             
EXPLAIN ALL SET QUERYNO = 11, 31, 51 FOR           
SELECT STMT_KEY, STMT_TYPE, ISOLATION_LEVEL
FROM IQA061HU.IQATY109                     
;                                          
COMMIT ;                                   
EXPLAIN ALL SET QUERYNO = 12, 32, 52 FOR           
SELECT STMT_KEY, STMT_TYPE, ISOLATION_LEVEL
FROM IQA061HU.IQATY109                     
WHERE STMT_KEY = X'01329149008E899B000001D6'
;                                          
COMMIT ;                                   
EXPLAIN ALL SET QUERYNO = 13, 33, 53 FOR           
SELECT STMT_KEY, STMT_TYPE, ISOLATION_LEVEL
FROM IQA061HU.IQATY109                     
WHERE STMT_KEY = X'01329149008E899B000001D6'
  AND STMT_TYPE = 'M'                      
;                                          
COMMIT ;                                   
EXPLAIN ALL SET QUERYNO = 14, 34, 54 FOR           
SELECT STMT_KEY, STMT_TYPE, ISOLATION_LEVEL
FROM IQA061HU.IQATY109                     
WHERE STMT_KEY = X'01329149008E899B000001D6'
  AND STMT_TYPE = 'M'                      
  AND ISOLATION_LEVEL = 'CS'               
;

Results were

QUERY  QNO  PNO  SQ  M  TABLE_NAME    A   CS  INDEX         IO
---------+---------+---------+---------+---------+---------+--
00001  01   01   00  0  IQATY109      I   00  IQAXY1092     Y
00002  01   01   00  0  IQATY109      I   01  IQAXY1092     Y
00003  01   01   00  0  IQATY109      I   02  IQAXY1092     Y
00011  01   01   00  0  IQATY109      R   00                N
00012  01   01   00  0  IQATY109      I   01  IQAXY1092     N
00013  01   01   00  0  IQATY109      I   02  IQAXY1092     N
00014  01   01   00  0  IQATY109      I   02  IQAXY1092     N

00021  01   01   00  0  IQATY109      I   00  IQAXY1092     Y
00022  01   01   00  0  IQATY109      I   01  IQAXY1092     Y
00023  01   01   00  0  IQATY109      I   02  IQAXY1092     Y
00031  01   01   00  0  IQATY109      I   00  IQAXY1093     Y
00032  01   01   00  0  IQATY109      I   01  IQAXY1093     Y
00033  01   01   00  0  IQATY109      I   02  IQAXY1093     Y
00034  01   01   00  0  IQATY109      I   03  IQAXY1093     Y

00041  01   01   00  0  IQATY109      I   00  IQAXY109I     Y
00042  01   01   00  0  IQATY109      I   01  IQAXY109I     Y
00043  01   01   00  0  IQATY109      I   02  IQAXY109I     Y
00051  01   01   00  0  IQATY109      I   00  IQAXY109I     Y
00052  01   01   00  0  IQATY109      I   01  IQAXY109I     Y
00053  01   01   00  0  IQATY109      I   02  IQAXY109I     Y
00054  01   01   00  0  IQATY109      I   02  IQAXY109I     Y

 

As can be seen, the first block gave especially bad results when the ISOLATION_LEVEL was added – leading to a tablespace scan in the worst case scenario!
Creating the second index alleviated the problem, and as you can see, the access’s all went INDEXONLY = Y, but now we have two indexes! That is double the disk space and double the time for updates, inserts and deletes. After I dropped the first two indexes and then created the INCLUDE one, you can see that the access is the same (Apart from the 2 columns for the query 54 of course – Here the original index would actually be better because the column is in the predicate list not just in the select list!) and now there is only the one index „to worry about“ – for RUNSTATS, REORGs, etc.

 

Now all you need to do is find where you have these „double“ defined indexes. The method is to look for any unique indexes where there exists another index with fewer equal columns. Easier said than done… however LISTSERV can help you here! If you are registered you can easily find some SQL written by Larry Kirkpatrick that very handily does nearly what you need! Search for “This could be a useful query when going to V10” to get a really neat SQL that you can simply SPUFI (You must do one tiny change and that is the SPACE line must be rewritten to look like INTEGER(A.SPACEF) AS UNIQ_IX_KB, to actually get the allocated space) to get this result from my test database:

---------+---------+----- --+---------+---------+---------+
UNIQUE_IX_TO_DEL           UNIQ_IX_KB  IX_WITH_PART_UNIQUE     
---------+---------+--------+---------+---------+---------+
IQA061HU.IQAXY1092             10800  IQA061HU.IQAXY1093      
IQA061HU.IQAXY1092             10800  IQA061HU.IQAXY109I

 

Here you can see that it is correctly telling me to drop the 1092 index as it is completely redundant and of course it finds it again due to my INCLUDEd column index with the 109I Now of course what you actually want to do is the inverse of this query. You probably want to DROP the longer index *and* the shorter index and then recreate the shorter with INCLUDE columns like the longer. Now is also a good time to think about using DSN_VIRTUAL_INDEXES to check the effects of the DROP before you actually do it…  also take a look in my Newsletter from June 2011 for details about finding dead indexes (or ask me to resend it).
Feel free to send me your comments and ask questions.

TTFN,
Roy Boxwell
Senior Architect

2012-04: SOUNDEX and other „cool“ features – part one for DB2 V8

New Scalar functions in every release IBM

DB2 has always been delivered with Scalar functions and in every release IBM create and/or improve more and more. This month I thought I would do a trawl through the archives and list out my personal favorites from DB2 V8, 9, and 10. This is not a complete list of all new scalar functions in each release but nearly!! In fact there are so many that I decided to split this newsletter into three parts one for V8 one for 9 and one for 10.
Starting off with DB2 V8 – The last ever version of DB2 with a V in the title…shame really as I was looking forward to V12.

See the  Scalar functions in DB2 V9: Newsletter 2012-05 SOUNDEX part 2

See the Scalar functions in DB2 10:  Newsletter 2012-06  SOUNDEX part 3

 

Starting off with DB2 V8 :  ASCII function

Now V8 introduced us to the ASCII function:

SELECT ASCII('BOXWELL')              
FROM SYSIBM.SYSDUMMY1 ;              
         66

So it takes the leftmost character of any given string expression and returns the ASCII value, in this case 66 for B – I must admit that I have *not* yet found a use for this little gem…

Now V8 was also a major change due to use of UNICODE and so IBM released CHARACTER_LENGTH to complement the LENGTH scalar function. The difference being that CHARACTER_LENGTH counts the characters and LENGTH counts the bytes. Now in the US or GB these will probably be the same but in Germany they are nearly always different!

SELECT CHARACTER_LENGTH(                                  
    CAST('DüSSELDORF' AS VARCHAR(128) CCSID UNICODE)      
                       , CODEUNITS32)                     
FROM SYSIBM.SYSDUMMY1 ;                                   
         10                                               

SELECT CHARACTER_LENGTH(                                  
    CAST('DüSSELDORF' AS VARCHAR(128) CCSID UNICODE)      
                       , CODEUNITS16)                     
FROM SYSIBM.SYSDUMMY1 ;                                   
         10                                               

SELECT CHARACTER_LENGTH(                                  
    CAST('DüSSELDORF' AS VARCHAR(128) CCSID UNICODE)      
                       , OCTETS)                          
FROM SYSIBM.SYSDUMMY1 ;                                   
         11                                               

SELECT LENGTH(                                            
    CAST('DüSSELDORF' AS VARCHAR(128) CCSID UNICODE)      
              )                                           
FROM SYSIBM.SYSDUMMY1 ;                                   
         11

Note that the CODEUNITS/OCTETS is how you specify the CCSID schema. Not that clear but you can see that the umlaut (ü) in Düsseldorf is counted as 1 character for 32 and 16 but as two bytes for Octets and the simple LENGTH function. I bet that there are hundreds of SQLs out there that do not use this feature and are therefore not 100% correct! Just waiting for your first UNICODE named employee or customer to cause problems somewhere!
Now of course with a new LENGTH you also had to be able to SUBSTR the data on a character boundary *not* on the byte boundary and sure enough along came SUBSTRING

SELECT SUBSTRING(                                         
    CAST('DüSSELDORF' AS VARCHAR(128) CCSID UNICODE)      
                , 1 , 2 , CODEUNITS32)                    
FROM SYSIBM.SYSDUMMY1 ;                                   
Dü                                                        

SELECT SUBSTRING(                                         
    CAST('DüSSELDORF' AS VARCHAR(128) CCSID UNICODE)      
                , 1 , 2 , CODEUNITS16)                    
FROM SYSIBM.SYSDUMMY1 ;                                   
Dü                                                        

SELECT SUBSTRING(                                         
    CAST('DüSSELDORF' AS VARCHAR(128) CCSID UNICODE)      
                , 1 , 2 , OCTETS)                         
FROM SYSIBM.SYSDUMMY1 ; 
D                                                                                  

SELECT SUBSTR(                                                         
    CAST('DüSSELDORF' AS VARCHAR(128) CCSID UNICODE)                   
                , 1 , 2)                                               
FROM SYSIBM.SYSDUMMY1 ;                                                

DSNE610I NUMBER OF ROWS DISPLAYED IS 0
DSNT408I SQLCODE = -331, ERROR:  CHARACTER CONVERSION CANNOT BE PERFORMED
         BECAUSE A STRING, POSITION 1, CANNOT BE CONVERTED FROM 1208 TO 1141,
         REASON 16
DSNT418I SQLSTATE   = 22021 SQLSTATE RETURN CODE
DSNT415I SQLERRP    = DSNXROHB SQL PROCEDURE DETECTING ERROR 
DSNT416I SQLERRD    = -117  0  0  -1  0  0 SQL DIAGNOSTIC INFORMATION
DSNT416I SQLERRD    = X'FFFFFF8B'  X'00000000'  X'00000000'  X'FFFFFFFF'
         X'00000000'  X'00000000' SQL DIAGNOSTIC INFORMATION

OK just like CHARACTER_LENGTH you “tell” the system which schema to use by giving the 32, 16 or octets again. Here you can see what happens if you attempt to split the byte by using the normal SUBSTR but exactly on a two byte character. Again this is a “lurking” bug that will byte people at some point (Sorry about that I could not resist!)

Finally IBM brought out TIMESTAMPDIFF. Bonnie Baker once held a class where she wondered out loud “Who on Earth programmed this?”

 SELECT TIMESTAMPDIFF( 64 ,                                  
         CAST(CURRENT_TIMESTAMP                             
            - CAST('1939-05-10-01.00.00.000000' AS TIMESTAMP)
         AS CHAR(22))) AS GRANDADS_AGE_IN_MONTHS
FROM SYSIBM.SYSDUMMY1 ;                     
---------+---------+---------+---------+---------+---------+-
GRANDADS_AGE_IN_MONTHS                                      
---------+---------+---------+---------+---------+---------+-
                   874

Now the first option is the bit that Bonnie was talking about (Oh dear another pun has crept in – sorry!) this number is just a simple bit flag

Valid values for numeric-expression equivalent intervals
1 Microseconds
2 Seconds
4 Minutes
8 Hours
16 Days
32 Weeks
64 Months
128 Quarters
256 Years

The question is just “Why?” Why not use the words??? There is no answer except “Blame the programmer!”. The important thing to remember is that TIMESTAMPDIFF works *only* with assumptions and has rounding errors such as
One year has 365 days
One year has 52 weeks
One month has 30 days

These are obviously not always true (Think of leap years or simply days in February) but if you are looking for ball-park figures – like Grandad – then who cares? The other thing to remember is that the input must be casted as a CHAR(22) like in my example.

So stay tuned for more Scalar fun next month with a bumper crop of interesting DB2 9 new functions!
Feel free to send me your comments and ask questions.

TTFN,
Roy Boxwell
Senior Architect

2012-05: SOUNDEX and other „cool“ features – part two for DB2 V9

 

Part two of my walk through new Scalar functions and I will start with a whole new “group” of functions all to do with character manipulation for cross-system strings. ASCII_CHR, ASCII_STR, EBCDIC_CHR, EBCDIC_STR, UNICODE, and UNICODE_STR

 

See the  Scalar functions in DB2 V8: Newsletter 2012-04 SOUNDEX part 1

See the Scalar functions in DB2 10:  Newsletter 2012-06  SOUNDEX part 3

 

Character manipulation for cross-system strings in DB2 V9

SELECT ASCII('$') -- THIS IS THE V8 FUNCTION AS AN EXAMPLE TO GIVE US INPUT                              
FROM SYSIBM.SYSDUMMY1 ;                                        
         36                                                    
                                                               
SELECT ASCII_CHR(36)                                           
FROM SYSIBM.SYSDUMMY1 ;                                        
          $                                                              

SELECT ASCII_STR(                                              
                'EMBEDDED UNICODE ' CONCAT UX'C385' CONCAT ' FRED'
                )                                              
FROM SYSIBM.SYSDUMMY1 ;                                        
EMBEDDED UNICODE \C385 FRED                                    

DSNT400I SQLCODE = 000,  SUCCESSFUL EXECUTION                  
DSNT418I SQLSTATE   = 01517 SQLSTATE RETURN CODE               

SELECT EBCDIC_CHR(124)                                                 
FROM SYSIBM.SYSDUMMY1 ;                                                
          §                                                                      

SELECT EBCDIC_STR(                                                     
                 'EMBEDDED UNICODE ' CONCAT UX'C385' CONCAT ' FRED'       
                 )                                                      
FROM SYSIBM.SYSDUMMY1 ;                                                
EMBEDDED UNICODE \C385 FRED                                            

DSNT400I SQLCODE = 000,  SUCCESSFUL EXECUTION                          
DSNT418I SQLSTATE   = 01517 SQLSTATE RETURN CODE                       
                                                                       
SELECT UNICODE(UX'C385')                                               
      ,HEX(UNICODE(UX'C385'))                                          
FROM SYSIBM.SYSDUMMY1 ;                                                
      50053  0000C385                                                  
                                                                     
SELECT UNICODE_STR(                                                    
                  'A UNICODE CHAR \C385 WITH BACK SLASH AT THE END'     
                  )                                                      
FROM SYSIBM.SYSDUMMY1 ;                                                
A UNICODE CHAR . WITH BACK SLASH AT THE END \                          

DSNT400I SQLCODE = 000,  SUCCESSFUL EXECUTION 
DSNT418I SQLSTATE   = 01517 SQLSTATE RETURN CODE

The _CHR functions return the character in the respective code page using the decimal number as input. The _STR functions return a string where the “unprintable” data is prefixed with a \ and then the hex codes are listed out. Note the extra use of the \\ in the UNICODE_STR to get a \ in UNICODE.

Then we have the SOUNDEX and DIFFERENCE functions which are pretty neat and excellent for searching for names that are similar but not exact (Eg Roy and Roi)  here are some samples of both

SELECT DIFFERENCE('LUCY' , 'JUICY')
      ,SOUNDEX('LUCY')            
      ,SOUNDEX('JUICY')           
FROM SYSIBM.SYSDUMMY1 ;           
          3  L200  J200           

SELECT DIFFERENCE('BEER' , 'BUYER')
      ,SOUNDEX('BEER')            
      ,SOUNDEX('BUYER')           
FROM SYSIBM.SYSDUMMY1 ;           
          4  B600  B600           

SELECT DIFFERENCE('COOL' , 'KEWL')
      ,SOUNDEX('COOL')            
      ,SOUNDEX('KEWL')            
FROM SYSIBM.SYSDUMMY1 ;           
          3  C400  K400           

SELECT DIFFERENCE('AARDVARK' , 'ZOMBIE')
      ,SOUNDEX('AARDVARK')             
      ,SOUNDEX('ZOMBIE')               
FROM SYSIBM.SYSDUMMY1 ;                
          0 A631  Z510

The SOUNDEX takes any input string and converts it down into a four character string. As you can see the first character is the first character of the string the rest are the “value” of the string. This is of academic interest as it really is of use when you plug in the DIFFERENCE which returns a value of 0, 1, 2, 3, or 4. 0 is the words have no similarity 4 is the words are practically the same. That BEER == BUYER was clear!

MONTHS_BETWEEN harks back to one of my examples from last month

SELECT MONTHS_BETWEEN(                                      
              CURRENT_TIMESTAMP                             
            , CAST('1939-05-10-01.00.00.000000' AS TIMESTAMP)
                     ) AS GRANDADS_AGE_IN_MONTHS            
FROM SYSIBM.SYSDUMMY1 ;                                     
---------+---------+---------+---------+---------+---------+-
           GRANDADS_AGE_IN_MONTHS                           
---------+---------+---------+---------+---------+---------+-
              874.645161290322580

Given two timestamps it calculates the number of months between – However this time more accurately than the bit switch stuff in TIMESTAMPDIFF

Lots of string stuff came in 9 as well. LPAD, RPAD, LOCATE_IN_STRING, and OVERLAY here are some examples:

SELECT LPAD('FRED', 8, 'Q')
FROM SYSIBM.SYSDUMMY1 ;   
QQQQFRED                  

SELECT RPAD('FRED', 8, 'Q')
FROM SYSIBM.SYSDUMMY1 ;   
FREDQQQQ                  

SELECT LOCATE_IN_STRING('RÖY LIVES IN DÜSSELDORF' ,
                        'Ü' , 1 , CODEUNITS32)    
FROM SYSIBM.SYSDUMMY1 ;                           
         15

LPAD pads any given input string to the left with the supplied padding character. RPAD does the same but from the right hand side. These two are very handing for filling data up with “.” To make alignment better on the eye. LOCATE_IN_STRING does what it says – it returns the position in the string of the character given but this works, like the CHARACTER_LENGTH and SUBSTRING functions in V8, at the character and not the byte level. Further there is an “instance”, which I did not use in the example but it defaults to 1, so that you can find the nth character if desired. Very handy if you want to check that the user has entered four –‘s in the input field etc.

SELECT LOCATE_IN_STRING('TEST 1-2-3-4-5' , '-' , 1 , 4 , CODEUNITS32)
FROM SYSIBM.SYSDUMMY1 ;                                             
         13                                                         

SELECT LOCATE_IN_STRING('TEST 1-2-3-4 5' , '-' , 1 , 4 , CODEUNITS32)
FROM SYSIBM.SYSDUMMY1 ;                                             
          0

Here you can see that I start the search at the start (1) and that I want the position of the 4th occurrence. So the first returns 13 which is correct and the next returns 0 as there are indeed only three hyphens. As you can imagine – lots of potential uses here!

OVERLAY is the same as good old INSERT except that the length argument is optional.

Maths got a small boost in DB2 9 with the introduction of QUANTIZE. A strange function that outputs a quantized DECFLOAT field based on the parameters input. Again I have never found a use for it but it *must* be used somewhere… Here are the documented examples of what it does:

 QUANTIZE(2.17, DECFLOAT(0.001)) = 2.170
 QUANTIZE(2.17, DECFLOAT(0.01)) = 2.17
 QUANTIZE(2.17, DECFLOAT(0.1)) = 2.2
 QUANTIZE(2.17, DECFLOAT(’1E+0’)) = 2
 QUANTIZE(2.17, DECFLOAT(’1E+1’)) = 0E+1
 QUANTIZE(2, DECFLOAT(INFINITY)) = NAN –- exception
 QUANTIZE(-0.1, DECFLOAT(1) ) = 0
 QUANTIZE(0, DECFLOAT(’1E+5’)) = 0E+5
 QUANTIZE(217, DECFLOAT(’1E-1’)) = 217.0
 QUANTIZE(217, DECFLOAT(’1E+0’)) = 217
 QUANTIZE(217, DECFLOAT(’1E+1’)) = 2.2E+2
 QUANTIZE(217, DECFLOAT(’1E+2’)) = 2E+2

The first parameter is simply changed to have the same look-and-feel as the second parameter definition. I guess when working with floating point, bigint etc. it would be nice to get the output all the same format…

Finally they brought out a few new DATE, TIME functions. EXTRACT, TIMESTAMPADD and TIMESTAMP_ISO. EXTRACT is a high level version of existing functions and looks like this

SELECT EXTRACT(YEAR FROM CURRENT_TIMESTAMP)                            
      ,        YEAR(CURRENT_TIMESTAMP)                                         
      ,EXTRACT(MONTH FROM CURRENT_TIMESTAMP)                           
      ,        MONTH(CURRENT_TIMESTAMP)                                        
      ,EXTRACT(DAY FROM CURRENT_TIMESTAMP)                             
      ,        DAY(CURRENT_TIMESTAMP)                                          
FROM SYSIBM.SYSDUMMY1 ;                                                
  2012         2012            3            3           30           30
                                                                     
SELECT EXTRACT(HOUR FROM CURRENT_TIMESTAMP)                            
      ,        HOUR(CURRENT_TIMESTAMP)                                         
      ,EXTRACT(MINUTE FROM CURRENT_TIMESTAMP)                          
      ,        MINUTE(CURRENT_TIMESTAMP)                                       
      ,EXTRACT(SECOND FROM CURRENT_TIMESTAMP)                          
      ,        SECOND(CURRENT_TIMESTAMP)                                       
FROM SYSIBM.SYSDUMMY1 ;                                                
 10           10           11           11   26.511601              26

As can be seen the functions are the same (apart from the better accuracy at EXTRACT(SECOND FROM xxxx) but you could always simply encapsulate it in an INTEGER function and then you would really have 1:1 functionality. The results are all INTEGER apart from SECOND which is DECIMAL(8,6)

TIMESTAMP_ISO simply returns a version of your given date in the ISO timestamp format – pretty handy when all you get as input is a date.

SELECT TIMESTAMP_ISO (CURRENT DATE)                         
FROM SYSIBM.SYSDUMMY1 ;                                     
2012-03-30-00.00.00.000000

TIMESTAMPADD was obviously written by the same programmer who developed TIMESTAMPDIFF as it uses the same “odd” Bit masking to get the job done. If you remember my example with granddads age in months we can now reverse it so

SELECT TIMESTAMPADD( 64                                     
            , 874                                           
            , CAST('1939-05-10-01.00.00.000000' AS TIMESTAMP)
                     ) AS GRANDADS_BIRTHDATE_PLUS_MONTHS    
FROM SYSIBM.SYSDUMMY1 ;                                     
---------+---------+---------+---------+---------+---------+-
GRANDADS_BIRTHDATE_PLUS_MONTHS                              
---------+---------+---------+---------+---------+---------+-
2012-03-10-01.00.00.000000

Aha! The first parameter is 64 (Which means Months – Look at last month’s newsletter for a table listing all valid values) the second parameter is 874 so it adds 874 months to the supplied timestamp. There are better uses I am sure!

Last but not least are two TEXT extender functions that I have not actually used but seem pretty neat. CONTAINS searches through the text index for a hit. The result is 1 if the text is found in the document. Here is an example from the docu:

SELECT EMPNO
FROM EMP_RESUME
WHERE RESUME_FORMAT = ’ascii’
AND CONTAINS(RESUME, ’cobol’) = 1

If the text index column RESUME contains the word “cobol” it returns 1

Finally SCORE just cut-and-pasted from the docu as I have not got the text extender stuff installed

The following statement generates a list of employees in the order of how well their resumes matches the query „programmer AND (java OR cobol)“, along with a relevance value that is normalized between 0 (zero) and 100.

SELECT EMPNO, INTEGER(SCORE(RESUME, ’programmer AND
      (java OR cobol)’) * 100) AS RELEVANCE
  FROM EMP_RESUME
 WHERE RESUME_FORMAT = ’ascii’
   AND CONTAINS(RESUME, ’programmer AND (java OR cobol)’) = 1
 ORDER BY RELEVANCE DESC

 

DB2 first evaluates the CONTAINS predicate in the WHERE clause, and therefore, does not evaluate the SCORE function in the SELECT list for every row of the table. In this case, the arguments for SCORE and CONTAINS must be identical.

If you have the text support then these two functions could be very useful indeed!
OK, next month DB2 10 functions!
Feel free to send me your comments and ask questions.

TTFN,
Roy Boxwell
Senior Architect

 

 

2012-06 SOUNDEX and other “cool” features – part three for DB2 10

Part three of my walk through new Scalar functions and I will start with a whole new “group” of functions all to do with Bit manipulation. In the past it took assembler or REXX or Language Environment math functions to manipulate bits but now in DB2 10 it is easy! Five functions arrived BITAND, BITANDNOT , BITOR, BITXOR, and BITNOT

 

See the Scalar functions in DB2 V8:  Newsletter 2012-04  SOUNDEX part 1

See the  Scalar functions in DB2 V9: Newsletter 2012-05  SOUNDEX part 2

 

Bit Manipulation in DB2 10

SELECT BITAND( 128 , 1)
FROM SYSIBM.SYSDUMMY1 ; 
          0            

SELECT BITAND( 129 , 1)
FROM SYSIBM.SYSDUMMY1 ;
           1            

SELECT BITOR ( 128 , 1)
FROM SYSIBM.SYSDUMMY1 ; 
        129              

SELECT BITXOR( 129 , 1)
FROM SYSIBM.SYSDUMMY1 ;
        128           

SELECT BITNOT( 1 )   
FROM SYSIBM.SYSDUMMY1 ; 
         -2           

SELECT BITNOT( -1 )  
FROM SYSIBM.SYSDUMMY1 ; 
          0

All pretty straightforward stuff at the BIT level – remember how DB2 stores numbers though! That’s why the 1 NOTted goes to -2 and -1 goes to 0. Very handy for testing, setting and resetting bits – Stuff that we all still do (well I do!)

Next up is DECODE which is an apt name because it is a condensed rewrite of CASE which automatically handles the NULL equal case (You could argue that it is a simplified version!) Here is a CASE structure and the equivalent DECODE:

SELECT                                                   
  CASE IBMREQD                                           
    WHEN 'Y' THEN 'DB2 10'                               
    WHEN 'N' THEN 'DB2 9'                                 
    ELSE 'DB2 V8'                                        
  END                                                    
FROM SYSIBM.SYSDUMMY1                                    
;

DB2 10

SELECT                                                   
  DECODE( IBMREQD, 'Y', 'DB2 10', 'N' , 'DB2 9' , 'DB2 V8')
FROM SYSIBM.SYSDUMMY1                                    
;

DB2 10

Here you can see how it works, first is the column to be tested then pairs of data and finally (optional) the ELSE value. This really comes in handy when any of the columns can be NULL (remember that NULL = NULL is not actually EQUAL – Null is an unknown value and two unknowns are never equivalent) to get around this people would add cumbersome OR and AND logic like in the IBM Docu example

CASE
   WHEN c1 = var1 OR
    (c1 IS NULL AND
     var1 ISNULL) THEN ’a’
   WHEN c1 = var2 OR
    (c1 IS NULL AND
     var2 ISNULL) THEN ’b’
   ELSE NULL
END

The values of c1, var1, and var2 can be null values.

This nasty CASE can be simply replaced with this
DECODE(c1, var1, ’a’, var2, ’b’)

This would definitely be a coding win!

NVL arrived but is just a synonym for COALESCE

SELECT NVL((SELECT IBMREQD                
            FROM SYSIBM.SYSDUMMY1         
            WHERE IBMREQD = 'Y')          
                          , 'B' , 'C')
  FROM SYSIBM.SYSDUMMY1                     
;                                          
          Y                                           

SELECT NVL((SELECT IBMREQD                
            FROM SYSIBM.SYSDUMMY1         
            WHERE IBMREQD = 'N')          
                          , 'B' , 'C')
  FROM SYSIBM.SYSDUMMY1                     
;                                          
          B

First embedded select gets a row with value Y back second gets NULL of course leading to B being output.

Then we got TIMESTAMP_TZ which is basically an embedded CAST statement around a TIMESTAMP column that looks like this

SELECT              CURRENT TIMESTAMP                                    
      ,TIMESTAMP_TZ(CURRENT TIMESTAMP)                     
      ,TIMESTAMP_TZ(CURRENT TIMESTAMP , 8)                 
      ,TIMESTAMP_TZ(CURRENT TIMESTAMP , '+02:00')          
      ,TIMESTAMP_TZ(CURRENT TIMESTAMP , '+02:00' , 8)       
FROM SYSIBM.SYSDUMMY1
2012-03-30-11.33.10.534659Timestamp
2012-03-30-11.33.10.534659+00:00Now with a Timezone field
2012-03-30-11.33.10.53465900+00:00Now with eight digits for seconds and a timezone
2012-03-30-11.33.10.534659+02:00Now with an adjustment Timezone
2012-03-30-11.33.10.53465900+02:00Now lengthened and adjusted

The documentation does not make it that clear that you can actually have three parameters. The numeric parameter is the accuracy of the seconds (Six is the default) and the string must be a valid Time zone offset in the range -12:59 to +14:00.

Finally for this three month long race through the new Scalar functions is my personal favorite TRIM. Now TRIM simply does what LTRIM and RTRIM have always done but at the same time and is basically the same as STRIP except that the enabling PTF was only closed on the 19. March 2012… So a bit new for most sites!

 

Feel free to send me your comments and ask questions.

TTFN,
Roy Boxwell
Senior Architect