2012-09: DB2 Catalog Statistics – revisited

1 – Which data is used by the DB2 Optimizer and which is updated by RUNSTATS?

2 – Which default column values trigger the DB2 Optimizer to use its own internal default values?

 

Every now and again, I hold a little presentation called Are you a RUNSTATS Master? where I describe in detail, what the DB2 Optimizer uses for access path selection in relation to the DB2 catalog data.

I am always surprised at how often people say “just that data?” or “Is that it?” (the various other reasons for access path selection like CP speed, Number of CPs, RID Pool size, Sort Pool size, Max data caching size, and, of course, the 80 bufferpools are also mentioned, but these have nothing to do with RUNSTATS).

So generally the answer is “Yes”, however the permutations and combinations make the devil in the detail – The DB2 Optimizer’s algorithms are top secret, but the input data it uses is fully described in the documentation.

What I want to do this month is show the Catalog data that is used, the default values that can cause surprising things to happen and the problem of correlations in the catalog.

 

First is – Which data is used by the DB2 Optimizer and which is updated by RUNSTATS?

Here is a complete list of the eleven tables used by the DB2 Optimizer:

  • SYSIBM.SYSCOLDIST
  • SYSIBM.SYSCOLSTATS *
  • SYSIBM.SYSCOLUMNS
  • SYSIBM.SYSINDEXES
  • SYSIBM.SYSINDEXPART
  • SYSIBM.SYSKEYTARGETS            9 and up (same as SYSCOLUMNS)
  • SYSIBM.SYSKEYTGTDIST             9 and up (same as SYSCOLDIST)
  • SYSIBM.SYSROUTINES
  • SYSIBM.SYSTABLES
  • SYSIBM.SYSTABLESPACE
  • SYSIBM.SYSTABSTATS

* degree of parallelism only and, after APAR PK62804, also „sometimes“ used to bound filter factor estimates…
Now we can also list out all of the columns (obviously not including the key columns) which are used by the DB2 Optimizer:

SYSCOLDIST
CARDF, COLGROUPCOLNO, COLVALUE, FREQUENCYF, HIGHVALUE, LOWVALUE, NUMCOLUMNS, QUANTILENO, STATSTIME

SYSCOLSTATS
COLCARD, HIGHKEY, LOWKEY

SYSCOLUMNS
COLCARDF, HIGH2KEY, LOW2KEY

SYSINDEXES
CLUSTERING*, CLUSTERRATIO, CLUSTERRATIOF, DATAREPEATFACTORF, FIRSTKEYCARDF, FULLKEYCARDF, NLEAF, NLEVELS

SYSINDEXPART
LIMITKEY*

SYSKEYTARGETS
CARDF, HIGH2KEY, LOW2KEY, STATS_FORMAT

SYSKEYTGTDIST
CARDF, KEYGROUPKEYNO, KEYVALUE, FREQUENCYF, HIGHVALUE, LOWVALUE, NUMKEYS, QUANTILENO, STATSTIME

SYSROUTINES
CARDINALITY*, INITIAL_INSTS*, INITIAL_IOS*, INSTS_PER_INVOC*, IOS_PER_INVOC*

SYSTABLES
CARDF, EDPROC*, NPAGES, NPAGESF, PCTROWCOMP

SYSTABLESPACE
NACTIVE, NACTIVEF

SYSTABSTATS
CARD, CARDF, NPAGES

Notes: * Columns are not updated by RUNSTATS and _ Columns are not updatable at all. The column STATSTIME is used only if there are duplicates in the SYSCOLDIST table, and then the DB2 Optimizer will use the “newer” data that was probably inserted by a User.

 

Second is – Which default column values trigger the DB2 Optimizer to use its own internal default values?

SYSCOLUMNS
If COLCARDF= -1 then use 25
SYSINDEXES
If CLUSTERRATIOF<= 0 then use CLUSTERRATIO
If CLUSTERRATIO<= 0 then use 0.95 if the index is CLUSTERing = ‘Y’ otherwise 0.00
If FIRSTKEYCARDF= -1 then use 25
If FULLKEYCARDF= -1 then use 25
If NLEAF= -1 then use 33 (Which is SYSTABLES.CARDF / 300)
If NLEVELS= -1 then use 2
SYSROUTINES
If CARDINALITY= -1 then use 10,000
If INITIAL_INSTS= -1 then use 40,000
If INITIAL_IOS= -1 then use 0
If INSTS_PER_INVOC= -1 then use 4,000
If IOS_PER_INVOC= -1 then use 0
SYSTABLES
If CARDF= -1 then use 10,000
If NPAGESF<= 0 then use NPAGES
If NPAGES= -1 then use 501 (Which is CEILING (1 + SYSTABLES.CARDF / 20))
SYSTABLESPACE
If NACTIVEF<= 0 then use NACTIVE
If NACTIVE<= 0 then use 501 (Which is CEILING (1 + SYSTABLES.CARDF / 20))
SYSTABSTATS
If CARDF= -1 then use SYSTABSTATS.CARD
If CARD= -1 then use 10,000
If NPAGES= -1 then use 501 (Which is CEILING (1 + SYSTABSTATS.CARDF / 20))

 

So now you can see that non-floating point “old” data, may still be used today and then causes access path headaches!

Now to top it all, the data in the SYSCOLDIST and SYSKEYTGTDIST never gets simply “deleted”. Once that data is inserted it stays there, until it is overwritten by new data or the object is dropped. This all leads to some very old data in these two tables that can and does cause the DB2 Optimizer a ton of grief! One of the first things I do is select the MIN(STATSTIME) from these tables just to see how old the data really is. Do it yourself and be surprised! I have seen sites with eight years old data in the SYSCOLDIST and that cannot be good!

Finally now onto correlations… There are lots of little tricks that DBAs use to massage access path choice and one of these is to just set NLEVELS to 15 for a given index. Then lots of queries simply refuse to touch it as it would appear to be HUGE. Now just simply updating columns can cause the DB2 Optimizer, in the best case to ignore your updates or perhaps makes things even worse! So here is a list of the correlations (In other words, if you change xxx remember to change yyy and zzz as well):

  • Relationships exist among certain columns of certain tables:
    • Columns within SYSCOLUMNS
    • Columns in the tables SYSCOLUMNS and SYSINDEXES
    • Columns in the tables SYSCOLUMNS and SYSCOLDIST
    • Columns in the tables SYSCOLUMNS, SYSCOLDIST, and SYSINDEXES
  • If you plan to update some values, keep in mind the following correlations:
    • COLCARDF and FIRSTKEYCARDF/FULLKEYCARDF (They must be equal for the 1st column and full, if a single column index)
    • COLCARDF, LOW2KEY and HIGH2KEY. (For non-default COLCARDF LOW2KEY and HIGH2KEY key must be filled with data) and if the COLCARDF is 1 or 2 DB2 uses LOW2KEY and HIGH2KEY as domain statistics to generate frequencies.
    • CARDF in SYSCOLDIST.  CARDF is related to COLCARDF and FIRSTKEYCARDF and FULLKEYCARDF. It must be at a minimum
      • A value between FIRSTKEYCARDF and FULLKEYCARDF if the index contains the same set of columns
      • A value between MAX(colcardf of each col) and the product of all the columns COLCARDFs in the group
    • CARDF in SYSTABLES. CARDF must be equal or larger than any other cardinalities, such as COLCARDF, FIRSTKEYCARDF, FULLKEYCARDF, and CARDF in SYSCOLDIST
    • FREQUENCYF and COLCARDF or CARDF. The number of frequencies collected must be less than or equal to COLCARDF for the column or CARDF for the column group
    • FREQUENCYF. The sum of frequencies collected for a column or column group must be less than or equal to 1

 

Do not forget that our little Freeware tool StatisticsHealthCheck will find all bad correlations, old data and badly updated data for you and it is FREE!

So I hope this little round-up of Catalog Statistics data was interesting, and, as usual, if you have any  comments or questions, then please, feel free to mail me!

2012-10: Existence check SQL – Through the ages

 

Before DB2 V7
DB2 V7
DB2 V8

Over the years, we have all been faced with the problem of checking for existence. (not just with DB2 and SQL either – that is for another BLOG!). Now in the days before even SYSIBM.SYSDUMMY1 existed, it was pretty nasty, and in my old firm we created our own single row dummy table for exactly this requirement (and doing date arithmetic etc.). However the programmers of the world often re-invent the wheel time and time again, a simple existence check has also evolved over the years. Here is a brief selection of the various ways that people have done it in the past, and could well be still running today, every one to two seconds, somewhere in the world…

Before DB2 V7

Line 188 in the EXPLAIN Output SELECT COUNT(*) FROM ZIPCODES WHERE COL1 = ‘xxx’ If the count(*) was > 0 – BINGO! (Line 199 WITH UR) Or Line 211 SELECT 1 FROM ZIPCODES WHERE COL1 = ‘xxx’ If the SQLCODE was 0 or -811 – BINGO! (Line 222 WITH UR) Or Line 234 SELECT 1 FROM SYSIBM.SYSDUMMY1 WHERE EXISTS ( SELECT 1 FROM ZIPCODES A WHERE A.COL1 = ‘xxx’) If the SQLCODE was 0 – BINGO! (Line 247 WITH UR) Or Line 261 SELECT 1 FROM SYSIBM.SYSDUMMY1 D WHERE EXISTS ( SELECT 1 FROM ZIPCODES A WHERE A.COL1 = ‘xxx’ AND D.IBMREQD = D.IBMREQD )   If the SQLCODE was 0 – BINGO! Note the use of the “correlated” predicate to actually force good access! (Line 275 WITH UR)

DB2 V7

In DB2 V7 you could then add the FETCH FIRST 1 ROW ONLY. So the scan would stop Line 290 SELECT 1 FROM ZIPCODES WHERE COL1 = ‘xxx’ FETCH FIRST 1 ROW ONLY If the SQLCODE was 0 – BINGO! (Line 302 WITH UR) Adding WITH UR “allegedly” also sped up the processing of SYSDUMMY1 and the possibility to declare a cursor with OPTIMIZE FOR 1 ROW and a single FETCH arrived.

DB2 V8

In DB2 V8 you could now drop the correlation check and still get good performance Same as line 234 SELECT 1 FROM SYSIBM.SYSDUMMY1 WHERE EXISTS ( SELECT 1 FROM ZIPCODES A WHERE A.COL1 = ‘xxx’ ) If the SQLCODE was 0 – BINGO!   As Terry Purcell wrote on listserv many years ago, “The FETCH FIRST 1 ROW ONLY made all other existence checks obsolete”, and so now is the time to dig through all of your old SQL, and see if you can improve even simple existence checks!   Here is the EXPLAIN output of all of these variations in a DB2 10 NF:

---------+---------+---------+---------+---------+---------+---------
LINE   QNO  TABLE_NAME    A   P  CE  TYPE             MS        SU
---------+---------+---------+---------+---------+---------+---------
00188  01   ZIPCODES      R   S  R   SELECT          137        388
00199  01   ZIPCODES      R   S  R   SELECT          137        388
00211  01   ZIPCODES      R   S      SELECT          187        529
00222  01   ZIPCODES      R   S      SELECT          187        529
00234  01   SYSDUMMY1     R   S      SELECT           11         29
00234  02   ZIPCODES      R   S      NCOSUB           11         29
00247  01   SYSDUMMY1     R   S      SELECT           11         29
00247  02   ZIPCODES      R   S      NCOSUB           11         29
00261  01   SYSDUMMY1     R   S      SELECT            2          6
00261  02   ZIPCODES      R   S      CORSUB            2          6
00275  01   SYSDUMMY1     R   S      SELECT            2          6
00275  02   ZIPCODES      R   S      CORSUB            2          6
00290  01   ZIPCODES      R          SELECT            1          1
00302  01   ZIPCODES      R          SELECT            1          1

Access is always a tablespace scan, sometimes Sequential pre-fetch is active. So you can see that now, today the WITH UR makes no difference, and the absolutely best performance is indeed with the FETCH FIRST 1 ROW ONLY code. Just for fun, I ran the above COBOL program calling each of these methods 10,000,000 times… the CPU usage varied from 10.87 seconds for the worst and 10.63 seconds for the best. That is how good DB2 really is: 0.24 / 10,000,000 is a very very small number indeed! Sometimes you can teach an old dog new tricks! Finally here’s a real cutey from about ten years ago, where the task was not to see if a given row existed but if a given TABLE existed.

DECLARE TESTTAB CURSOR FOR
SELECT 'PS_DOES_TABLE_EXIST'
FROM PS_TAX_BALANCE
WHERE 1 = 0
FOR FETCH ONLY

I hope this has been fixed to actually query the DB2 catalog these days, as in DB2 9, it now gets a PRUNED access path.

 ---------+---------+---------+---------+---------+---------+---
 LINE   QNO   TABLE_NAME  A P CE  TYPE       MS       SU
 ---------+---------+---------+---------+---------+---------+---
 00319 01                         PRUNED     0         0

Which would probably really kill the application!!!   As usual, any  questions or comments, then please feel free to mail me!

2012-11: EXPLAIN table maintenance and clean-up

Remember the good old days when there was a <userid>.PLAN_TABLE with all your access path data in it?
When the package was dropped or changed by some program maintenance you could simply do a DELETE WHERE NOT EXISTS style SQL to keep your data up to date and pristine?

Of course those days have *long* gone…
Nowadays everyone on the block has got DataStudio installed and it installs a whole bunch of “new” hidden tables to enable Query Tuning. This is OK of course, but do you actually take care of this data? If it is the production EXPLAIN tables – Are you image copying, runstating and reorging when you should? Do you have all the indexes you need? Are you deleting too much or not enough data? Can you, in fact, delete anything these days without the risk that you will delete an access path that may “come back from the dead”?

First, a quick review of which tables you may well have in existence at the moment:

 

DB2 V8

PLAN_TABLEThe Good old original, now with 58 columns
DSN_STATEMNT_TABLEAlso available as a published API with 12 columns used for Costs
DSN_FUNCTION_TABLEAs above with 15 columns for checking which function is called
DSN_DETCOST_TABLEThe Hidden tables first officially externalized by DataStudio
DSN_FILTER_TABLE
DSN_PGRANGE_TABLE
DSN_PGROUP_TABLE
DSN_PREDICAT_TABLE
DSN_PTASK_TABLE
DSN_QUERY_TABLENot populated by EXPLAIN(YES) for BIND/REBIND
DSN_SORT_TABLE
DSN_SORTKEY_TABLE
DSN_STRUCT_TABLE
DSN_VIEWREF_TABLE

Now, most of the descriptive text for the new ones is “IBM internal use only” but some of them are pretty cool! Please remember that all this data is asis, and will probably *not* help you in your day to day tuning one little bit. That being said, I do use the DSN_PREDICAT_TABLE quite a lot.
The first “problem” is: the DSN_QUERY_TABLE because it contains a CLOB(2M) column, which means that it can grow very very quickly – well, not the DSN_QUERY_TABLE itself of course, rather its auxiliary table that holds the LOB data.

 

DB2 9

PLAN_TABLENow with 59 columns
DSN_STATEMNT_TABLE
DSN_FUNCTION_TABLE
DSN_DETCOST_TABLE
DSN_FILTER_TABLE
DSN_PGRANGE_TABLE
DSN_PGROUP_TABLE
DSN_PREDICAT_TABLE
DSN_PTASK_TABLE
DSN_QUERYINFO_TABLENew for query rewrite & accelerator usage and with two LOBs
DSN_QUERY_TABLE
DSN_SORT_TABLE
DSN_SORTKEY_TABLE
DSN_STATEMENT_CACHE_TABLENew with a LOB column
DSN_STRUCT_TABLE
DSN_VIEWREF_TABLE

The DSN_STATEMENT_CACHE_TABLE must be separately maintained as it has nothing to do with the PLAN_TABLE of course!

 

DB2 10

PLAN_TABLENow with 64 columns
DSN_STATEMNT_TABLE
DSN_FUNCTION_TABLE
DSN_COLDIST_TABLENew and contains the SYSCOLDIST data
DSN_DETCOST_TABLE
DSN_FILTER_TABLE
DSN_KEYTGTDIST_TABLENew and contains the SYSKEYTGTDIST data
DSN_PGRANGE_TABLE
DSN_PGROUP_TABLE
DSN_PREDICAT_TABLE
DSN_PTASK_TABLE
DSN_QUERYINFO_TABLE
DSN_QUERY_TABLE
DSN_SORT_TABLE
DSN_SORTKEY_TABLE
DSN_STATEMENT_CACHE_TABLE
DSN_STRUCT_TABLE
DSN_VIEWREF_TABLE

 

Now it must be apparent that the amount and size of this data is getting out of hand. If someone, by accident, creates all of these tables in production, then every BIND or REBIND with EXPLAIN(YES) will be doing a vast amount of I/O possibly up to the point, that you run out of space in one or more of the tablespaces (Especially the LOB ones really hurt!). I actually filled two packs with data…cost me a pizza…

In DB2 10, IBM aligned all of the EXPLAIN tables to have a “common” key: QUERYNO, APPLNAME, PROGNAME, COLLID, GROUP_MEMBER, SECTNOI, VERSION and EXPLAIN_TIME. Before this, some or nearly all of these columns were not there (DSN_PGRANGE_TABLE and DSN_QUERY_TABLE being the worst with only QUERYNO, GROUP_MEMBER, and EXPLAIN_TIME as a “key”)

This leads to some minor problems in determining which rows can be deleted, but nothing disastrous!

To clean-up in DB2 10, the simplest and best is a DELETE WHERE NOT EXISTS from the child tables to the PLAN_TABLE and as these are nowadays all the same; it is very simple:

 

DELETE
FROM <userid>.DSN_<name>_TABLE A 
WHERE NOT EXISTS (SELECT 1
                  FROM <userid>.PLAN_TABLE B
                  WHERE A.QUERYNO = B.QUERYNO
                    AND A.APPLNAME = B.APPLNAME
                    AND A.PROGNAME = B.PROGNAME
                    AND A.COLLID = B.COLLID
                    AND A.GROUP_MEMBER = B.GROUP_MEMBER
                    AND A.SECTNOI = B.SECTNOI
                    AND A.VERSION = B.VERSION
                    AND A.EXPLAIN_TIME = B.EXPLAIN_TIME)
;

And you are done once you have done this on all 16 tables…

DSN_STATEMNT_TABLE
DSN_FUNCTION_TABLE
DSN_COLDIST_TABLE
DSN_DETCOST_TABLE
DSN_FILTER_TABLE
DSN_KEYTGTDIST_TABLE
DSN_PGRANGE_TABLE
DSN_PGROUP_TABLE
DSN_PREDICAT_TABLE
DSN_PTASK_TABLE
DSN_QUERYINFO_TABLE
DSN_QUERY_TABLE
DSN_SORT_TABLE
DSN_SORTKEY_TABLE
DSN_STRUCT_TABLE
DSN_VIEWREF_TABLE

Of course the first real work is to find out which data you no longer need in the <userid>.PLAN_TABLE; and here the new catalog table SYSIBM.SYSPACKCOPY can help you – once you have bound everything in NFM; of course!

  
DELETE
FROM <u serid>.PLAN_TABLE A
WHERE NOT EXISTS (SELECT 1
FROM (SELECT COLLID
, NAME
, BINDTIME
FROM SYSIBM.SYSPACKCOPY
UNION ALL
SELECT COLLID
, NAME
, BINDTIME
FROM SYSIBM.SYSPACKAGE
) B
WHERE A.COLLID = B.COLLID
  AND A.PROGNAME = B.NAME
  AND A.BIND_TIME = B.BINDTIME)
 ;

This finds all of the currently “in use” (or possibly in use) collections and packages and then simply deletes from the PLAN_TABLE any that are not there. Having run this, you can then run the 16 other tidy up queries before finally doing a REORG with inline RUNSTATS and COPY, as this data is critical for your business! Now as it is critical… where have you got it stored? Does it also have the same stringent standards as other production data? One table per tablespace with a correct SEGSIZE, Bufferpool etc. Now is the time to review where your EXPLAIN data actually lives and take corrective action – it will speed up BIND and REBIND as deadlocks will reduce, and finally add all of the tablespaces to your DB2 Database Maintenance system! Just doing INSERTs is all well and good but when you try and SELECT the data, the performance can be terrible! For this the RealTime DBAExpert (RTDX) is your friend.

At this point it is also worth remembering that indexes also can be a great help! You should have two indexes on the PLAN_TABLE on BIND_TIME, COLLID, NAME and also BIND_TIME, QUERYNO, PROGNAME. All of the other EXPLAIN tables need at least EXPLAIN_TIME, QUERYNO and PROGNAME. You can happily add or alter these, of course, as it might even make sense to have indexes on QUERYNO, APPLNAME, PROGNAME, COLLID, GROUP_MEMBER, SECTNOI, VERSION, and EXPLAIN_TIME. Feel free to mix’n’ match for what you are doing with your tables! For example if using HINTs, it is recommended to have QUERYNO, APPLNAME, PROGNAME, VERSION, COLLID, OPTHINT or the performance will be pretty bad.
Bottom line is – Just make sure that you *only* create the EXPLAIN tables you actually *need*
and you *only* create the indexes you actually *need*

 

Happy EXPLAINing! As usual  comments and queries are always welcome!

TTFN,
Roy Boxwell
Senior Architect

2011-11: Index Compression – Which ones are worth it?

Before DB2 9 index spaces were stuck at being 4K and then in DB2 9, the full range of page sizes was opened up – 8k, 16k and even 32k pages. Knowing the classic answer to any DB2 question is “It Depends” and for index compression that is,still the correct answer!

The PROs for index compression are:

    • Less disk space

 

    • Fewer levels in the index

 

  • Faster I/O

Of course, the world is not really that simple. There are also CONs for index compression:

    • Increased CPU

 

    • Worse I/O response time

 

    • Increased bufferpools (more bufferpools, not just increasing the sizes of existing ones)

 

    • Real memory

 

  • Standards

Also bear in mind that you do not *have* to use compression to use the larger spaces. In fact, just switching to a larger page size can reduce index splits and index levels without the CPU and bufferpool tradeoffs that you must accept with compression.

If you’re interested in compressing indexes, here are a few quick facts about how it works:

    • The compression is done purely on disk. In the bufferpool, on the log, etc., it will always be uncompressed.

 

    • The compression is done at the “page” level.

 

    • The compression is only on the LEAF pages (it uses the old VSAM methodology).

 

  • The index *must* be defined with either 8k, 16k, or 32k as these will then be compressed down to a 4k page on disk.

How to choose which index to compress? There are these simple rules:

  1. Only compress big indexes when it is worth it. (Rule of Thumb: Between 48k and 1536k is *not* worth compression).
  2. A good candidate is an index with “normal” sequential access, because there is less I/O.
  3. A bad candidate is an index with “normal” random access having a bad bufferpool hit ratio and you are CPU limited, as you will bring more data than needed into the bufferpool.
  4. Run the DSN1COMP job and look at its results to chose the best option.

There are two Rules of Thumb that you may want to keep in mind:

  1. Normally 8k is the best choice as 16k can result in too much bufferpool space being wasted  (refer to the 1st example below).
  2. Use 8K if 50% or more compression, unless you get 75% or more when you can indeed use 16k (refer to the 2nd example below).

I have never seen a 32k space worth using at all. However, remember I have not tested every possible key length and data mix in the whole universe!

Txt File contains typical JCL for DSN1COMP. Note the use of PARM LEAFLIM(10000). This specifies how many index leaf pages should be evaluated to determine the compression estimate. This option prevents DSN1COMP from processing all index leaf pages in the input data set that are pretty large by definition.
The range is from 1 to 99000000. If the LEAFLIM PARM is not specified, the entire index will be scanned.

The output from DSN1COMP shows the results from evaluating the compression with different index page sizes. Look for message DSN1940I to see the details of the compression report; see the following two examples:

Example 1:

  8  K Page Buffer Size yields a     
 51  % Reduction in Index Leaf Page Space                          
     The Resulting Index would have approximately
 49  % of the original index's Leaf Page Space
     No Bufferpool Space would be unused 
     ----------------------------------------------
 16  K Page Buffer Size yields a                                 
 68  % Reduction in Index Leaf Page Space
     The Resulting Index would have approximately
 32  % of the original index's Leaf Page Space
 21  % of Bufferpool Space would be unused to
     ensure keys fit into compressed buffers
     ----------------------------------------------
 32  K Page Buffer Size yields a 
 68  % Reduction in Index Leaf Page Space
     The Resulting Index would have approximately
 32  % of the original index's Leaf Page Space
 60  % of Bufferpool Space would be unused to 
     ensure keys fit into compressed buffers

Here you can quickly see that the 8k wins even though it gives less saved space (51% instead of the 68% of the 16k). The Bufferpool will not be unused a.k.a. wasted. The following shows another output where the 16k page wins:

Example 2:

  8  K Page Buffer Size yields a
 51  % Reduction in Index Leaf Page Space
     The Resulting Index would have approximately
 49  % of the original index's Leaf Page Space
     No Bufferpool Space would be unused
     ----------------------------------------------
 16  K Page Buffer Size yields a 
 76  % Reduction in Index Leaf Page Space 
     The Resulting Index would have approximately 
 24  % of the original index's Leaf Page Space 
     No Bufferpool Space would be unused
     ---------------------------------------------- 
 32  K Page Buffer Size yields a            
 76  % Reduction in Index Leaf Page Space        
     The Resulting Index would have approximately   
 24  % of the original index's Leaf Page Space 
 47  % of Bufferpool Space would be unused to 
     ensure keys fit into compressed buffers

Important is the right balance between space savings, buffer pool usage, and CPU to decide on the best page size for your indexes. This brings us to the downside of all this – Standards. Agree on and write down which index gets what page size and verify this. In shops where the DDL is generated by many DBAs or tools, the “compress index methodology” must be documented and understood by everyone; otherwise, it will go horribly wrong at some point!

Monitoring and tuning bufferpools must also be done/redone with compression and your need of real memory may go up as well as your CPU (a little bit).

There is a great red paper that also describes the whole process in great detail. One nice little snippet of info about indexes that came in with DB2 9 was  the asymmetric page split. Before DB2 9, when you were inserting within a range (not just inserting at the end of a table!), the page splits were always 50/50, and the first page’s space was then almost always never used until the next index REORG came along. In DB2 9, this changed, and the engine could detect that this insert processing was happening and then changed the split to be adaptive and even up to 90/10; thus, drastically reducing the number of splits.

In DB2 9, index look aside was enabled for CLUSTERRATIOF >= 0.80 indexes; not just for the CLUSTERING index. Also in DB2 9, RANDOM indexes were delivered but I have never seen one in the wild – so to speak.

In DB2 10, we now have the excellent usage of INCLUDE columns to eliminate  redundant indexes that were only created for index only access. The point is that every index that exists on a table adds to the “overhead” of INSERT and DELETE, as well as disk space. Not to forget the poor little Optimizer can get a headache trying to work out the best access path with all the data there is to read and process! This overhead can get quite large and reducing the number of indexes should always be a goal for the DBA group; refer to my earlier Newsletter about RTS to see how I go hunting for redundant indexes.

Below is a little table from the IBM labs detailing the CPU usage of different INSERTs:

Scope                             9672-Z17 CPU time
No Index                          40 to 80µs
One index with no index read I/O  40 to 140µs 
One index with index read I/O     130 to 230µs 
Five indexes with index read I/O  500 to 800µs

Feel free to send me your  questions or comments.
TTFN,
Roy Boxwell
Senior Architect

2011-12: SOFTWARE ENGINEERING Holiday present – Free DB2 HealthCheck Series

 

Hi!

As a thank you to everybody joining my monthly News from the Labs Newsletter, this month we have a seasonal freebie for you all! SEGUS offers a DB2 HealthCheck package that pinpoints weaknesses and opportunities for improvements and optimization of your DB2 system. Just reply to this issue and my support colleagues will ship our Licensed Freeware edition of our HealthCheck series for DB2 z/OS.

 

DB2 Subsystem check, including Coupling Facility

PerformanceHealthCheck for DB2 z/OS (PHC) checks your DB2 subsystem for a range of problems and lists out what it finds including a the latest enhancement – the Coupling Facility checker. I read on listserv about people with “Coupling Facilities under stress” and so I added some CF checks. It checks the six important values in your CF. The Level of the microcode, the transfer time, the number of rejects, the false contention percentage, the subchannel busy percentage and finally the all paths busy count. From these KPIs you can see if your CF is “under stress” or not! Now to get all this juicy data the LOAD library *must* be APF authorized of course! Remember that the normal Performance HealthCheck still runs fine without being APF auth’d just the CF check must be.

 

Analyzes and assesses a complete DB2 Subsystem

Along with PHC comes Statistics HealthCheck for DB2 z/OS (SHC), which lots of you may already be familiar with. It allows you to analyze and assess a complete DB2 subsystem down to a single database and tell you what is ʺwrongʺ or inconsistent with your DB2 catalog statistics. This enables you to determine any problems before they get bad and to improve performance by providing the DB2 Optimizer with better information from which it can base its cost estimate on. It fully supports and is compliant for DB2 10. It is a perennial favorite and you cannot run it enough. Especially when you’re going to migrate to a new DB2 version, this software is a must to protect yourself from strange optimizer behavior.

The binaries come with the products documentation with a full feature overview that summarizes what our PHC can do for you!
Feel free to send me your questions or comments.
TTFN,
Roy Boxwell
Senior Architect

2013-01: Finding the right SSD (solid state disk) candidates

 

A while back IBM introduced support for SSDs (solid state disks) on the mainframe.
They have been around for years known as Flash memory, SD Cards, USB sticks, or Thumb drives and have been fantastic in their flexibility and speed of use.

As a PC user, I have loads of them and recently actually “retired” all of my sticks that were under 4GB (I remember my first 128 MB stick with happy memories!). The reason we love these little babies is that they are small, very fast and take no power. The downside is that they are slow (if you have just USB 1.1 or 2), expensive, and they do not last forever. I read a paper that stated after about 5000 rewrites the data could start to “lose its cohesion” which reminded me more of a transporter pattern buffer failure in Star Trek than losing a Word or Excel file from my stick – but I digress…

 

Where’s the beef?

Well the marketing hype is that SSDs will dramatically increase your I/O speed, increase throughput and make everyone very happy and put the HDD (Hard Disk Drive or “spinning drive”) into the Vinyl retirement home within a few years just like the CD has done. Now of course there is a lot of truth in all this hype. Think about what we as DBAs do a lot of the time…We try to increase performance and we do this by attempting to reduce CPU and I/O. Why do we try and reduce I/O? Because it is so slow! That’s why DB2 has gotten HUGE bufferpools over the years so that it can, with luck, avoid an I/O as to catch the right block, as it merrily spins around and around. That is actually quite tricky!

 

Advantages of the SSD

The major difference is, of course, that SSDs do not spin and so have no rotation and seek time for the physical head to “find” the right block, and naturally the extra, incredible bonus of more than one access at the same time! A disk head can only be in one place at one time but a SSD is just memory and can be read in parallel! A lot of people forget this little fact but it makes a dramatic difference to I/O times.

 

Technical performances

With normal HDDs ranging in size from 146GB up to 900 GB spinning around, between 15000 rpm for the 146GB to 450 GB ones, and 10000 rpm for the bigger ones, the random read time for a 4k page is about 6ms. When you use a SDD, that plummets down to around 1ms, so that is 6 times faster on its own. But do not forget the overlapping data access on a SDD. A HDD at 15000 rpm can do about 200 4k random page reads a second whereas SDD can do 5000 4k random page reads a second!

 

SDDs really win when it comes to Random page reads, but they also give a boost with DB2 List prefetch when the number of pages to be read is 15% of the total or less – the lower the percentage the better the performance when compared to HDD access. When you add striping of data into the mix the results just get better.

 

Downsides

OK – Sounds too good to be true? Well it is all true but it COSTS!!! That’s why today you either buy a machine with intelligent software for “Tiering” the data between SDD, HDD, and SATA (That is the third type of mainframe HDD which is 1 or 2 Terabytes but even slower than the HDDs as they revolve at 7200 rpm!). The downside for “Tiering” is that the controller does the work and you have nearly no idea where the data is and what to check for. Better, in my opinion, is an SMS class that allocates the data either to SDD or “other”.

 

Identifying the right candidates

OK, that’s the introduction, now onto the real topic “Which data?”

Simply put, any data that fits this list of properties is a candidate.

 

From top to bottom are the best indicators, and the more indicators that match the better it will be!

  1. Random read – As a random page read is about 1ms, any random big hitter will benefit
  2. Low cache hit ratio – If this object is also not often hit in the disk cache
  3. High I/O rate – If this object gets a lot of traffic or IO/Sec/GB
  4. High read percentage – if users hit this object and read lots of it
  5. High read only disconnect time – Use SMF records 42 – 6 and 74 – 5

 

All of these things play a role in the decision of where your data should live. Finding out the numbers can be a bit tricky depending on what monitors, traces etc. you have at your disposal. But there are some pretty good things available for nearly nothing or free. The DISPLAY BUFFERPOOL command, when a table is isolated, can give very good information about the Random – Sequential access ratios and usage as well as all the normal monitors out there of course!

 

Once you have found your typically large object(s) that are randomly accessed, you then face another interesting question: Does CLUSTERing play a role anymore? Of course the idea is that “two rows are next to each other” and so one getpage gets both rows but on SDD you really have absolutely no idea where the data physically is (There are algorithms for data placement to increase the lifespan of the SDD and stop bad data etc., all of which, “move” the data around) and so the answer is: “Not really”. Which is why, in SOFTWARE ENGINEERINGs RealTimeDBAExpert (RTDX) product, we have these extra lines available for the decision about whether or not to REORG:

BATCH ON-DEMAND

REORG TABLESPACEREGULARCRITICAL
MIN PAGES 64   0No REORG if object is smaller
PCT CHANGEDPercentage changed rows
PCT INDREF10  10Percentage FARINDREF+NEARINDREF
PCT UNCLUSTINS10  10Sensitive-  > _______ SSD mult. 2
-2for objects > _______ pages
-3for MEMBER CLUSTER
PCT HASH OVERPercentage hash overflow usage
MASS DELETES  0    0 Number of mass deletes

 

Here you can see two thresholds have been created for DB2 10 NF. First is the “Sensitive > _______” which uses the new columns in the Real-Time Statistics tables in DB2 10 NF to recommend a REORG due to unclustered data if the number of accesses, which are sensitive to sequential sequence, exceeds this value. And then, if the data is on an SSD, use a multiplier of the threshold because the need to REORG for CLUSTERing is much less. RTDX defaults to two, which in this case, would simply double the 10% to 20% before recommending a REORG. This whole system reduces the need for REORGs and their huge use of I/O and CPU dramatically of course!

 

What other side effects are there? Well, if you start to think about it a bit, you come to the conclusion that an SSD is really just a BUFFERPOOL dressed up like a HDD. This implies the chance to resize your actual bufferpools for non-SDD usage and, of course, to potentially resize your SDD data attributes (PCTFREE, FREESPACE etc.) as the notion of “data placement” is basically *gone* with SDDs.

A last couple of points though

  1. Leave indexes on HDD.
  2. Leave mainly sequentially accessed objects on HDD.

 

Feel free to send me your comments and ask questions.

TTFN,
Roy Boxwell
Senior Architect

2013-02: SYSCOPY – Do you know what is in it?

If you have written your own DB2 database maintenance programs then you almost certainly run SQL queries against the DB2 Catalog. If you are also checking for Incremental Image Copies (IIC) or Full Image Copies (FIC) then you will probably be using a mix of Real-Time Statistics tables (RTS) and the SYSIBM.SYSCOPY to figure out which type of utility to generate. Further if you are in DB2 10 (any mode! CM8, CM9, or NF) then this newsletter is for you!

I had a problem in one of our test centers with a cursor that I noticed was taking a long time to finish and so I went into our SQL PerformanceExpert tool and extracted the EDM Pool data (this is the new data in DB2 10 NF that is synonymous with the Dynamic Statement Cache counters and statistics) and sorted by Total Elapsed Time descending to get this:

Analyze+ for DB2 z/OS ----- EDM Pool (6/12) ---------- Stmt 1 from 316
Command ===>                                          Scroll ===> CSR
                                                             DB2: QA1B
Primary cmd: END, SE(tup), Z(oom), L(ocate) total elapse time
Line cmd: Z(oom), A(nalyze), D(ynamic Analyze), E(dit Statement), 
P(ackage), S(tatement Text)
                           Total      Average       Total      Average
            StmtID   Elapsed Time  Elapsed Time   CPU Time    CPU Time
    HHHH:MM:SS.ttt HHH:MM:SS.ttt HHH:MM:SS.ttt HHH:MM:SS.ttt
            115967   1:28.107705   29.369235     1:12.151391  24.050464
            114910      8.367834    0.000331     6.779229      0.000268
             79642      7.998559    0.054412     6.346829      0.043176
            114907      5.760045    0.000238     4.378691      0.000181
            115974      5.031890    2.515945     2.937258      1.468629
              5439      4.037261    0.000739     2.685938      0.000492

Over  one hour total and over 29 minutes average for our small amount of test data set alarm bells ringing – so I drilled down to the SQL:

Analyze+ for DB2 z/OS-View EDM-Pool Statement LINE 00000001 COL 001 080
Command ===>                                           Scroll ===> CSR
                                                          DB2: QA1B
Primary cmd: END

Collection: RTDX0510_PTFTOOL
Package   : M2DBSC09
Contoken  : 194C89620AE53D88  PrecompileTS :2012-10-29-15.34.40.938230
StmtID    :             115967  StmtNo     :  1223 SectNo:         2
----------------------------------------------------------------------
DECLARE
 SYSCOPY-IC-MODI-9N
CURSOR WITH HOLD FOR
SELECT
 T1.N1 , T1.N2 , T1.N3 , T1.N4 , T1.N5 , T1.N6 , T1.N7 , T1.N8 , T1.N9
 , T1.N10 , T1.N11 , T1.N12
FROM (
  SELECT
    ICTS.DBNAME AS N1
  , ICTS.TSNAME AS N2
  , ICTS.TIMESTAMP AS N3
  , ' ' AS N4
  , ICTS.DSNUM AS N5
  , ICTS.ICTYPE AS N6
  , DAYS ( :WORK-CURRENT-DATE ) - DAYS ( ICTS.TIMESTAMP ) AS N7
  , ICTS.OTYPE AS N8
  , ICTS.DSNAME AS N9
  , ICTS.ICUNIT AS N10
  , ICTS.INSTANCE AS N11
  , ICTS.STYPE AS N12
  FROM SYSIBM.SYSCOPY ICTS
  WHERE ICTS.ICBACKUP IN ( ' ' , 'LB' , 'FC' )
  AND ICTS.OTYPE = 'T'
  UNION
   SELECT
    ICIX.DBNAME AS N1
  , CAST ( TABLES.TSNAME AS CHAR ( 8 ) CCSID EBCDIC ) AS N2
  , ICIX.TIMESTAMP AS N3
  , ICIX.TSNAME AS N4
  , ICIX.DSNUM AS N5
  , ICIX.ICTYPE AS N6
  , DAYS ( :WORK-CURRENT-DATE ) - DAYS ( ICIX.TIMESTAMP ) AS N7
  , ICIX.OTYPE AS N8
  , ICIX.DSNAME AS N9
  , ICIXS.ICUNIT AS N10
  , ICIX.INSTANCE AS N11
  , ICIX.STYPE AS N12
   FROM SYSIBM.SYSCOPY ICIX
      , SYSIBM.SYSINDEXES INDEXES
      , SYSIBM.SYSTABLES TABLES
   WHERE ICIX.ICBACKUP IN ( ' ' , 'LB' , 'FC' )
   AND ICIX.OTYPE = 'I'
   AND VARCHAR ( ICIX.DBNAME , 24 ) = INDEXES.DBNAME
   AND VARCHAR ( ICIX.TSNAME , 24 ) = INDEXES.INDEXSPACE
   AND INDEXES.TBNAME = TABLES.NAME
   AND INDEXES.TBCREATOR = TABLES.CREATOR
   AND TABLES.TYPE IN ( 'H' , 'M' , 'P' , 'T' , 'X' ) )
AS T1
ORDER BY CAST (T1.N1 AS CHAR ( 8 ) CCSID EBCDIC )
       , CAST (T1.N2 AS CHAR ( 8 ) CCSID EBCDIC )
       , N3 DESC
FOR FETCH ONLY
WITH UR
HOSTVARIABLE NAME                  NULLABLE  TYPE          LENGTH SCALE
 --------------------------------  -------   -----------    -----  ----
 WORK-CURRENT-DATE                  NO       CHAR              26  
 WORK-CURRENT-DATE                  NO       CHAR              26   
**************************** Bottom of Data ****************************

 

Ok, ok this SQL is not going to win a beauty contest any day soon but it used to run just fine…so now I explained it:

Analyze+ for DB2 z/OS ----- Explain Data (1/6) -------- Entry 1 from 7
Command ===>                                           Scroll ===> CSR
EXPLAIN: DYNAMIC     MODE: CATALOG                        DB2: QA1B
Primary cmd:
  END, T(Explain Text), V(iolations), R(unstats), P(redicates),  
  S(tatement Text), C(atalog Data), M(ode Catalog/History), Z(oom),
  PR(int Reports), SAVExxx, SHOWxxx 
                               
Line cmd: 
Z(oom), C(osts), I(ndexes of table), S(hort catalog), T(able), 
V(irtual indexes of table), X(IndeX)                              
Collection: RTDX0510_PTFTOOL   Package: M2DBSC09       Stmt :    1223
Version   : - NONE -                                                          
Milliseconds: 77519  Service Units:         220222  Cost Category: B   
                                                                                
  QBNO QBTYPE CREATOR  TABLE NAME       MTCH IX METH PRNT TABL PRE  MXO
  PLNO TABNO  XCREATOR INDEX NAME ACTYP COLS ON OD   QBLK TYPE FTCH PSQ
  ---- ------ -------- ----------  ----- ---- -- ---- ---- ---- ---- ---
    1 SELECT R510PTFT T1          R       0 N   0      0   W   S    0
    1 5  
    1 SELECT                              0 N   3      0  -         0
    2 0 
    2 UNION                               0     3      1  -         0
    1 0 
    3 NCOSUB SYSIBM   SYSCOPY     R       0 N   0     2 T    S      0
    1 1       
    4 NCOSUB SYSIBM   SYSCOPY     R       0 N   0     2 T    S      0
    1 2       
    4 NCOSUB SYSIBM   SYSINDEXES  I       2 N   1     2 T           0
    2 3      SYSIBM   DSNDXX02       
    4 NCOSUB SYSIBM   SYSTABLES   I       2 N   1     2 T           0
    3 4      SYSIBM   DSNDTX01 
  --- ------ -------- ---------- ------   ---- ---- -- ---- ----   ---


This is *after* I had REORGed the SYSCOPY, SYSTSIXS and SYSTSTAB and then run the RUNSTATS on the SYSTSIXS and SYSTSTAB as you cannot do inline RUNSTATS on those two of course!

Two tablespace scans against the SYSCOPY is not brilliant of course but in this system we only have 4,000 table spaces and 2,500 indexes… so then I used the Catalog primary command to have another look at the catalog data:

TS   : DSNDB06 .SYSCOPY
Stats: 2013-02-04-10.49.32.600316
Partitions:  0 , Tables: 1 , NACTIVEF 18.272 pages
Type      :  Neither a LOB nor a MEMBER CLUSTER.
RTS data TOTALROWS: 347.087 , Pages: 18.268

Table: SYSIBM.SYSCOPY
Stats: 2013-02-04-10.49.32.600316
No. of rows (CARDF): 347.082 , Pages: 18.268

Index: SYSIBM.DSNUCH01
Stats: 2013-02-04-10.49.32.600316     Type: Type-2 index
  Levels: 3 , Leaf pages: 3.945
  FIRSTKEYCARDF: 101 , FULLKEYCARDF: 347.082                  
  RTS data Levels: 3 , Leaf pages: 3.945 , TOTALENTRIES: 347.087
  CLUSTERING: Y , CLUSTERED: Y , CLUSTERRATIO = 100,00%
  DATAREPEATFACTORF: 18.268
  Indexcolumn     ! Format       ! Dist. Values ! A/D ! NL ! Stats
   ---------------+--------------+--------------+-----+----+------
    DBNAME        ! CHAR(8)      !          101 ! ASC ! N  ! OK
    TSNAME        ! CHAR(8)      !          712 ! ASC ! N  ! OK 
    START_RBA     ! CHAR(6)      !       72.398 ! DSC ! N  ! OK
    TIMESTAMP     ! TIMESTAMP(6) !      347.082 ! DSC ! N  ! OK 
                                                                              
  Index: SYSIBM.DSNUCX01
  Stats: 2013-02-04-10.49.32.600316     Type: Type-2 index
    Levels: 3 , Leaf pages: 509
    FIRSTKEYCARDF: 1.820 , FULLKEYCARDF: 1.820
    RTS data Levels: 3 , Leaf pages: 509 , TOTALENTRIES: 347.087
    CLUSTERING: N , CLUSTERED: Y , CLUSTERRATIO = 100,00%            
    DATAREPEATFACTORF: 18.275
    Indexcolumn    ! Format      ! Dist. Values ! A/D ! NL ! Stats
    ---------------+-------------+--------------+-----+----+------
    DSNAME         ! CHAR(44)    !        1.820 ! ASC ! N  ! OK

 

Here I had a heart attack! 347,082 rows?!?!?!?!?!? How in the wide wide world of sports did that happen? Time to drill down into the contents of SYSCOPY with this little query:

SELECT ICTYPE , STYPE,  COUNT(*)
FROM SYSIBM.SYSCOPY
GROUP BY ICTYPE , STYPE
;

 

Which returned these rather surprising results:

---------+---------+---------+-----
ICTYPE  STYPE                         
---------+---------+---------+-----
A        A                4
B                        46 
C        L             1669
C        O                4
F                       100
F        W               16   
I                         4     
L        M           344723 
M        R               18
R                       151
S                        62 
W                        18
W        S                1
Y                         2
Z                       269 
DSNE610I NUMBER OF ROWS DISPLAYED IS 15

 

The L and M combination appears 344,723 times!!!

Grab your handy DB2 10 SQL reference and page on down to DB2 Catalog tables, SYSIBM.SYSCOPY and you will see:

ICTYPE CHAR(1) NOT NULL
Type of operation:
A ALTER
B REBUILD INDEX
C CREATE
D CHECK DATA LOG(NO) (no log records for the range are available for RECOVER utility)
E RECOVER (to current point)
F COPY FULL YES
I COPY FULL NO
L SQL (type of operation)
M MODIFY RECOVERY utility
P RECOVER TOCOPY or RECOVER TORBA (partial recovery point)
Q QUIESCE
R LOAD REPLACE LOG(YES)
S LOAD REPLACE LOG(NO)
T TERM UTILITY command
V REPAIR VERSIONS utility
W REORG LOG(NO)
X REORG LOG(YES)
Y LOAD LOG(NO)
Z LOAD LOG(YES)

Now in my version the L entry has a ‘|’ by it to signify it is new. Scroll on down further to STYPE to read

STYPE CHAR1) NOT NULL
Sub-type of operation:
When ICTYPE=L, the value is:
M Mass DELETE, TRUNCATE TABLE, DROP TABLE, or ALTER TABLE ROTATE PARTITION.
The LOWDSNUM column contains the table OBID of the affected table.

So, in other words, every time a program does a MASS DELETE it inserts a row into SYSCOPY. So then I ran another query to see when this all began and, hopefully, ended:

SELECT MAX(ICDATE), MIN(ICDATE)
FROM SYSIBM.SYSCOPY
WHERE ICTYPE = 'L'
;
---------+---------+---------+--------

---------+---------+---------+--------
121107  120828
DSNE610I NUMBER OF ROWS DISPLAYED IS 1

So we started getting records on the 28th August 2012 and the last one was the 7th November 2012, so in just about ten weeks even we managed 344,723 Mass Deletes!
So now, with my Sherlock Holmes deer stalker hat on, the question was “Why did it stop in November?” Happily we have a history here of APARs and that’s when this PMR bubbled to the surface:

PM52724: MASS DELETES ENDS UP WITH LOCK ESCALATION ON SYSCOPY IN V10. BECAUSE PM30991 INTALLED CODE INSERTING L 12/01/04 PTF PECHANGE

I will let you go and read the text but suffice it to say IBM realized what a disaster this “logging” of Mass Deletes was and HIPERed a quick fix to stop it! Plus you can see the APAR that “brought in the dead mouse” PM30991.

PM30991 UK66327 Closed 2011-03-30
PM52724 UK80113 Closed 2012-07-03

So if you installed the PM30991 and not the PM52724 you probably have some cleaning up to do…
Now try and figure out how to clear up the mess!

By the way I also rewrote the Ugly Duckling SQL:

SELECT  T1.N1    
         ,T1.N2 
         ,T1.N3  
         ,T1.N4                
         ,T1.N5                
         ,T1.N6        
         ,T1.N7                             
         ,T1.N8  
         ,T1.N9               
         ,T1.N10               
         ,T1.N11                 
         ,T1.N12 
    FROM (                     
   SELECT ICTS.DBNAME    AS N1
         ,ICTS.TSNAME    AS N2
         ,ICTS.TIMESTAMP AS N3      
         ,' '            AS N4
         ,ICTS.DSNUM     AS N5 
         ,ICTS.ICTYPE    AS N6   
         ,DAYS ( :WORK-CURRENT-DATE ) - DAYS ( ICTS.TIMESTAMP ) AS N7
         ,ICTS.OTYPE     AS N8  
         ,ICTS.DSNAME    AS N9
         ,ICTS.ICUNIT    AS N10  
         ,ICTS.INSTANCE  AS N11 
         ,ICTS.STYPE     AS N12
     FROM SYSIBM.SYSCOPY ICTS       
    WHERE ICTS.ICBACKUP IN ('  ','LB','FC') 
      AND ICTS.OTYPE    = 'T'               
UNION ALL    
   SELECT ICIX.DBNAME     AS N1   
         ,CAST(TABLES.TSNAME      
          AS CHAR(8) CCSID EBCDIC) AS N2 
         ,ICIX.TIMESTAMP  AS N3 
         ,ICIX.TSNAME     AS N4 
         ,ICIX.DSNUM      AS N5
         ,ICIX.ICTYPE     AS N6 
         ,DAYS ( :WORK-CURRENT-DATE ) - DAYS ( ICIX.TIMESTAMP ) AS N7
         ,ICIX.OTYPE      AS N8
         ,ICIX.DSNAME    AS N9 
         ,ICIX.ICUNIT    AS N10     
         ,ICIX.INSTANCE  AS N11    
        ,ICIX.STYPE     AS N12  
    FROM SYSIBM.SYSCOPY ICIX 
        ,SYSIBM.SYSINDEXES INDEXES
        ,SYSIBM.SYSTABLES TABLES 
   WHERE ICIX.ICBACKUP IN ('  ','LB','FC')
     AND ICIX.OTYPE        = 'I' 
     AND ICIX.DBNAME      = INDEXES.DBNAME
     AND ICIX.TSNAME      = INDEXES.INDEXSPACE
     AND INDEXES.TBNAME    = TABLES.NAME 
     AND INDEXES.TBCREATOR = TABLES.CREATOR
 ) AS T1        
ORDER BY CAST(T1.N1 AS CHAR(8) CCSID EBCDIC)
        ,CAST(T1.N2 AS CHAR(8) CCSID EBCDIC)
        ,        N3 DESC               
  FOR FETCH ONLY 
  WITH UR 
  ;

To  now perform like this:

Milliseconds:    55911  Service Units:   158836  Cost Category: A 
                                                        
QBNO QBTYPE CREATOR  TABLE NAME        MTCH IX METH PRNT TABL PRE  MXO
PLNO TABNO  XCREATOR INDEX NAME  ACTYP COLS ON OD   QBLK TYPE FTCH PSQ
---- ------ -------------- ----- ----  ---- -- ---- ---- ---  ---  --
  1 NCOSUB  SYSIBM   SYSINDEXES  I        0  N   0   2   T    S    0
  1 3       SYSIBM   DSNDXX07 
  1 NCOSUB  SYSIBM   SYSTABLES   I        2  N   1   2   T         0
  2 4       SYSIBM   DSNDTX01          
  1 NCOSUB  SYSIBM   SYSCOPY     I        2  N   1   2   T    S    0
  3 2       SYSIBM   DSNUCH01  
  2 UNIONA                                0  N   3   0   -         0
  1 0
  5 NCOSUB  SYSIBM   SYSCOPY     R         0  N   0   2   T    S   0
  1 1                             
---- ------  ------------------ ----- ---- -- ---- ---- ---  ---  ---

I am sure once I have deleted all the SYSCOPY rows (Note that we do not need to RECOVER on our test machine so I have the luxury of being able to delete the data – You, of course, cannot!) that it will return to being a nice little SQL!
After a large DELETE run which left only 2,365 rows followed by a REORG with inline RUNSTATS the original SQL now looks like:

Milliseconds:      672  Service Units:      1909  Cost Category: B         
                                                                              
QBNO QBTYPE CREATOR  TABLE NAME       MTCH IX METH PRNT TABL PRE   MXO
PLNO TABNO  XCREATOR INDEX NAME ACTYP COLS ON OD   QBLK TYPE FTCH  PSQ
---- ------ -------- ---------------  ---- -- ---  ---  --- ----  ----
  1  SELECT R510PTFT T1         R        0 N    0    0  W    S      0
  1 5  
  1 SELECT                               0 N    3    0  -           0
  2 0
  2 UNION                                0      3    1  -           0
  1 0
  3 NCOSUB SYSIBM   SYSCOPY      R       0 N    0    2  T    S      0
  1 1                                                     
  4 NCOSUB SYSIBM   SYSCOPY      R       0 N    0    2  T    S      0
  1 2                                                   
  4 NCOSUB SYSIBM   SYSINDEXES   I       2 N    1    2  T           0
  2 3      SYSIBM   DSNDXX02                                   
  4 NCOSUB SYSIBM   SYSTABLES    I       2 N    1    2  T           0
  3 4      SYSIBM   DSNDTX01
---- ------ -------- ----------------  ---- -- --- ---  --- ----  ----

 

And my version:

Milliseconds:   631  Service Units:   1792  Cost Category: A     
                                                                               
QBNO QBTYPE CREATOR  TABLE NAME        MTCH IX METH PRNT TABL PRE  MXO
PLNO TABNO  XCREATOR INDEX NAME  ACTYP COLS ON OD   QBLK TYPE FTCH PSQ
---- ------ -------- ----------------- ---- -- ---  ---- ---  ---  ---- 
   1 NCOSUB SYSIBM   SYSCOPY     R        0 N    0    2  T    S     0
   1 2 
   1 NCOSUB SYSIBM   SYSINDEXES  I        2 N    1    2  T          0
   2 3      SYSIBM   DSNDXX02 
   1 NCOSUB SYSIBM   SYSTABLES   I        2 N    1    2  T          0
   3 4      SYSIBM   DSNDTX01                                   
   2 UNIONA                               0 N    3    0  -          0
   1 0 
   5 NCOSUB SYSIBM   SYSCOPY     R        0 N    0    2  T    S     0
   1 1 
 ---- ------ -------- ----------------- --- -- --- ---- ---  ---   ---

Doesn’t look quite so impressive now…sniff…sniff

Finally here’s my SYSCOPY query for all cases:

SELECT ICTYPE, STYPE, MIN(ICDATE) AS OLDEST, MAX(ICDATE) AS NEWEST
     , COUNT(*) AS COUNT                                         
FROM SYSIBM.SYSCOPY
GROUP BY ICTYPE , STYPE                                 
;                                               
---------+---------+---------+---------+---------+---------+------
ICTYPE  STYPE  OLDEST  NEWEST        COUNT       
---------+---------+---------+---------+---------+---------+------
A       A      121228  121228            4                
B              121228  130128           46                
C       L      100809  130204         1669             
C       O      120827  120827            4              
F              100809  130204          100                   
F       W      100809  130204           16               
I              130131  130204            4               
M       R      130102  130131           18                 
R              120829  130130          151               
S              120829  130131           62            
W              100809  130204           18         
W       S      100809  100809            1                 
Y              120828  120828            2               
Z              120828  130201          269
DSNE610I NUMBER OF ROWS DISPLAYED IS 14

All the L records have gone! Yippee!

 

As always if you have any comments or questions please email me!
TTFN
Roy Boxwell

2013-03: Incredible out of the box data – DSC, „SSC“, RTS

Over the years, we (the DB2 Community) have got used to more and more data in our databases and actually, more and more data in our meta-database (the DB2 Catalog).

 

Real Time Statistics (RTS)

In fact, the amount of information that was “of use”, took a quantum leap when the Real Time Statistics (RTS) tables were introduced back in DB2 V7 (As a set of post-GA APARs). But this new data source freed us from having to run a RUNSTATS to see when we need a REORG and also helped greatly with Image Copy and REORG decisions. When it was first announced; a lot of shops were worried about the overhead of this data but as DB2 collected the statistics internally, anyway the actual overhead is negligible – especially in relation to the benefit they give!

For more details about RTS usage, see one of my earlier Newsletters. Nearly all of the SOFTWARE ENGINEERING products use the RTS data in some way, even if it is just displaying the data on the screen “out of interest”.

 

Dynamic Statement Cache (DSC)

Then in DB2 V8 came the next jump forward – EXPLAIN STMTCACHE ALL (as long as your user id had SYSADM of course otherwise you only get your own statements back…). This enabled a look inside the Dynamic Statement Cache (DSC) that before had been hidden behind the veil of IFCIDs. Now to really get useful information you *must* have started these two traces.

– START TRACE(PERFM) CLASS(31) IFCID(318)
– START TRACE(ACCTG) CLASS(3)

And then waited for a “representative amount of data to be accumulated” – Is that one minute, one day, one month? Again the overhead plays a major role but all I have seen is “between 2% and 4%” which for me is low enough to justify having these always switched “on”. Of course, the data that is returned is priceless and more than outweighs the overhead. Nevertheless, I wrote a newsletter all about the DSC.

What I have seen, is that the “Hit Ratio” of the DSC is basically a worthless metric. I have been to shops where their hit ratio was 99% + but actually the DSC was flushing statements out at a rate of over 20,000 per hour! Yes, that is not a typo, 20K statements/hour ! The “problem” is that if you have say 4,000 statements in the cache (Which is normal for a cache size of about 150,000Kb by the way) and you imagine that one of the SQLs is executed 16,000,000 times and all the rest are flushed you still have a “hit ratio” of nearly 100%! The better metric is your “flush per hour” which you should try and reduce to less than 4,000 if you can…

Remember that literals kill the DSC – and JAVA developers kill it for fun!

To squeeze even more out of the DSC you must make sure it is as big as a house – Set it to 320.000 Kb if you can! Make sure all literals are used only when absolutely needed! Check all special register usage and any other reasons why the DSC was not correctly used. Our new SQL WorkloadExpert does all this and more for you of course.

 

The new Static Statement Cache (SSC)

Having slipped in a mention of our new product, from now on called WLX, we can head on into the new world introduced since DB2 10 NF and that is what I call the Static Statement Cache (SSC) – IBM call it static SQL in the EDMPOOL but I find SSC a *much* better name!

This new addition to the “very cheap but great data” from DB2 is the final piece in a large, complex jigsaw puzzle. What it does is treat static SQL exactly the same as dynamic SQL so you now have performance metrics and data in the same format and style as the DSC – IBM have not done an “EXPLAIN SSC ALL” – you still have to write mainframe assembler to get the data in the IFCIDs but that is why we are here! We write the assembler, the high speed STCs and the GUI front end so you do not have to worry about all that!

 

Build your own SQL Performance Warehouse

Further IBM added an enhancement to the DSC (and also in the new SSC) which means that flushed statements are now *also* thrown as an IFCID – this is fantastic news!

It simply means that you no longer have “invisible” SQL on your machine – All executed SQL is now freely available with its performance metrics. This is great news for SQL tuners and performance managers. You can now finally really see what is running on the machine, when it runs, who runs it, what it costs, how it adds to the 4 hour rolling average etc. Then you can analyze the data to find trends and bottlenecks and areas for tuning that up until this time were not even known to be there! There is no finger pointing here. The data simply did not exist before so no-one could see which possible benefits existed.

So now you can build your own SQL Performance Warehousefor long term analysis and tuning which should contain:

RTS data – Object list, When was the object last REORGed, How many rows?, etc.
DSC data – Which SQLs are run? How bad are they? Can I make them “better”?, etc.
SSC data – Which packages are causing me pain? Which SQLs are never run?, etc.

Further you can add a cross-reference to objects to get object level usage statistics for all your tables, indexes, packages, collections etc. which can also be used for an application level analysis which then leads to one of the great things which is a “before and after comparison”. To explain this, think about the following scenario:

 

Find out the „bad guy“

You have a “bad guy” SQL that would really fly if a new index was created.
The problem is “Will this new index help this one query but cripple all of the rest?” and of course you never knew!
Now you can find it out by simply gathering all the data for, say, a week then create the index with runstats etc. and then wait another week while collecting data. Once that week is done, simply compare all of the SQL that had anything to do with that table and see how the CPU, IO and Elapsed times compare. If the overall average CPU goes down then you can conclude it was a winner! However if it has gone up – then you might as well drop the index and think again…

All of this data is also great for charting, graphing and reporting in speedometers, barometers, histograms, pie charts and radar diagrams, which raises the awareness of SQL Workload to the management level in a really good visual way.

 

I hope from this brief introduction to the topic of the “new” SSC and enhanced DSC that it has awakened your interest in the topic – I am really excited about all this data (but then again I am a geek!)

As always if you have any  questions or comments please email me.
TTFN
Roy Boxwell

2013-04: DSC Hit Ratio – Overrated metric???

 

Dynamic SQL Cache (DSC) settings

I have recently been to a bunch of shops where everyone was very happy with their Dynamic SQL Cache (DSC) settings. They all said “We have statements for up to four days in the cache and a 99.2% hit ratio so everything is green!”

They were so wrong….

The problem is, that one statement in the DSC that is executed 16,000,000 times, gives a HUGE false positive on the hit ratio, and *one* really old statement does not automatically imply that *all* statements are so long lived in the DSC.

 

What I have found is that the DSC flush rate per hour is a much much better metric to see if your DSC is thrashing itself to death. If you have 10,000 statements or more being flushed out of the cache, then your cache is not too healthy!

At the sites where I looked, the peak was 25,000 statements an hour being flushed! This is seriously bad when the DBAs were sure that everything “is green”…

 

So how do you work out your Flush/hour ???

Well, you could always buy and install the SQL WorkloadExpert that does the calculations for you, or you could run this set of SQL to also get nearly the same style of data!

 

First,

do an EXPLAIN STMTCACHE ALL on one of the members of your data-sharing groups or in your non-datasharing DB2 (Remember that this command only extracts the local data!!!)

 EXPLAIN STMTCACHE ALL;
---------+---------+---------+---------+---------+---------
DSNE616I STATEMENT EXECUTION WAS SUCCESSFUL, SQLCODE IS 0
---------+---------+---------+---------+---------+---------

 Then,

execute the following SQL:

SELECT COUNT(*) AS NUM_STMTS

  , INTEGER((MAX(STMT_ID) - MIN(STMT_ID) + 1 - COUNT(*)) /
  (
  TIMESTAMPDIFF ( 2 , CAST(
  MAX(CACHED_TS) - MIN(CACHED_TS) AS CHAR(22)
  )) / 3.6E+03
  ) )
  AS FLUSH_PER_HOUR
  , MAX(STMT_ID) - MIN(STMT_ID) + 1 - COUNT(*) AS FLUSHED_STMTS
  , MIN(STMT_ID) AS MIN_STMT
  , MAX(STMT_ID) AS MAX_STMT
  , MIN(CACHED_TS) AS MIN_STMT_TS
  , MAX(CACHED_TS) AS MAX_STMT_TS
  , DECIMAL(
    TIMESTAMPDIFF ( 2 , CAST(
      MAX(CACHED_TS) - MIN(CACHED_TS) AS CHAR(22)
                           )) / 3.6E+03
           , 12 , 3 ) AS MAX_AGE_OF_CACHE
FROM <userid>.DSN_STATEMENT_CACHE_TABLE
      ;

 

I use here the TIMESTAMPDIFF built-in function (BIF) with 2 to get the seconds, and then divide by 3600 to get back to hours.

The output will look like:

 ---------+---------+---------+---------+---------+---------+---------
NUM_STMTS   FLUSH_PER_HOUR   FLUSHED_STMTS    MIN_STMT     MAX_STMT
---------+---------+---------+---------+---------+---------+---------
     69                4               200        7750         8018

---------+---------+---------+---------+---------+---------+---------+-
MIN_STMT_TS                  MAX_STMT_TS               MAX_AGE_OF_CACHE
---------+---------+---------+---------+---------+---------+---------+-
2013-04-24-09.49.55.712231   2013-04-26-09.52.46.758709          48.047

 

Here, on one of our small test systems you can see we have a very low use and flush rate, however, I am sure that you all will be surprised when you run this in your pre or production DB2s!

 

Feel free to send me your comments and ask questions.

TTFN,
Roy Boxwell
Senior Software Architect.

2013-05: Access Path Changes During a DB2 10 Skip-Level Migration

Lately I have been asked a lot about skip level migration and so, even though I was surprised that people are *still* on DB2 V8 (Even the nails have rusted on the perch…), I have dug out this “old” marketing/whitepaper below as I thought it could actually make a nice newsletter as a final last gasp info output for DB2 V8…

Obviously this is not written in the normal “Roy” style as it is primarily a marketing paper but there is still good data in it – and next month we will be back in the land of technical newsletters – I promise!

So here it is DB2 10 Skip Level Migration:

 

DB2 10 Skip Level Migration
A speedier way to DB2 10 

For the second time in the life cycle of DB2 z/OS, IBM supports a version jump for the migration – now from version 8 to version 10. It was also possible to leave out a version ten years ago, namely the jump from version 5 to version 7. The direct jump from DB2 version 8 to DB2 version 10 is referred to as a „Skip-Level Migration„.

During the DB2 V10 beta program, this function was available to selected customers; and it has been intensively tested and is a great option for DB2 customers that are currently on DB2 version 8 (I sincerely hope that is not many!). If a site plans the skip migration ahead and prepares it wisely, the migration from V8 to DB2 10 can be as smooth as other migration scenarios. IBM estimates that the project duration for the migration should take around one and one-half times as long as it would take for a simple DB2 migration. However, a skip-level migration carries a much greater risk because software updates for two versions will be simultaneously implemented.

 

General recommendations

This newsletter describes some special aspects to be considered when performing a skip-level migration and rounds out the information provided by IBM.

IBM’s best practices contain a customer recommendation for detailed migration planning as well as an extensive test strategy.

It is especially important to ascertain the current service levels of all applications when conducting this special type of migration. The measures described in this newsletter should support this.

 

 

Test & review RUNSTATS

Regarding applications, IBM’s best practices strongly recommend that customers test and review their RUNSTATS procedures in order to guarantee that the DB2 Optimizer can choose efficient access paths by using all of the required information in the DB2 catalog (including the new stats in 9).

 

REBIND

Version 10 – as well as Version 9 – contains a number of Optimizer enhancements that can/will drastically improve performance, which is realized after the migration by REBINDing all packages (if static SQL is available). For dynamic SQL, the new Optimizer code will be used instantly. For several reasons, customers should plan to REBIND all static application packages as part of the skip-level migration. The changes of the static access paths can be cross checked based on the EXPLAIN information in the PLAN_TABLEs. Dynamic SQL access paths can be compared using the dynamic statement cache (DSC) as a base for an EXPLAIN.

 

Access Path Changes Due to Optimizer Enhancements

Part of the DB2 migration is the very time-consuming process to perform the necessary checks and validations for all of the current access paths. With skip-level migration, additional attention to the changed Optimizer behavior is strongly recommended.
Unfortunately, the access path changes caused by a migration are not predictable in advance. Or are they?

 

The Early Precheck

Each access path depends upon the rules and algorithms of the Optimizer and the DB2 version-dependent code.
By comparing all version 8 access paths with the resulting access paths of version 10, the tool Bind ImpactExpert (BIX) identifies the changes necessary for the new Optimizer in advance.

DB2 applies several version-specific optimizations to the internal executable code of an access path. Thus, it is important to reactivate those performance options – fast column processing a.k.a. SPROCS (selects) and UPROCS (updates) etc. – because the options get “lost” during a migration without REBINDs.
One important thing to mention here: IBM automatically REBINDs all packages that are from version 5 and older. To stay in control of the access paths during the AUTO-REBIND: BIX-IT! 
Bind ImpactExpert (BIX) safely ensured performance stability during migrations from V7 to V8 and from V8 to V9. According to individual service levels of applications being affected by the migration, the tool prechecks the whole environment or just a subset that requires special attention. At a number of customer installations, so-called Optimizer “patterns” were determined (patterns classify different types of access path changes).

Some examples:

– Index accesses change to table space scans for small tables
– Non-matching index scans change to table space scans for large indexes

The functional changes to RUNSTATS utilities in DB2 9 will also have an impact on the access paths since catalog statistics will change after executing the new RUNSTATS.

Version 8 also extended the functionality of the RUNSTATS utility. Adapting your RUNSTATS strategies accordingly can correct access path degradations that are discovered.

 

Preparing Suitable Catalog Statistics Prior to Migration

When preparing for a migration, special attention should be paid to your RUNSTATS procedures.

On the one hand, the Optimizer requires detailed statistics in order to choose (more) efficient access paths. On the other hand, after the migration, it is necessary to REBIND all packages in order to activate the new DB2 10 Optimizer code. Due to the amount of time required to execute RUNSTATS for all objects, it is not advisable to perform this on the day of the migration when entering DB2 conversion mode (CM). A RUNSTATS instantly invalidates the access paths in a DSC for dynamic SQL.

 

What is the best way to prepare the migration with RUNSTATS without risking the DSC?

Bind ImpactExpert’s DSC Protection allows the execution of RUNSTATS for all objects under DB2 Version 8 without any risk by preventing unwanted invalidation of the DSC. This means there will be no uncontrolled access path changes for dynamic SQL during the preparation phase. With static SQL, the Optimizer determines (new) access paths the REBIND is executed.

 

What is a practical method to compare all V8 and DB2 10 access paths?

Bind ImpactExpert  categorizes each SQL statement into:

  •  Same access paths (or unimportant changes)
  •  Improved access paths
  •  Degraded access paths
  •  Changed access paths (not automatically rated)

 

Is your RUNSTATS strategy under version 8 prepared for the new challenges of the DB2 Optimizer under Version 10?

Comparing access paths with Bind ImpactExpert before the migration will help with the preparations.

 

How do Optimizer patterns specifically influence your applications in your DB2 system?

This too is analyzed and answered by Bind ImpactExpert.

Enhancements to Bind ImpactExpert support skip-level migration. It provides extended rules and standards as well as results categories.

Using Bind ImpactExpert, the first analysis of the access path methods between versions 8 and 10 show results similar to those found in the past. However, the percentage of changes made during a skip-level migration is much higher.

During a “regular” migration, only 5% to 30% of all statements (dynamic and static) have access path changes. This number can double during skip-level migration.

 

Using APCOMPARE/APREUSE in DB2 10

There are two new REBIND options in DB2 10 that allow you to compare and reuse access paths (APCOMPARE and APREUSE).

These options suppress a REBIND execution in the event of an access path change. However, unless the packages were bound or rebound in DB2 9 NF, these options do not support skip-level migration. Using them later (like in your daily maintenance procedure) is also highly controversial because they only know that access paths have “changed”; thereby suppressing REBINDs for access path improvements as well.

 

Using Package and Access Path Stability after the Migration

Version 9 introduced Plan Management, which is also known as either package or access path stability. This is available as a REBIND PACKAGE parameter and as a ZPARM. In DB2 10, an enhancement to this function provides you with the option to save the old version 8 access paths when performing the REBIND. If required, an access path can be rolled back to the old access path. PLANMGMT(EXTENDED), for example, has 3 access path versions (CURRENT, PREVIOUS, and ORIGINAL).

Bind ImpactExpert conveniently includes Plan Management in the skip-level migration so a reversal of degraded access paths is possible at any time.

It reduces the overhead associated with this feature by up to 90% by limiting its use to those access paths that have degraded. For complete details about Plan Management, refer to SOFTWARE ENGINEERING’s white paper title “Package Stability“ – A great feature with some weak points”.

This Bind ImpactExpert function – which was so useful during migrations from V8 to DB2 9 – is also usable for skip-level migration.

 

Summary

DB2 10 offers significant performance improvements. To get the most out of DB2 10 improvements, global REBINDs are a must. With Bind ImpactExpert, you can predict access paths before you migrate and avoid surprises in a production DB2 10 environment.

As usual any questions or comments are welcome,
TTFN Roy Boxwell
Senior Software Architect