2013-11: FREE DB2 HealthCheck

This month, as a thank you to everyone on the News from the Labs “nice” list, we have a seasonal freebie!

Our DB2 HealthCheck package pinpoints weaknesses and opportunities for improvements and optimization in DB2 z/OS systems.
Just click this link, fill in and select “Healthcheck Freeware” down the page.
My support colleagues will ship our Licensed Freeware edition of our HealthCheck series or DB2 z/OS.

PerformanceHealthCheck for DB2 z/OS (PHC)

PerformanceHealthCheck for DB2 z/OS (PHC) checks your DB2 subsystem for a range of problems and lists out what it finds including the latest enhancement – the Coupling Facility checker.

I read on listserv about people with “Coupling Facilities under stress” and so I added some CF checks. It checks the six important values in your CF. The Level of the microcode, the transfer time, the number of rejects, the false contention percentage, the subchannel busy percentage and finally the all paths busy count. From these KPIs you can see if your CF is “under stress” or not! Now to get all this juicy data the LOAD library *must* be APF authorized of course!

Remember that the normal Performance HealthCheck still runs fine without being APF auth’d just the CF check must be.

Statistics HealthCheck for DB2 z/OS (SHC)

Along with PHC comes Statistics HealthCheck for DB2 z/OS (SHC), which lots of you may already be familiar with. It allows you to analyze and assess a complete DB2 subsystem down to a single database and tell you what is ʺwrongʺ or inconsistent with your DB2 catalog statistics.

This enables you to determine any problems before they get bad and to improve performance by providing the DB2 Optimizer with better information from which it can base its cost estimate on. It fully supports and is compliant for DB2 10. It is a perennial favorite and you cannot run it enough. Especially when you’re going to migrate to a new DB2 version, this software is a must to protect yourself from strange optimizer behavior.

The binaries come with the products documentation with a full feature overview that summarizes what our PHC can do for you!
As usual any questions or comments are welcome,
TTFN Roy Boxwell
Senior Software Architect

30/06/201422/06/2021

2013-09: Roy’s first DB2 11 review

Well now I can finally talk about DB2 11 for the first time … We have had it for over a year now and have been in the ESP from the beginning. In this newsletter I want to give you a “first look” at what I think is cool, great or odd!

DB2 11. Here’s my personal list of brilliant down to why?

Brilliant
Great
Useful
Why?

Brilliant

REORG TP with NPSIs got a nice boost in performance and the new ability to do TP copies during a DSNUM ALL REORG is *very* good. REORG can now do much better and faster inline statistics including histograms. The switch phase got a massive boost as well. Mapping tables also became history in this release! You can finally drop ‘em all from your system. DB2 will create and use them on the fly. In NFM you *must* use the new format mapping table so you might as well let DB2 do the work for you!
RUNSTATS now works *without* a profile – Hooray! Terry Purcell can now sleep better at night but the update of the RTS TOTALROWS and TOTALENTRIES is *still* only done if you run with SHRLEVEL REFERENCE – and how many of you do that?? A new option RESET ACCESSPATH finally gives you the chance to *remove* all access path data in the catalog for the given objects. This then enables a new RUNSTATS to collect only the information you actually need/want. ZiiP offload has been greatly increased as well.
For the Optimizer there is Index Skipping to enable faster duplicate discarding processing, a new In Memory Data Cache (IMDC) to enable better run time optimizer choice depending on the current amount of free memory (This memory cache value (MXDTCACH) is default 20MB and can be increased) and a bunch of new query rewrite techniques to help “automatic bad SQL generators” get a chance of index access. E.g. WHERE YEAR(date_column) = 2013 is rewritten to WHERE date_column BETWEEN ‘2013-01-01’ AND ‘2013-12-31’. Finally OR and IN list with NULLs gets a rewrite boost as well so WHERE COLUMN IN (‘A’, ‘Y’) OR COLUMN IS NULL is simply rewritten to COLUMN IN (‘A’, ‘Y’, NULL). All these changes allow index usage where it was not possible before.

The optimizer team also delivered a seriously good boost for embedded CASE structures that used to be anathema to DBAs:

SELECT COL2, COL3
FROM TAB1
WHERE COL1 = CASE (CAST(? AS INT))
             WHEN 1 THEN 'R'
             WHEN 2 THEN 'O'
             ELSE 'Y'
             END
;

SELECT TAB2.COL1, TAB1.COL2, TAB2.COL2
FROM TAB1, TAB2
WHERE TAB2.COL1 = CASE WHEN TAB1.COL1 = 'Y'
                  THEN TAB1.COL2
                  ELSE TAB1.COL3
                  END
;

Both of these used to be stage2 but now can actually use indexes!

Great

Expanded LRSN from six to ten bytes. DB2’s six byte RBA meant it could store 2 to the power of 48 bytes of data on the Log or 256TB of data. Back in the late 80’s this was thought to be “more than enough” … So now we have 2 to the power of 80 or 1YB. Yep, that’s a Yottabyte of Log! This is optional but from NFM on DB2 will internally always store ten bytes – the externalized data is dependent on whether you do the migration or not. However, it is highly recommended as you will get a small performance hit to convert down to six bytes. The other boost this gives is a reduction in spins waiting for a unique LRSN in data sharing which – with the faster and faster hardware – was becoming a bottleneck.

DSN1COPY – The first time a dataset is opened for normal business after a DSN1COPY, some basic “sanity” checks are run to try and flag “bad” DSN1COPY runs before too much work, hopefully any, has actually been done. This is very handy as the DSN1COPY is notoriously difficult with its parameters and input control cards.

AUTOSIZE on Bufferpools gets a wonderful fix to stop “runaway” autonomic mismanagement. In other words, you can set upper and lower bounds which will *not* be exceeded – very nice!

Pseudo-empty index page cleanup daemon – A very nice “set it and forget it” feature that actually removes completely empty pages from indexes, thus reducing the need for REORG INDEX and boosting performance.

Useful

Global Variables – Pretty handy feature in DB2 11 to easily pass data along a chain of programs. A bit like a SPA in good old IMS!

JVM only being started once and using 64 Bit storage is a major winner for most shops as nearly everyone uses JAVA these days and sometimes even on Big Iron.

Archive Tables – Similar to “time travel” tables from DB2 10 but looked upon as an “extension of the current table”. This is very handy indeed when all you want is a bucket to throw any DELETEs into. Plus you can use this bucket as input to be added to the current table or not – Very nice feature this!

DROP COLUMN – I will be using this as I have *always* : created a copy table, cross loaded it, dropped the original table, created the new table without the column, cross loaded back to it, created any dependent views, GRANT access back. Reorg, Runstats, Rebind – Done *phew*. Now you can simply Drop the column and then schedule a reorg and you are also done!

Autonomous procedures – This gives the chance to COMMIT any work done *just* in the stored procedure. The main task can still ROLLBACK but that will not affect the data COMMITted in the proc. This could be very handy or a complete disaster depending on your business logic!

Faster recovery of directory data because certain directory tables/indexes update RBA/LRSNs are now stored in the SYSLGRNX. This is a very good feature for recovery of catalog objects of course!

Why?

GROUP BY multiple columns and aggregates has been added – now here is another new SQL feature, like OLAP queries, that nearly no-one will ever use!

Enforce NUMTCB=1 for stored procedures … this is an enhancement?!?!?

And last but not least

Finally my personal little favorite – LIKE_BLANK_INSIGNIFICANT is a new ZPARM – it makes the rules for LIKE of static and varchar get a little bit more aligned. This is, of course, a change in behavior and you must make sure that your programs react as you still think they should!

That’s all for now –next time I will dive into details on some of the above topics.
If anyone has any wishes please email me and I will see what I can do!

As usual any questions or comments are welcome,
TTFN Roy Boxwell
Senior Software Architect

30/06/201422/06/2021

2013-08: Inline LOBs

In DB2 10 you can now define LOBs to also be “inline”.

This small feature is actually fantastic for performance and general use and I urge all DB2 users to evaluate using them!

What they enable is the use of just the “base” table space and not the aux (LOB) space at all! This is a very good thing as LOB access can(!) sometimes be painfully slow and cause bottlenecks in the processing.

In the DB2 10 Performance guide Chapter 4.3 there is a nice list of benefits:

Inline LOBs offer the following performance advantages over LOBs that are stored in auxiliary tables (sometimes called outline LOBs):

Disk space savings because two LOBs cannot share a page on a LOB table space
Disk space savings because the inline portion of a LOB can be compressed
Synchronous I/Os to the AUX index and LOB table space are avoided
CPU savings associated with accessing the AUX index and LOB table space
Sequential and dynamic prefetch I/O for LOBs
Improved effectiveness of FETCH CONTINUE when scanning rows
Index on expression can be enabled for LOB data

Note the LAST one which I think is actually one of the best reasons!!!

Obviously there are a few “considerations”:

Check how long your LOBs are now and try to size for the 80 / 20 rule
Remember that the Page size and the related BP size will need to be adjusted and tuned
If you rarely use the LOB column then don’t bother!
If going from “old” style space to an inline LOB with compression then you must do a sort of triple jump:

a. ALTER to get a UTS and also in RRF format – REORG to action
b. ALTER to get INLINE LOB usage – REORG to action
c. Now REORG it again to actually get COMPRESSION as that is not done by the earlier REORGs!

An example SQL from the DB2 Performance chapter 4.3

Here’s an example SQL that I took from the DB2 Performance chapter 4.3 and changed a little to report more info:

WITH LOB_DIST_TABLE (LOB_LENGTH                    
                    ,LOB_COUNT)                           
AS (                                               
    SELECT LOBCOL_LENGTH                                 
            ,COUNT(*)                                       
    FROM (                                           
    SELECT ((LENGTH(STATEMENT) / 1000) + 1) * 1000 
            AS LOBCOL_LENGTH                     
            FROM SYSIBM.SYSPACKSTMT                     
            ) LOB_COL_LENGTH_TABLE                      
    GROUP BY LOBCOL_LENGTH                               
   )                                                     
SELECT '01000' AS SIZE                                   
       ,COALESCE(SUM(LOB_COUNT) , 0) AS COUNT                 
       ,(COALESCE(SUM(LOB_COUNT) , 0) * 100.00) /               
        (SELECT SUM(LOB_COUNT) FROM LOB_DIST_TABLE) AS PERCENTAGE
FROM LOB_DIST_TABLE                                             
WHERE LOB_LENGTH <= 1000                                       
UNION ALL                                                      
SELECT '02000' AS SIZE                                         
,COALESCE(SUM(LOB_COUNT) , 0) AS COUNT                   
,(COALESCE(SUM(LOB_COUNT) , 0) * 100.00) /               
(SELECT SUM(LOB_COUNT) FROM LOB_DIST_TABLE) AS PERCENTAGE

FROM LOB_DIST_TABLE                                            
WHERE LOB_LENGTH <= 2000                                       
UNION ALL                                                      
SELECT '04000' AS SIZE                                         
       ,COALESCE(SUM(LOB_COUNT) , 0) AS COUNT                   
       ,(COALESCE(SUM(LOB_COUNT) , 0) * 100.00) /               
        (SELECT SUM(LOB_COUNT) FROM LOB_DIST_TABLE) AS PERCENTAGE
FROM LOB_DIST_TABLE                                            
WHERE LOB_LENGTH <= 4000                                        
UNION ALL                                                      
SELECT '08000' AS SIZE                                         
,COALESCE(SUM(LOB_COUNT) , 0) AS COUNT                   
,(COALESCE(SUM(LOB_COUNT) , 0) * 100.00) /                
(SELECT SUM(LOB_COUNT) FROM LOB_DIST_TABLE) AS PERCENTAGE
FROM LOB_DIST_TABLE                                             

WHERE LOB_LENGTH <= 8000                                       
UNION ALL                                                        

SELECT '12000' AS SIZE                                         
,COALESCE(SUM(LOB_COUNT) , 0) AS COUNT                   
,(COALESCE(SUM(LOB_COUNT) , 0) * 100.00) /               
(SELECT SUM(LOB_COUNT) FROM LOB_DIST_TABLE) AS PERCENTAGE
FROM LOB_DIST_TABLE                                            
WHERE LOB_LENGTH <= 12000                                      
UNION ALL                                                       

SELECT '16000' AS SIZE                                          
,COALESCE(SUM(LOB_COUNT) , 0) AS COUNT                   
,(COALESCE(SUM(LOB_COUNT) , 0) * 100.00) /               
(SELECT SUM(LOB_COUNT) FROM LOB_DIST_TABLE) AS PERCENTAGE

FROM LOB_DIST_TABLE                                            
WHERE LOB_LENGTH <= 16000                                      
UNION ALL                                                       

SELECT '20000' AS SIZE                                          
,COALESCE(SUM(LOB_COUNT) , 0) AS COUNT                   
,(COALESCE(SUM(LOB_COUNT) , 0) * 100.00) /               
(SELECT SUM(LOB_COUNT) FROM LOB_DIST_TABLE) AS PERCENTAGE

FROM LOB_DIST_TABLE                                             
WHERE LOB_LENGTH <= 20000                                      
UNION ALL                                                      

SELECT '24000' AS SIZE                                        
,COALESCE(SUM(LOB_COUNT) , 0) AS COUNT                  
,(COALESCE(SUM(LOB_COUNT) , 0) * 100.00) /              
(SELECT SUM(LOB_COUNT) FROM LOB_DIST_TABLE) AS PERCENTAGE
FROM LOB_DIST_TABLE                                           
WHERE LOB_LENGTH <= 24000                                      
UNION ALL                                                     
SELECT '28000' AS SIZE                                        
      ,COALESCE(SUM(LOB_COUNT) , 0) AS COUNT                  
      ,(COALESCE(SUM(LOB_COUNT) , 0) * 100.00) /              
       (SELECT SUM(LOB_COUNT) FROM LOB_DIST_TABLE) AS PERCENTAGE
FROM LOB_DIST_TABLE                                           
WHERE LOB_LENGTH <= 28000                                     
UNION ALL                                                      
SELECT '32000' AS SIZE                                        
      ,COALESCE(SUM(LOB_COUNT) , 0) AS COUNT                  
      ,(COALESCE(SUM(LOB_COUNT) , 0) * 100.00) /              
       (SELECT SUM(LOB_COUNT) FROM LOB_DIST_TABLE) AS PERCENTAGE
FROM LOB_DIST_TABLE                                           
WHERE LOB_LENGTH <= 32000                                     
UNION ALL                                                     
SELECT '99999' AS SIZE                                        
      ,COALESCE(SUM(LOB_COUNT) , 0) AS COUNT                  
      ,(COALESCE(SUM(LOB_COUNT) , 0) * 100.00) /              
       (SELECT SUM(LOB_COUNT) FROM LOB_DIST_TABLE) AS PERCENTAGE
FROM LOB_DIST_TABLE                                           
WHERE LOB_LENGTH >  32000                                     
ORDER BY 1                                                    
;

Now it returns this data:

---------+---------+---------+-------
SIZE         COUNT           PERCENTAGE
---------+---------+---------+-------
01000       112102              95.66
02000       115957              98.95
04000       116771              99.65
08000       117044              99.88
12000       117110              99.94
16000       117140              99.96
20000       117173              99.99
24000       117174              99.99
28000       117175              99.99
32000       117176             100.00
99999            0                .00

DSNE610I NUMBER OF ROWS DISPLAYED IS 11
As can be seen just changing the inline LOB length to be 1000 bytes would “hit” over 95% of the rows!
This might even mean no change to page size and/or bufferpool would be needed!

Remember to change the driver CTE to be your candidate LOB column and table as here I have used STATEMENT and SYSIBM.SYSPACKAGE.

Return the LOB column names and tables needed

Here’s a little query, taken from the performance book and enhanced/corrected, to return the LOB column names and tables needed:

SELECT SUBSTR(CO.NAME , 1 , 30) AS COLUMN_NAME              
,STRIP(CO.TBCREATOR) CONCAT '.' CONCAT STRIP(CO.TBNAME)
                                AS TABLE_NAME               

FROM SYSIBM.SYSCOLUMNS CO                                   
WHERE CO.COLTYPE IN ('BLOB' , 'CLOB' , 'DBCLOB')            
  AND NOT EXISTS (SELECT TB.NAME                            
                  FROM SYSIBM.SYSTABLES TB                  
                  WHERE TB.NAME    = CO.TBNAME              
                    AND TB.CREATOR = CO.TBCREATOR           
                    AND TB.TYPE    = 'X')                   
ORDER BY 2 , 1
;

Create an index

One very nice feature is the ability to create an index on expression:

 CREATE INDEX IQATW005.LOB_IOE_IX      
  ON IQA0610.IQATW005                  
    (CHAR(SUBSTR(SQL_TEXT , 1 , 254)) )
     USING STOGROUP SYSDEFLT            
          PRIQTY    14400              
          SECQTY    14400              
     ERASE NO                           
     FREEPAGE    5                      
     PCTFREE     5                      
     BUFFERPOOL BP0                     
     CLOSE YES                          
;

Only available for SUBSTR and with fixed start and end but still very useful as you can now “scan” the text in your WHERE clauses!

The performance benefits can be immense for in-line LOBs and so I recommend a quick test and then roll-out to production!
As usual any questions or comments are welcome,
TTFN Roy Boxwell
Senior Software Architect

23/06/201422/06/2021

2013-10: DB2 11 Expanded RBA/LRSN

Now if you were a good attendee at the IOD (and paid attention to the freebies!), you probably got a free copy of the great little book „DB2 11: “The Database for Big Data & Analytics”.

If you did not get a copy at the IOD you can download a free electronic copy here.

On page 30 there is the following text:

“Implementation of the expanded RBA/LRSN formats is optional but highly recommended as DB2 11 will use 10-byte values internally when in NFM, and there will be a small performance penalty for converting these to old Format”

Italics added by me of course!

Now this caused a couple of questions at the IOD as how to manage a REORG of the whole world. So my newsletter this month is to suggest a way to do this without just REORGing everything from DB 0 to DB Z!

How to manage a REORG without just REORGing everything from DB0 to DBZ ?

1) Be on *usage* metric
Start with your “known suspects” list – Every shop has a top ten list of SQLs normally based on CPU. What you must do is change this list to not be on CPU but on *usage*. The trick is how to actually do this! Well, I would start with the average executes per hour for all objects (To get this metric you must EXPLAIN or have the EXPLAIN data). Obviously the objects at the top of this list should be REORGed first!

2) 50 Reorgs a night
Get your system automation or DB2 Database Maintenance system to use the above data to drive, say, 50 Reorgs a night *purely* based on usage metrics

3) Wait

4) You’re done when no more REORGs are triggered for this metric

Of course it would be easier if all the data was there in front of you but then you would need our SQL WorkLoadExpert (WLX) to get the usage metrics which is, of course, linked to our RealTime DBAExpert (RTDX) which takes care of the REORGs but I digress…

SYSCOPY’s handling

What is also “interesting” is the way SYSCOPY handles all of this… If you select the LENGTH of the START_RBA before the CATENFM job has REORGed SYSCOPY you will get 6 back. After the CATENFM has done SYSCOPY you then get 10 back. This way your “home grown” systems can “know” if the RBA is 6 or 10 Bytes…

SELECT LENGTH(START_RBA)
FROM SYSIBM.SYSCOPY 
FETCH FIRST 1 ROW ONLY 
;

RBA & LRSN extensions

The next interesting point is that a RBA has been extended on the left and a LRSN has been extended on the left *and* the right. So you need to be aware of whether or not the row was written in a data-sharing environment. Probably not that interesting to most people as their shops are one or the other and not normally a mix, but for our software we must know what is in the START_RBA so we use a little CASE to do the work for us if running in DB2 V8, 9 or 10:

CASE WHEN GROUP_MEMBER > ' ' 
     THEN 
          SUBSTR(X'00' CONCAT START_RBA CONCAT X'000000' , 1 , 10)
     ELSE 
          SUBSTR(X'00000000' CONCAT START_RBA , 1 , 10)
END

The last thing I will mention this month is the SYSLGRNX format has changed which might catch a few people out if you are using CI style programs to read it! The DB2 11 NFM version is now in a segmented space and has the extended trailer so beware!

As usual any questions or comments are welcome,
TTFN Roy Boxwell
Senior Software Architect

23/06/201422/06/2021

2014-01: Complementing IBMs ACCESS PLAN Stability

Part 1: Dynamic SQL

with SEGs complete RUNSTATS Rescue package:
Why it is a good idea but why it fails in certain cases?

The story

This month I want to tell you a story. This story is true, but the names have been changed to protect the innocent!

A big company regularly scheduled production staging on Thursday nights. One night last year everything went as normal – until the Friday morning…

… then the telephones started ringing, and people were complaining about slow or non-existent response times for a rather critical business application.

The problem quickly escalated from the standard “Help desk”-level answers of “Switch it off and on, reboot”, and “Have you changed anything?” to senior managers demanding to know why things were not working anymore.

At this point, the DBA group was not actually involved, as it was first thought that a “bad” package had been promoted “by accident” on the night before.

The production group backed out the staged packages, but it didn’t help…
The delays got worse. They then stopped nearly all of the WebSphere Servers to at least allow *some* work through the over loaded system. Now in full panic mode, the DBA and the JAVA teams got involved.They both quickly found the “culprit”, which was an extremely large and complex dynamic SQL statement that had worked fine until some time Thursday night, and was now behaving *very* badly indeed.
The DBA team REORGed the “big” tables involved, in the hope that it would then all get better…
It didn’t.
Finally the DBA team proposed creating a new index which was quickly done and RUNSTATed in production.

The SQL then switched the access path back to a good one.
The WebSphere servers were all re-started, and gradually everything returned to normal.
This whole process actually took two days!
The company involved relies upon its logistics chains and Just-In-Time delivery, so this outage had some serious repercussions, of course… Get the full story

What really happened?

The DBA team then investigated further and found out that what had really happened, was that a RUNSTATS, on just one small table, had been run on the Thursday night at “an inappropriate time” thus causing the statement’s access path to go “pear-shaped”(aka Belly Up) all day Friday and half of Saturday…

How to fix the problem quickly and easily?

The DBA team then thought about ways that such a problem – should it reoccur in the future – could be quickly and easily fixed. Now my part of the story begins…

This company uses our software and had a license to run the Enterprise Statistics Distribution (ESD) component of Bind ImpactExpert, which extracts, and optionally converts, all of the DB2 Catalog data that the DB2 Optimizer needs to do its job. Normally, customers use this to copy all the production statistics over to a sand box style system, to see if a DB2 APAR or DB2 Migration will cause unforeseen problems. For this, they use the Early-PreCheck component of our tool Bind ImpactExpert for Dynamic or Static SQL. Now we do have another scenario, called DSC (Dynamic Statement Cache) Protection, that would nearly do what they wanted, but also does a lot more and, of course, costs more to use!
And so arose the idea for our new PocketTool called RUNSTATS Rescue. “Why is it a PocketTool?” I hear you ask “Because it only costs pocket money!” (aka Pin Money in the USA or an “allowance” if you prefer) These tools are inexpensive to use – really!

Now, before you stop reading at this point and start complaining about the fact this Newsletter is just 100% Marketing, please bear in mind that what I describe here could also be written by you – then you just need to give me credit for sharing the idea…

RUNSTATS Rescue

The idea is to use EXPLAIN in any way, shape or form, either in SPUFI, or directly in any monitor, to simply EXPLAIN the “culprit” SQL, and to remember the PLAN_TABLE owner you are using, as well as the QUERYNO you just used.

Using these two inputs, RUNSTATS Rescue analyses the EXPLAIN output to build a list of extract and update control cards for our ESD, for all of the tables used and *all* of the indexes – even those *not* used, of course! Finally a DSC flush RUNSTATS is also generated for all Tablespaces involved in the query to make sure that the next time this “culprit” SQL comes into the system, it will then use the correct statistics.

Now, of course, the question is: “How do I know which statistics to use as the Rescue statistics?”

The answer: “The ones that were there before you did a REORG with inline RUNSTATS or a stand-alone RUNSTATS”. This is the key point to bear in mind: You must simply run the ESD extract before any normal DB2 Database maintenance jobs run.

Most shops have days, or weekends, when they run these, and it is not a problem to extract the data and then copy it, for example, to a GENGROUP, to enable easily finding the date and time of the last extract when the statistics were “good”, thus enabling the RUNSTATS Rescue job to revert the required statistics very quickly. This gives the DBA group much-needed time to find out what really happened and take any appropriate action – almost – at their leisure.

10 Rescue Steps

Select the new scenario RUNSTATS Rescue
Generate some JCL
Optionally copy to a GENGROUP dataset
Insert the EXPLAIN TABLE-CREATOR and QUERYNO
Automatic launch of our catalog browser
Drill down to the Index level
Ask “new” file name for the extracted “rescue” statistics
Perform the RUNSTATS Rescue extraction
Reset the statistics and executes the RUNSTATS
The “Rescued” Statistics

Top of page

Here’s a walkthrough example of how it looks in real life:

Near the bottom, you can see the new scenario RUNSTATS Rescue – select it to get a little pop-up window with the three steps. The first step must only be done once and then simply be plugged into an existing production job. I would recommend the first job of the normal DB2 Database Maintenance job stream.

1 – Select the new scenario RUNSTATS Rescue in the little pop-up window with the three steps

2 – Generate some JCL

The first option just generates some JCL looking like:

News 2014-01 bild3

3 – Optionally copy to a GENGROUP dataset

At the end the optional step to copy to a GENGROUP dataset

News 2014-01 bild4

4 – Insert the EXPLAIN TABLE-CREATOR and QUERYNO

Selecting the second option then requests the required input data as discussed earlier

News 2014-01 bild5

5 – Automatic launch of our catalog browser

Hitting “enter” then launches our catalog browser to enable you to see which objects were being used by that SQL…

News 2014-01 bild6

6 – Drill down to the Index Level

News 2014-01 bild7

7 – Ask “new” file name for the extracted “rescue” statistics

PF3-ing out of the tool then asks for your original Production Statistics datasets, as extracted by the job in the first step, and a “new” file name for the extracted “rescue” statistics:

News 2014-01 bild8

8 – Perform the RUNSTATS Rescue extraction

The next JCL appears that performs the RUNSTATS Rescue extraction, including the optional steps for GENGROUP support, as seen here:

News 2014-01 bild9

9 – Reset the statistics and executes the RUNSTATS

Finally, the third option is selected, which actually resets the statistics and executes the RUNSTATS to flush the DSC

News 2014-01 bild10

10 – The “Rescued” Statistics

Now the next time that statement appears, it will use the “rescued” statistics and get back its old Access Path.

Top of page

Next Month

Next month I wish to expand upon this topic with the capability of doing the same for Static SQL.

The month after that, I will go into detail about the DSC Protection scenario I mentioned earlier. That is not a pocket tool, of course, but it *is* very interesting!

As usual any questions or comments are welcome,
TTFN Roy Boxwell
Senior Software Architect

23/06/201422/06/2021

2014-09: Bad Data Day

Bad Data Day

– Duplicates found in a migration Precheck
– The “bug”
– FIX the “bug”

One of my favorite topics is STATISTICS and RUNSTATS.
This month I have a short newsletter involving both of them!

Some time ago we were helping one of our customers to perform an Early Precheck (Going from DB2 9 to 10). To do so, we requested a copy of their entire DB2 production statistics so the optimizer could work here at our labs in Dusseldorf – just like at the customer site.

Duplicates found in a migration Precheck

We loaded up the data and I noticed something that just jumped right out…

SELECT TYPE 
      ,FREQUENCYF 
      ,NUMCOLUMNS 
      ,SUBSTR(COLVALUE , 1 , 11) AS COLVALUE 
      ,COLGROUPCOLNO 
       FROM SYSIBM.SYSCOLDIST 
       WHERE TBOWNER = 'aaaaaaa' 
         AND TBNAME  = 'bbbbbb' 
         AND NAME    = 'cccccc' 
ORDER BY 1 , 3 , 2 
; 
---------+---------+---------+---------+---------+---------+---------
TYPE             FREQUENCYF   NUMCOLUMNS   COLVALUE      COLGROUPCOLNO
---------+---------+---------+---------+---------+---------+---------
F    +0.6066539624818615E-02            1 . xxxxxxxxxx .. 
F    +0.6066539624818615E-02            1 . xxxxxxxxxx 
F    +0.6287988717751475E-02            1 . xxxxxxxxxx .. 
F    +0.6287988717751475E-02            1 . xxxxxxxxxx 
F    +0.8554928116458912E-02            1 . xxxxxxxxxx .. 
F    +0.8554928116458912E-02            1 . xxxxxxxxxx 
F    +0.8578238547293950E-02            1 . xxxxxxxxxx .. 
F    +0.8578238547293950E-02            1 . xxxxxxxxxx 
F    +0.8852136109605646E-02            1 . xxxxxxxxxx 
F    +0.8852136109605646E-02            1 . xxxxxxxxxx .. 
F    +0.1229042465777374E-01            1 . xxxxxxxxxx 
F    +0.1229042465777374E-01            1 . xxxxxxxxxx .. 
F    +0.1331608361451540E-01            1 . xxxxxxxxxx .. 
F    +0.1331608361451540E-01            1 . xxxxxxxxxx 
F    +0.1342098055327308E-01            1 . xxxxxxxxxx .. 
F    +0.1342098055327308E-01            1 . xxxxxxxxxx 
F    +0.1633478440765281E-01            1 . xxxxxxxxxx 
F    +0.1633478440765281E-01            1 . xxxxxxxxxx .. 
F    +0.8439891140288000E+00            1 . xxxxxxxxxx .. 
F    +0.8439891140288000E+00            1 . xxxxxxxxxx 
DSNE610I NUMBER OF ROWS DISPLAYED IS 20 
DSNE616I STATEMENT EXECUTION WAS SUCCESSFUL, SQLCODE IS 100

There were “duplicates”!
This is just one of many examples. Also above, you can see the problem – Notice the data in COLGROUPCOLNO? Sometimes there is a hex value and sometimes not!

The “bug”

Now, believe this or not, all of this goes back to a bug in RUNSTATS in DB2 V8 which got this APAR:

PK33517: COLGROUPCOLNO ASSOCIATED WITH SINGLE COLUMN CONTAINS A NUMERIC VALUE INSTEAD OF ZERO LENGTH FIELD ACCORDING TO SQL REF

What’s interesting here, is this APAR is marked as FIN so the “bug” disappeared in DB2 9!

As you can see, the bug caused an erroneous value in the COLGROUPCOLNO to be set for single column frequency rows. The bug was then fixed in DB2 9, but the “bad” data that had been inserted was not automatically cleaned up and, as I hope you all know, the SYSCOLDIST data is never automatically deleted – it is only ever updated.

So when the bug was fixed, the low-value or hexadecimal column number in that field was no longer EQUAL to a zero length field, and so an insert was done.
Since then these rows have just stayed there…

FIX the “bug”

I wrote a little query just to show the count of how many of these bad rows existed in the SYSCOLDIST:

SELECT COUNT(*) AS BAD_GUYS 
FROM SYSIBM.SYSCOLDIST 
WHERE NUMCOLUMNS                = 1 
  AND TYPE                      = 'F' 
  AND NOT LENGTH(COLGROUPCOLNO) = 0 
; 
---------+---------+---------+---------+------
    BAD_GUYS 
---------+---------+---------+---------+------
       8680 
DSNE610I NUMBER OF ROWS DISPLAYED IS 1

All of these entries should be deleted from SYSCOLDIST to help the optimizer pick the right access paths!

I first noticed this problem during a test of our Statistics HealthCheck product, which flagged over 1,200 critical problems in the DB2 catalog and, thinking I had broken something, I checked all of the checks and found the above bad data.
Now is as good a time as any to check your stats *and* download our Statistics Healthcheck Freeware!!

As usual any questions or comments are welcome,
TTFN Roy Boxwell
Senior Software Architect

18/06/201422/06/2021

2014-02: Complementing IBMs ACCESS PLAN Stability

Part 2: the Story continues for Static SQL…

The Story continues
Rescue Dynamic SQL & Static SQL
Rescue “pure” Static SQL
An errant Access path onto the good path
Next month

Welcome back to the second part of this newsletter.
Remember that last month I talked about a nasty real-world event that caused pain and grief to a large number of fellow DBAs?

Short reminder of part 1: rescue Dynamic SQL

A big company regularly scheduled production staging on Thursday nights. One night last year everything went as normal – until the Friday morning……then the telephones started ringing, and people were complaining about slow or non-existent response times for a rather critical business application…

See the previous newsletter here and the “walk through” example for Dynamic SQL rescue.

The Story continues for Static SQL

Well that was “only” Dynamic SQL that was killing them.

SOFTWARE ENGINEERING GmbH created the new Pocket Tool: RUNSTATS Rescue to gallop over the hill and rescue them from bad statistics timing problems.

…Everyone then lamented “But my Static SQL *also* goes horribly wrong if the statistics are badly timed. What can SEG do there?”

Rescue Static & Dynamic SQL

Naturally we have the ability to interface with Static SQL as well as Dynamic. Now, as all roads lead to Rome, or as I like to say “There is more than one way to skin a cat” – my co-workers don’t like that saying much for some reason… anyway, I digress – the interface to the dynamic RUNSTATS Rescue is the PLAN_TABLE owner and the QUERYNO used for the EXPLAIN. RUNSTATS Rescue does the rest –

so for static, there is a “simple” entry point. Just cut–and–paste the static SQL statement that is “causing grief” into SPUFI and EXPLAIN it, just as if it were dynamic SQL. However, the problem here, is that you must then manually generate all required REBINDs.

Top of page

Rescue “pure” Static SQL

What we decided to do, is to create a new Pocket Tool extension for “pure” Static SQL. If you only have Dynamic SQL or if you only have Static SQL, then you will only need a single licence! The data that the new extension requires is all currently available in your SQL monitor, Trace etc. which tells you who the bad guy is, namely: Collection ID, Package name, Version and Statement Id. If you have this data *and* you have done your last BIND or REBIND with EXPLAIN(YES) – and I hope that’s true for 100% of production packages! – you are ready to rock’n’roll!

Two prepare options

So this is how it will look. First we have a new option in the Menu selection list:

News 2014-02 Bild1

Here you can see that we now have two Prepare options: one is for Dynamic, and is the same as in last month’s newsletter, and the other is a new Static one which then leads you to the next pop–up:

A REBIND step instead of a RUNSTATS step

News 2014-02 Bild2

Here you simply enter the required data and then, after it has been verified, you generate and run the RUNSTATS Rescue job, just like last time.

However, now you do not get a RUNSTATS step at the end of the job. Instead, you’ll get a REBIND step for either the single PACKAGE or, if REBIND ALL is Y, for all affected PACKAGES in any COLLECTION. Obviously this option must be used “with care,” however, if the statistics are bad for one package why shouldn’t they be bad for all packages?

An errant Access path onto the good path

That’s it!

Now you have three ways to get an errant Access Path back onto the good path. (Remember that you can *always* cut–and–paste Static SQL into an EXPLAIN SPUFI – which means there are actually two ways to do Static).

Top of page

Next month

Next month I will go into detail about the DSC Protection scenario from our Bind ImpactExpert tool that is *really* cool and helps deal with the same type of problems but at a much higher and fully-integrated manner. It also completely handles the tricky question of :

“OK, so this query gets better when I reset my statistics but *this* query gets worse!”

As usual any questions or comments are welcome,
TTFN Roy Boxwell
Senior Software Architect

18/06/201422/06/2021

2014-03 Complementing IBMs ACCESS PLAN Stability

Part 3: DSC Protection

In the last Newsletter of this series about protecting your assets and/or access paths, I want to tell you how SOFTWARE ENGINEERING GmbH’s Bind Impact Expert DSC Protection works.

In the previous Newsletter of the series (2014-01 Dynamic SQL 2014-02 Static SQL), we have seen how with RUNSTATS Rescue you can quickly and cheaply get back your old DB2 Catalog Access Path Statistics.

This is all well and good but then come the questions:

How will a RUNSTATS affect my Dynamic SQL?
What happens if I reset the statistics for this statement? What are the side effects?

DSC Protection was created in order to handle this. In a nut shell, it will:

Extract all needed DDL from production (Optional if it is already on test of course!)
Extract all needed catalog statistics from production
Run a “special” RUNSTATS (More on this baby later)
Extract all needed catalog statistics from the _HIST tables in production
Capture and Explain the DSC from production
Export DSC from production
Transfer all data to test
Now logoff from production and logon to test
Create the DDL on test (Optional)
Import the dynamic SQL
Update all statistics in test from the extracted _HIST data
Explain all SQL from production DSC
Compare the “new” Access Paths
If for a given DB.TS the access is only “improved”,create a DB.TS pair ready to send back to production for actual RUNSTATS
If for a given access path the result is “worsened” or “changed” then reset the statistics back to the actual production statistics and re-analyse
Keep going until no more access paths are worsened or changed
Transfer the RUNSTATS control cards to production
Logoff from test and logon to production
Run a “normal” RUNSTATS using the transferred control cards
You are done! Simple, huh?

We start at the Main Menu:

Select the DSC Protection scenario:

News 2014-03 DSC bild2

To begin use the primary command X to start the eXport chain of jobs:

News 2014-03 DSCbild3

Once you get to the blue line,

you must then run the “special” RUNSTATS.

What this does, is run the RUNSTATS but does not invalidate the DSC. We do not want to kill the good access paths, we want to see if a RUNSTATS will help us first *before* we really run the RUNSTATS! That is why the scenario is called DSC protection after all! It is there to protect your DSC from a mistimed RUNSTATS.

The HISTORY option must be ALL or ACCESSPATH

What is very important, is that the HISTORY option *must* be ALL or ACCESSPATH. This then fills the _HIST tables with data that we need later in the analysis. Here’s another “little” problem in the fact that there is no SYSTABLESPACE_HIST table! IBM “forgot it” years ago and it has never been created… This is why there is a “prepare extract” step, as we must create a VIEW which gives us a “fake” SYSTABLESPACE_HIST table as NACTIVEF is very good for the DB2 Optimizer if you ask me!

Once all of these steps have been done and you have file transferred all the needed files across to test, you can logoff from production and log onto the test machine.

Top of page

Here you select the DSC Protection scenario again:

News 2014-03 DSCbild4

Now you use I to Import the data:

Again you just run through the jobs a step at a time; DDL create is, of course, optional. Note that here we just apply the _HIST data – not the “current” statistics but the “future” statistics. When all of these steps are done you are then ready to use primary command N for a “new” run. Submit the job and wait for it to complete.

Once it has finished use Refresh to update the display:

News 2014-03 DSCbild6

and then you can use line command S to view the statements:

News 2014-03 DSCbild7

Here you see the overview and you can drill down to the statement level:

News 2014-03 DSCbild8

Note that on both these panels there is the R command for Reset statistics that simply adds the affected objects to an internal RUNSTATS reset table which at the entry panel can also be selected with an R:

News 2014-03 DSCbild9

Top of page

It then displays another pop-up with a short list of options:

News 2014-03 DSCbild10

Using the first option just shows you a list of the internal RUNSTATS table where you can see the Production and Test names of the objects (Of course you can rename all the extracted data)
and you may choose whether or not specific DB.TS pairs should be added to or removed from the table:

News 2014-03 DSCbild11

Once you have finished selecting your tables the next option should then look a little familiar to you all by now!

It uses the extracted statistics to build an update of the production data:

When it has finished and reset various statistics, you naturally want to re-run the EXPLAINs to see if the bad paths are now all gone, or perhaps even new worse ones have appeared! In my experience this never actually happens, but just to make absolutely sure, use the next option to re-generate the JCL and re-do the analysis. Now you can start all over again or you can see that all is ok. At this point you will now have a small list of DB.TS in the internal table that would actually give you guaranteed performance improvements if run on production.

Then you select the last option to see this:

News 2014-03 DSCbild13

Top of page

And these RUNSTATS should simply be executed on Production using the normal RUNSTATS utility JCL and with UPDATE ALL or ACCESSPATH. History no longer matters of course!

Now you are done! Easy peasy lemon squeezy! (Or Easy as pie if you prefer!) That, my dear readers, is why this is *not* a pocket tool and does *not* cost pocket money (Pin money).

As usual any questions or comments are welcome,
TTFN Roy Boxwell
Senior Software Architect

17/06/201422/06/2021

2014-04: Are you going to PIECEs?

This month I have a really nice – and rather long – technical newsletter to sink your teeth into!

The problem: Too low internal PIECE limit in DB2

The solution: Set of SQL queries for my newsletter readers to really find the bad guys

Create DBD diagnose cards
Cut and paste the control cards into some DB2 Utility JCL
o “F OBDINNUM” in the REPAIR job output
Automatic parse the millions of lines of REPAIR output and *only* outputs OBIDs

Whole SQL.txt JCL.txt REXX.txt

Reach your real maximum number of PIECEs!

The Problem: Too low internal PIECE limit in DB2

We start with a little APAR that IBM did way back in 2013:

2013 IBM APAR

PM75150: ALLOW MAXIMUM NUMBER OF PIECES FOR AN NPI OF LARGE TABLE SPACE, INDEPENDENT OF TABLE SPACE MAXIMUM NUMBER OF DATASETS

	Reported release	A10
	Status	CLOSED UR1
	PE	NoPE
	HIPER	NoHIPER
	Special Attention	NoSpecatt
	Submitted date	2012-10-16
	Closed date	2013-02-25
	Last modified date	2013-04-02

So far, so good. Looks like any run of the mill standard DB2 10 fix of “no real importance”… until you read the “small print” that is!

******************************************************************************************************************
USERS AFFECTED: All DB2 users of Non-Partitioned Index (NPI) on a large table space.
******************************************************************************************************************

PROBLEM DESCRIPTION:
For a large table space, the maximum number of pieces for an NPI is incorrectly inherited from the
maximum number of partitions of the table space. When inserting data to a table or creating the NPI,
the customer may receive the following message,

DSNT501I  DMBC DSNKINSL RESOURCE UNAVAILABLE  CORRELATION-ID=QFSBTC036
CONNECTION-ID=SERVER  LUW-ID=  GAF263F9.C741.120818101839=561408
REASON 00C9009D
TYPE 00000201

****************************************************************************************************************
RECOMMENDATION:
****************************************************************************************************************
For an NPI on a LARGE(5-byte RID) table space, the index maximum number of pieces is independent of the table space maximum number of partitions, it is calculated from index page size and piece size as the following:

Maximum NPI number of pieces = MINIMUM( 4096, 2^32 / (index piece size/index page size) )

In other words, DB2 set the internal piece limit way too low!
This is very bad news if you think you have 4,096 pieces but you really have only 256!

Now the APAR carries on with helpful hints and tips (Quoting here from the APAR text):

Top of page

1. run a query, such as the sample query below, to identify the databases, table spaces and indexes,

SELECT TS.DBNAME AS DB_NAME,
       TS.NAME AS TS_NAME,
       IX.NAME AS INDEX_NAME,
       IX.CREATOR AS IX_CREATOR,
       HEX(IX.OBID) AS INDEX_OBID,
       IX.CLOSERULE AS CLOSERULE
FROM   SYSIBM.SYSINDEXES IX,
       SYSIBM.SYSTABLES TB,
       SYSIBM.SYSTABLESPACE TS
WHERE  TB.NAME = IX.TBNAME AND
       TB.CREATOR = IX.TBCREATOR AND
       TB.TSNAME = TS.NAME AND
       TB.DBNAME = TS.DBNAME AND
       (TS.PARTITIONS > 254 OR
        TS.TYPE = 'L' OR
        TS.TYPE = 'K' OR
        TS.DSSIZE > 0) AND
        TS.PARTITIONS <> 0 AND
        (TS.STATUS = 'C' OR
        NOT ((IX.INDEXTYPE = 'P' OR
        IX.INDEXTYPE = 'D') AND
        TB.PARTKEYCOLNUM <>0)) AND
        TS.DBNAME <> 'DSNDB06'
ORDER BY IX.OBID;

+-------------------------------------------------------------+
|DB_NAME |TS_NAME |INDEX_NAME|IX_CREATOR|INDEX_OBID|CLOSERULE |
+-------------------------------------------------------------+
|DB161410|XTB10000|I_NODEIDXT|SC161410  |0056      |N         |
|DB161410|TU161410|I_DOCIDTB1|SC161410  |0058      |N         |
 ...
 
2. run REPAIR DBD DIAGNOSE against all databases from the above query.
 
For example REPAIR DBD DB161410. By searching OBDINNUM in the REPAIR DBD output, it has the following mismatch,
 
DSNU916I DSNUEDFT - OBDS DIFFER AT OFFSET X'00000052'
OBDINNUM - INFORMATION NOT IN DB2 CATALOG
DSNU904I DSNUEDFT - EXISTING VALUE  X'0100'
DSNU905I DSNUEDFT - REBUILT  VALUE  X'1000'
DSNU913I DSNUEDDR - OBD X'0056' COMPARISON COMPLETE -- 0 ERRORS
 
which confirms the index index I_NODEIDXT with OBID X'0056' has incorrect maximum number of index pieces.
 
3. fix these indexes found in the above repair DBD output by altering the CLOSE attribute of the identified indexes from their current value and then altering the CLOSE attribute back to their original value. The index name and index attribute information can be found in the query result at step 1.
 
for example, the index SC161410.I_NODEIDXT has the CLOSE attribute CLOSERULE = N. To correct the maximum number of index pieces for I_NODEIDXT, run the following DDL
 
   ALTER INDEX index SC161410.I_NODEIDXT CLOSE YES
   ALTER INDEX index SC161410.I_NODEIDXT CLOSE NO

So what I decided to do, was to create a set of queries for my newsletter readers to make this process a tad easier to really find the bad guys!

Top of page

The solution: Set of SQL queries for my newsletter readers to really find the bad guys

Whole SQL.txt JCL.txt REXX.txt

Create DBD diagnose cards

First step is nearly the same SQL as in 1. Above:

-- CREATE DBD DIAGNOSE CARDS FOR ANY DATABASES THAT ARE CANDIDATES  
SELECT DISTINCT SUBSTR('  REPAIR OBJECT LOG YES DBD DIAGNOSE DATABASE '
CONCAT STRIP(TS.DBNAME) , 1 , 54 )              
FROM SYSIBM.SYSINDEXES    IX,                                        
     SYSIBM.SYSTABLES     TB,
     SYSIBM.SYSTABLESPACE TS                                       
WHERE TB.NAME    = IX.TBNAME                                         
   AND TB.CREATOR = IX.TBCREATOR                                      
   AND TB.TSNAME  = TS.NAME                                           
   AND TB.DBNAME  = TS.DBNAME                                         
   AND (TS.PARTITIONS > 254                                           
     OR TS.TYPE       = 'L'                                           
     OR TS.TYPE       = 'K'                                           
     OR TS.DSSIZE     > 0)                                            
   AND TS.PARTITIONS <> 0                                             
   AND (TS.STATUS = 'C'                                               
     OR NOT ((IX.INDEXTYPE = 'P'                                      
           OR IX.INDEXTYPE = 'D')                                     
        AND TB.PARTKEYCOLNUM <> 0))                                  
   AND TS.DBNAME <> 'DSNDB06'                                         

;

It returns this style of output (If you have *no* output then congratulations you have *no* problem!):

REPAIR OBJECT LOG YES DBD DIAGNOSE DATABASE A140XO82
REPAIR OBJECT LOG YES DBD DIAGNOSE DATABASE ANKDB1  
REPAIR OBJECT LOG YES DBD DIAGNOSE DATABASE ANKDB3
.
.
.

Top of page

Cut and paste these control cards into some DB2 Utility JCL

Now you must cut and paste these control cards into some DB2 Utility JCL and get them all run.

The next query lists out all the index data you need from the above databases:


-- NOW LIST INDEX OBID'S ON THE ABOVE DATABASES THAT MUST BE SEARCHED
-- FOR IN THE REPAIR SYSOUT BY USING "F OBDINNUM" AND CHECKING IF THE
-- OBID'S MATCH                                                 
SELECT SUBSTR(TS.DBNAME , 1 , 8)       AS DB_NAME,                
       HEX(IX.OBID)                    AS INDEX_OBID,                
       SUBSTR(STRIP(IX.CREATOR) CONCAT '.' CONCAT                    
              STRIP(IX.NAME) , 1 , 71) AS INDEX                       
       FROM SYSIBM.SYSINDEXES    IX,                                      
       SYSIBM.SYSTABLES     TB,                                      
       SYSIBM.SYSTABLESPACE TS                                       
  WHERE TB.NAME    = IX.TBNAME                                       
    AND TB.CREATOR = IX.TBCREATOR                                    
    AND TB.TSNAME  = TS.NAME                                         
    AND TB.DBNAME  = TS.DBNAME                                       
    AND (TS.PARTITIONS > 254                                         
      OR TS.TYPE       = 'L'                                         
      OR TS.TYPE       = 'K'                                         
      OR TS.DSSIZE     > 0)                                          
    AND TS.PARTITIONS <> 0                                           
    AND (TS.STATUS = 'C'                                             
      OR NOT ((IX.INDEXTYPE = 'P'                                    
            OR IX.INDEXTYPE = 'D')                                   
          AND TB.PARTKEYCOLNUM <> 0))                                
    AND TS.DBNAME <> 'DSNDB06'                                       
  ORDER BY 1 , 2 
;

The output looks like:

---------+---------+---------+---------+----------
 DB_NAME   INDEX_OBID  INDEX 
---------+---------+---------+---------+---------
 A140XO82  000C        USER001.RESB~0
 ANKDB1    000A        ANKDB1.ANKIX11
 ANKDB3    0008        ANKDB3.ANKIX3 . . .

Do “F OBDINNUM” in the REPAIR job output

Now comes the really horrible bit – you must now do “F OBDINNUM” in the REPAIR job output and see if you have any matches for the above indexes in the relevant database. Again, if you have none then congratulations, you have no problem!

If you find data like this (just as in the IBM APAR description):

DSNU916I DSNUEDFT - OBDS DIFFER AT OFFSET X'00000052'
OBDINNUM - INFORMATION NOT IN DB2 CATALOG
DSNU904I DSNUEDFT - EXISTING VALUE  X'0100'
DSNU905I DSNUEDFT - REBUILT  VALUE  X'1000'
DSNU913I DSNUEDDR - OBD X'0056' COMPARISON COMPLETE -- 0 ERRORS

Top of page

then you must now do the next step to generate the corrective ALTER flip-flops:

-- CREATE DRIVER TABLE FOR CARTESIAN JOIN PROCESSING     
DECLARE GLOBAL TEMPORARY TABLE DRIVER (NUMBER SMALLINT) ;
INSERT INTO SESSION.DRIVER VALUES 1 ;                    
INSERT INTO SESSION.DRIVER VALUES 2 ;                    
INSERT INTO SESSION.DRIVER VALUES 3 ;                    
INSERT INTO SESSION.DRIVER VALUES 4 ;                    
INSERT INTO SESSION.DRIVER VALUES 5 ;                    
INSERT INTO SESSION.DRIVER VALUES 6 ;                    
-- NOW GENERATE CORRECTIVE CLOSE FLIPS ONE FOR NO -> YES           
-- AND ONE FOR YES -> NO                                           
-- NO -> YES CLOSERULE FLIP CONTROL CARDS                          
WITH T1 ( IXNAME                                                   
        , DBNAME                                                   
        , OBID)                                                    
     AS ( SELECT STRIP(IX.CREATOR) CONCAT '.' CONCAT STRIP(IX.NAME)
               , TS.DBNAME                                         
               , IX.OBID                                           
          FROM SYSIBM.SYSINDEXES IX,                               
               SYSIBM.SYSTABLES TB,                                
               SYSIBM.SYSTABLESPACE TS                             
          WHERE TB.NAME    = IX.TBNAME                             
            AND TB.CREATOR = IX.TBCREATOR                          
            AND TB.TSNAME  = TS.NAME                               
            AND TB.DBNAME  = TS.DBNAME                             
            AND (TS.PARTITIONS > 254                               
              OR TS.TYPE       = 'L'                               
              OR TS.TYPE       = 'K'                               
              OR TS.DSSIZE     > 0)                                
            AND TS.PARTITIONS <> 0                                 
            AND (TS.STATUS = 'C'                                   
              OR NOT ((IX.INDEXTYPE = 'P'                          
                    OR IX.INDEXTYPE = 'D')                         
                  AND TB.PARTKEYCOLNUM <> 0))                      
            AND TS.DBNAME <> 'DSNDB06'                             
            AND IX.CLOSERULE = 'N'                                 
          ORDER BY TS.DBNAME , HEX(IX.OBID)                        
        )                                                          
SELECT CAST(CASE NUMBER              
            WHEN 1 THEN 'ALTER INDEX'
            WHEN 2 THEN IXNAME       
            WHEN 3 THEN 'CLOSE YES $'
            WHEN 4 THEN 'ALTER INDEX'
            WHEN 5 THEN IXNAME       
            WHEN 6 THEN 'CLOSE NO  $'
            END AS CHAR(72))         
FROM T1, SESSION.DRIVER              
ORDER BY DBNAME, OBID, NUMBER        
;

Which generates the first set of flip-flops from YES -> NO:

ALTER INDEX                               
D9999TOP.BADBIGIX                         
CLOSE YES $                               
ALTER INDEX                               
D9999TOP.BADBIGIX                         
CLOSE NO  $                               
ALTER INDEX                               
D9999TOP.BADBIGIB                         
CLOSE YES $                               
ALTER INDEX                               
D9999TOP.BADBIGIB                         
CLOSE NO  $

And then the next flip-flop:

-- CREATE DRIVER TABLE FOR CARTESIAN JOIN PROCESSING     
DECLARE GLOBAL TEMPORARY TABLE DRIVER (NUMBER SMALLINT) ;
INSERT INTO SESSION.DRIVER VALUES 1 ;                    
INSERT INTO SESSION.DRIVER VALUES 2 ;                    
INSERT INTO SESSION.DRIVER VALUES 3 ;                    
INSERT INTO SESSION.DRIVER VALUES 4 ;                    
INSERT INTO SESSION.DRIVER VALUES 5 ;                    
INSERT INTO SESSION.DRIVER VALUES 6 ;                    
-- YES -> NO CLOSERULE FLIP CONTROL CARDS                          
WITH T1 ( IXNAME                                                   
        , DBNAME                                                   
        , OBID)                                                    
     AS ( SELECT STRIP(IX.CREATOR) CONCAT '.' CONCAT STRIP(IX.NAME)
               , TS.DBNAME                                         
               , IX.OBID                                           
          FROM SYSIBM.SYSINDEXES IX,                               
               SYSIBM.SYSTABLES TB,                                
               SYSIBM.SYSTABLESPACE TS                             
          WHERE TB.NAME    = IX.TBNAME                             
            AND TB.CREATOR = IX.TBCREATOR                          
            AND TB.TSNAME  = TS.NAME                               
            AND TB.DBNAME  = TS.DBNAME                             
            AND (TS.PARTITIONS > 254                               
              OR TS.TYPE       = 'L'                               
              OR TS.TYPE       = 'K'                               
              OR TS.DSSIZE     > 0)                                
            AND TS.PARTITIONS <> 0                                 
            AND (TS.STATUS = 'C'                                   
              OR NOT ((IX.INDEXTYPE = 'P'                          
                    OR IX.INDEXTYPE = 'D')                         
                  AND TB.PARTKEYCOLNUM <> 0))                      
            AND TS.DBNAME <> 'DSNDB06'                             
            AND IX.CLOSERULE = 'Y'                                 
          ORDER BY TS.DBNAME , HEX(IX.OBID)                        
        )                                                          
SELECT CAST(CASE NUMBER              
            WHEN 1 THEN 'ALTER INDEX'
            WHEN 2 THEN IXNAME       
            WHEN 3 THEN 'CLOSE NO  $'
            WHEN 4 THEN 'ALTER INDEX'
            WHEN 5 THEN IXNAME       
            WHEN 6 THEN 'CLOSE YES $'
            END AS CHAR(72))         
FROM T1, SESSION.DRIVER              
ORDER BY DBNAME, OBID, NUMBER        
;

Which generates NO -> YES format:

ALTER INDEX                               
USER001.RESB~0                            
CLOSE NO  $                               
ALTER INDEX                               
USER001.RESB~0                            
CLOSE YES $

Top of page

Note that the above ALTERs use a $ as terminator, so you must either use a C ALL $ ; style ISPF command or just use a –#SET TERMINATOR $ line in the SPUFI.

Automatic parse of the millions of lines of REPAIR output and only outputs OBIDs

At this point, when even I had a HUGE list of candidates, I decided to automate it even further so I wrote a little REXX that parses the millions of lines of REPAIR output and *only* outputs OBIDs for real bad guys. That way you can see what you really have to do very quickly. Here’s how my output looks:

Began checking at: 2014-02-11-12.34.08.752641
Database A140XO82 started.                   
Database A140XO82 ended.                     
Database ANKDB1   started.                   
Database ANKDB1   ended.                     
Database ANKDB3   started.                   
Database ANKDB3   ended.                     
Database D9999TOP started.                   
OBID X'0004' is X'0100' rebuilt X'1000'      
OBID X'0006' is X'0100' rebuilt X'1000'      
OBID X'0008' is X'0100' rebuilt X'1000'      
OBID X'000A' is X'0100' rebuilt X'1000'      
OBID X'000C' is X'0100' rebuilt X'1000'      
OBID X'000E' is X'0100' rebuilt X'1000'      
OBID X'0010' is X'0100' rebuilt X'1000'      
Database D9999TOP ended.                     
Database DATAPBG2 started.                   
Database DATAPBG2 ended.                     
.
.
.

Here you can see that my first actual problem database is the D9999TOP. When I then plug that into my flip flop generator by using cut-and-paste, for both cases, like this:

AND TB.DBNAME  = 'D9999TOP' 
  AND HEX(IX.OBID) IN (       
                  '0004'      
                , '0006'      
                , '0008'      
                , '000A'      
                , '000C'      
                , '000E'      
                , '0010'      
                 )

I get the following output:

ALTER INDEX      
D9999TOP.BADBIGIA
CLOSE YES $      
ALTER INDEX      
D9999TOP.BADBIGIA 
CLOSE NO  $      
ALTER INDEX      
D9999TOP.BADBIGIB 
CLOSE YES $      
ALTER INDEX      
D9999TOP.BADBIGIB 
CLOSE NO  $      
ALTER INDEX      
D9999TOP.BADBIGIC
CLOSE YES $      
ALTER INDEX      
D9999TOP.BADBIGIC
CLOSE NO  $      
ALTER INDEX      
D9999TOP.BADBIGID
CLOSE YES $      
ALTER INDEX      
D9999TOP.BADBIGID
CLOSE NO  $      
ALTER INDEX      
D9999TOP.BADBIGIE
CLOSE YES $      
ALTER INDEX      
D9999TOP.BADBIGIE
CLOSE NO  $      
ALTER INDEX      
D9999TOP.BADBIGIF
CLOSE YES $      
ALTER INDEX      
D9999TOP.BADBIGIF
CLOSE NO  $      
ALTER INDEX      
D9999TOP.BADBIGIG
CLOSE YES $      
ALTER INDEX      
D9999TOP.BADBIGIG
CLOSE NO  $

Top of page

Reach your real maximum number of pieces!

There was no output from the other query, so once these ALTERs ran my work on this database was done and I could happily move onto the next.

This is much quicker than searching through REPAIR sysout! The REXX should be copied to a PROC library and called NPI. Then using the JCL (which must tailored for your site, of course!) to execute it – and Bob’s your uncle!

Once all of the ALTERs have been executed then, finally, all of your NPIs can actually reach their real maximum number of pieces!
This month I also used (stole!) the neat Cartesian Join idea from Marcus Davage’s blog: http://spufidoo.wordpress.com/2013/02/28/judicious-cartesian-joins/

that was posted on Listserv and everyone loved it instantly!

As usual any questions or comments are welcome,
TTFN Roy Boxwell
Senior Software Architect

16/06/201422/06/2021

2014-05: Why SIZE matters for DB2

“Are your space management and monitoring tools up-to-date, or are they still DB2 V7?”

IFCIDs : Instant monitoring of secondary allocations

DB2 V7 : Space management without “the IBM algorithm”
DB2 V8 : Running out of datasets & Degenerated extents
DB2 9 : UTS spaces introduction & PBG’s problem
DB2 10 : ALTER PBGs
DB2 11 : Remove any empty parts in PBGs

SAX : Space management processing

Since DB2 V8, the DBAs of the world have all had the ability to forget about PQTY and SQTY in the DDL for Tablespaces and Indexes.

At first, nearly no-one trusted the sliding scale algorithm, and SOFTWARE ENGINEERING’s product Space AssuranceExpert (aka SAX) monitored and reacted instantly to secondary allocations.

However, we now have DB2 11 and I thought it would be interesting to review what was done in DB2 V7 (when our SAX was launched), and the difference nowadays in the DB2 11 world.

IFCIDs: Instant monitoring of secondary allocations

In DB2, every time a secondary allocation is done, DB2 can be made to spit out an IFCID. SAX runs as a started task, active 24×7, from DB2 start up until just before DB2 shut down.

It catches all of these IFCIDs thrown by DB2 and performs an analysis with five basic questions:

Can this dataset reach its maximum physical size *before* running out of physical extents? (The actual size is dependent on the “geometry” of the object of course!)
Will this object run out of datasets? (The number of datasets an object can have is, once again, dependent on the “geometry” of the object)
Is this partition nearing its maximum size?
Did DB2 ask for one extent but got more back?
Are any of my SMS disk storage pools running out of space?

DB2 V7 : Space management without “the IBM algorithm”

Remember, back in those old days of DB2 V7, we only had 255 extents and 254 partitions, but datasets could still get pretty big pretty fast.

The problem lots of shops had, was that an important dataset would “hit the buffers” of maximum number of extents *way* before it ever ran out of physical space. Thus causing grief, wailing and gnashing of teeth!

SAX stopped all this by giving WTO “heads-up” style messages in two flavors.

– First a warning message,
– and then a critical message.

This gave DBAs and space managers much needed time to plan for the outage and the, inevitably, long running REORG to actually action the required ALTER, or perhaps even any DROP/RECREATE that had to be done.

DB2 V8 : Running out of datasets & Degenerated extents

IBM also noticed this problem and so introduced in DB2 V8 the “sliding scale” of secondary allocations, as long as the ZPARM OPTIMIZE EXTENT SIZING field (MGEXTSZ) was set to YES (this is the default from DB2 9, by the way).

Of course, to really use this, you then had to ALTER all of the existing spaces PQTY and SQTY to be -1, and then remember to delete all PRIQTY and SECQTY lines in your DDL and also rely on the TSQTY and IXQTY ZPARMs giving a big enough “first default”. (By the way, defaults for these two ZPARMS are 0, which is actually translated to be 720k or one cylinder for normal spaces and 7200k or 10 cylinders for LOB spaces). This all probably explains why the take up of this great feature has not been that spectacular and, in fact, Listserv *still* gets questions about “How good is this feature?”

This also explains why the primary reason for having SAX is still valid at most shops today!

Running out of datasets

Now the problem of running out of datasets is very, very evil indeed… For a non-partitioned space, you can have up to 32 datasets, and DB2 will happily allocate away and you will never know, or even be informed, if, and when, the last possible dataset has just been allocated and, of course, you will not know that the 33^rd one cannot be allocated until you get a -904 unavailable resource error!

By definition this is “not good”, as you must do a HUGE REORG with a bunch of managers breathing down your neck and *not* make any mistakes with the new allocations. (Again, this is a very good reason to have SAX doing all the monitoring and triggering early warning “heads-up” style messages!)

A partition running out of space is rare, but when it does happen it is, of course, a disaster! The idea in SAX, is to warn when the end is near for any of the partitions in use and thus, as before, allow time for the ALTER etc.

Top of page

Degenerated extents

Degenerated extents are annoying as well. You have 255 extents, DB2 requests one but gets up to five back! This is “wasting” your precious supply of extents and so SAX can also warn you if this starts happening, and remedial action can again be planned to correct the problem, (normally a volume defrag in this case).

Now in z/OS 1.7 “Extent Constraint Removal” was introduced for the DATACLAS which, if set to “Y”, allows 7,257 extents but still limits you to 123 extents per volume and 59 volumes, so watch out if you are using huge “virtual” disks (E.g. MOD 54 or EAV), as you can end up wasting space as you still cannot exceed 123 extents per volume.

Finally, when an SMS Pool runs out of space, either for sort/work or image copy, it is *not* good! The idea here, is to also give a “heads-up” style alert so that the DBA can trigger the space management people to have a look at the state and size of the SMS storage groups this time alerted by percentage used or GBs of space free.

DB2 V8 introduced a big change – Partitions went up to a maximum of 4,096, and the calculation about how many pieces your NPI can have got “a little bit complex” (see also my previous newsletter: “Are you going to PIECEs” here!).

DB2 9: UTS spaces introduction & PBG’s problem

In DB2 9 the next major advance came with UTS spaces, and the one that caused the most grief was, of course, PBG. Why?

Well, the first problem was that some people went mad and used MAXPARTITIONS 4096 right from the get-go which they then found out could not simply be changed and ended up being a huge problem.

IBM came out with a bunch of fixes for these people, but the recommendation is still true today: “Use the number you expect to use!”

PBGs, however, came with a new set of space management Problems:

By definition every partition is full, and so a TP REORG is “dangerous” -especially if you have VARCHAR, and even more so if compressed.
ALTER at TP level is not supported for PBG.
Getting rid of empty partitions was not supported.
Adding partitions dynamically (by command) was not supported.
What to do if the partition that is “in use” is growing and is
The last allowed partition
MAXPARTITIONS is set to one?

Now these are “non trivial” because the DB2 catalog is so defined and you would not want an alert every time someone created a table or index!
The trick here, is to treat these conditions as if it was a normal space and so, instead of warning that you are using the last part, it waits until you are using e.g. 80% of that part and then e.g. at 90% comes the critical threshold warning.

DB2 10: ALTER PBGs

With DB2 10 came the ability to ALTER PBGs to add parts which made using DSN1COPY to clone data around a lot better!

DB2 11: Remove any empty parts in PBGs

Further, in DB2 11, the REORG utility can be used to remove any empty parts in PBGs by the use of the ZPARM REORG_DROP_PBG_PARTS being set to ENABLE (DISABLE is the default).

Top of page

Space Management processing

The SAX tool way of processing all this info is neatly summarised in the help panel of the tool itself:

AUDIT SMS STOGROUPS
Should the Space AssuranceExpert audit SMS stogroups. Y/N
If Y is entered, a pop-up window will allow you to enter up
to 24 SMS storage groups which will be audited.
If WARN IF % ALLOC > or WARN IF GB FREE < is specified and
exceeded, a warning (WTO) will be issued.

AUDIT DEGENERATED XTS
Audit secondary quantity for de-generated extents. If
specified, a warning is issued in case of the last extent
does not reach this percentage of the SECQTY specified
in the DB2 catalog. If this field is left blank, no
auditing is performed.

SUPERVISE LPS
Supervise linear pagesets. If specified, a warning is issued
in case of high allocated reaches this percentage of the
maximum data set size for partitioned objects.

For non-partitioned objects, a warning is issued for every
newly allocated data set as soon as the data set number
reaches this percentage of the maximum number of data sets:

Two different values may be entered for warning and critical
values with different message ids. This may be useful for
automation reasons (see below).

Object type: TABLESPACE      ! Maximum number of data sets
-----------------------------+----------------------------
LOB tablespaces              ! 254
-----------------------------+----------------------------
Non-partitioned tablespaces  ! 32
-----------------------------+----------------------------
Partitioned tablespaces      ! 1 (Percent used check)
-----------------------------+----------------------------
Partitioned By Growth        ! MAXPARTITIONS. LPS check if
tablespaces                  ! more than one. If on last
                             ! partition then percent used.
-----------------------------+----------------------------

Object type: INDEX           ! Maximum number of data sets
-----------------------------+----------------------------
Non-partitioned indexes on   ! MIN ( 4096 , 2 power 32 /
tablespace with LARGE,       ! ( DSSIZE / TS PGSIZE))
DSSIZE, or more than 64      ! Eg: 128 GB DSSIZE with
Partitions                   ! 8 KB Tablespace Page
                             ! gives 256 Pieces (datasets)
                             ! Or 4 GB DSSIZE with
                             ! 4 KB Tablespace Page
                             ! gives 4096 Pieces (datasets)
-----------------------------+----------------------------
Non-partitioned indexes      ! 32
otherwise                    !
-----------------------------+----------------------------
Partitioned indexes          ! 1 (Percent used check)
-----------------------------+----------------------------

To support automation based on WTO ids two different
thresholds may be specified:
Field (1) specifies a warning threshold using WTO ids
O2RTSU04 - 12W (non-partitioned spaces)
O2RTSU04 - 14W (partitioned spaces)
O2RTSU04 - 16W (partition by growth spaces)

Field (2) specifies a critical threshold using WTO ids
O2RTSU04 - 13W (non-partitioned spaces)
O2RTSU04 - 15W (partitioned spaces)
O2RTSU04 - 17W (partition by growth spaces)

So now you know why size matters for DB2!

The big question now is:
“Are your space management and monitoring tools up-to-date, or are they still DB2 V7?”

Top of page

As usual any questions or comments are welcome,
TTFN Roy Boxwell
Senior Software Architect

DB2 11. Here’s my personal list of *brilliant* down to *why*?

Brilliant

Great

Useful

Why?

And last but not least

In DB2 10 you can now define LOBs to also be “inline”.

In the DB2 10 Performance guide Chapter 4.3 there is a nice list of benefits:

Obviously there are a few “considerations”:

An example SQL from the DB2 Performance chapter 4.3

How to manage a REORG without just REORGing everything from DB0 to DBZ ?

SYSCOPY’s handling

RBA & LRSN extensions

Part 1: Dynamic SQL

The story

What really happened?

How to fix the problem quickly and easily?

RUNSTATS Rescue

Now, of course, the question is: “How do I know which statistics to use as the Rescue statistics?”

10 Rescue Steps

Here’s a walkthrough example of how it looks in real life:

1 – Select the new scenario RUNSTATS Rescue in the little pop-up window with the three steps

2 – Generate some JCL

3 – Optionally copy to a GENGROUP dataset

4 – Insert the EXPLAIN TABLE-CREATOR and QUERYNO

5 – Automatic launch of our catalog browser

6 – Drill down to the Index Level

7 – Ask “new” file name for the extracted “rescue” statistics

8 – Perform the RUNSTATS Rescue extraction

9 – Reset the statistics and executes the RUNSTATS

10 – The “Rescued” Statistics

Next Month

Bad Data Day

Duplicates found in a migration Precheck

The “bug”

PK33517: COLGROUPCOLNO ASSOCIATED WITH SINGLE COLUMN CONTAINS A NUMERIC VALUE INSTEAD OF ZERO LENGTH FIELD ACCORDING TO SQL REF

FIX the “bug”

Part 2: the Story continues for Static SQL…

Short reminder of part 1: rescue Dynamic SQL

The Story continues for Static SQL

Rescue Static & Dynamic SQL

Rescue “pure” Static SQL

Two prepare options

A REBIND step instead of a RUNSTATS step

An errant Access path onto the good path

Next month

Part 3: DSC Protection

How will a RUNSTATS affect my Dynamic SQL?

What happens if I reset the statistics for this statement? What are the side effects?

We start at the Main Menu:

Select the DSC Protection scenario:

To begin use the primary command X to start the eXport chain of jobs:

you must then run the “special” RUNSTATS.

The HISTORY option *must* be ALL or ACCESSPATH

Here you select the DSC Protection scenario again:

Now you use I to Import the data:

Once it has finished use Refresh to update the display:

and then you can use line command S to view the statements:

Here you see the overview and you can drill down to the statement level:

It then displays another pop-up with a short list of options:

It uses the extracted statistics to build an update of the production data:

Then you select the last option to see this:

The Problem: Too low internal PIECE limit in DB2

The solution: Set of SQL queries for my newsletter readers to really find the bad guys

Create DBD diagnose cards

Cut and paste these control cards into some DB2 Utility JCL

Do “F OBDINNUM” in the REPAIR job output

Automatic parse of the millions of lines of REPAIR output and *only* outputs OBIDs

Reach your real maximum number of pieces!

IFCIDs: Instant monitoring of secondary allocations

DB2 V7 : Space management without “the IBM algorithm”

DB2 V8 : Running out of datasets & Degenerated extents

Running out of datasets

Degenerated extents

DB2 9: UTS spaces introduction & PBG’s problem

DB2 10: ALTER PBGs

DB2 11: Remove any empty parts in PBGs

Space Management processing

DB2 11. Here’s my personal list of brilliant down to why?

The HISTORY option must be ALL or ACCESSPATH

Automatic parse of the millions of lines of REPAIR output and only outputs OBIDs