Roy Boxwell, Autor bei SOFTWARE ENGINEERING

20. Juli 202220. Juli 2022

2022-07 IBM problem data requests…

Most of us have been there … something somewhere goes wrong … things are checked, changes are undone, tests are re-run and in the end you have no idea why a failure happens.

Who You Gonna Call?

Yep, it is time to open a Case at IBM technical support … So you open a Case and you type in as much detail as possible about when and what happened but it is *never* enough! In the world of Db2, the first question that *always* comes back is „Please supply us with further information“, like:

SYSLOG
Master Log
MEPL
Detailed EREP
Complete SVC dump

WTF? („What’s That For“ before anyone complains)

SYSLOG

The syslog is the console of a z/OS system and any and all interesting, and sometimes not so interesting, messages from *all* running „things“ are in here – it is normally enormous! The problem begins when IBM Technical Support asks „please provide us with the SYSLOG from 06:00 to 06:30 on the day of the event“.

SDSF

SDSF is your friend here and I really mean it! All you do is go to SDSF and then enter primary command LOG. From this panel you enter three primary commands, one after another, and you are done!

PT ODSN ‚your.dataset.name‘ * NEW
PT 06.00.00 22/06/2022 06.30.00 22/06/2022
PT CLOSE

That is it! Your dataset will then just have the data from between those times. This is *extremely* handy! Note that the date format is locale-dependent and, as I am in Europe, we have DD/MM/YYYY. I am sure you know your own date format!

Master LOG

This is the first SDSF dataset in your ssidMSTR STC. So, once more in SDSF, using *MSTR as a prefix and then putting line command ? next to the sub-system in question shows you three DDNAMEs. The first one, JESMSGLG, is the one they normally need. Here you use line command XDC to get an SDSF Open Print Data Set window:

xxxxMSTR STC09394           SDSF Open Print Data Set                         
COMMAND INPUT ===>                                         SCROLL ===> CSR
                                                                             
                                                                             
Data set name  ===> 'xxxxxxx.SYSLOG.PRINT'                                   
Member to use  ===>                                                          
Disposition    ===> NEW        (OLD, NEW, SHR, MOD)                          
                                                                             
Management class     ===>           (Blank for default management class)     
Storage class        ===>           (Blank for default storage class)        
  Volume serial      ===>           (Blank for authorized default volume)    
  Device type        ===>           (Generic unit or device address)         
Data class           ===>           (Blank for default data class)           
  Space units        ===> CYLS      (BLKS, TRKS, CYLS, BY, KB, or MB)        
  Primary quantity   ===> 19        (In above units)                         
  Secondary quantity ===> 19        (In above units)                         
  Directory blocks   ===>           (Zero for sequential data set)           
  Record format      ===> FBA                                                
  Record length      ===> 121                                                
  Block size         ===>                                                    
Data set name type   ===>           (LIBRARY, blank, ... See Help for more)  
Extended attributes  ===>           (NO, OPT, or blank)

Here you can see I choose type FBA, LRECL 121 and a disposition of NEW for a new dataset. Hit ENTER and SDSF tells you how many lines it just wrote to that file:

PRINT CLOSED  23025 LINE

View the file and max down to the bottom:

023019 0------ JES2 JOB STATISTICS ------        
023020 -  17 MAY 2022 JOB EXECUTION DATE         
023021 -            2 CARDS READ                 
023022 -       28,616 SYSOUT PRINT RECORDS       
023023 -            0 SYSOUT PUNCH RECORDS       
023024 -        3,099 SYSOUT SPOOL KBYTES        
023025 -    50,488.70 MINUTES EXECUTION TIME

So we know we are in the correct file! Here you can do some updating of „sensitive“ data like IP address, User Name etc. Remember to just change the data, not blindly delete it! Naturally, you can delete stuff *after* the event of interest and probably a ton of stuff from *before* but be careful what you delete!

MEPL

Say what? MEPL is the Module Entry Point List and IBM need it to see which PTFs and APARs have been applied in the application address space and the Db2 system. To get a MEPL I use a normal Utility job jcl with DIAGNOSE and a DISPLAY MEPL like this:

//MEPL     EXEC PGM=DSNUTILB,REGION=32M,       
//         PARM=(ssss,'DIAGNOSEMEPL')          
//STEPLIB  DD DISP=SHR,DSN=DSNsss.SDSNEXIT.ssss
//         DD DISP=SHR,DSN=DSNsss.SDSNLOAD     
//CEEDUMP  DD SYSOUT=*                         
//SYSUDUMP DD SYSOUT=*                         
//SYSPRINT DD SYSOUT=*                         
//SYSIN    DD *                                
 DIAGNOSE                                      
    DISPLAY MEPL                               
 DIAGNOSE END                                  
/*

This will output the MEPL to SYSPRINT which starts like this:

DSNU000I    173 12:57:08.82 DSNUGUTC - OUTPUT START FOR UTILITY, UTILID = DIAGNOSEMEPL                               
DSNU1044I   173 12:57:08.83 DSNUGTIS - PROCESSING SYSIN AS EBCDIC                                                      
DSNU050I    173 12:57:08.83 DSNUGUTC -  DIAGNOSE DISPLAY MEPL                                                          
DSNU861I    173 12:57:08.84 DSNUDIAG - DISPLAY MEPL FOR SUBSYSTEM xxxx                                                 
    0000 20B92820 C2C5D7D3 0140D4C5 D7D360D3  C9D2C540 C6D6D940 C4E2D5E4 E3C9D3C2    *....BEPL. MEPL-LIKE FOR DSNUTILB*
    0020 28100000 C4E2D5C1 C1404040 F0F761F1  F461F1F6 E4C9F3F9 F3F9F340 00000000    *....DSNAA   07/14/16UI39393 ....*
    0040 28100100 C4E2D5C1 D7D9C840 F1F261F2  F361F1F5 F1F34BF4 F6404040 00000000    *....DSNAPRH 12/23/1513.46   ....*
    0060 28100200 C4E2D5C6 D4D5C6D4 F1F061F1  F761F1F8 E4C9F5F8 F8F4F040 00000000    *....DSNFMNFM10/17/18UI58840 ....*
    0080 28100240 C4E2D5C6 D7D4E2C7 F1F061F1  F761F1F8 E4C9F5F8 F8F4F040 00000000    *... DSNFPMSG10/17/18UI58840 ....*
.
.
.

It is quite long! Here in my test system nearly 5000 lines are written to SYSPRINT. Then, like with the ssidMSTR, I use ? against the job and then XDC against the SYSPRINT DD card this time to create another file with type FBA and LRECL 133 to get your.mepl.list.

Detailed EREP

Now it gets interesting… The EREP (Environmental Record Editing and Printing Program) is the API to the system LOGREC dataset where all „events of interest“ on a z/OS LPAR are recorded. It contains far less than the console log but is a treasure trove of data for the IBM Technical Support.

Here’s my job to simply do a Detailed EREP as per IBM standards:

//*------------------------------------------------------------------*/
//*  EREP: DETAILED REP PRINT                                        */
//*------------------------------------------------------------------*/
//EREP     EXEC PGM=IFCEREP1,PARM='CARD'                               
//SERLOG   DD DISP=SHR,DSN=xxxxxxxx.LOGREC                            
//DIRECTWK DD DISP=(NEW,DELETE),UNIT=SYSDA,SPACE=(CYL,(50,50))         
//EREPPT   DD SYSOUT=*,DCB=BLKSIZE=133                                 
//TOURIST  DD SYSOUT=*,DCB=BLKSIZE=133                                 
//SYSIN    DD *                                                        
ACC=N                                                                  
HIST=N                                                                 
ZERO=N                                                                 
PRINT=PS                                                               
TYPE=S                                                                 
/*                                                                     
//* IF REQUIRED YOU CAN ADD DATE, TIME RANGES TO FILTER DOWN           
//* WITHIN THE SYSIN LIKE:                                             
//* DATE=(YYDDD,YYDDD)                                                 
//* TIME=(HHMM-HHMM)

Do not forget to give your LOGREC DSN for the SERLOG DD. Most of the time I get just a few rows of output and then create another file using XDC from the EREPPT DD name but this time with type FB and LRECL 133 to get the.erep.list

Complete SVC Dump

If your Db2 system receives a dump, for whatever reason, it normally writes out an SVC dump to a special dataset that can be used to analyze what went wrong. It is very important that the SVC dump is complete and *not* partial …

Default Size

The default size is only 500MB which is way too small for a halfway decent production Db2 sub-system these days. It must normally be increased to at least 16000MB. To change this you issue a console command like:

CHNGDUMP SET,SDUMP,MAXSPACE=16000M

But make sure you have enough local page datasets space to handle your normal load PLUS the size of this dump dataset…auxilliary swapping (paging) while dumping is a painfully slow experience you do not want to suffer!

If successful, the SVC dump will be COMPLETE and then you are nearly done …

File Transfer

Most of the files I have described so far are quite small but the SVC dump is a monster. You must TERSE it using JCL like:

//AMATERSE  EXEC PGM=AMATERSE,PARM='SPACK'
//SYSPRINT  DD SYSOUT=*                   
//SYSUT1    DD DISP=SHR,                  
//             DSN=xxxxxxxx.xxxxxxxx       
//SYSUT2    DD DISP=(,CATLG),UNIT=SYSDA,             
//             DSN=xxxxxxxx.xxxxxxxx.TRS,  
//             SPACE=(CYL,(99,99),RLSE)

I use the SPACK parameter which is, according to the documentation, much better at compression than the PACK parameter. Fun factoid of the day: SPACK is the „complex“ format whereas PACK is the „simple“ format – Gotta love IBM for that! IBM do prefer the TERSE style of compression, and please do *not* change the file ending! Then doing a ZIP has no real bonus and just confuses the automatic systems at IBM. Leave „.TRS“ at the end and they know it has been TERSED.

Then download the xxxxx.xxxxx.TRS file as BINARY to the PC and all the other files as TEXT to the PC. Then simply upload by drag-and-drop to your IBM Case and you are ready for the next question!

Have you Switched it Off and On again?

I wish I never hear this about a mainframe Db2 problem!

I hope this was of some interest, and if you have any other Tips & Tricks about getting „standard“ data to IBM, I would love to hear from you!

TTFN

Roy Boxwell

20. Juni 202221. Juni 2022

2022-06 Apollo 13 has landed!

Yes, I admit it, I was surprised that IBM actually called it Db2 13 for z/OS in the end. I know IMS also had a 13 release but I still believed they would jump to 14! After all Apollo 13 was a disaster, albeit with a happy ending caused by a whole bunch of engineers working really well with the hardware and the design. I guess that was the reason for the „Apollo“ code name of this release of Db2. Anyways, on to my review of all things new and interesting in Db2 13.

What’ve We Got?

The most important bit is that you *must* be on Db2 12 FL510 to even think about getting to Db2 13. This is done to make the migration as easy as possible with no gotcha’s happening along the way!

System Check First!

Remember that automatic rebind will occur for any packages created before Db2 11. This should not happen but is never a good idea in production! Further, a whole bunch of ZPARMs have been removed and so you must check to see if you set these differently than default, if so then check what will happen in your shop with the „new“ values. Here’s a direct link to the docu all about new, changed and deprecated ZPARMs: https://www.ibm.com/docs/en/db2-for-zos/13?topic=13-subsystem-parameter-changes-in-db2

FL100 Hits

FL100 allows you to fallback to Db2 12 FL510 and also permits co-existence in data-sharing. This enables 24×7 availability as you go round-robin with the load libraries and bounce your Db2 members.

What Else do You get at FL100?

Index look-aside optimizations are there for INSERT, UPDATE and DELETE.

Sort

It gets seriously enhanced:

the ability to generate machine code enabling DECFLOAT
if you use grouping set, multiple distincts and PERCENTILE you also get generated machine code
Sort can use its very own workfile
larger sort tree size
a check for ordered data on first iteration
LISTAGG gets SUBSTR support
Ability to avoid rereading a workfile, but only if Watson Machine learning is enabled
Shrinking the length of long varchars (over 100 bytes), again only if Watson Machine learning is enabled
z15 expanded support of SORTL, this also requires Watson Machine learning to be enabled

PBG Insert

The insert mechanism for PBGs got a nice update to drive a retry if the partition lock fails on the first insert attempt. This could well stop excessive growth of PBGs with heavy insert activity.

Below-the-Bar (BTB) Reduced and Above-the-Bar (ATB) Enhanced

As in all releases of Db2, the amount of BTB storage has been reduced primarily by reducing the agent BTB storage to use ATB storage instead. This applies to dynamic SQL statement text and attribute strings. When BTB storage exceeds 64% Db2 will automatically trigger a storage contraction of all private storage pools. Further, if you are at z/OS 2.5 you can use a new dynamic allocation function by updating the ALLOCxx parmlib member and set the SYSTEM SWBSTORAGE to ATB. This reduces the overhead per open dataset from 5KB down to 4KB. It might not sound much but when you have 200,000 open datasets it all adds up! The ATB storage clean-up also got improved to stop the usage of the IARV64 REQUEST(DISCARDDATA) during thread deallocation. Now a system level timer is used to trigger these clean-ups.

ECSA Reduced

For IFI users the ECSA usage has also been reduced from 50MB down to 8MB, however, you must make sure that you set aside around 50MB in HVCOMMON and 25MB for private storage to compensate for this. There is no such thing as a free lunch after all! The DDF per thread ECSA is now down to be the same as a local thread whereas before it was always 2KB more for a remote than a local thread. When the ECSA exceeds 85% Db2 will automatically trigger a storage contraction of all allocated pools within the ECSA.

DBAT Optimizations

These reduce the frequency and number of terminations and also flatten the spike when a surge of short-term DBATs increase usage quickly.

External Security Improvements

Db2 13 now caches the plan authorization checks as long as you have z/OS 2.5 and zparm AUTHEXIT_CACHEREFRESH set to ALL. At the same time, more IDs per plan are now cached.

Enhance RECOVERY Feature

RECOVER is enhanced so that it can use TP/IP level copies when a DSNUM ALL recover is requested. Pre-requisite is that it is only allowed on objects that are based on a UTS. Previously, RECOVER required a TS/IX level Image Copy to process this. Now it will check to see if it can do the RECOVER based upon the TP/IP copies that have been done.

FL500 Hits

Artificial Intelligence finally comes to Db2 for z/OS with SQL data insights. These give cognitive intelligence to Built-in-Functions and can be used in SQL queries. More on these new functions later!

Role-Based Package Ownership

This completes the ability to use Trusted Context which assigns a ROLE to an inbound connection. It can now use this role as the package owner.

Utility Changes

Page sampling is now allowed for inline statistics. Within a REORG TABLESPACE or a LOAD utility you can now use page sampling (Like you can within RUNSTATS). Before this, only row sampling was allowed and page sampling can give a much bigger saving in CPU and elapsed time.

Online Conversion from PBG to PBR

As long as you have some sort of usable partitioning key you can now use ALTER to migrate from PBGs, which are getting very big and unwieldy, to the new and much better PBR RPN type UTS spaces. After the ALTER, a REORG simply completes the migration.

FTB Expanded

Unique index maximum size for FTB is now raised up to 128 bytes and for duplicate indexes up to 120 bytes. This greatly increases the possible scope of FTB usage.

Lock Timeout Controls

CURRENT LOCK TIMEOUT special register can be used to change the TIMEOUT by the application or even the SQL Statement. This can help reduce lock contention.

Profile Updates

The profile table can now be used for local threads as long as DDF is started with AUTO or COMMAND. These tables can now also contain CURRENT LOCK TIMEOUT and also the RELEASE_PACKAGE keyword with one of the COMMIT attributes.

Active LOG Delete

You can now delete an active log while Db2 is running, using the new REMOVELOG option from the -SET LOG command without having to stop and start D2b afterwards. This is very handy when you wish to resize all of your active logs!

How fast? – Very fast!

The first „real“ FL version 500 has *no* catalog changes! This simplifies migration again as there is no „real“ CATMAINT. There is still a CATMAINT job but it just sets internal flags and does *not* do any changes to the Catalog.

FL501 Hits

Deadlock Priority

New built-in global variable SYSIBMADM.DEADLOCK_RESOLUTION_PRIORITY gives you the ability to set your relative weighting about „who should die“ when a deadlock is detected. This is also added to the profile tables by the way.

Catalog Changes!

One duplicate catalog index is, finally, dropped and the brand new SYSIBM.SYSUTILITIES table is created and will start to be INSERTed into, more about this table later! Some of the real-time statistics (RTS) statistical columns got uprated from INTEGER to BIGINT or SMALLINT to INTEGER. This helps for really big shops where the RTS numbers just maxxed out all the time! Further, lock escalation is disabled on all RTS and RTS History tables.

Index Splits

The RTS also got three new columns in the SYSINDEXSPACESTATS table:

REORGTOTALSPLITS – How many index splits since last REORG
REORGSPLITTIME – Aggregated elapsed time for all index splits since last REORG
REORGEXCSPLITS – Number of abnormally long (over one second) index splits since last REORG

All of these columns give good data about your index structures and possible changes to their definitions.

AI BiFs

We get three new Built-in Functions in Db2 13 FL500:

AI_ANALOGY – Computes an analogy score between two values
AI_SEMANTIC_CLUSTER – Computes a semantic clustering score of a member argument against a set of clustering arguments
AI_SIMILARITY – Computes a similarity score between two values

In fact, these are the *only* BiF changes in Db2 13 up to FL501! I will not go into detail about how they work here but there is a bit of work that must be done „behind the scenes“ to get these beasts working… again – no such thing as a free lunch!

SYSIBM.SYSUTILITIES

Is a very interesting catalog table that contains a list of all Utilities that have run on the machine. It is updateable by anyone with the necessary rights so I recommend at least a yearly purge or it could grow to crazy sizes… The first support is pretty basic, just EVENTID, NAME, JOBNAME, UTILID, USERID, STARTTS, ENDTS, ELAPSEDTIME, CPUTIME, ZIIPTIME, RETURNCODE, CONDITION: Blank – Active or stopped utility, E – Execution has ended, F – Execution terminated by -STA db(x) sp(x) ACCESS(FORCE) or T – Execution terminated by -TERM UTILITY, RESTART: N – not restarted, Y – restarted, NUMOBJECTS, LISTNAME, STARTLOGPOINT: RBA for non-data-sharing or LRSN for data-sharing, GROUP_MEMBER, SORTNAME, SORTCPUTIME and SORTZIIPTIME. All of the other columns are „for future use“ but with those columns that are filled you can do some interesting Utility analysis! The interesting factoid about this table is that all the time columns were originally planned as being CHAR(8)… Thankfully IBM decided to change them to BIGINT columns containing microseconds thus making the table actually usable!

Getting there…

Of course you have to get to Db2 13 before you can start using all the stuff I have just been discussing. To make sure you get there as quickly and painlessly as possible, why not download our Migration HealthCheck software? It is free-of-charge, with licenced extensions, software that lists out the „state“ of your Db2. This includes all the prereqs for Db2 12 FL510 which must be reached before going to Db2 13 FL100 as well as all other deprecated features!

Any Plans?

What are your plans for going to Db2 13? I would be delighted to hear from you!

TTFN

Roy Boxwell

Update: One of my valued readers asked me if I could check whether or not LISTAGG accepts ORDER BY or not in Db2 13. Sadly I can confirm that only SUBSTR was added. ORDER BY still dies a death with an SQLCODE -390. My reader had even opened an AHA Idea request for this support, which I also voted for by the way, but it was rejected by Db2 support. The question that both of us have is „Why not?“ What is so hard about adding sort support? My personal theory is that, internally, it uses a LOB style object format which is not sortable, but why not just tell us? In Boston I will ask around…

24. Mai 202223. Juni 2022

2022-05 Let’s get hashed

This month I will delve into the wonderful new field QUERY_HASH in the SYSIBM.SYSPACKSTMT Db2 Catalog table.

Isn’t it Old?

The QUERY_HASH column first appeared back in the Db2 11 tables SYSQUERY and DSN_STATEMENT_CACHE_TABLE, where it is used to identify dynamic SQL and enable joining between these two tables.

Now it Gets Interesting

In Db2 12 it was added throughout the Catalog to tables SYSDYNQRY, SYSPACKSTMT and DSN_STATEMNT_TABLE, thus adding the availability to Static SQL as well as finishing the Dynamic SQL support.

For Static SQL, the column is actually very interesting. I will now show you a list of little queries that you can use to see what it is and how to use it at your site.

First up: Baseline

How many static SQL statements do you have in the Db2 catalog at this moment?

SELECT COUNT(*)
FROM SYSIBM.SYSPACKSTMT A
FOR FETCH ONLY
WITH UR
;

Gives me:

---------+---------+---------+---------+---------+---------+
324675
DSNE610I NUMBER OF ROWS DISPLAYED IS 1
DSNE616I STATEMENT EXECUTION WAS SUCCESSFUL, SQLCODE IS 100

0 = 0 = 0 ?

But that includes the „dummy“ stuff in all packages, so let’s rerun with a better predicate:

SELECT COUNT(*)
FROM SYSIBM.SYSPACKSTMT A
WHERE NOT (A.SEQNO  = 0
       AND A.STMTNO = 0
       AND A.SECTNO = 0)
FOR FETCH ONLY
WITH UR
;

Which shows my true list of actual SQLs:

---------+---------+---------+---------+---------+---------
319015
DSNE610I NUMBER OF ROWS DISPLAYED IS 1
DSNE616I STATEMENT EXECUTION WAS SUCCESSFUL, SQLCODE IS 100

So how many different hashes do I have?

SELECT COUNT(*)
     , HEX(A.QUERY_HASH) AS QUERY_HASH
FROM SYSIBM.SYSPACKSTMT A
WHERE NOT (A.SEQNO  = 0 
       AND A.STMTNO = 0 
       AND A.SECTNO = 0)
GROUP BY A.QUERY_HASH
ORDER BY 1 DESC
FETCH FIRST 100 ROWS ONLY
FOR FETCH ONLY
WITH UR
;

Gives me:

---------+---------+---------+---------+-----
      QUERY_HASH
---------+---------+---------+---------+-----
73671 00000000000000000000000000000000
27228 40404040404040404040404040404040
12161 00000000000000000000000025B72000
 9821 40C7C5D7C9E240404040404040404040
 7372 00000000000000000000000024CC5000
 5989 000000000000000000000000257CD000
 5324 000000000000000000000000257CB000
 4530 40404070580000800000000000000000
 4265 1372041862047A61177D116D03002220
 4033 00000000000000000000000024E2F000
 3791 000000000000000000000000257EB000
 3114 27B830EAC5C4D44040C7C5D7C9E24040
 3089 7A69031C0174677E6844533C59555533
 2881 7B7F67796A17051E04077C027E142055
 2690 1473780D166A031F575A2F432047382F
 2446 6C166B000A6E13186161751F1A255340
 2264 6D760D7A75066A7A111A691E62592154
 2248 27B8353AC5C4D44040C7C5D7C9E24040
 2098 000000000000000000000000257BA000

Now I was a bit worried about all the low-values and all the 4040 entries but thought „The low-values are probably not bound or not executable or some such.“ The spaces were more worrying! Then I noticed the rows with lots of leading zeroes…

Details, Details…

At this point I thought we needed to break down the SQLs between true „real“ SQLs and „fake“ ones – FETCH, OPEN, CLOSE, SET etc. which are not EXPLAINable. So I added the EXPLAINABLE column to the select to see if I was right:

SELECT COUNT(*)
     , A.EXPLAINABLE
     , HEX(A.QUERY_HASH) AS QUERY_HASH
FROM SYSIBM.SYSPACKSTMT A
WHERE NOT (A.SEQNO  = 0 
       AND A.STMTNO = 0 
       AND A.SECTNO = 0)
GROUP BY A.EXPLAINABLE, A.QUERY_HASH
ORDER BY 1 DESC, 2
FETCH FIRST 100 ROWS ONLY
FOR FETCH ONLY
WITH UR
;

Gives me:

---------+---------+---------+---------+---------+--------
      EXPLAINABLE QUERY_HASH
---------+---------+---------+---------+---------+--------
73485 N           00000000000000000000000000000000
27228 N           40404040404040404040404040404040
12161 N           00000000000000000000000025B72000
 9821 N           40C7C5D7C9E240404040404040404040
 7372 N           00000000000000000000000024CC5000
 5989 N           000000000000000000000000257CD000
 5324 N           000000000000000000000000257CB000
 4530 N           40404070580000800000000000000000
 4055 N           1372041862047A61177D116D03002220
 4033 N           00000000000000000000000024E2F000
 3791 N           000000000000000000000000257EB000

Aha! So I guessed right all these, well over a third of *all* SQLs are not actually explainable and so a QUERY_HASH would be a little bit pointless.

Getting There…

So, now I added a predicate to remove all those:

SELECT COUNT(*)
     , A.EXPLAINABLE
     , HEX(A.QUERY_HASH) AS QUERY_HASH
FROM SYSIBM.SYSPACKSTMT A
WHERE NOT (A.SEQNO  = 0 
       AND A.STMTNO = 0 
       AND A.SECTNO = 0)
  AND NOT A.EXPLAINABLE = 'N'
GROUP BY A.EXPLAINABLE, A.QUERY_HASH
ORDER BY 1 DESC, 2
FETCH FIRST 100 ROWS ONLY
FOR FETCH ONLY
WITH UR
;

Which gives me much better data:

---------+---------+---------+---------+---------+--------
    EXPLAINABLE QUERY_HASH
---------+---------+---------+---------+---------+--------
372 Y           6014030D641C1325583D214B504C3750
372 Y           3E4E5C30101603600A60620574076268
312 Y           70106E0C106E150F7274790C53255340
307 Y           49335C5B4A1C6B6276101914001D6D73
248 Y           53473E64001574120C191862767E1360

Enhanced It All!

Then I enhanced the query to now join back to the SYSPACKAGE and show me columns of interest from there, especially TYPE as I had a suspicion!

SELECT A.COLLID, A.NAME, A.CONTOKEN, A.TIMESTAMP
      ,A.BINDTIME, A.VALID, A.OPERATIVE, A.LASTUSED
      ,A.TYPE, B.STATUS, B.EXPLAINABLE, B.STATEMENT
FROM SYSIBM.SYSPACKAGE  A
    ,SYSIBM.SYSPACKSTMT B
WHERE NOT (B.SEQNO  = 0 
       AND B.STMTNO = 0 
       AND B.SECTNO = 0)
  AND     A.LOCATION    = ''
  AND     A.LOCATION    = B.LOCATION
  AND     A.COLLID      = B.COLLID
  AND     A.NAME        = B.NAME
  AND     A.CONTOKEN    = B.CONTOKEN
  AND     B.QUERY_HASH  = X'00000000000000000000000000000000'
  AND NOT B.EXPLAINABLE = 'N'
ORDER BY 2 , 1
FOR FETCH ONLY
WITH UR
;

Not my TYPE

Scrolling right to the TYPE column:

----+---------+---------+---------+---------+---------+---------+----
VALID OPERATIVE LASTUSED   TYPE STATUS EXPLAINABLE STATEMENT
----+---------+---------+---------+---------+---------+---------+----
Y     Y         2021-12-23      C      Y      DECLARE DB2JCCCURSOR8 C
Y     Y         0001-01-01      H      Y      DECLARE DB2JCCCURSOR1 C
Y     Y         0001-01-01      C      Y      DECLARE DB2JCCCURSOR3 C
Y     Y         2016-09-02 1
Y     Y         2016-09-02 1
Y     Y         2016-09-02 1
Y     Y         0001-01-01 1
Y     Y         0001-01-01 1
Y     Y         2022-05-12 N
Y     Y         2022-05-12 N
Y     Y         0001-01-01 T
Y     Y         0001-01-01 T

Aha! Advanced Triggers (1) , Procedures (N) and Normal Triggers (T). Functions(F) would also be there. There’s a Gotcha here, too: RESTful Services identify themselves like a normal package but with HOSTLANG=’R‘, yet they do not have a usable Hash. So then I removed all of these from the picture like this:

SELECT A.COLLID, A.NAME, A.CONTOKEN, A.TIMESTAMP
      ,A.BINDTIME, A.VALID, A.OPERATIVE, A.LASTUSED
      ,HEX(B.QUERY_HASH) AS QUERY_HASH
      ,B.STATUS, B.STATEMENT
FROM SYSIBM.SYSPACKAGE  A
    ,SYSIBM.SYSPACKSTMT B
WHERE NOT (B.SEQNO  = 0 
       AND B.STMTNO = 0 
       AND B.SECTNO = 0)
  AND A.LOCATION    = ''
  AND A.LOCATION    = B.LOCATION
  AND A.COLLID      = B.COLLID
  AND A.NAME        = B.NAME
  AND A.CONTOKEN    = B.CONTOKEN
  AND A.TYPE        = ' ' -- ONLY REAL PACKAGES
  AND B.EXPLAINABLE = 'Y' -- ONLY EXPLAINABLE SQL
  AND NOT B.HOSTLANG = 'R' -- NO RESTFUL SERVICES 
ORDER BY 2 , 1
FETCH FIRST 100 ROWS ONLY
FOR FETCH ONLY
WITH UR
;

Then I got the real data out:

---------+---------+---------+---------+---------+---------+---------
COLLID     NAME    CONTOKEN TIMESTAMP
---------+---------+---------+---------+---------+---------+---------
PTFCOLL008 ADMDMEM +oæ (p8  2021-10-08-12.40.14.562716
PTFCOLL008 ADMDMEM +oæ (p8  2021-10-08-12.40.14.562716
PTFCOLL008 BUILDS  ?À ã     2016-12-28-08.50.58.486446
PTFCOLL008 CLEAN   ââ  r4   2016-12-28-08.50.59.911739
PTFCOLL008 CLEAN   ââ  r4   2016-12-28-08.50.59.911739

-+---------+---------+---------+---------+--
BINDTIME                   VALID OPERATIVE
-+---------+---------+---------+---------+--
2022-01-20-15.26.20.128052 Y     Y
2022-01-20-15.26.20.128052 Y     Y
2022-01-20-15.26.20.533383 Y     Y
2022-01-20-15.26.20.639940 Y     Y
2022-01-20-15.26.20.639940 Y     Y

-----+---------+---------+---------+---------+------
LASTUSED   QUERY_HASH                       STATUS
-----+---------+---------+---------+---------+------
2022-03-21 48534A5F7A741F0E75131067686F066A C
2022-03-21 585850457A760F066F091067686F0668 C
2022-02-18 6A0A777E720B7B671E703E40232C584B C
0001-01-01 59283E494E332341514B37572A29376F C
0001-01-01 6F051C041E1C136A0024374B293F3742 C

--------+---------+---------+---------+---------+---
STATEMENT
--------+---------+---------+---------+---------+---
DECLARE PTFTOOL-02 CURSOR FOR SELECT A . PMEMBERNAME
DECLARE PTFTOOL-01 CURSOR FOR SELECT A . PMEMBERNAME
DECLARE GET-MEMBER CURSOR FOR SELECT A . RMEMBERNAME
DECLARE PTFTOOL-02 CURSOR FOR SELECT D . PTFNO , D .
DECLARE PTFTOOL-01 CURSOR WITH HOLD FOR SELECT D . P

Finally, I got my „real“ useful HASH data with text.

And Now?

So what can you do with the QUERY_HASH? One simple thing, is to just use it to pull out all the duplicate (see note below!) SQLs that you have in different collections or packages:

SELECT A.COLLID, A.NAME, A.CONTOKEN, A.TIMESTAMP
      ,A.BINDTIME, A.VALID, A.OPERATIVE, A.LASTUSED
      ,B.STATUS, B.STATEMENT
FROM SYSIBM.SYSPACKAGE  A
    ,SYSIBM.SYSPACKSTMT B
WHERE NOT (B.SEQNO  = 0 
       AND B.STMTNO = 0 
       AND B.SECTNO = 0)
  AND A.LOCATION    = ''
  AND A.LOCATION    = B.LOCATION
  AND A.COLLID      = B.COLLID
  AND A.NAME        = B.NAME
  AND A.CONTOKEN    = B.CONTOKEN
  AND A.TYPE        = ' ' -- ONLY REAL PACKAGES
  AND B.QUERY_HASH  = X'621B6C6564170F63151C5E45544E4A40'
  AND B.EXPLAINABLE = 'Y' -- ONLY EXPLAINABLE SQL
  AND NOT B.HOSTLANG = 'R' -- NO RESTFUL SERVICES 
ORDER BY 2 , 1
FETCH FIRST 100 ROWS ONLY
FOR FETCH ONLY
WITH UR
;

This shows me all the packages with the selected QUERY_HASH value:

---------+---------+---------+---------+
COLLID           NAME     CONTOKEN
---------+---------+---------+---------+
MDB2VNEX_TEST    DSMALTER ) }_8
RTDX0510RCH      DSMALTER î Ö t Y
RTDX0510_AT      DSMALTER À´ Â ¢
RTDX0510_BE      DSMALTER _K W y
RTDX0510_COLL_AN DSMALTER À´ Â ¢
RTDX0510_DA      DSMALTER î Ö t Y

---------+---------+---------+---------+---------+----
TIMESTAMP                  BINDTIME
---------+---------+---------+---------+---------+----
2019-04-08-14.40.11.723943 2022-01-20-15.23.56.457297
2021-11-10-08.34.22.949593 2022-02-28-09.51.35.203521
2021-11-09-10.25.16.043349 2022-05-10-09.05.47.146861
2021-11-17-06.25.25.922112 2022-03-21-13.44.10.759957
2017-11-09-13.17.35.820393 2022-04-25-14.27.48.875174
2021-12-01-11.36.44.218374 2022-03-21-13.44.49.788358

---+---------+---------+---------+--
VALID OPERATIVE LASTUSED   STATUS
---+---------+---------+---------+--
Y     Y         2021-11-22 C
Y     Y         0001-01-01 C
Y     Y         0001-01-01 C
Y     Y         0001-01-01 H
Y     Y         2021-03-24 C
Y     Y         0001-01-01 H

--+---------+---------+---------+---
STATEMENT
--+---------+---------+---------+---
INSERT INTO DSM_ALTER VALUES ( : H ,
INSERT INTO DSM_ALTER VALUES ( : H ,
INSERT INTO DSM_ALTER VALUES ( : H ,
INSERT INTO DSM_ALTER VALUES ( : H ,
INSERT INTO DSM_ALTER VALUES ( : H ,
INSERT INTO DSM_ALTER VALUES ( : H ,

This does add an extra, pretty neat, element to the DBA’s tool kit.

Useful for You?

Do you think you will be using this feature or just let external tooling handle all of this for your system?

I would be very interested to hear any, and all, of your thoughts!

TTFN

Roy Boxwell

Note: One thing that I have noticed is that the hash algorythm is not actually that good! I get duplicates which are not actually duplicate SQLs when only one – four characters are different (typically table names!) Not really a major problem but something you had all better be aware of!

25. April 202225. April 2022

2022-04 A brief history of the Universal Tablespace (UTS) Part Two

This month I wish to finish off, what I started last month, my musings about UTS over the releases and years.

Db2 12 Changes to the UTS Picture

For Partitioned By Range (PBR), a brand-new space was created called the UTS PBR Relative Page Number (RPN) which was, in my opinion, the best thing in Db2 12! Quite simply, it allows the dynamic ALTERing of the partition and all of the related partitioned indexes DSSIZE on-the-fly even when a LOAD is running! This was great! Any users out there who have had a nightmare LOAD? Yep, now, as long as you are actively monitoring your data and index partition sizes, you can issue these ALTERs automatically and never hit the buffers like you do today.

DSSIZE gets propagated through

To enable this feature DSSIZE was improved to be settable at the data partition level and also extended to partitioned indexes. The available values were changed to allow any integer from 1 GB to 1024 GB. This allows extreme flexibility for all sizing requirements. Note, however, that NPSIs are still stuck in the middle of nowhere with just PIECESIZE as a helping hand…

Everything groovy???

So, what was wrong with this picture? Well, the change from PBR to PBR RPN was, shall we say, a little bit painful. The RID size got extended and as the RID is stored in *every* header page it required a tablespace reorg with partition-based inline image copies. Now, as you can well imagine, most people’s PBRs are pretty big, and allocating 4096 Virtual Tapes in parallel was just not going to happen! After a while IBM enhanced REORG so that you could put multiple copies on one tape, sort of like STACK, but much better – and not just for Tape but also for DASD. This has really accelerated the acceptance and usage of PBR RPN.

The future is bright!

Check this Blog entry:

https://www.idug.org/blogs/emil-kotrc1/2021/03/12/why-universal-table-spaces-uts

It is revealed that in Apollo (Db2 for z/OS vNext – Which has now been released as Db2 13 for z/OS), the ability to migrate from a PBG to a PBR will become available instead of the UNLOAD, DROP, CREATE, LOAD method which is the only way up until Db2 12. This will be very handy as the PBR RPN is the best way to go forward with large tablespaces (>60 GB) as long as you have *some* sort of available partitioning scheme, of course!

No more worries??

What do you need to worry about now? Well, you remember that huge LOAD I mentioned earlier that comes at the partition level? You must simply monitor the sizes of your partitioned objects and add a few GBs of space when required, on the fly, with no outage.

How?

Well, we have a little product called SAX+ which does exactly that! It starts the required OPx IFCID traces and computes the values against given thresholds, then automatically issues the ALTERs giving you a seamless experience with PBR RPNs. The only “problem” left now is when you are approaching the 1024 GB absolute PBR RPN physical limit. SAX+ warns you about this as well. Then you will have to either schedule a REBALANCE or a new LIMITKEY definition and REORG to spread the load again. However, when you know this well in advance it is no longer a serious problem!

Not yet at PBR RPN? – No problem!

PBRs which are not yet RPNs are monitored to warn when a threshold of usage in a data or index partition is exceeded. This also gives you more than enough lead time to get the REORG ready to switch it to RPN or just resize/rebalance your partitions.

What about PBGs?

PBGs are also fully supported within SAX+ as it can automatically add partitions, if desired, which additionally avoids SQLCODE -904’s. Plus, SAX+ adjusts the way it works depending on the MAXPARTITIONS and the actual number of allocated partitions. For example, MAXPARTITIONS 10 with a 90% warning would alert you when the ninth partition is allocated and in use. When the tenth partition gets allocated the warning switches from an LDS warning (Running out of available partitions) to a “last partition” filling up warning similar to PBRs which are not yet RPNs. This obviously helps a lot when you have MAXPARTITIONS 1 which is the IBM recommendation these days.

Anything else?

Naturally, SAX+ also takes care of the other problems that can catch you unawares:

Running out of Linear Datasets (LOB, PBG, XML, non-UTS space and NPSI)
Running out of space
Running out of extents and also badly fragmented extents
Running out of volumes
Running out of physical range for SEQUENCES or IDENTITY columns
Running out of physical range for Numeric Primary key columns
Running out of DBATs
Running out of space in SMS Storage Groups

All of the entries in the above list are very annoying when they hit you with no warning, especially running out of Linear Datasets. Think NPSIs here! Every PIECESIZE is an LDS and so you can run out of these quicker than you think.

LOG problems?

One last thing that SAX+ can help you with is detecting Db2 Log problems together with stalled logs… Not something you normally think of with a space management tool! When Db2 starts running out of space in active logs it issues the DSNJ110E message and I know shops who have „missed“ this alert. The problem was that the Db2 Log SMS Storage Group was getting full *and* the tape offload had stalled… As you can imagine this scenario is not pretty, so checking you Log storage groups is probably a good idea to guarantee you do not hit this problem!

That’s enough about UTS for this month. If IBM bring out another TLA for a TLA I will scream!

TTFN,

Roy Boxwell

21. März 202221. März 2022

2022-03 A brief history of the Universal tablespace (UTS)

To understand the UTS concept it helps to look back and see what we, at least in the Db2 world, started with!

In the beginning it was simple

In the beginning were simple tablespaces with the possibility of multiple tables all mixing rows with each other that meant performance got very bad after a very short period of time. We also got single table partitioned tablespaces that required a Partitioning Index (PI) to be created before you could use the table at all. You could also create non-partitioned indexes (NPIs) that were Unique or Non-Unique and were based on the data from all partitions. You could, sort of, manage the size of these using the PIECESIZE parameter, but most shops just took the default and never used it.

Something new in V2R1

Then in DB2 V2R1, we got segmented spaces where all of any given segment only held rows for one table. This improved performance a lot, especially for insert and mass delete processing. It stayed like this right up until DB2 V8.1 when the requirement for a PI was dropped as now you could “partition by table”. This was a leap forward and also brought in the new Data-Partitioned Secondary Index (DPSI) to allow a different partitioning scheme as opposed to the actual partitioning scheme. First, only non-unique DPSIs were allowed (Just think about up to 4096 index checks for breaking a unique rule!) Unique was then finally actually allowed in DB2 V9.1 as long as the DPSI columns were a superset of the partitioning columns.

DB2 V8.1 the *big* release!

DB2 V8.1 was also the release that renamed our good old NPIs to be non-partitioned secondary indexes (NPSIs) The rules for “When is an index a partitioned index (PI or DPSI)?” or “When is it a non-partitioned secondary index (NPSI)?” are still pretty hard to explain – and most users just seem to guess!

Death of a PI

The drop of the PI was very good but most shops kept them hanging around as the application code was using them all over the place. DPSIs became a problem right out of the get go causing massive index probes and performance headaches right up to this day. Nowadays, the vast majority of indexes are NPSIs with a few, hopefully well chosen, DPSIs and any required PIs.

DB2 V9.1 new things appearing

Then in DB2 V9.1 along came the new kid on the block – UTS.

IBM has mentioned on multiple occasions that any new Db2 features will only be developed for UTS usage, which has been seen with the developments of CLONE spaces, HASH spaces, Currently committed locking behavior, Pending DDL, Inline LOBs, XML multi-versioning, and ALTER TABLE with DROP COLUMN. This list just keeps on growing.

The Universal TableSpace was born!

The UTS was created with two distinct and very different flavors.

Partition-By-Range (PBR) is a single-table, segmented tablespace that is partitioned on data values that are given at the table level. Partitions can be from 1 GB DSSIZE up to 64 GB DSSIZE in DB2 V9.1 (In DB2 10 it rose to 256 GB DSSIZE.) The maximum physical size of the partitions and the corresponding partitioned and non-partitioned indexes are all, relative to each other, the same. This was a modernization of the original partitioned table space with its index-based partitioning scheme.

PBRs got used straightaway as the benefits of the segmented definition were quickly realized. They gave fast spacemap usage for variable length rows and much faster mass delete processing. Changing the maximum possible size, the DSSIZE, of a partition or even of a partitioned index was still very nasty as it required a complete tablespace REORG to action. For NPSIs there was no real change – still just PIECESIZE here to, sort of, manage the maximum size of the datasets.

Calculating the maximum possible sizes is also “non-trivial”. For the data partition it is relatively straightforward but for the PI, the DPSI, and the NPSI it is a pretty nasty set of calculations you must do – and any mistake can quickly lead to an SQLCODE -904!

Partition-By-Growth (PBG) is a single-table, segmented tablespace that can have from 1 to 4096 partitions, each of which can be from 1 GB DSSIZE up to 64 GB DSSIZE in DB2 V9.1 (In DB2 10 it rose to 256 GB DSSIZE.) The PBG cannot have a partitioning key but can have multiple NPSIs, if desired. The PBG was basically viewed as just one huge container where you could just insert rows forever. This has benefits and drawbacks!

Benefits are: No worries about PI and DPSI definitions as you can only have NPSIs. No worries about running out of space – I mean 4096 huge partitions is a vast amount of space!

Drawbacks are: No PI and so no way to balance the partitions, a terrible space map search algorithm that simply kills mass insert processes and a huge problem with utilities as it is “one huge space” – and reorganizing one part in the middle of 100’s just causes grief. Plus the inherent problems in copying or cloning data are more complicated, especially if the number of parts on source and target are not the same.

It all started so easily…

In the beginning people used MAXPARTITIONS 4096 thinking “I will never need to change this” but very very quickly it turned out that this was a disastrous idea! The problem being that *all* of the partitions required some control block storage in memory, even if not physically defined or even in use! This caused massive problems and the recommendation to drop down a *lot* in the usage of MAXPARTITIONS. Over the years it has dropped down even more and so now, as of Db2 12, the IBM recommendation for a PBG definition is:

MAXPARTITIONS 1 DSSIZE 64 GB

For the older readers amongst us it should be apparent that this is the good old Linear Dataset (LDS) limit for simple and segmented spaces but rolled out in a new way! For the younger people out there, when DB2 began you could only have 2GB in one VSAM dataset, and so simple and segmented datasets got the ability to allocate up to 32 VSAM datasets, thus giving us the *massive* amount of 64GB space! Coincidence? I think not!

That’s enough about UTS for this month, come back next month for a wrap up about how Db2 12 has changed this picture and all the ways to handle these spaces in the best possible way!

TTFN,

Roy Boxwell

15. Februar 202215. Februar 2022

2022-02 ZPARMs never stop changing!

This month I want to go through some of the absolutely most important ZPARMs that control how your Db2 systems behave in a very significant manner. All of the following ZPARMs have a performance impact of some sort and we are always trying to squeeze the last drop of performance out of our Db2 sub-systems, aren’t we?

Starting with the Easy Stuff…

CACHEDYN. YES/NO, default YES. Should always be set to YES unless you do not care about saving dynamic SQL performance. Back a few decades ago, the recommendation was to have this set to NO as default! Hard to believe that these days, where most shops have 80% – 90% dynamic SQL during the day!

Now we Get to the Numerics!

OUTBUFF. 400 – 400,000, default 4,000. This is *extremely* important and you really should set it to the highest possible value you can afford in real memory! As a minimum, it should be 102,400 KB (100MB). This is the buffer that Db2 uses to write log records before they are „really“ written to disk. The larger the buffer, the greater the chance that by a ROLLBACK the data required is in the buffer and not on disk. This is a big win and the default of 4,000 KB is crazy low!

Skeletons in the Closet?

EDM_SKELETON_POOL. 5,120 – 4,194,304, default 51,200. This is one of my personal favorites (I wrote a newsletter solely on this a few years ago) The default is way to small these days. I personally recommend at least 150,000 KB and actually even more if you can back it with real memory. Just like OUTBUFF, pour your memory in here but keep an eye on paging! If Db2 starts to page you are in serious trouble! Raising this can really help with keeping your DSC in control.

DBDs are Getting Bigger…

EDMDBDC. 5,000 – 4,194,304, default 23,400. The DBD Cache is getting more and more important as, due to UTS usage, the size of DBDs is increasing all the time. The default just doesn’t cut the mustard anymore so jump up to 40,960 as soon as you can.

DSC is Always too Small!

EDMSTMTC. 5,000 – 4,194,304, default 113,386. The EDM Statement Cache (really the Dynamic Statement Cache) is where Db2 keeps a copy of the prepared statements that have been executed. So when the exact same SQL statement with the exact same set of flags and qualifiers is executed, Db2 can avoid the full prepare and just re-execute the statement. This is basically a no-brainer and should be set to at least 122,880 KB. Even up to 2TB is perfectly ok. Remember: A read from here is *much* faster than a full prepare, so you get a very quick ROI and great value for the memory invested! Keep raising the value until your flushing rates for DSC drop down to just 100’s per hour, if you can! Remember to cross check with the EDM_SKELETON_POOL ZPARM as well. It always takes two to Tango…

How Many SQLs?

MAXKEEPD. 0 – 204,800, default 5,000. The Max Kept Dyn Stmts parameter is how many prepared SQLs to keep past commit or rollback. It should be set to a minimum of 8,000 or so. Raising this might well cause a large memory demand in the ssidDBM1 address space so care must be taken.

RIDs Keep Getting Longer…

MAXRBLK. 0, 128 – 2,000,000, default 1,000,000. RID POOL SIZE is the maximum amount of memory to be available for RID Block entries. It should be at least 1,000,000 and, if you can, push it to the maximum of 2,000,000. Unless you want to switch off all RID Block access plans in which case you set it to zero – Obviously not really recommended!

Sorts Always Need More Space

MAXSORT_IN_MEMORY. 1000 to SRTPOOL. The maximum in-memory sort size is the largest available space to complete ORDER BY, GROUP BY or both SQL Clauses. Remember that this is per thread, so you must have enough memory for lots of these in parallel. The number should be between 1,000 and 2,000, but whatever value you choose, it must be less than or equal to the SRTPOOL size.

Sparse or Pair-wise Access?

MXDTCACH. 0 – 512, default 20. Max data caching is the maximum size of the sparse index or pair-wise join data cache in megabytes. If you do not use sparse index, pair-wise join, or you are not a data warehouse shop, then you can leave this at its default. Otherwise, set it to be 41 MB or higher. If it is a data warehouse subsystem, then you could set this as high as 512 MB. (This ZPARM replaced the short-lived SJMXPOOL, by the way.)

Sort Node Expansion

SRTPOOL. 240 – 128,000, default 10,000. SORT POOL SIZE is the available memory that is needed for the sort pool. The default is 10,000 KB and should really be set to 20,000 KB at least, if not more! IFCID 96 can really help you size this parameter. Remember that the number of sort nodes leapt up from 32,000 in Db2 11 to 512,000 nodes for non-parallelism sorts and 128,000 nodes for a sort within a parallel child task in Db2 12. This means raising this ZPARM can have an even greater positive effect than before.

Your „Top Ten List“

These ten ZPARMs really influence how your Db2 system works and so must always be changed with great care and attention to detail. Always do a before and after appraisal to see whether or not changing them helped or hindered your system!

And Finally…

I have updated our Pocket Tool, Performance Health Check, to check and report all these ZPARMs, as well as all the other checks like the 6 Byte RBA/LRSN or the Mapping table changes or the reason for REORG etc etc. Feel free to download and run it, as it is but a click away!

If you have any comments, or other ZPARMs you think are also important for performance, feel free to drop me a line!

TTFN,

Roy Boxwell

27. Januar 202227. Januar 2022

2022-01 Fazed by phases?

This month I wish to do a quick run through, and review, of the effects of REBIND with active packages and how phases, both in and out, have given us some interesting problems to deal with!

Phase-In

As I mentioned in an earlier blog, the phase-in/phase-out was a very good idea indeed! It finally meant that we could do REBINDs whenever we wanted and no-one had to shut down servers to simply action a REBIND. This was always especially galling when it was an „empty“ package or just to get an FL upgrade.

Allowed?

Remember, Phase-in is only allowed with PLANMGMT(EXTENDED) style packages and if the package is not a generated package for a trigger or SQL routine, such as a procedure or user-defined function.

Problems ahead?

The problems began soon after it was introduced in June 2019 with PH09191 and FL505. Now, it looked really good, and indeed it was, but it had one little flaw… it still required a SIX lock to do the business.

Problem solved?

This was solved in January 2021 with APAR PH28693 which changed the lock from a SIX to a much better U lock. With this APAR all of our problems were fixed and life would be good!

Nope…

Sadly, we then found out that the inactive/phased-out packages were actually causing rapid space growth in the SYSPACKCOPY (Not in SYSPACKAGE!) Remember that the phased-out packages all get copied across with a new number to the „back-up“ SYSPACKCOPY for use until the active thread finally disconnects.

The Problem Gets Worse

Now these backed-up, old packages are sitting around in SYSPACKCOPY waiting until the next REBIND comes along whereupon Db2 will attempt a FREE. Various customers noticed that, for their intensely used packages, this free never really happened and so they „hit the buffers“ of 14 copy_ids for packages… very nasty!

Another APAR to the Rescue!

IBM then created a new APAR, PH33295, that enhanced the FREE PACKAGE command to have a new PLANMGMTSCOPE sub-clause PHASEOUT. All this does is delete all of those old, no longer used packages. However, I can testify that doing a:

FREE PACKAGE(*.*.(*)) PLANMGMTSCOPE(PHASEOUT)

in production is one of the scariest commands I have ever issued!!! Personally, I would have preferred a brand new command like FREE PHASEDOUT PACKAGES or so…

Invalid as Well

While you are ridding yourself of 100’s of packages you could also issue the PLANMGMTSCOPE(INVALID) command which removes the phased-out and also gets rid of the dead, invalid packages which cannot ever be used anyway.

Do not Forget to Tidy up!

Once all these commands have been done, a REORG of the tablespace DSNDB06.SYSTSPKC is very highly recommended!

How have you experienced phase-in and phase-out? Do you think it’s a great feature or not?

Whatever you think, I would love to hear from you!

TTFN,

Roy Boxwell

15. Dezember 202116. Dezember 2021

2021-12 What’s in a Log anyway?

Hi! This month I wish to briefly delve into the inner workings of the D2b Log. The topic is very broad and complex and so this newsletter is only really skimming the surface of this topic!

What is the Log?

The Db2 Log is the central area where all data changes, plus a whole lot more, gets written away by Db2 as it does its normal work. Most shops allocate several large VSAM datasets for the Active Logs which, when full, get written off to Archive Logs.

Rules of the Log

How much Log do you need? The golden rules are all time based:

1) At least 24 hours of coverage on your Active Logs

2) At least 30 days of coverage on your Archive Logs

Any less and you could seriously run into trouble, breaking the 24 hour rule means that possibly normal ROLLBACKs might suddenly be requesting tape mounts which is not pretty and breaking the 30 days rule might put you into a world of pain when data sets get migrated off and scratched in a recovery scenario.

Take Care of Your Logs!

These Logs must be looked after and cared for as they save your company’s life on a daily and hourly basis. Normally, no-one really cares about the logs as they just „work“ and that’s it! However, the Db2 Log is actually a bit of a bottleneck these days.

Remember LOG NO?

Many years ago, Roger Miller said „LOG NO datasets will be implemented in DB2 over my dead body“ as he was pretty adament that LOG NO was a bad idea. The driver behind the requirement was just the sheer size of the logs being written by Db2 and the belief that writing fewer logs would make application faster.

How Many?

When you look at the history of Db2 you can see it started with between two and 31 active logs and between 10 and 1000 archive logs. We all thought „that will be more than enough“. Nowadays we have between two and 93 active and between 10 and 10,000 archives! Some shops rotate through their six byte RBAs in 24 hours and go through 1000’s of logs so we have grown up a little bit!

Before it Hits the Log…

Remember that there is another really important ZPARM that affects the LOG before the log is even hit – OUTBUFF it started out at a value between 40K and 4000K and is now between 400(K) and 400000(K). Just set it to the highest value you can! Db2 will always look here first before even looking at the active log and so if the data is here it is much faster than VSAM access!

Bottleneck?

Some customers were convinced that one area that was slowing down Db2 was the log write and the externalization of the data within. Roger was 100% against this but even he lost this battle and so the LOG NO space was born. However, if you ever do a ROLLBACK then the space goes into COPY pending which is horrible!

Checkpoint Charlie

The number of system checkpoints also has a direct impact on log size and usage. You must decide whether you wish to go time based (IBM-recommended way is three minutes) or transaction based – or even a mix of the two methods. I am in favour of keeping my life simple so I would always recommend the checkpoint every three minutes rule. You could argue that at „quiet times“ overnight too many checkpoints will be taken, but I counter that argument with „When was the last time you ever had a quiet time on your machine?“

Index Split?

Believe it or not, index splits seem to take up some 55% of the Db2 Log at one shop I visited. We decided to try different index pages sizes , including compression, and this all really helped in reducing the log load pressure. This has a direct effect on DASD, elapsed and CPU time which was an allround winner!

Crystal Ball?

So, how do you look into the Db2 Log? If you have no tooling then you only get the IBM stuff which is – shall we say – basic. It was here that I decided to write a little COBOL program, that I have called Db2 Archive Log Viewer for Db2 z/OS, that would read all of the Archive logs (No VSAM – keep it simple!) and handle normal non-spanned log records to give me a glimpse into what on earth Db2 was writing into the darn things!

Log HealthCheck

So what does Db2 Archive Log Viewer for Db2 z/OS then do? Well, it reads as many Archive Logs as you can give it and reports on the contents of the Logs in absolute numbers and in size. Using this data enables you to get a new view into what your Db2 system is actually doing and who is causing all your logs to be written.

Surprise!

I was pretty surprised by what I saw inside my logs and I hope you find it interesting to peer inside your logs!

If you have any ideas or desires about log datasets, feel free to email me!

TTFN

Roy Boxwell

23. November 202123. November 2021

2021-11 When is an NPI not an NPI?

This month, I wish to discuss everything about indexes but focusing on Non-Partitioned Index’s (from now on NPIs). I must confess that the real name of an NPI is a Non-Partitioned Secondary Index but I really dislike that as a term…

In the Beginning was the Index

Well, actually, it was the data, but then we needed a quick way of accessing the data and so the index was born. In Db2, these are b-tree based indexes with a few system pages and then a hierarchy of non-leaf pages starting with the root page, which then branches down in levels until it finally hits the leaf page – and these contain the direct pointers to the data. Very fast and just a few getpage I/Os to read the non-leaf/leaf pages in.

Hope Springs Eternal

The hope is that all the system stuff, the root page, and quite a few other non-leaf pages will be hanging around in the bufferpool and so no real I/O is required, but you all know the real world? It never is like that!

Bigger and Bigger

Tables and tablespaces got bigger and bigger over time, so then the Partitioned Tablespace was born. At first it required a partitioning index (PI), but nowadays you should all be using table based partitioning.
Brief aside: Recently, there was a mega outage caused by a really nasty bug in Db2 that brought everything crashing down (PH41947). Root cause was the use of index-based partitioning – still! Do IBM test all these old things on all levels of code? Nope. Time to get my Migration HealthCheck freeware and migrate away from index-based partitioning as fast as you can!

A Star was Born

Well ok, not really a star but the PI started life a little bit dazed and confused, and the formula for calculating „how big can my PI can get before it goes *boom*“ is also impressive! (Check out my Know Your Limits presentation for the formulae!)

And Then Came the NPI

Now this PI was naturally unique, but we all needed a way to check for other data and possibly non-unique column combinations. So the non-partitioned index (NPI) was born. This is an index over the *complete* partition range and was normally heavily used by application SQL.

Pain points began

First major problem was the REORG of a single partition. It did the Tablepart (TP) data and it did the Index Part (IP), but it also had to update all the pointers within the NPI. This used to be terribly slow. At one shop, when reorging 22 partitions of a 254 partition TS it took 26 hours at the TP level and only 18 hours at the complete TS level. This is when we added to our RealTime DBAExpert utility generation software the ability to „roll up“ from a TP reorg to a TS reorg based on number of partitions being REORGed, percentage of partitions being REORGed, or whether or not at least one NPI existed.

A Piece of Cake?

The other major problem with NPIs is that they can have a PIECESIZE, which is how large a Linear Page Set (LPS) can get before a new one is created. You can have from 1 to 4096 pieces for a single NPI, and the calculation to work out how many pieces you are allowed to have is another brain-numbingly complex one – which is also in my limits presentation.

Some sample SQLs for you:

-- FIRST YOU MUST CALCULATE THE MAXIMUM NUMBER OF POSSIBLE PARTITIONS
SELECT INTEGER(MIN(4096 , POWER(2E00 , 32)
/ (
(64E00 * 1024E00 * 1024E00 * 1024E00) -- DSSIZE TS
/ (32 * 1024) -- TS PGSIZE
)
)) AS MAX_NBR_PARTS
FROM SYSIBM.SYSDUMMY1 ;
-- THEN YOU PLUG THAT NUMBER INTO HERE
SELECT MIN(
(64E00 * 1024E00 * 1024E00 * 1024E00) -- DSSIZE TS
, (POWER(2E00 , 32) / 2048 -- NBR PARTS
) * 4096 -- IX PGSIZE
) AS MAX_INDEX_SIZE
FROM SYSIBM.SYSDUMMY1 ;
-- FOR AN NPI YOU JUST NEED THE PIECESIZE AND THE INDEX PAGESIZE
SELECT INTEGER(MIN(4096 , POWER(2E00 , 32)
/ (
(8E00 * 1024E00 * 1024E00 * 1024E00) -- PIECESIZE
/ 4096 -- IX PGSIZE
)
)) AS MAX_PIECES
FROM SYSIBM.SYSDUMMY1 ;

Death by Index

Suffice it to say you can be killed by running out of space in your PI, running out of space in your NPI, or even running out of NPI LPS pieces…

A New Type of Index

A brand new index type, data-partitioned secondary index, DPSI was created in Db2 V8 so that you could access partitions without having to scan thousands of datasets. Naturally, DPSIs only work if the application SQL is cognizant of their definition and index. Failure to do so can give terrible performance. These are naturally also bound to their respective partitions. It was also at this time that NPIs got rebranded to be NPSIs…

Separation for Utilities is Good

The major utility benefit that partitioned TSs give, is the ability to run them in parallel, including RECOVER. It was a really great idea so that you would only REORG, COPY or RUNSTATS the partitions that required it, and all the partition level utilities could run in parallel.

Db2 9 was All Change

To make these REORGS faster, IBM changed the way REORG worked with partitioned objects in Db2 9. They did speed it up but it meant that you could no longer work in parallel! You could add ranges to the PARTs being reorged but still not in parallel. This caused much grief and gnashing of teeth amongst loads of customers. Thus, we developed the so-called NPI Job. Anything that has a dependency on an NPI could be squirreled away in the NPI job(s) to then be run sequentially by the job scheduler after all other jobs were finished. Nasty!

Faster, Better, Cheaper

Now, after over 35 years, the b-tree is showing its age. IBM came up with the FTB or FIT which, simply put, copies all the non-leaf pages into a private area of the bufferpool and then, using L2 cache-aware code, removes the need for I/Os, apart from the very last two, leaf and data page, and thus reduces I/O and CPU.

Limits Again

Of course this is not for *every* index. Only unique 64 bytes long or less with no IOE, TIMESTAMP with TIMEZONE, or versioning allowed. But hey, it was a start!

Duplicate Allowed

Now even duplicate indexes are there. Well, to start with, only 56 bytes or less but we are getting there! You will require PH30978 UI75643 and then set the new ZPARM FTB_NON_UNIQUE_INDEX to be YES.

Is an NPI Always an NPI?

Now to the core of the matter. As you should all be aware we can, talking about base tablespaces, only create Universal Table Spaces (UTS) in the three flavors we know and love – Partitioned by Growth, Partitioned by Range, and Partitioned by Range Relative Page Numbering.

And???

Well, imagine you have a TS with MAXPARTITIONS = 1 – by definition it cannot go to multiple partitions but all indexes created on it are also by definition NPIs. This is also sort of true for MAXPARTITIONS > 1 if the current number of defined and existing partitions (NUMPARTS) is equal to one.

„Fake“ NPIs

These are therefore, in my humble opinion, fake NPIs – You could argue that the NUMPARTS = 1 is „risky“, as Db2 could add a new partition by the next INSERT/UPDATE, but I think the risk is low. For these cases I think the jobs should not go to special NPI handling.

What are your Db2 maintenance jobs doing with these fake NPIs?

I would be very interested to hear from you all about your thoughts and plans here!

TTFN

Roy Boxwell

19. Oktober 202127. Oktober 2021

2021-10 Creating Clones

This month I’m reviewing CLONE Table usage. It is one of several, what I call „esoteric“, Db2 abilities/functions that I will be running through over the coming months, plus some blogs that are either badly misunderstood, not used by anyone, or just very odd.

Attack of the Clones

Clones arrived in a blaze of glory way back in DB2 9 (remember that capital B?) and then promptly disappeared. I have had nothing to do with them – ever – and I only received one question about their usage. Until now…

What Changed?

Well, what happened, is that I was asked if our RealTime DBAExpert utility generating software worked with CLONE tables, and I had to do some quick checking in my head about *what* these things were!

How Do They Work?

So what is a CLONE Table? It is basically a duplicate table that lives in the „same“ tablespace but with a different INSTANCE. This is the first place where people make mistakes. You read a lot about renaming the VSAM LDS. That *never* happens with CLONEs. The „trick“ that IBM uses is the INSTANCE, but I am getting ahead of my self here!

In the Beginning…

Create a Database, Tablespace and a Table with a couple of indexes:

CREATE DATABASE "TESTDB"
BUFFERPOOL BP0
INDEXBP BP0
STOGROUP SYSDEFLT
;
COMMIT ;
CREATE TABLESPACE "TESTTS" IN "TESTDB"
USING STOGROUP SYSDEFLT
PRIQTY -1
SECQTY -1
ERASE NO
FREEPAGE 5
PCTFREE 10
GBPCACHE CHANGED
TRACKMOD YES
LOG YES
DEFINE YES
DSSIZE 1 G
MAXPARTITIONS 1
BUFFERPOOL BP0
LOCKSIZE ANY
LOCKMAX SYSTEM
CLOSE YES
COMPRESS NO
MAXROWS 255
SEGSIZE 32
;
COMMIT ;
CREATE TABLE BOXWELL.TEST_BASE
(COL1 CHAR(12) NOT NULL
,COL2 INTEGER NOT NULL
,COL3 INTEGER NOT NULL)
IN TESTDB.TESTTS
;
COMMIT ;
CREATE UNIQUE INDEX BOXWELL.TEST_BASE_IX1 ON BOXWELL.TEST_BASE
(COL1, COL2, COL3)
USING STOGROUP SYSDEFLT
PRIQTY -1
SECQTY -1
ERASE NO
FREEPAGE 5
PCTFREE 10
GBPCACHE CHANGED
BUFFERPOOL BP0
CLOSE YES
COPY NO
;
COMMIT ;
CREATE INDEX BOXWELL.TEST_BASE_IX2 ON BOXWELL.TEST_BASE
(COL2, COL3)
USING STOGROUP SYSDEFLT
PRIQTY -1
SECQTY -1
ERASE NO
FREEPAGE 5
PCTFREE 10
GBPCACHE CHANGED
BUFFERPOOL BP0
CLOSE YES
COPY NO
;
COMMIT ;

Insert some data:

INSERT INTO BOXWELL.TEST_BASE VALUES ('A', 1 , 2);
INSERT INTO BOXWELL.TEST_BASE VALUES ('B', 1 , 2);
INSERT INTO BOXWELL.TEST_BASE VALUES ('C', 1 , 2);
INSERT INTO BOXWELL.TEST_BASE VALUES ('D', 1 , 2);
INSERT INTO BOXWELL.TEST_BASE VALUES ('E', 1 , 2);
INSERT INTO BOXWELL.TEST_BASE VALUES ('F', 2 , 2);
INSERT INTO BOXWELL.TEST_BASE VALUES ('G', 2 , 2);
INSERT INTO BOXWELL.TEST_BASE VALUES ('H', 2 , 3);
INSERT INTO BOXWELL.TEST_BASE VALUES ('I', 2 , 3);
INSERT INTO BOXWELL.TEST_BASE VALUES ('J', 2 , 3);
COMMIT ;

What Says RTS?

First, make sure the real-time statistics (RTS) have all been externalized:

-ACCESS DATABASE(TESTDB) SPACENAM(*) MODE(STATS)

Then run a little SQL:

SELECT *
FROM SYSIBM.SYSTABLESPACESTATS
WHERE DBNAME = 'TESTDB'
;
SELECT *
FROM SYSIBM.SYSINDEXSPACESTATS
WHERE DBNAME = 'TESTDB'
;

You should see one row from SYSTABLESPACESTATS with 10 TOTALROWS and 10 REORGINSERTS etc. and two rows from SYSINDEXSPACESTATS with 10 TOTALENTRIES and 10 REORGINSERTS etc. Now we have what I call the „base“ table.

Use ISPF as well…

In ISPF 3.4 you should see datasets like this:

DB2DC1.DSNDBD.TESTDB.TESTRBAS.I0001.A001
DB2DC1.DSNDBD.TESTDB.TESTTS.I0001.A001
DB2DC1.DSNDBD.TESTDB.TEST1BZC.I0001.A001

The Fun Begins …

Now we create a CLONE. To do this, you do *not* CREATE – that would be way too easy – a CLONE Table. You actually issue an ALTER statement like this:

ALTER TABLE BOXWELL.TEST_BASE
  ADD CLONE RINGO.AARDVARK
;
COMMIT ;

Now do that RTS select and the ISPF 3.4 again … As if by magic you will now see double the rows in the RTS … Check out the INSTANCE column:

------+---------+---------+
PSID  PARTITION  INSTANCE
------+---------+---------+
   2          1         1
   2          1         2

Aha! We now have two sets of RTS Counters – This is a good thing! ISPF also looks different:

DB2DC1.DSNDBD.TESTDB.TESTRBAS.I0001.A001
DB2DC1.DSNDBD.TESTDB.TESTRBAS.I0002.A001
DB2DC1.DSNDBD.TESTDB.TESTTS.I0001.A001
DB2DC1.DSNDBD.TESTDB.TESTTS.I0002.A001
DB2DC1.DSNDBD.TESTDB.TEST1BZC.I0001.A001
DB2DC1.DSNDBD.TESTDB.TEST1BZC.I0002.A001

Notice all the INSTANCE values here?

Finally the Boss Guy – SYSTABLESPACE. Here is where the access is controlled using, yet again, INSTANCE and its good friend CLONE:

SELECT *
FROM SYSIBM.SYSTABLESPACE
WHERE DBNAME = 'TESTDB'
;

--+---------+--
INSTANCE CLONE
--+---------+--
       1 Y

This is showing you all the information you need. The current base table is still the original table and this tablespace is in a „clone relationship“ – slightly better than „it’s complicated“ but close!

Test Select

Run this to see what you get back:

SELECT COUNT(*) FROM BOXWELL.TEST_BASE ; 
SELECT COUNT(*) FROM RINGO.AARDVARK    ;

You should get ten from the first count and zero from the second.

So What Is the Point?

Now we, finally, get to the raison d’être of CLONEs. The idea is that using table name ringo.aardvark you can INSERT data, perhaps very slowly over a period of days, into the CLONE TABLE and the application is not aware of and cannot be affected by it. Once the INSERT processing is completed you may then do the EXCHANGE DATA command to instantaneously swap the tables around. OK, it must actually just do a one byte update of the INSTANCE column in the SYSTABLESPACE, but I digress…

Here’s How it Looks

EXCHANGE DATA BETWEEN TABLE BOXWELL.TEST_BASE
                        AND RINGO.AARDVARK
;
COMMIT ;

Now do those COUNT(*) SQLs again:

SELECT COUNT(*) FROM BOXWELL.TEST_BASE
---------+---------+---------+--------
0
DSNE610I NUMBER OF ROWS DISPLAYED IS 1

SELECT COUNT(*) FROM RINGO.AARDVARK
---------+---------+---------+--------
10
DSNE610I NUMBER OF ROWS DISPLAYED IS 1

Whoopee! You can see that the table name has not changed but all the data has! This is pretty cool!

Downsides …

Never a day without pain, my mother used to say, and CLONEs come with a bunch of pain points!

Pain Point Number One

Reduced utility support. You can only run MODIFY RECOVERY, COPY, REORG (without inline statistics!) and QUIESCE against these. Why? Because there is only one set of catalog statistics for them. A RUNSTATS would destroy all of the data for *both* objects and the current object access paths might all go south; further, you absolutely *must* add the keyword CLONE to the utility control cards. You *cannot* rely on LISTDEF to do this for you and it is documented:

This utility processes clone data only if the CLONE keyword is specified. The use of CLONED YES on the LISTDEF statement is not sufficient.

but people still miss this and then *think* they are working with their clones but they are not! This can get very embarrassing…

Pain Point Number Two

You must remember to RUNSTATS the „new“ base after the EXCHANGE has been done. The RTS is always in step, the Catalog is probably way out of line…

When You Are Quite Finished

Once a CLONE table is no longer required you can easily drop it but naturally not with a DROP but with another ALTER statement:

ALTER TABLE BOXWELL.TEST_BASE
      DROP CLONE
;
COMMIT ;

Pain Point Number Three

The problem here is not that bad, but, depending on when you do the DROP CLONE, your „base“ could be the instance two! You, and your vendors, must make sure your Db2 and non-Db2 utilities are happy with this state of affairs!

RealTime DBAExpert?

Yep, we are in the clear! Our software does indeed support these esoteric beasts.

Over To You!

Do you use CLONEs? If so, why? Any war stories or is everything hunky dory?

As usual I would love to hear from you!

TTFN,

Roy Boxwell

Updates

I got quite a few updates about clones:

DDL Disaster

One of the major issues that I missed was another pain point: DDL Changes. These are really nasty as you must throw away your clone before you can do the ALTER and then recreate the CLONE relationship.

Commands

I also did not mention that various commands also need the CLONE keyword to be applied to CLONE spaces. For example -START DATABASE(xxx) SPCENAM(yyy) CLONE