2021-06 How large is my system?

After I held my Deprecated Items IDUG talk at the Virtual 2021 North American IDUG one of the attendees asked me if I also had some SQL for quickly retrieving the current „size of the database“ – Normally I use our very own SpaceManager product for this as it uses CSI access calls to get up to the second info about the size of the VSAM LDS, however for a quick and dirty query the Db2 catalog can be partially trusted!

Where to begin?

The easiest places to start are the two SYSIBM.SYSxxxxxPART tables. These contain all the partition details and if the PARTITION is zero then it is a non-partitioned tablespace or it is a NPSI defined index.

What to select?

SELECT COUNT(*)                          AS INDEX_PARTS
      ,SUM(IP.SPACEF)                    AS KB
      ,INTEGER(SUM(IP.SPACEF) / 1024)    AS MB
      ,INTEGER(SUM(IP.SPACEF) / 1048576) AS GB
FROM SYSIBM.SYSINDEXES   IX
    ,SYSIBM.SYSINDEXPART IP
WHERE     IX.NAME    = IP.IXNAME
  AND     IX.CREATOR = IP.IXCREATOR
  AND NOT IP.SPACE   < 0
;

This query just shows you the current size of all of your indexes. Notice the NOT IP.SPACE < 0 – This is to not count any DEFINE NO indexes as these use no space of course! I simply added the two INTEGER functions – naturally you can change these to any format you would like.

Table time!

SELECT COUNT(*)                          AS TABLE_PARTS
      ,SUM(TP.SPACEF)                    AS KB
      ,INTEGER(SUM(TP.SPACEF) / 1024)    AS MB
      ,INTEGER(SUM(TP.SPACEF) / 1048576) AS GB
FROM SYSIBM.SYSTABLESPACE TS
    ,SYSIBM.SYSTABLEPART TP
WHERE     TS.NAME   = TP.TSNAME
  AND     TS.DBNAME = TP.DBNAME
  AND NOT TP.SPACE  < 0
;

This does the same but now for tableparts of course!

More detail?

Now these two SQLs are OK for quick results but it is probably more interesting to see which index types and which tablespace types are taking up how much space on your disks. So here the expanded queries follow:

SELECT COUNT(*)                          AS INDEX_PARTS
      ,IX.INDEXTYPE
      ,IX.IX_EXTENSION_TYPE
      ,IX.PAGENUM
      ,SUM(IP.SPACEF)                    AS KB
      ,INTEGER(SUM(IP.SPACEF) / 1024)    AS MB
      ,INTEGER(SUM(IP.SPACEF) / 1048576) AS GB
FROM SYSIBM.SYSINDEXES   IX
    ,SYSIBM.SYSINDEXPART IP
WHERE      IX.NAME    = IP.IXNAME
   AND     IX.CREATOR = IP.IXCREATOR
   AND NOT IP.SPACE   < 0
GROUP BY IX.INDEXTYPE
        ,IX.IX_EXTENSION_TYPE
        ,IX.PAGENUM
ORDER BY IX.INDEXTYPE
        ,IX.IX_EXTENSION_TYPE
        ,IX.PAGENUM
;

The INDEXTYPE column can be Blank (For ancient Type 1 indexes which I hope no-one has), 2 for „normal“ Type 2 Indexes, D for Data-partitioned Secondary Indexes (DPSIs) and P for Indexes which are both Partitioned and Partitioning (which I also hope no-one has anymore!)

What’s in a TYPE?

The IX_EXTENSION_TYPE column can be Blank for „normal“ indexes, N for Node ID indexes on XML spaces, S for Scalar Expressions, T for Spatial or V for real XML Indexes.

Using new features yet?

PAGENUM is just telling you how many of your indexes are enjoying the great feature of fully variable DSSIZE, adjustable on-the-fly with no outage (As long as you adjust upwards of course!) If set to „A“ for Absolute then sadly not, much better is „R“ for Relative!

SELECT COUNT(*)                          AS TABLE_PARTS
      ,TS.TYPE
      ,TS.PAGENUM
      ,SUM(TP.SPACEF)                    AS KB
      ,INTEGER(SUM(TP.SPACEF) / 1024)    AS MB
      ,INTEGER(SUM(TP.SPACEF) / 1048576) AS GB
FROM SYSIBM.SYSTABLESPACE TS
    ,SYSIBM.SYSTABLEPART  TP
WHERE     TS.NAME   = TP.TSNAME
  AND     TS.DBNAME = TP.DBNAME
  AND NOT TP.SPACE  < 0
GROUP BY TS.TYPE
     ,TS.PAGENUM
ORDER BY TS.TYPE
     ,TS.PAGENUM
;

What’s in a tablespace TYPE?

Now at the tablespace level we have, again, the TYPE column which can be Blank for a „normal“ table, G for UTS PBG, L for Large, O for LOB, P for XML or R for UTS PBR. Going forward you will have to migrate any of the Blank tablespaces to PBG/PBR and L should be gone anyway!

My Output

When I run these here in Düsseldorf in my little Db2 12 FL508 test system I get this output:

---------+---------+---------+---------+---------+---------+--
INDEX_PARTS                       KB           MB           GB
---------+---------+---------+---------+---------+---------+--
       4453  +0.5951116000000000E+07         5811            5
---------+---------+---------+---------+---------+---------+--
TABLE_PARTS                       KB           MB           GB
---------+---------+---------+---------+---------+---------+--
       3733  +0.3255953100000000E+08        31796           31
---------+---------+---------+---------+---------+---------+-
INDEX_PARTS INDEX IX_EXT PAGE                     KB    MB GB     
            TYPE  TYPE   NUM
---------+---------+---------+---------+---------+---------+-
       3280 2            A   +0.5100220000000000E+07  4980  4
         25 2     N      A   +0.1800000000000000E+05    17  0
          3 2     S      A   +0.1224000000000000E+05    11  0
          8 2     V      A   +0.3744000000000000E+04     3  0
         15 D            A   +0.8112000000000000E+04     7  0
         24 P            A   +0.1392000000000000E+05    13  0
       1098 P            R   +0.7948800000000000E+06   776  0
---------+---------+---------+---------+---------+--------
TABLE_PARTS TYPE PAGENUM                     KB     MB  GB
---------+---------+---------+---------+---------+--------
        363      A      +0.5242134000000000E+07   5119   4 
       2094 G    A      +0.2109695400000000E+08  20602  20
          4 L    A      +0.1312000000000000E+04      1   0
        116 O    A      +0.1186235000000000E+07   1158   1
         25 P    A      +0.2952000000000000E+05     28   0
         33 R    A      +0.6932800000000000E+05     67   0
       1098 R    R      +0.4934048000000000E+07   4818   4

So you can see I already have over 1000 UTS PBR RPN table parts in use! Not bad! As a vendor I cannot just migrate everything of course so those 24 Index Controlled Partitioning indexes and the 33 Absolute Partitions aren’t going anywhere soon!

How does your site look? How are your plans for migrating to UTS PBR RPN going?

As always, I would love to hear the results!

TTFN,

Roy Boxwell

2021-05 Soundex and other cool features part 7 for Db2 12

In this, obviously, never ending series of new features, I will roll up all the new ones since my „SOUNDEX and other „cool“ features – Part six All new for Db2 12“ newsletter from 2018-08 where I first covered the new stuff in Db2 12.

What was new in FL100

At that level, a new table function called BLOCKING_THREADS was introduced, which is primarily used by the command DISPLAY BLOCKERS to help in looking for bad guys who are ready to BLOCK your change to FLxxx

FL500 Changed Scalars

Two scalar BiF got changed: TRANSLATE and UPPER can now both use UNI_60 as a locale.

FL500 New Scalars

In FL500, we got a nice bunch of new Scalar functions:

GENERATE_UNIQUE_BINARY (like GENERATE_UNIQUE except it returns a BINARY(16) value)
HASH_CRC32
HASH_MD5
HASH_SHA1
HASH_SHA256
PERCENTILE_CONT
PERCENTILE_DISC
WRAP

FL500 Changed Aggregates

One aggregate BiF got updated: ARRAY_AGG can now be used for associative arrays.

FL501?

It brought in just one new Aggregate BiF, which was the very famous LISTAGG. This is famous, as it was the „proof of concept“ that Agile actually worked and could function in the wild.

FL502?

Two changes to Scalar BiFs in this release:

GRAPHIC The first argument now accepts numeric data types, including SMALLINT, INTEGER, BIGINT, DECIMAL, REAL, DOUBLE, FLOAT, and DECFLOAT.
VARGRAPHIC The first argument now accepts numeric data types, including SMALLINT, INTEGER, BIGINT, DECIMAL, REAL, DOUBLE, FLOAT, and DECFLOAT.

FL503?

Nothing new for us in that release!

FL504?

It brought in a brand new concept for Db2 on z/OS. The concept of „pass-through BiFs“. These are just checked for valid syntax before being passed on down to the Accelerator that would actually execute them. Obviously, you must be careful about their usage!

CUME_DIST
CUME_DIST (aggregate)
FIRST_VALUE
LAG
LAST_VALUE
LEAD
NTH_VALUE
NTILE
PERCENT_RANK
PERCENT_RANK (aggregate)
RATIO_TO_REPORT
REGEXP_COUNT
REGEXP_INSTR
REGEXP_LIKE
REGEXP_REPLACE
REGEXP_SUBSTR

FL505?

Brought in a whole bunch of encryption and decryption BiFs:

ENCRYPT_DATAKEY converts a block of plain text to a block of encrypted text using a specified algorithm and key label.
DECRYPT_DATAKEY_datatype functions return the block of encrypted text from ENCRYPT_DATAKEY as a block of the datatype specified by the function used. List of datatypes: DECRYPT_DATAKEY_INTEGER, DECRYPT_DATAKEY_BIGINT, DECRYPT_DATAKEY_DECIMAL, DECRYPT_DATAKEY_VARCHAR, DECRYPT_DATAKEY_CLOB, DECRYPT_DATAKEY_VARGRAPHIC, DECRYPT_DATAKEY_DBCLOB, and DECRYPT_DATAKEY_BIT.

FL506?

Here IBM came up with a slew of „alternative names“ for existing BiFs and a new method of calling HASH routines. The background here was to make porting to Db2 on z/OS that little bit easier!

CHAR_LENGTH CHARACTER_LENGTH, which returns the length of its argument in the number of string units that are specified
COVAR_POP COVARIANCE or COVAR, which return the population covariance of a set of number pairs
HASH HASH_MD5, HASH_SHA1, or HASH_SHA256, which return the result of applying a hash algorithm to an input argument, depending on the value specified for the second argument for the HASH function:

        0 (default) HASH_MD5 Returns VARBINARY(16) instead of BINARY(16)
        1 HASH_SHA1          Returns VARBINARY(20) instead of BINARY(20)
        2 HASH_SHA256        Returns VARBINARY(32) instead of BINARY(32)

POW POWER, which returns the value of one argument raised to the power of a second argument
RANDOM RAND, which returns a double precision floating-point random number
STRLEFT LEFT, which returns a string that consists of the specified number of leftmost bytes or the specified string units
STRPOS POSSTR, which returns the position of the first occurrence of an argument within another argument
STRRIGHT RIGHT, which returns a string that consists of the specified number of rightmost bytes or specified string units
TO_CLOB CLOB, which returns a CLOB representation of the first argument
TO_TIMESTAMP TIMESTAMP_FORMAT, which returns a timestamp for a character string expression, using a specified format to interpret the string

FL507?

Here IBM development added another slew of „pass through“ BiFs:

ADD_DAYS
BTRIM
DAYS_BETWEEN
NEXT_MONTH
REGR_AVGX
REGR_AVGY
REGR_COUNT
REGR_INTERCEPT or REGR_ICPT
REGR_R2
REGR_SLOPE
REGR_SXX
REGR_SXY
REGR_SYY
ROUND_TIMESTAMP if invoked with a date expression

FL508?

None that I have read of yet!

FL509?

None yet…

Naturally I will be keeping this newsletter up-to-date, as necessary.

Any questions or ideas, do not hesitate to drop me a line,

TTFN,

Roy Boxwell

2021-04 PBG Partition Pain

One of the things I dearly love about Db2 is that I never ever stop learning. I have been working with Db2 since V1.3 and I was completely surprised by what I just learnt!

PBG What is it?

For those that do not know, PBG is a Partitioned-by-Growth Universal Tablespace (UTS) that came into existence in Db2 V9 as well as its twin – Partitioned-by-Range (PBR). IBM have repeatedly said that all new developments will focus on tables/indexes etc that are backed by UTS and so deprecated all the others types (Simple, Segmented, Classic Partitioned). When you create a PBG you specify how many partitions it can get to using the MAXPARTITIONS clause and it starts by allocating just the very first partition.

First Part – First Problem

Quickly after GA the first problems with this data model started appearing. First up were the people who used 4096 as MAXPARTITIONS and then started running out of memory as all the internal control blocks were being allocated for 1000’s of partitions that would most probably never exist!

Next problem – Copy problems

Then users of DSN1COPY and other cloning methods hit the next really nasty problem. This was: What happened if in subsystem A you had three partitions but in subsystem B only one and you use DSN1COPY to copy the data? This problem was unsolvable until another new DDL parameter was quickly rolled out: NUMPARTS. This controls how many partitions are physically allocated at DDL creation time. This should naturally *only* be used if you intend to cross-clone datasets where the number of partitions do not match.

Running out of space?

Various people have written monitoring software over the years that intercept IFCIDs from Db2 and audits the data within. Our very own SpaceAssuranceExpert (SAX) does this to warn you if you are about to hit any one of the many Db2 limits of space management. Naturally one of these limits is when you are nearing, or have allocated, the last partition, in other words you are soon reaching or have indeed reached MAXPARTITIONS. This means you are about to run out of partitions and, if in the last partition, SAX monitors how full it is to warn the DBA group of possible -904’s coming up. All well and good so far!

Newest problem – Death by descending key

It then came to my attention that Db2 12 does not, in fact, work that way in one special case. If the DDL has the same number for MAXPARTITIONS and NUMPARTS (So all possible partitions are allocated at creation) and you have a CLUSTER defined index where all columns are also DESCENDING then the following happens:

First data is inserted in partition one and extents are taken as required until the partition is full (So DSSIZE has been reached) then the inserts are targeted at the *last* partition! Once this partition is full it then goes to MAXPARTITIONS – 1 and fills that one up!

This naturally kills any attempt to see how full the dataset is online. Obviously you can do a spacemap analysis to review exactly how much free space is there, or hope that the Real-time Statistics table can help you, but doing this instantly for every IFCID is just crazy!

Is that a must?

Thankfully you can toggle off this very strange behaviour by setting MAXPARTITIONS to one more than NUMPARTS. Then Db2 behaves itself and goes back to 1 , 2 , 3 etc.

An ALTER to the rescue?

But what if you already have such a definition? If you already have MAXPARTITIONS 3 and NUMPARTS 3, what about doing an ALTER like:

ALTER TABLESPACE SAXTDB06.SAXTTS06
  MAXPARTITIONS 4 ;

Nope. Then Db2 gets *very* dumb! It then inserts to partition one until it is full then it skips to the *old* MAXPARTITIONS partition three and fills it up and then goes backwards until partition two is full before then jumping to partition four! Whoever thought this up must have been crazy! In a space with three partitions which is then ALTERed to have a MAXPARTITIONS of four you then fill partitions 1, 3, 2, 4 in that order… In my opinion this is a bug in Db2 as this space should be handled exactly as if it was defined with MAXPARTITIONS 4 and NUMPARTS 3!

What about an Index ALTER?

BINGO! Doing an ALTER on the index like:

ALTER INDEX SAXTEST.SAXTTB06_NEW_INDEX
  NOT CLUSTER ;

Then gets the inserts going „back“ into the normal partitioning sequence. All very odd, confusing and misleading!

There must be reason!

I have thought long and hard about „Why?“ and I cannot think of one answer that makes *any* sense! If Db2 started inserting in the last partition I could sort of get it – It might reduce the unclustered insert counts – but only doing this *after* filling the first partition just makes it all seem so pointless…and then after an ALTER it just gets worse!

One „pointer“ found

While looking around the web for any help I did stumble across this little gem:

The new V12 looping through partition function, can be either ascending or descending order. If the search is in descending order, then it is possible to skip a partition between the original target and last partition of the table space. Therefore, a sanity check of 04E-00C90101 LOC=DSNIDM DSNISGNS ERQUAL50C4 will be encountered, to catch this situation.

In APAR PI68087 from 2016.

I think this undocumented behavior should be either corrected or, even better, removed as soon as possible!

As usual, any questions or comments please feel free to mail me!

TTFN

Roy Boxwell

2021-03 Set Phasers to Stun!

This month I am going to talk about phasers – Nah! Actually, just about REBIND phase-in but still interesting even, if you cannot vaporize an evil alien with a bad access path!

A short history of REBIND

It all started way way way back when… REBIND is used to rebuild the internal structures of plans and packages. I mention plans as even these days an „empty“ plan *still* has structures within it that get rebuilt at REBIND time. It is a major prerequisite of Db2 release migrations to make sure that any and all active plans and packages which are older than Db2 vnext – 3 get a REBIND. For example, Migrating to Db2 12 will cause an auto-rebind of any plans and packages last bound earlier than Db2 10, and we all know how bad that will be – In fact, the recommendation is to REBIND all of these in Db2 11 NFM well *before* the migration anyway!

Where’s the beef?

OK, so you see you must REBIND for migration. Any other reason? Well, what about a new index or a RUNSTATS with COLGROUP or HISTOGRAM or whatever! You get the gist? You might be using our Bind ImpactExpert (BIX) to minimize the number of REBINDs, but even so, you cannot get by not doing any REBINDs or the optimizer has no chance to get a different, better access plan all worked out!

What’s the problem?

Sadly, most shops are 24×7 these days and so lots of packages are *always* active! To REBIND a package you must get an exclusive lock of course. This caused a serious amount of pain in Db2 12 as it requires regular REBINDs.

Why that?

FL Levels cause all the grief… the APPLCOMPAT is tied to the package and so when an application, in this case think JDBC or SPUFI etc., wishes to use a new feature then that package *must* be REBINDed at the appropriate level. The very first FL501 with just LISTAGG caused endless problems as companies could not get the simple REBINDs of the JDBC access packages (SYSLHxxx etc.) through without stopping all of their servers… sub optimal!

Ah! Got it…FL505 to the rescue!

So IBM development came up with „phase-in“ REBIND – A really brilliant idea! In a nutshell the REBIND *always* works. You get a new package, possibly with new access plans, and any new work that arrives runs with the new package. Meanwhile, back in the old world, the currently running transactions are still using the old version until they commit and de-allocate. Over time, and hopefully a brief period of time, all users of the old package will have gone and now only the new package is in use.

For all REBINDs?

Not all, but most! The supported syntax is any form of APREUSE, PLANMGMT must be extended or you cannot do phase-in at all and the package is *not* a generated package for a TRIGGER, an SQL Routine or a UDF.

Next REBIND and you are FREE

The next time this package gets a REBIND, Db2 looks in SYSPACKAGE and sees there are phased-out packages and attempts a FREE to get rid of them. If, however, the package is *still* in use, Db2 will write out an IFCID (393) to warn the DBA that something is hanging on desperately to an old phased-out package for far longer than it is good for!

How does that help?

If you have started that IFCID, and you capture and process them, you get a nice little list of the bad guys causing you to not be able to FREE up the phased-out packages. This gives you all the data you need to issue a term thread, as it even gives you the TOKEN, or stop the servers as there is a limit to how many of these old packages can be left hanging around in limbo.

14 and you are out!

Yep, after 14 packages have been phased-out you get a really nasty RC 00E30307 and the REBIND will fail.

A question for you all

Do any of you think you will ever get anywhere near 14 phased-out copies? Do you think that adding support for such an IFCID and then a Use Case to our SQL WorkloadExpert (WLX) monitor for online reporting in Eclipse or ZOWE would be worth it?

I await your answers with bated breath!

TTFN

Roy Boxwell

2021-02 Hidden features of EXPLAIN

EXPLAIN has been with us since DB2 V2.1 and, as I have previously written in older newsletters, (2019-03 EXPLAIN data review and 2012-11 EXPLAIN table maintenance), it has gotten bigger and better over the years. In this newsletter, I wish to quickly bring everyone up-to-date, based on my two older newsletters, and then discuss the usage of CTE Opthints.

Up to speed

Back in 2012, I listed out the tables EXPLAIN can use and ended with Db2 10, so first up are the new and changed tables for Db2 11 and Db2 12:

Db2 11

PLAN_TABLENow with 66 columns
DSN_COLDIST_TABLE
DSN_DETCOST_TABLE
DSN_FILTER_TABLE
DSN_FUNCTION_TABLE
DSN_KEYTGTDIST_TABLE
DSN_PGRANGE_TABLE
DSN_PGROUP_TABLE
DSN_PREDICAT_TABLE
DSN_PREDICATE_SELECTIVITYNew but input for the BIND QUERY command only!
DSN_PTASK_TABLE
DSN_QUERYINFO_TABLETwo LOB tables as well
DSN_QUERY_TABLEOne LOB table as well
DSN_SORTKEY_TABLE
DSN_SORT_TABLE
DSN_STATEMENT_CACHE_TABLEOnly for DSC
DSN_STATEMNT_TABLE
DSN_STAT_FEEDBACKNew table containing RUNSTATS recommendations when EXPLAIN is executed.
DSN_STRUCT_TABLE
DSN_VIEWREF_TABLE

Db2 12

PLAN_TABLENow with 67 columns
DSN_COLDIST_TABLE
DSN_DETCOST_TABLE
DSN_FILTER_TABLE
DSN_FUNCTION_TABLE
DSN_KEYTGTDIST_TABLE
DSN_PGRANGE_TABLE
DSN_PGROUP_TABLE
DSN_PREDICAT_TABLE
DSN_PREDICATE_SELECTIVITY
DSN_PTASK_TABLE
DSN_QUERYINFO_TABLETwo LOB tables as well
DSN_QUERY_TABLEOne LOB table as well
DSN_SORTKEY_TABLE
DSN_SORT_TABLE
DSN_STATEMENT_CACHE_TABLEOnly for DSC
DSN_STATEMNT_TABLE
DSN_STAT_FEEDBACK
DSN_STRUCT_TABLE
DSN_VIEWREF_TABLE

So you can see, that not a lot really happened in Db2 12 as far as any new EXPLAIN tables, but the one big change was the new column sprinkled throughout all of them: PER_STMT_ID

Something new

PER_STMT_ID BIGINT NOT NULL The persistent statement identifier for SQL statements in Db2 catalog tables.
For example, this column corresponds to the following catalog table columns that identify
SQL statements:
• STMT_ID in SYSIBM.SYSPACKSTMT, for SQL statements in packages.
• SDQ_STMT_ID in SYSIBM.SYSDYNQUERY, for stabilized dynamic SQL statements.

This column makes it *much* easier to track your mix of dynamic and static SQL all through the system!

CTE Opthints

I was using one of these the other day and the customer I was working with was amazed to see what it is and how it works. As in all things to do with OPTHINTs, caution must always be used! The best OPTHINT is no OPTHINT!

In the Beginning

Many, many years ago, sometime around DB2 V8, I found a CTE Opthint documented in the internet and thought „Wow! That is the future of hints!“ Then they completely disappeared… try doing a google search and you will see what I mean. The cool thing is – They still work! I do not know for how long, but they still work in Db2 12 FL507 at least.

Time to Test

First create a couple of test table candidates and one index:

CREATE TABLE BOXWELL.T1 (C1 CHAR(8) NOT NULL
                        ,C2 CHAR(8) NOT NULL
                        ,C3 SMALLINT NOT NULL);
CREATE TABLE BOXWELL.T2 (C1 CHAR(8) NOT NULL
                        ,C2 CHAR(8) NOT NULL
                        ,C3 SMALLINT NOT NULL);
CREATE INDEX INDX1_T2 ON T2 ( C1 ) CLUSTER ;

The SQL of interest is:

SELECT T1.*        
FROM T1
, T2
WHERE T1.C1 = T2.C1
;

So first you do a normal EXPLAIN and get this output:

---------+---------+---------+---------+---------+---------+---------+---------+---------+--------
LINE   QNO  PNO  SQ  M  TABLE_NAME          A   PA  CS  INDEX         IO  UJOG  UJOGC  P
---------+---------+---------+---------+---------+---------+---------+---------+--------
01000  01   01   00  0  T1                  R       00                N   ----  ----   S
01000  01   02   00  1  T2                  I   T   00  INDX1_T2      Y   -Y--  ----   S

So Db2 accesses T1 first with a tablespace scan (A = R) and then uses the index to access T2. Now add the CTE to the front so the SQL looks like:

WITH DSN_INLINE_OPT_HINT
(TABLE_CREATOR
, TABLE_NAME
, ACCESS_TYPE
, ACCESS_CREATOR
, ACCESS_NAME
, JOIN_SEQ) AS
(VALUES (NULL
, NULL
, 'INDEX'
, NULL
, NULL
, NULL)
)
SELECT T1.*        
 FROM T1            
    , T2            
 WHERE T1.C1 = T2.C1
 ;                  

Rules, Rules, Rules

The golden rules of CTE Opthint are that the CTE must be the first CTE, and it must be called DSN_INLINE_OPT_HINT, your ZPARM OPTHINTS must be set to YES to allow them, of course. Just include the columns and the rows you actually need. Every additional column or row is one more step towards optimizer disapproval and the CTE Opthint being ignored.

In the example above, the value NULL is a wild card short hand for all values and that row is simply telling the optimizer „If nothing else is found try and enforce INDEX access“.

So now the access path looks like:

---------+---------+---------+---------+---------+---------+---------+---------+---------+--------
LINE   QNO  PNO  SQ  M  TABLE_NAME          A   PA  CS  INDEX         IO  UJOG  UJOGC  P
---------+---------+---------+---------+---------+---------+---------+---------+--------
01001  01   01   00  0  T2                  I       00  INDX1_T2      Y   ----  ----   S
01001  01   02   00  1  T1                  R   T   00                N   -Y--  ----   S

Seeing double?

As you can see, Db2 now uses the index first. So what if you had two indexes?? Create an index on the first table:

CREATE INDEX INDX1_T1 ON T1 ( C1 ) CLUSTER ;

Now the „normal“ SQL EXPLAIN shows:

---------+---------+---------+---------+---------+---------+---------+---------+---------+--------
LINE   QNO  PNO  SQ  M  TABLE_NAME          A   PA  CS  INDEX         IO  UJOG  UJOGC  P
---------+---------+---------+---------+---------+---------+---------+---------+--------
02000  01   01   00  0  T1                  R       00                N   ----  ----   S
02000  01   02   00  1  T2                  I   T   00  INDX1_T2      Y   -Y--  ----   S

We are back to tablespace scan on T1 and then IX on T2. Now, using the CTE with just INDEX (That is the one we just used) gives you:

---------+---------+---------+---------+---------+---------+---------+---------+---------+--------
LINE   QNO  PNO  SQ  M  TABLE_NAME          A   PA  CS  INDEX         IO  UJOG  UJOGC  P
---------+---------+---------+---------+---------+---------+---------+---------+--------
02001  01   01   00  0  T1                  I       00  INDX1_T1      N   ----  ----    
02001  01   02   00  1  T2                  I       01  INDX1_T2      Y   ----  ----     

So, we have now got double index access (which is what we wanted!) What about trying to push T2 up to the first table to be used? Just add a second row in the CTE like this:

WITH DSN_INLINE_OPT_HINT
(TABLE_CREATOR
, TABLE_NAME
, ACCESS_TYPE
, ACCESS_CREATOR
, ACCESS_NAME
, JOIN_SEQ) AS
(VALUES (NULL
, NULL
, 'INDEX'
, NULL
, NULL
, NULL)
  ,(NULL
  , 'T2'
  , NULL
  , 'BOXWELL'
  , 'INDX1_T2'
  , 1 )       
)
SELECT T1.*        
 FROM T1            
    , T2            
 WHERE T1.C1 = T2.C1
 ;           

And the output now changes to be this:

---------+---------+---------+---------+---------+---------+---------+---------+---------+--------
LINE   QNO  PNO  SQ  M  TABLE_NAME          A   PA  CS  INDEX         IO  UJOG  UJOGC  P
---------+---------+---------+---------+---------+---------+---------+---------+--------
02002  01   01   00  0  T2                  I       00  INDX1_T2      Y   ----  ----   S
02002  01   02   00  2  T1                  I       00  INDX1_T1      N   ----  ----    

Isn’t that cool?

I think these are very, very handy items and are a crucial extra piece in the puzzle of „tipping point“ SQLs. CTE Opthints work for both Dynamic and static SQL by the way.

One final bit of info about these CTE Opthints: If defined OK and accepted by EXPLAIN you will get:

 DSNT404I SQLCODE = 394, WARNING:  USER SPECIFIED OPTIMIZATION HINTS USED DURING ACCESS PATH SELECTION                                      
 DSNT418I SQLSTATE   = 01629 SQLSTATE RETURN CODE                           
 DSNT415I SQLERRP    = DSNXOPCO SQL PROCEDURE DETECTING ERROR               
 DSNT416I SQLERRD    = 20 0  4000000  1143356589  0  0 SQL DIAGNOSTIC INFORMATION                                                       
 DSNT416I SQLERRD    = X'00000014'  X'00000000'  X'003D0900'  X'44263CAD'   
          X'00000000'  X'00000000' SQL DIAGNOSTIC INFORMATION               
 DSNE616I STATEMENT EXECUTION WAS SUCCESSFUL, SQLCODE IS 0                  

If you get something wrong you will get *no* message SQLCODE +394 (This happens if your column names are invalid for example), and if the SQL Opthint is *not* used you get this SQLCODE output:

 DSNT404I SQLCODE = 395, WARNING:  USER SPECIFIED OPTIMIZATION HINTS ARE INVALID (REASON CODE = 'xx'). THE OPTIMIZATION HINTS ARE IGNORED
 DSNT418I SQLSTATE   = 01628 SQLSTATE RETURN CODE                         
 DSNT415I SQLERRP    = DSNXOOP SQL PROCEDURE DETECTING ERROR              
 DSNT416I SQLERRD    = 20 0  4000000  1142580885  0  0 SQL DIAGNOSTIC INFORMATION                                                     
 DSNT416I SQLERRD    = X'00000014'  X'00000000'  X'003D0900'  X'441A6695' 
          X'00000000'  X'00000000' SQL DIAGNOSTIC INFORMATION             
 DSNE616I STATEMENT EXECUTION WAS SUCCESSFUL, SQLCODE IS 0                

Tipping points

We all have had SQLs that run fine for years and then, normally after a RUNSTATS, the order of the tables swaps… this is not really seen by anyone until the SQL starts performing really badly! If you have one of these „tipping point“ SQLs you can now simply „lock it in“ with a simple CTE at the start of the SQL.

Columns of interest

In my examples I only used a few of the allowable columns in a CTE Opthint. Here’s a list of as many as I have found:

Column NameRemarks
TABLE_CREATORNULL for all
TABLE_NAMENULL for all
CORRELATION_NAMENeeded if duplicate table names
ACCESS_TYPERSCAN, INDEX, INLIST, MULTI_INDEX
ACCESS_CREATORIndex Schema, NULL for all
ACCESS_NAMEIndex Name, NULL for all
JOIN_SEQJoin sequence number. 1 for first table.
JOIN_METHODNLJ, SMJ, HYBRID
PARALLELISM_MODECPU, IO, SYSPLEX
ACCESS_DEGREEDegree of required parallelism
JOIN_DEGREEDegree of required parallelism
TABNONormally never required. If used take the number from the PLAN_TABLE
QBLOCKNONormally never required. Must be used if duplicate table names and correlation Ids. Take the value from the PLAN_TABLE
PREFETCHS or L. Sequential or List Prefetch to be used. From this column on I have never used them!
QBLOCK_TYPE
MATCHING_PRED
HINT_TYPE
OPT_PARM

As usual, if you have any questions or ideas please drop me a line!

TTFN,

Roy Boxwell

Update: One of my readers pointed out that he uses these beasts and even had to open a PMR. Here I quote:

„I learned the hard way that CTE Hints were not honored when you did a rebind with APREUSE(WARN).  I reported this to IBM, they did fix it with UI68523, you want to be sure you have that.“

Another reader found that I had cut-and-pasted the DDL a bit too fast so all of the DDL got corrected where it was wrong (The table create and the second index create)

Yet another sent me an updated column list and so have done some changes to the list of columns used in the table.

It is Now Official!

IBM just released, on the 2022-06-24, an update on the AHA page DB24ZOS-I-717 that CTE inline hints are now supported with a pointer to the APAR PK04574 (Closed in 2015!!!). Now we must just wait for the docu update I guess!

2021-01 Migration Mayhem – Update

Well, this month I want to do a simple review of the data I have so far received from readers, a plug for my upcoming webinar on the Db2 Night show, and an announcement about the next version!

The results are in!

We have surely had a *very* interesting few months of results (naming no names!) and I also got great feedback about my freeware program for aiding and abetting with Db2 deprecated data in last month’s newsletter – a freebie give away.

MT DB?

Who knew? There are a *lot* of empty databases out there! Well over 2,000! Guys guys guys (and gals gals gals) … Time to start dropping those bad old boys!

MT TS?

Strangely enough, there are far fewer empty tablespaces – not yet even 200 have been reached!

Classic Partitions

Over 100 partitioned spaces with over 5,000 partitions, which I call „classic“, are still out there running away happily – From these, nearly 60 are LARGE…

Segmented all the way…

Segmented is still in popular demand, of course, and I am not surprised at all that I got over 6,000 spaces back.

As simple as myself…

Simple, however, surprised me a bit, as there are nearly 200 of these ancient animals still roaming free over the data prairie… Time to move on with these old artifacts too!

RRF has no roll over victory!

That there are nearly 2,000 BRF partitions out there is also a bit worrying for me although, to be fair, quite a few were work database entries and, I must be honest here, I do not know if BRF in Work tablespaces is a „problem“ or not – Any volunteers to find this out?

Multi-table troubles…

Nearly 500 tablespaces are multi-table with around 1,500 tables within them all. This could cause problems after Db2 12 FL508, naturally. Start planning to move these on as soon as you can.

Running out of log???

Six byte RBAs were amazingly high! Over 20,000 TP/IP datasets are *still* at six bytes … This is pretty bad … remember time is running out and you really must just do a REORG of the darn things and you are done!

The opposite of antonyms…

Finally, my favorite pet peeve – SYNONYMS – Most people have just a few but one customer had over 16,000!

Gone – and hopefully soon forgotten

What have I *not* seen anywhere in the wild? Hash Tables – No great surprise though, as they were unloved and there must have been a reason behind the decision to kill ‚em off!

Plug!

Please visit the Db2 Night show website:

https://www.dbisoftware.com/db2nightshow/

There, you can register for show #Z113 or, if you are reading this after January 29th 2021, you can download the replay and the slide deck!

I will be running through and showing all the SQLs you need to fix any and all of the deprecated features in our freeware and listed above!

Next Version available now!

We have released version 1.2 of the MORE000 Migration HealthCheck program which now includes some changes suggested by readers/users.

  • Addition of an aggregate total of tables sum
  • Ordering of the multi-table tablespace output to make it easier to read

Finally, I missed one deprecated feature: Procedure – SQL External. These got deprecated in Db2 11 so now MORE0000 outputs all the Procedure/Function counts as well as the names and specific names of SQL External ones. To correct these, you could be lucky and just have to do a DROP and CREATE or you might have to do some code changes, but it is all described in the documentation.

Wishlist a bit earlier this year?

If you have any wishes then please drop me a line!

As always, many thanks for taking the time to read and

TTFN

Roy Boxwell

 

 

 

 

2020-12 Migration Mayhem?

Nah, not really! Well at least I hope not! The problem lies with all of the old baggage we carry around with us all the time. Humans, and computers, are pretty bad at simply „forgetting“. Things just keep popping up and annoying us at the most inopportune moments…

What can you do about it?

Well, you can check out how your Db2 subsystems are looking to deal the the „computer part“. Your brain, on the other hand, is sadly a little bit too much for a Db2 newsletter!

What are the problems?

The number one problem with Db2 system migrations, is the deprecated features that are still firmly nailed to their perches but are 100% dead. They will not cause a problem today, or tomorrow, but at some point they will start to smell… and I mean smell really bad!

Tell me more!

Here’s a list of all the deprecated (and semi-deprecated) items that should be checked and verified at your site:

  1. Use of SYNONYMS
  2. Use of HASH objects
  3. Use of segmented spaces
  4. Use of classic partitioned objects (not using table based partitioning)
  5. Use of simple spaces
  6. Use of six byte RBA
  7. Use of BRF
  8. Use of LARGE objects (This is semi-deprecated)
Anything else?

Well yes! You could also check how many empty implicit databases you have and how many empty tablespaces you have while you are checking your subsystem out. While you are scanning, it could also be cool to list out all the Db2 sub-system KPIs, and what about seeing how many tables you actually have in multi-table tablespaces that, at some point, must also be migrated off into a UTS PBG or UTS PBR tablespace?

We do it all!

Our little program does all of this for you. It runs through your Db2 Catalog in the blink of an eye and reports all of the data mentioned above.

What does it cost?

Nothing – It is our freeware for 2020/2021 and you only have to be registered on our website to request it along with a password to run it.

How does it look?

Here is an example output from one of my test systems here in Düsseldorf:

Db2 Migration HealthCheck V1.0 for DC10 V12R1M507 started at  
2020-11-23-12.12.10
Lines with *** are deprecated features

Number of DATABASES : 594
# of empty DATABASES : 237
# of implicit DATABASES : 385
# of empty implicit DATABASES: 207

Number of TABLESPACES : 5258
of which HASH organized : 0
of which PARTITIONED CLASSIC : 18 ***
# Partitions : 218 ***
of which SEGMENTED : 1507 ***
of which SIMPLE : 3 ***
of which LOB : 152
of which UTS PBG : 3525
# Partitions : 3531
of which UTS PBR : 19
# Partitions : 1161
of which XML : 34

Number of tablespaces as LARGE : 8 ***
Number of empty tablespaces : 28
Number of BRF table partitions : 0
Number of multi-table TSs : 55
# of tables within these : 239

Number of ACCELERATOR ONLY : 0
Number of ALIASes : 5428
Number of ARCHIVEs : 0
Number of AUXs : 147
Number of CLONEs : 3
Number of GTTs : 235
Number of HISTORYs : 2
Number of MQTs : 1
Number of TABLEs : 5230
Number of VIEWs : 43
Number of XMLs : 34

Number of SYNONYMs : 1 ***

Number of Indexes : 23243
of which HASH : 0
of which type 2 : 23210
# of partitioned IXs : 6
# Partitions : 160
of which DPSI : 18
# Partitions : 164
of which PI : 15
# Partitions : 1138

Number of table partitions : 6606
of which DEFINE NO : 2848
of which six byte RBA <11 NFM: 0
of which six byte RBA Basic : 0
of which ten byte RBA : 3759
of which unknown RBA : 2847
Number of index partitions : 24666
of which DEFINE NO : 20140
of which six byte RBA <11 NFM: 0
of which six byte RBA Basic : 0
of which ten byte RBA : 4527
of which unknown RBA : 20139

Number of STOGROUPS : 10
Number of VOLUMES : 0

Number of PLANs : 54
Number of PACKAGES (total) : 6053
Number of PACKAGES (distinct) : 545
Number of SQL statements : 441833

Db2 Migration HealthCheck V1.0 for DC10 V12R1M507 ended at
2020-11-23-12.12.12

Db2 Migration HealthCheck ended with RC: 0

 

Any line with *** at the end means that you have something to do at some point in the future.  The names of all the found objects are written to DD card DEPRECAT so you can then start building a „to do“ list. I would start now to slowly „fix“ all of these before it is 03:00 in the morning, someone is migrating to Db2 14 FL 608 and it all goes horribly wrong…

What’s wrong with LARGE?

This is not actually deprecated but any tablespaces marked as LARGE tend to also not have a valid DSSIZE in them. This is fine if you have built a CASE construct to derive the value from the tablespace definition. But what you should do is an ALTER and a REORG to „move“ the LARGE to a „proper“ tablespace. IBM and 3rd Party Software vendors hate having to remember that ancient tablespaces are still out there!

All on my own?

Naturally not! For example, after all the ALTERs have been done, a lot of the spaces are simply in Advisory REORG pending status and you could use our RealtimeDBAExpert (RTDX) software to automatically generate the required REORGs to action the changes.

Synonyms??

Well, you can do them all yourself by reading one of my older newsletters – just remember to watch out for the GRANTs afterwards.

How many deprecated objects do you have?

I would love to get screenshots of the output at your sites which I would then all sum up and publish as an addendum to this newsletter. Just so that people can see how many Parrots we all have pining for the fjords!

TTFN

Roy Boxwell

 

 

 

 

2020-11 Recover Résumé

This month I hope to answer a reader question – I was asked all about when/how to use the TOLOGPOINT phrase in a RECOVER utility. So here is my RECOVER résumé.

What is it?

RECOVER is one of those utilities that you hear a lot about but normally do not get to use that often! This is what makes it so dangerous and so interesting at the same time of course. We should all be using it on a regular basis so that when the time comes we are not afraid, or left dithering, while desperately trying to come up to speed with a „new“ Utility at 03:00 in the morning…

What does it do?

It recovers – what else stupid! Well, actually it does a ton of stuff but at its most basic it takes a dataset (this can be any valid type of COPY’ed dataset) and recreates a VSAM Linear Dataset (LDS) from it. Typically, a Full Image Copy (FIC) is used as input but it can, and does, use Incremental Image Copies (IIC) as well. Finally, it can also apply log updates to get the LDS to the point you wish it be at, but note that this whole process is only forward in nature. RECOVER also has BACKOUT YES which starts at the current LDS and UNDOes updates against the object using the log. It could well be that going backwards is actually faster than applying an old image copy and going forwards! There is one small limit to BACKOUT YES:

If you specify BACKOUT YES, the recovery point must be within the most recent Db2 system checkpoints that are recorded in the BSDS for each member. Otherwise, the recovery cannot proceed and returns an error.

That’s all?

Basically yes! It is designed to get data back to a point at which you are happy that it is good and consistent. Note here that *you* must be happy. Db2 does not give a hoot about transactional consistency in this case because you are doing the recover. RECOVER guarantees that a Unit Of Work (UOW) will always be consistent with COMMITted data and indexes will also be consistent – this might well mean that you must rebuild or recover your indexes, naturally!

What is recovered?

Recover uses database and space name as input so it is instantly apparent that the „unit of recovery“ is *not* a table but a tablespace (which can contain 1000’s of tables) and/or an index (Remember that an indexspace can only contain one index). There is an exception here and that is, you can also give a Page Number to recover but that is very rarely used these days as IO Errors really are a thing of the past.

The phases of Recover

First it accesses SYSIBM.SYSCOPY to see which dataset(s) it must use (Unless you use LOGONLY option in which case RECOVER assumes that some other process got the LDS back *or* you use the new NOSYSCOPY parameter to give it a dataset name that has been deleted from SYSCOPY presumably by a harsh usage of MODIFY RECOVER) and it checks the SYSIBM.SYSLGRNX to build a list of log datasets that possibly must be scanned for recovery info (Unless you use LOGRANGE NO which is not recommended!) This is also what the REPORT RECOVERY utility does, and you should also run that utility every now and again to get used to the style and type of data we are talking about.

Are you consistent?

If using Flashcopy Consistent Copies it then runs two early phases of work looking for uncommitted work that was backed out at the point of the Flashcopy and also uncommitted work that was committed at the point of Flashcopy.

The real work

Then comes the LOGAPPLY phase where, using data from SYSLGRNX, it reads and applies all log records from the FIC/IIC up to the recovery point you wished for. Here the Fast Log Apply (FLA) greatly speeds up the entire process.

FLA?

Fast Log Apply uses up to 10MB of memory to buffer the LOG being used for the RECOVER which made the LOG APPLY phase much quicker. The value of the, deprecated in Db2 9, ZPARM LOGAPSTG was set at a maximum of  100MB so this meant you could run 10 RECOVER jobs in parallel each with 100’s of RECOVERs to benefit from FLA. Then in Db2 10 they upped the maximum to 510MB by APAR while at the same time removing the ZPARM meaning you could do 51 parallel jobs.

FLA – What does it do?

As William Favero wrote many years ago: Fast log apply was first delivered in DB2 Version 6 and as the name implies, aids in processing log records by sorting log records by page before applying log record changes. It uses one log read task per recovery job and multiple log apply tasks whenever possible. By sorting the log records in page order prior to applying them, fast log apply can almost completely eliminate the need to touch a page set page more than once. This process can drastically reduce the log apply times. This is not just used by RECOVER but for today’s blog we can ignore all the other uses of FLA.

Enough of FLA, back to the point, well at least TOLOGPOINT?

Then come two special phases but only if using TOLOGPOINT – See later for details about this parameter. Here the recover LOGCSR phase builds info for any inflight, indoubt, inabort, and postponed abort units of recovery (Transactions) to be applied and the LOGUNDO phase rolls back any uncommitted changes that transactions made to any of the recovered objects.

What’s in a name?

Recover accepts four (well, actually five…) different parameters telling it what to do at the highest level:

  1. TOCOPY – recover the named object to this IC.
  2. TORBA – recover up to and including this Relative Byte Address (RBA) on the log. These days it is recommended not to use this anymore but to use the next option. See more later.
  3. TOLOGPOINT – recover up to and including this log point (Timestamp).
  4. TOLASTCOPY – recover up to the last copy taken – This can be a FIC or an IIC.
  5. TOLASTFULLYCOPY – recover up to the last FIC.

RBA or LOGPOINT?

If you are running non-datasharing then an RBA is the same as LOGPOINT and so you can use TOLOGPOINT with no worries. If, however, you are in datasharing then you can *only* use TORBA for the time span before this data sharing group was created. Thus, to keep things easier, it makes sense to just use TOLOGPOINT.

Is it safe?

The problem here is the famous saying „A little knowledge is dangerous“. It is recommended by IBM that the use of TOCOPY, TOLOGPOINT (TORBA), TOLASTCOPY and TOLASTFULLCOPY be restricted to senior users who know exactly what they are doing! Why so? Because all of these options are so called „Point in Time“ (PiT) recoveries and they are all special!

Aren’t we all special?

Nope! If you do a „normal“ RECOVER it brings you back to the current time. All of your data is back as if the transactions had actually all executed. This is fine if you have had a media failure etc. The problem is that most of the recoveries done these days are due to bad application code. This means you have to recover to a Point in Time when the bug was not live…

The problems of PiT

The absolute biggest problem is data consistency. I am not even talking about referential integrity (RI) or Check Rules here! Imagine you are a bank and someone withdraws 200 Dollars at 10:01 in the morning. You do a PiT to 10:00 and what is the end result? Your data is possibly ok and consistent but the end user has got 200 bucks and your bank hasn’t… this is traditionally known as „not good“…

Then come all the „other problems“ – RI is number one here… You recover a dependent table to a PiT where the parent table has no row… this is „sub-optimal“. Or, even worse IMHO, you do a PiT to a point where a Check Constraint was added… this is 100% Nightmare time…

The problem is…

We are not alone! The objects that we have and use are all defined and managed in the Db2 Catalog and any PiT you do to so-called „user data“ has no knowledge and no inference with the Db2 Catalog data. Here is where all the really nasty problems lurk around and loiter in the shadows… The basic rule is „If you changed anything in the catalog – check it all before doing the PiT!“ It is much easier to check it all beforehand – doing it afterwards can be especially painful!

SLB?

System Level Back-up came in a while ago and RECOVER got updated to also be able to use it as a basis for recovery data. It started out with quite a few Ifs and Buts e.g. No REORG, REBUILD, LOAD REPLACE etc. but these all disappeared with z/OS 1.11 as then it could handle when datasets moved around the disks. There are still two problem areas where SLBs cannot be used as input and that is when you have done a REORG to remove a dropped column or you have done a REORG to remove unused partitions at the end. Apart from these two minor irritations SLBs are just as good as FICs.

In depth per option

All of the TOxxxCOPYs do the same thing. They recover the object to that COPY and thus use *no* log access. If the IC is not available, for whatever reason, Db2 falls back to the next earliest IC and then *does* do a LOGAPPLY up to the PiT of the requested IC.

TOLOGPOINT. Recover chooses the newest FIC before this point. If any IICs exist these are merged, then it does LOGAPPLY up to and including the record contained by the LOGPOINT. In this case if there is no FIC or SLB then it will use *just* LOGAPPLY from the creation of the object, LOAD TS or REORG TS. Naturally, this LOGAPPLY is only possible as long as you have *not* run any MODIFY RECOVERY that deleted records from the SYSLGRNX. This is why MODIFY RECOVERY actually inserts records in SYSCOPY!

Any uncommitted transactions running at the PiT are rolled back leaving them all in a transactionally consistent state.

Scoping out problems?

New in Db2 12 is the parameter SCOPE with default value UPDATED. Now this was quite a good idea as it meant that RECOVER now only does real work for objects that have really *changed* since the last FIC. It does this little bit of magic by looking into the SYSLGRNX to see whether or not any changes have actually been processed. This is also a serious problem! John Campbell tells us all „Please change all RECOVER jobs to use SCOPE ALL or make sure that PTF PH20056 is applied“.

Where’s the beef?

It is a well known fact that the road to disaster is paved with good intentions. The SCOPE UPDATED sounds great at first but then what would happen if you are using LISTDEF to do a PiT and the Db2 Catalog is involved? Further what happens if you, purely by bad luck, RECOVER SYSLGRNX before any other tablespaces? Yep, the „bit of magic“ then has no data and is effectively blind and you get seriously bad recovered data.

Top Tip

If you want to use TOxxxCOPY but *require* consistency just use TOLOGPOINT instead and simply specify the LOGPOINT of the IC you wished to use anyway. This way Db2 guarantees that even SHRLEVEL CHANGE copies will not cause a problem!

Pain Points

RI – This is the number one bug bear! You must make sure that all objects in the RI chain get the *same* PiT. Failure to do so will leave you in CHECK pending and possibly a nasty mess to sort out!

Check Constraint added – If added try and make sure you hit a quiesce point or FIC otherwise verify that the PiT is before the creation timestamp of the constraint.

Catalog – As mentioned the catalog is a description of the objects and is *not* automatically updated by recover. This is for the carbon based beings to sort out. ALTER can be especially annoying here! Plus SYSLGRNX as mentioned above.

Identity Columns – These will no longer be in sync with SYSSEQUENCES and so care must be taken if they *must* be synchronized.

Compression dictionaries can kill you if you recover a single LDS of a multiple LDS but with a different version of the dictionary. Then the data is naturally not readable any more!

How to find a LOGPOINT

We have now seen how it all hangs together so the big question now is „How do you find an acceptable TOLOGPOINT?“

The xxxxMSTR address space contains numerous LOGPOINTs:

06.24.45 STC03367 DSNJ003I -SC10 DSNJOFF3 FULL ARCHIVE LOG VOLUME  
DSNAME=DB2SC1.SC10.ARCHLG1.D20280.T0624437.A0000374,
STARTRBA=00D6CAB49FFF50D10000, ENDRBA=00D6CAB49FFF52ECFFFF,
STARTLRSN=00D8A08DB179EBDBB800, ENDLRSN=00D8A09432EF45F1A800,
UNIT=SYSALLDA, COPY1VOL=SC1L00, VOLSPAN=00, CATLG=YES

From this message you get the ENDLRSN. If you convert the LRSN to a timestamp you get:

SELECT TIMESTAMP(BX'00D8A09432EF45F1A800' CONCAT BX'000000000000')
FROM SYSIBM.SYSDUMMY1 ;
---------+---------+---------+---------+---------+---------+------
2020-10-06-04.24.43.727967
DSNE610I NUMBER OF ROWS DISPLAYED IS 1
DSNE616I STATEMENT EXECUTION WAS SUCCESSFUL, SQLCODE IS 100

So you now have a LRSN to timestamp to work with. Be aware that the LRSN is *local* time as you can see here it is 04.24 but reported in the MSTR as 06.24 Berlin time is UTC+2.

Another possibly good starting point is to run the DSNJU004 routine and look at the CHECKPOINT QUEUE right at the very end of the output:

                    CHECKPOINT QUEUE 
09:20:31 OCTOBER 12, 2020
TIME OF CHECKPOINT 04:24:46 OCTOBER 06, 2020
BEGIN CHECKPOINT RBA 00D6CAB49FFF5308C5A8
END CHECKPOINT RBA 00D6CAB49FFF530A143A
END CHECKPOINT LRSN 00D8A09435A32E201600
TIME OF CHECKPOINT 03:55:38 OCTOBER 06, 2020
BEGIN CHECKPOINT RBA 00D6CAB49FFF511E3612
END CHECKPOINT RBA 00D6CAB49FFF511F5CB8
END CHECKPOINT LRSN 00D8A08DB297403D0E00

Full of RBAs and LRSNs – Great stuff indeed! Plus all the normal Active and Archive log details.

I have a horrible old REXX where I can enter a timestamp and it outputs the LRSN. You must manually edit it but it does work!

/* REXX */ 
NUMERIC DIGITS 64
RBAFORMAT = 'E' /* Output format (E = extended, B = basic) */
OFFSET = '000000000000' /* Set to STCK offset of data sharing group */
CVT = C2D(STORAGE(10, 4))
CHKBYTE = STORAGE(D2X(CVT + 304), 1)
CVTTZ = STORAGE(D2X(CVT + 304), 4)
IF BITAND(CHKBYTE, '80'X) = '80'X THEN CVTTZ = C2D(CVTTZ, 4)
ELSE CVTTZ = C2D(CVTTZ)
CVTTZ = CVTTZ * 1.048576 / 3600
CVTTZ = FORMAT(CVTTZ, 2, 0)
SAY 'Timezone diff. hours ' CVTTZ
MS = (DATE('BASE') - 693595) * 24 * 60 * 60
MS = MS + TIME('SECONDS')
MS = MS - (CVTTZ * 60 * 60)
MS = MS * 1000000
SAY 'Today MS = ' MS
LRSNBIN = X2D(SUBSTR(D2X(MS), 1, 12))
IF LENGTH(OFFSET) = 20 THEN OFFSET = SUBSTR(OFFSET, 3, 12)
LRSN = D2X(LRSNBIN + X2D(OFFSET))
IF RBAFORMAT = 'E' THEN LRSN = '00'LRSN'000000'
SAY 'Current LRSN (with STCK offset 'OFFSET') = 'LRSN
/* Now do for any time e.g. */
/* 2019-11-08-11.59.21.445591 00D6FE3F55460D3AD600 From syscopy */
/* output 00D6FE3F55460D000000 */
fromdate = '2019-11-08 11.59.21' /* example */
MS = (DATE('BASE','8 Nov 2019') - 693595) /* Here the date */
MS = MS * 24 * 60 * 60
MS = MS + ( 11 * 60 * 60) /* Here the hours */
MS = MS + ( 59 * 60) /* Here the minutes */
MS = MS + 21 /* Here the seconds */
MS = MS - (CVTTZ * 60 * 60)
MS = MS * 1000000
MS = MS + 445591 /* Here the rest seconds */
SAY 'From date MS = ' MS
LRSNBIN = X2D(SUBSTR(D2X(MS), 1, 12))
IF LENGTH(OFFSET) = 20 THEN OFFSET = SUBSTR(OFFSET, 3, 12)
LRSN = D2X(LRSNBIN + X2D(OFFSET))
IF RBAFORMAT = 'E' THEN LRSN = '00'LRSN'000000'
SAY 'From date = ' fromdate
SAY 'From date LRSN (with STCK offset 'OFFSET') = 'LRSN

Feel free to change it to allow the date/time/seconds as a parameter – I never got the time – pardon the pun…

SYSCOPY contains LRSNs and TIMESTAMPs and can be great starting point. SYSLGRNX obviously is nowadays also worth a look and the RTS also contains LRSNs and TIMESTAMPs.

REPORT RECOVERY also lists out a ton of info including LRSNs and timestamps of course.

Ready, Steady, RECOVER!

So with all the sources available and the ability to switch LRSN to Timestamp and vice versa you are now fully prepared to use a PiT. Just remember to think it all through, get the application people on board and get the plan approved before you start doing RECOVER!

Redirected RECOVER?

Testing all this is fraught with danger of course… You cannot simply do a RECOVER of a production table while *anything* is running in production. So how can you test whether or not your RECOVER JCL is correct and whether or not your Service Level Agreements are met by your RECOVER JCL? IBM Db2 development have now solved this for us with „Redirected RECOVERY“ APAR PH27043 PTF UI72057 for Db2 12 FL500 and above. It adds a new parameter to the RECOVER syntax “ FROM xxx.yyy“ so you can RECOVER an unused „dummy“ object from a live production object with *no* interference! There are, as usual, a couple of caveats: The „dummy“ space must be „the same“ as the to-be-recovered space. Think table defs, DSSIZE, BUFFERPOOL etc. here. Finally, only UTS are supported at this time.

I would love to hear of any of your war stories about RECOVER use or abuse!

Finally, many thanks to Vasu for asking me to dig deeper into RECOVER.

TTFN

Roy Boxwell

 

2020-10 Mapping Table Mystery

We all know, and love, REORG Mapping tables, right? Well, this month I want to run through the ins and outs of defining and using these little beasts!

In the Beginning

In the beginning, well DB2 V5 actually, was a requirement for a list of Row Ids (RIDs) to be stored so that the SHRLEVEL CHANGE REORG could map between an original RID and the new „shadow“ RID. It was created with normal DDL:

CREATE TABLESPACE table-space-name SEGSIZE integer;

CREATE TABLE table-name1
(TYPE CHAR(1) NOT NULL,
SOURCE_RID CHAR(5) NOT NULL,
TARGET_XRID CHAR(9) NOT NULL,
LRSN CHAR(6) NOT NULL)
IN database-name.table-space-name
;

 CREATE TYPE 2 UNIQUE INDEX index-name1 ON table-name1
 (SOURCE_RID ASC,
TYPE,
TARGET_XRID,
LRSN)
;

Notice that the TARGET_XRID is nine characters even though a RID is, at least way back then, actually five bytes!

Tablespace or Index space size?

What was interesting from the start, was that the size of the tablespace was irrelevant – it was *only* the index that was ever used. So I saw a ton of tablespaces with 7200, 720 allocations, all of which were completely pointless. I always ALTERed these back to 48, 48. Actually, I normally just DROPped and CREATEd them again!

It is also interesting to see how many shops actually do COPY, RUNSTATS, and even REORGs of these tablespaces… Completely pointless of course as they are not used by normal SQL, never used for RECOVERY, and a waste of time for all utilities. If possible, EXCLUDE them from all utilities!

Static or Dynamic?

So shops started heading off in two directions: Some created 1000’s of mapping tables, all with Job name as schema or table name, so that they could just use generator software and always „know“ that the mapping table would be there. The other group decided to add a DROP and CREATE step at the beginning of the REORG job, and a DROP at the end. This was known as „dynamic“ mapping tables.

Static meant less traffic to the catalog, but 1000’s of tables loafing around doing nothing, Dynamic meant DB2 Catalog traffic but only a few tables existing and only for the lifetime of the REORG. Basically, it was down to every shop to work out their best way to handle these tables.

Changes through the releases

In DB2 V7 it was even recommended to name the table the same as the Utility ID and as these, by definition, are unique, you could never hit the „duplicate name“ problem.

In DB2 V8 it was noted that REORG always empties the table after completion.

In DB2 V9 you could now also use a PBG for the tablespace. This is actually required if you have more than two billion rows in the tablespace to be REORGed.

CREATE TABLESPACE table-space-name MAXPARTITIONS integer;

instead of the segmented syntax.

All new ball game

In Db2 11 it all changed! The ability was given to use a MAPPINGDATABASE or even nothing at all! This latter option seemed great, at first glance, but then it became apparent that it caused Db2 catalog contentions… not good for massively parallel REORG jobs!

The MAPPINGDATABASE xxxxx value overides the ZPARM REORG_MAPPING_DATABASE in cases where you would like to use another „location“ for your implicit tables. From the docu:

REORG MAPPING DB field (REORG_MAPPING_DATABASE subsystem parameter)
The REORG_MAPPING_DATABASE subsystem parameter specifies the default database that REORG TABLESPACE SHRLEVEL CHANGE uses to implicitly create the mapping table.

Acceptable values: database-name
Default: blank
Update: option 31 on panel DSNTIPB
DSNZPxxx: DSN6SPRM.REORG_MAPPING_DATABASE
Online changeable: Yes
Data sharing scope: Member

database-name
The name of the default database that REORG TABLESPACE SHRLEVEL CHANGE uses to implicitly create the mapping table. The name must a character string with maximum length of 8 bytes.
blank
An implicitly defined database is used.

When processing a REORG TABLESPACE SHRLEVEL CHANGE request, the REORG utility has the option to create its own mapping table and mapping index, instead of relying on user's input. Specifying this subsystem parameter with a valid database name directs REORG to allocate the mapping table in the database that is specified. By default, REORG uses an implicitly defined database for the mapping table allocation.

So nearly all shops have <blank> – which led to the creation of hundreds of implicit databases…

Top tip: Create a „master“ Database that is *just* for REORG SHRLEVEL CHANGE and change the ZPARM to use it. Then you have a really good high level way to exclude utilities from bothering with these tables and you stop using up your DBIDs!

Also in Db2 11, the mapping table got updated as the LRSN increased to ten bytes…

<optional>CREATE TABLESPACE table-space-name SEGSIZE integer;
or
<optional>CREATE TABLESPACE table-space-name MAXPARTITIONS integer;

CREATE TABLE table-name1
(TYPE CHAR(1) NOT NULL,
SOURCE_RID CHAR(5) NOT NULL,
TARGET_XRID CHAR(9) NOT NULL,
LRSN CHAR(10) NOT NULL)
<optional>IN database-name.table-space-name
;

CREATE UNIQUE INDEX index-name1 ON table-name1
 (SOURCE_RID ASC,
TYPE,
TARGET_XRID,
LRSN);

Actually, IBM changed the names of the columns as well (Though the column names are never really used…) so the CREATE really looks like:

<optional>CREATE TABLESPACE table-space-name SEGSIZE integer;
or
<optional>CREATE TABLESPACE table-space-name MAXPARTITIONS integer;

CREATE TABLE table-name1
(TYPE CHAR(1) NOT NULL,
ORID CHAR(5) NOT NULL,
NRID CHAR(9) NOT NULL,
LRSN CHAR(10) NOT NULL)
<optional>IN database-name.table-space-name
;

CREATE UNIQUE INDEX index-name1 ON table-name1
(ORID ASC,
TYPE,
NRID,
LRSN)
;

If this format was not available in NFM, then REORG created a table to use on its own using either an implicit database, or the database from the ZPARM, if specified. This uses up another DBID of course.

All change!

Then along came Db2 12 and they changed the mapping table again…

<optional>CREATE TABLESPACE table-space-name SEGSIZE integer;
or
<optional>CREATE TABLESPACE table-space-name MAXPARTITIONS integer;

CREATE TABLE table-name1
(TYPE CHAR(1) NOT NULL,
ORID CHAR(7) NOT NULL,
NRID CHAR(11) NOT NULL,
LRSN CHAR(10) NOT NULL)
<optional>IN database-name.table-space-name
;

CREATE UNIQUE INDEX index-name1 ON table-name1
(ORID ASC,
TYPE,
NRID,
LRSN)
;

This time, it was the RIDs expanding by two bytes each that forced the change. Just like in Db2 11 NFM, if this format was not available in Db2 12 FL500 and above, then REORG creates a table to use on its own using either an implicit database or the database from the ZPARM if specified.

Enough history…

So now we know how the table looks and how it is used and defined. The question is: Which of the various methods is the best?

Method 1: Define 1000’s of static mapping tables.

Method 2: Create each table in each utility job using the utility id as the mapping table name.

Method 3: Create a „master“ Database and use it in the MAPPINGDATABASE utility parameter or ZPARM, thus allowing Db2 to create all tablespaces, tables, and indexes in it.

Method 4: Do nothing, have no MAPPINGxxx utility parameter and let Db2 „do the magic.“

Method 1: Is a bit mad, as whenever IBM development change the definition, you have 1000’s of changes to do and they do clutter up the Db2 catalog. However if you are paranoid about contention and you are doing lots, and I really mean *lots*, of parallel REORGS then this method is probably the best way to avoid DBD contention.

Method 2: Is not bad, as you only create as many objects as you have jobs and they are always dropped at the end, so it reduces clutter but it does increase the risk of Db2 Catalog contention.

Method 3: My favorite! (Unless doing *lots* of parallel REORGS!) You have one DB with one DBID and you can easily exclude it from other utilities and cleanup, and recreate is a simple DROP and CREATE. From my point of view, this is a winner! There is still Db2 Catalog contention of course as the TS, TB and IX must all be created before all being thrown away at the end of the REORG.

Method 4: Is the worst of the all IMHO. You will run through your DBIDs quicker than you can blink, and you increase contention on the Db2 Catalog as it must also create your implicit database all the time. This is for test only! However, it it does have the advantage of a new DBID every time and so very low chance of DBD contention if you have lots of parallel running REORGS.

Results from my tests

Using static caused no trouble anywhere and DBIDs did not get wasted.

Using dynamic, including creation of the database, allowed DBID reuse.

Using dynamic without the database also allowed DBID reuse as the DROP TABLESPACE also dropped the implicitly created database.

Using dynamic without the database & tablespace caused a Widow database to be left after dropping of the TABLE, (remember the current rules on dropping implicitly created databases? If you drop the Table then the Tablespace gets dropped but not the Database!) This is pretty bad of course, as not only are your DBIDs going up they are being left „used“ which is not good!

Using MAPPINGDATABASE with an existing database works, as I mentioned, great and is my favorite! (Remember – Only if not doing massively parallel REORGS)

What are your thoughts on mapping table solutions?

I would love to hear from you!

TTFN

Roy Boxwell

Update: Michael Harper contacted me with some comments and I summarize them below:

  1. Definitely use a default mapping database – do not leave it blank as this will reduce contention issues
  2. If not using shared ZPARMs also use different default mapping databases as again this will reduce contention issues

Update: One of my readers pointed out that when running large numbers of parallel REORGS (Over 100!) you can hit DBD contention issues if you use one DB for the mapping Tables. In this scenario you either require a dynamic table being created for each job in its own database or use the implicit databases (DSNnnnnn) and make sure you DROP any widow style DBs.

Another Update: One of my readers mentioned that you can still get contention when you or Db2 create mapping tables. This is naturally true for creating anything within Db2. If you want a 100% guarantee of no contention then you *must* create all mapping tables in advance which would be Method 1 in my Blog.

2020-08 SYSLGRNX through the looking glass

This month I want to have a good look inside the SYSLGRNX – Not just for fun, but actually to review what is inside and ask the question: Is any of it “interesting” for us as DBAs?

Create your own?

Since Db2 10 the Db2 Directory table SYSIBM.SYSLGRNX has been available for normal SQL use. I wrote a newsletter all about this (Discovering hidden recovery problems in the SYSLGRNX.) This “older” newsletter has now been updated with a newer version of the SYSLGRNX private table/index and how to cross load it and RUNSTATS it, which we will be using within this newsletter. 

What is it?

The SYSLGRNX table is used by Db2 to “remember” when a physical page set (object) changed from R/O access to R/W access. This is a requirement to speed up RECOVER jobs. When this is done correctly, the RECOVER can simply look in SYSLGRNX to find any RBA/LRSN ranges that must actually be checked in the Db2 Log. Without this table, Db2 would *have* to access all the logs from the last Full Image Copy/Incremental Image Copy to see if any data changes had been done. This means that the table is pretty critical when it comes to doing a RECOVER job!

Why look in it?

Well, it is nice to know that Db2 is keeping tabs on everything, but what if it makes a mistake? Heaven forbid there could be bugs in the code or perhaps an LPAR crash… Basically, it boils down to a good chance that you have some garbage in the table. This does not have any impact at all *until* you try and RECOVER an object… then Db2 requires an archive log from 2017 and the object is marked as “unrecoverable” – Whoops!

SQL Time – LRSN or RBA Sir?

So now you have to decide: are you interested in data sharing (DS) or in non-data sharing (Non-DS) systems? DS use the Log Record Sequence Number (LRSN) to mark data on the log. It is basically a ten-byte super-duper timestamp field, with/without an offset, which points to a place on the log. If Non-DS, you use a Relative Byte Address (RBA) which is really just a ten-byte address pointing to a place on the log. All of the sample SQL assumes you are running in a DS world and that you have created the updated private version of SYSLGRNX. 

Start with Open Units of Work

Let’s have a quick review of all current Open Units of Work (OUW) that exist:

-- OPEN UNITS OF WORK IN SYSLGRNX                     
SELECT L.LGRDBID                    AS DBID           
      ,SUBSTR(TS.DBNAME, 1, 8)      AS DBNAME         
      ,L.LGRPSID                    AS ID             
      ,L.LGRNEGPSID                 AS NEG_PSID       
      ,SUBSTR(TS.NAME, 1, 8)        AS SPACE          
      ,L.LGRPART                    AS PART           
      ,L.LGRUCTS                    AS LGRX_UCTIMESTAMP
--    ,HEX(L.LGRSRBA)               AS START_RBA      
--    ,HEX(L.LGRSPBA)               AS END_RBA        
      ,HEX(L.LGRSLRSN)              AS START_LRSN     
      ,HEX(L.LGRELRSN)              AS END_LRSN       
      ,'TS'                                           
      ,L.LGRMEMB                    AS DM             
FROM BOXWELL.SYSLGRNX2    L                           
    ,SYSIBM.SYSTABLESPACE TS                          
WHERE TS.DBID     = L.LGRDBID                         
  AND TS.PSID     = L.LGRPSID                          
--AND HEX(L.LGRSPBA)  = '00000000000000000000'        
  AND HEX(L.LGRELRSN) = '00000000000000000000'        
UNION ALL                                             
SELECT L.LGRDBID                    AS DBID           
      ,SUBSTR(IX.DBNAME, 1, 8)      AS DBNAME         
      ,L.LGRPSID                    AS ID             
      ,L.LGRNEGPSID                 AS NEG_PSID       
      ,SUBSTR(IX.INDEXSPACE, 1, 8)  AS SPACE          
      ,L.LGRPART                    AS PART           
      ,L.LGRUCTS                    AS LGRX_UCTIMESTAMP
--    ,HEX(L.LGRSRBA)               AS START_RBA      
--    ,HEX(L.LGRSPBA)               AS END_RBA        
      ,HEX(L.LGRSLRSN)              AS START_LRSN     
      ,HEX(L.LGRELRSN)              AS END_LRSN       
      ,'IX'                                           
      ,L.LGRMEMB                    AS DM             
FROM BOXWELL.SYSLGRNX2    L                           
    ,SYSIBM.SYSINDEXES    IX                           
WHERE IX.DBID   = L.LGRDBID                           
  AND IX.ISOBID = L.LGRPSID                           
--AND HEX(L.LGRSPBA)  = '00000000000000000000'        
  AND HEX(L.LGRELRSN) = '00000000000000000000'        
ORDER BY 7                                             
WITH UR                                               
  ;                                                   

This SQL is written for DS. To convert to Non-DS just comment out the LRSN lines and uncomment out the RBA lines not forgetting the WHERE statements!

An OUW is one where the Stop RBA/End LRSN is not yet filled (so all zeroes), by the way. These are all “in use” objects and I have seen ranges from six to 17,000 of these. The Stop RBA / End LRSN is updated when the page set gets pseudo, or physically, closed again. How many of these you can/should have all depends on your workload, your choice of ZPARM PCLOSEN and when you copy the SYSLGRNX of course.

Here on my test system I only get these:

---------+---------+---------+---------+---------+---------+---------+---------+
  DBID  DBNAME        ID  NEG_PSID  SPACE       PART  LGRX_UCTIMESTAMP         
---------+---------+---------+---------+---------+---------+---------+---------+
     6  DSNDB06     2210  N         SYSTSTSS       1  2020-07-29-13.53.27.740000
     6  DSNDB06     2208  N         SYSTSISS       1  2020-08-26-12.46.56.690000
     6  DSNDB06     2068  N         SYSTSPKG       1  2020-08-26-13.46.57.710000
     6  DSNDB06     2018  N         SYSTSDBA       1  2020-08-26-14.13.13.310000
     6  DSNDB06      371  N         SYSSEQ         0  2020-08-26-14.13.13.310000
     6  DSNDB06     1994  N         SYSTSCOL       1  2020-08-26-14.13.13.320000
     6  DSNDB06     2014  N         SYSTSTAB       1  2020-08-26-14.13.13.320000
     6  DSNDB06     2016  N         SYSTSTSP       1  2020-08-26-14.13.13.320000
     6  DSNDB06     2012  N         SYSTSTPT       1  2020-08-26-14.13.13.320000
     6  DSNDB06     2020  N         SYSTSDBU       1  2020-08-26-14.13.13.320000
     6  DSNDB06     2010  N         SYSTSTAU       1  2020-08-26-14.13.13.340000
   505  DSN00171       2  N         SYSLGRNX       1  2020-08-26-14.13.33.340000
DSNE610I NUMBER OF ROWS DISPLAYED IS 12                                        

But here you can see there is something odd in that first row! I do not believe we have an OUW from July as it is the end of August at time of writing.

Grouping the output

Now to get a view of how many records per object you have. With this output you can instantly see where you have *way* too many records and need at least a MODIFY RECOVERY and/or a full image copy!

-- GROUP RECORDS IN SYSLGRNX                              
SELECT L.LGRDBID                    AS DBID               
      ,SUBSTR(TS.DBNAME, 1, 8)      AS DBNAME             
      ,L.LGRPSID                    AS ID                 
      ,L.LGRNEGPSID                 AS NEG_PSID           
      ,SUBSTR(TS.NAME, 1, 8)        AS SPACE              
      ,L.LGRPART                    AS PART               
      ,MIN(L.LGRUCTS)               AS MIN_LGRX_UCTIMESTAMP
      ,MAX(L.LGRUCTS)               AS MAX_LGRX_UCTIMESTAMP
--    ,MIN(HEX(L.LGRSRBA))          AS MIN_START_RBA      
--    ,MAX(HEX(L.LGRSRBA))          AS MAX_START_RBA      
--    ,MIN(HEX(L.LGRSPBA))          AS MIN_END_RBA        
--    ,MAX(HEX(L.LGRSPBA))          AS MAX_END_RBA        
      ,MIN(HEX(L.LGRSLRSN))         AS MIN_START_LRSN     
      ,MAX(HEX(L.LGRSLRSN))         AS MAX_START_LRSN     
      ,MIN(HEX(L.LGRELRSN))         AS MIN_END_LRSN       
      ,MAX(HEX(L.LGRELRSN))         AS MAX_END_LRSN       
      ,'TS'                                               
      ,COUNT(*)                                           
FROM BOXWELL.SYSLGRNX2    L                               
    ,SYSIBM.SYSTABLESPACE TS                              
WHERE TS.DBID = L.LGRDBID                                 
  AND TS.PSID = L.LGRPSID                                 
--AND NOT L.LGRSRBA  = L.LGRSPBA                          
  AND NOT L.LGRSLRSN = L.LGRELRSN                         
GROUP BY L.LGRDBID                                        
       , TS.DBNAME                                         
       , L.LGRPSID                                        
       , L.LGRNEGPSID                                     
       , TS.NAME                                          
       , L.LGRPART                                        
UNION ALL                                                 
SELECT L.LGRDBID                    AS DBID               
      ,SUBSTR(IX.DBNAME, 1, 8)      AS DBNAME             
      ,L.LGRPSID                    AS ID                 
      ,L.LGRNEGPSID                 AS NEG_PSID           
      ,SUBSTR(IX.INDEXSPACE, 1, 8)  AS SPACE              
      ,L.LGRPART                    AS PART               
      ,MIN(L.LGRUCTS)               AS MIN_LGRX_UCTIMESTAMP
      ,MAX(L.LGRUCTS)               AS MAX_LGRX_UCTIMESTAMP
--    ,MIN(HEX(L.LGRSRBA))          AS MIN_START_RBA      
--    ,MAX(HEX(L.LGRSRBA))          AS MAX_START_RBA      
--    ,MIN(HEX(L.LGRSPBA))          AS MIN_END_RBA        
--    ,MAX(HEX(L.LGRSPBA))          AS MAX_END_RBA        
      ,MIN(HEX(L.LGRSLRSN))         AS MIN_START_LRSN     
      ,MAX(HEX(L.LGRSLRSN))         AS MAX_START_LRSN
      ,MIN(HEX(L.LGRELRSN))         AS MIN_END_LRSN 
      ,MAX(HEX(L.LGRELRSN))         AS MAX_END_LRSN 
      ,'IX'                                         
      ,COUNT(*)                                     
FROM BOXWELL.SYSLGRNX2    L                          
    ,SYSIBM.SYSINDEXES    IX                        
WHERE IX.DBID   = L.LGRDBID                         
  AND IX.ISOBID = L.LGRPSID                         
--AND NOT L.LGRSRBA  = L.LGRSPBA                    
  AND NOT L.LGRSLRSN = L.LGRELRSN                    
GROUP BY L.LGRDBID                                  
       , IX.DBNAME                                  
       , L.LGRPSID                                  
       , L.LGRNEGPSID                               
       , IX.INDEXSPACE                              
       , L.LGRPART                                  
ORDER BY 14 DESC                                    
WITH UR                                             
  ;                                                 

Again, this is a DS version – same rules apply as the first SQL and, in fact, for all following SQL as well!

Output looks like:

---------+---------+---------+---------+---------+---------+---------+---------+
  DBID  DBNAME        ID  NEG_PSID  SPACE       PART  MIN_LGRX_UCTIMESTAMP     
---------+---------+---------+---------+---------+---------+---------+---------+
     6  DSNDB06     2208  N         SYSTSISS       1  2019-09-28-15.02.57.790000
     6  DSNDB06     2210  N         SYSTSTSS       1  2019-09-28-15.02.57.780000
     6  DSNDB06     2068  N         SYSTSPKG       1  2019-09-28-13.51.16.650000
     6  DSNDB06     2228  N         SYSTSSFB       1  2019-09-28-15.02.58.840000
     6  DSNDB06     2010  N         SYSTSTAU       1  2019-09-28-13.51.18.800000
   348  MVNXTEST     352  N         MVNXS89        1  2020-03-11-07.21.38.830000
     1  DSNDB01      127  N         SPT01          1  2019-09-28-13.51.16.620000
-------+---------+---------+---------+---------+---------+---------+--
MAX_LGRX_UCTIMESTAMP        MIN_START_LRSN        MAX_START_LRSN     
-------+---------+---------+---------+---------+---------+---------+--
2020-08-26-12.46.56.690000  00D6CADBE78BCD209000  00D86990DB7295A35400
2020-07-29-13.53.27.740000  00D6CADBE77162E15600  00D84A37CE9CCF71C400
2020-08-26-13.46.57.710000  00D6CACBE16C6D6D2800  00D867213DA095E82400
2020-08-26-09.16.50.090000  00D6CADBE7FB6E5FCC00  00D8670666C5C4344600
2020-08-26-14.13.13.340000  00D6CACBE2DBBC140000  00D866E87FD0328A4600
2020-07-02-08.42.53.340000  00D7A00B986CE7398600  00D82800085E355D1800
2020-08-26-10.18.22.240000  00D6CACBE1285CB92800  00D866E87F9F228F0A00
-----+---------+---------+---------+---------+---------+---

MIN_END_LRSN          MAX_END_LRSN                         
-----+---------+---------+---------+---------+---------+---
00D6CADE5D36D0BD2000  00000000000000000000  TS        22519
00D6CADE5D34CB2B6C00  00000000000000000000  TS        21092
00D6CACCE2EF932A2400  00000000000000000000  TS          698
00D6CADE5D381865B400  00D86710F530EDF11800  TS          568
00D6CACCE2F4C7A05800  00000000000000000000  TS          487
00D7A00A120AD792D800  00D8280A9A34E82F9800  TS          450
00D6CACCE344A2021200  00D866F33992A0EDD600  TS          448

Here you can see that the RTS tablespaces are in desparate need of a Full Image Copy and a MODIFY RECOVER! I hope you have no numbers as high as mine!

Oh, for the poor orphans

Yes indeed, orphans also exist in Db2 SYSLGRNX sometimes… Run this SQL to check: 

-- ORPHAN RECORDS IN SYSLGRNX - SHOULD BE NONE        
SELECT L.LGRDBID                    AS DBID           
      ,L.LGRPSID                    AS ID             
      ,L.LGRNEGPSID                 AS NEG_PSID       
      ,L.LGRPART                    AS PART           
      ,L.LGRUCTS                    AS LGRX_UCTIMESTAMP
--    ,HEX(L.LGRSRBA)               AS START_RBA      
--    ,HEX(L.LGRSPBA)               AS END_RBA        
      ,HEX(L.LGRSLRSN)              AS START_LRSN     
      ,HEX(L.LGRELRSN)              AS END_LRSN       
      ,L.LGRMEMB                    AS DM             
FROM BOXWELL.SYSLGRNX2    L                           
WHERE NOT EXISTS                                      
 (SELECT 1                                            
  FROM SYSIBM.SYSTABLESPACE TS                        
  WHERE TS.DBID     = L.LGRDBID                        
    AND TS.PSID     = L.LGRPSID                       
 )                                                    
  AND NOT EXISTS                                      
 (SELECT 1                                            
  FROM SYSIBM.SYSINDEXES    IX                        
  WHERE IX.DBID   = L.LGRDBID                         
    AND IX.ISOBID = L.LGRPSID                         
 )                                                    
ORDER BY LGRX_UCTIMESTAMP                              
WITH UR                                               
  ;                                                   

In my DS there are none, but when I run in my non-DS system I get:

---------+---------+---------+---------+---------+---------+---------+---------+---------+---------+----
  DBID      ID  NEG_PSID    PART  LGRX_UCTIMESTAMP            START_RBA             END_RBA            
---------+---------+---------+---------+---------+---------+---------+---------+---------+---------+----
   414       2  N              1  2020-01-06-13.31.11.100000  00000000008EB1D404EC  00000000008EB1D419EC
   414       2  N              1  2020-01-06-13.31.11.110000  00000000008EB1D438A7  00000000008EB1D49074

 

Now I happen to remember this! A colleague was “playing” with CLONE Objects and caused a bit of a problem back in the first week of the year… I have to now figure out how to get rid of these bad boys really…REPAIR should do it for me, but in a productive system I would probably contact IBM for assistance.

SYSLGRNX for Dummies?

You may have noticed that in most of the queries there is a predicate:

--AND NOT L.LGRSRBA  = L.LGRSPBA                    
  AND NOT L.LGRSLRSN = L.LGRELRSN                   

So, the question is: What??? Why would the SYSLGRNX have entries where the Start RBA was equal to the Stop RBA or the Start LRSN was equal to the End LRSN? These are special “events” or the so-called “dummy” entries. To see how many you have, run this little query:

-- SPECIAL/DUMMY ENTRIES IN SYSLGRNX                  
SELECT L.LGRDBID                    AS DBID           
      ,SUBSTR(TS.DBNAME, 1, 8)      AS DBNAME         
      ,L.LGRPSID                    AS ID             
      ,L.LGRNEGPSID                 AS NEG_PSID       
      ,SUBSTR(TS.NAME, 1, 8)        AS SPACE          
      ,L.LGRPART                    AS PART           
      ,L.LGRUCTS                    AS LGRX_UCTIMESTAMP
--    ,HEX(L.LGRSRBA)               AS START_RBA      
--    ,HEX(L.LGRSPBA)               AS END_RBA        
      ,HEX(L.LGRSLRSN)              AS START_LRSN     
      ,HEX(L.LGRELRSN)              AS END_LRSN       
      ,'TS'                                           
      ,L.LGRMEMB                    AS DM             
FROM BOXWELL.SYSLGRNX2    L                           
    ,SYSIBM.SYSTABLESPACE TS                          
WHERE TS.DBID    = L.LGRDBID                           
  AND TS.PSID    = L.LGRPSID                          
  AND L.LGRSLRSN = L.LGRELRSN                         
--AND L.LGRSRBA  = L.LGRSPBA                          
UNION ALL                                             
SELECT L.LGRDBID                    AS DBID           
      ,SUBSTR(IX.DBNAME, 1, 8)      AS DBNAME         
      ,L.LGRPSID                    AS ID             
      ,L.LGRNEGPSID                 AS NEG_PSID       
      ,SUBSTR(IX.INDEXSPACE, 1, 8)  AS SPACE          
      ,L.LGRPART                    AS PART           
      ,L.LGRUCTS                    AS LGRX_UCTIMESTAMP
--    ,HEX(L.LGRSRBA)               AS START_RBA      
--    ,HEX(L.LGRSPBA)               AS END_RBA        
      ,HEX(L.LGRSLRSN)              AS START_LRSN     
      ,HEX(L.LGRELRSN)              AS END_LRSN       
      ,'IX'                                           
      ,L.LGRMEMB                    AS DM             
FROM BOXWELL.SYSLGRNX2    L                            
    ,SYSIBM.SYSINDEXES    IX                          
WHERE IX.DBID    = L.LGRDBID                          
  AND IX.ISOBID  = L.LGRPSID                          
  AND L.LGRSLRSN = L.LGRELRSN                         
--AND L.LGRSRBA  = L.LGRSPBA                          
ORDER BY 7                                            
WITH UR                                               
  ;                                                   

I get data like this here on my test DS system:

---------+---------+---------+---------+---------+---------+---------+---------+
  DBID  DBNAME        ID  NEG_PSID  SPACE       PART  LGRX_UCTIMESTAMP         
---------+---------+---------+---------+---------+---------+---------+---------+
   320  R510D0DE     122  N         R510S23        1  2019-09-30-07.53.56.670000
   320  R510D0DE     122  N         R510S23        1  2019-09-30-07.53.58.880000
   348  MVNXTEST     122  N         MVNXS23        1  2019-11-07-11.24.59.920000
   348  MVNXTEST     122  N         MVNXS23        1  2019-11-07-11.24.59.950000
   348  MVNXTEST     389  N         MVNXS001       1  2019-11-07-14.48.37.780000
   348  MVNXTEST     389  N         MVNXS001       1  2019-11-07-14.48.38.890000
   348  MVNXTEST     398  N         MVNXS002       1  2019-11-07-14.48.40.050000
   348  MVNXTEST     398  N         MVNXS002       1  2019-11-07-14.48.40.090000
   348  MVNXTEST     429  N         MVNXS007       1  2019-11-07-14.48.42.200000
   348  MVNXTEST     389  N         MVNXS001       1  2020-02-21-11.25.54.480000
   348  MVNXTEST     389  N         MVNXS001       1  2020-02-21-11.25.55.560000
   348  MVNXTEST     398  N         MVNXS002       1  2020-02-21-11.25.56.680000
   348  MVNXTEST     398  N         MVNXS002       1  2020-02-21-11.25.57.700000
-------+---------+---------+---------+---------+------
START_LRSN            END_LRSN                      DM
-------+---------+---------+---------+---------+------
00D6CCFFBDC159BB2E00  00D6CCFFBDC159BB2E00  TS       2
00D6CCFFBFC00FF01200  00D6CCFFBFC00FF01200  TS       2
00D6FCF5C92F2D4F5200  00D6FCF5C92F2D4F5200  TS       2
00D6FCF5C976B5E74600  00D6FCF5C976B5E74600  TS       2
00D6FD234DC040BFE200  00D6FD234DC040BFE200  TS       1
00D6FD234EBD47CE8400  00D6FD234EBD47CE8400  TS       1
00D6FD235042B0392800  00D6FD235042B0392800  TS       2
00D6FD2350B1394C5800  00D6FD2350B1394C5800  TS       2
00D6FD2351B0B0B72A00  00D6FD2351B0B0B72A00  TS       2
00D7823BB0FED94EA800  00D7823BB0FED94EA800  TS       1
00D7823BB1C7253F4A00  00D7823BB1C7253F4A00  TS       1
00D7823BB2F61E0A3400  00D7823BB2F61E0A3400  TS       2
00D7823BB32BFC34CE00  00D7823BB32BFC34CE00  TS       2

What I can see from this, is that for any DDL CREATE with LOGGED attribute, or any LOAD REPLACE LOG(NO) utility, you get an entry that you can also find in SYSCOPY. The RECOVER utility might well need this data as well if you are rolling forward. 

The oldest are the bestest!

Finally, a simple “review” of the SYSLGRNX data sorted by LGRNX_UCTIMESTAMP. Just so you get an overview, this SQL has a FETCH FIRST 500 ONLY but feel free to change that number!

-- OLDEST VALID RECORDS IN SYSLGRNX                   
SELECT L.LGRDBID                    AS DBID           
      ,SUBSTR(TS.DBNAME, 1, 8)      AS DBNAME         
      ,L.LGRPSID                    AS ID             
      ,L.LGRNEGPSID                 AS NEG_PSID       
      ,SUBSTR(TS.NAME, 1, 8)        AS SPACE           
      ,L.LGRPART                    AS PART           
      ,L.LGRUCTS                    AS LGRX_UCTIMESTAMP
--    ,HEX(L.LGRSRBA)               AS START_RBA      
--    ,HEX(L.LGRSPBA)               AS END_RBA        
      ,HEX(L.LGRSLRSN)              AS START_LRSN     
      ,HEX(L.LGRELRSN)              AS END_LRSN       
      ,L.LGRMEMB                    AS DM             
      ,'TS'                                           
FROM BOXWELL.SYSLGRNX2    L                           
    ,SYSIBM.SYSTABLESPACE TS                          
WHERE TS.DBID     = L.LGRDBID                         
  AND TS.PSID     = L.LGRPSID                         
--AND NOT L.LGRSRBA  = L.LGRSPBA                      
  AND NOT L.LGRSLRSN = L.LGRELRSN                     
UNION ALL                                             
SELECT L.LGRDBID                    AS DBID           
      ,SUBSTR(IX.DBNAME, 1, 8)      AS DBNAME         
      ,L.LGRPSID                    AS ID             
      ,L.LGRNEGPSID                 AS NEG_PSID       
      ,SUBSTR(IX.INDEXSPACE, 1, 8)  AS SPACE          
      ,L.LGRPART                    AS PART           
      ,L.LGRUCTS                    AS LGRX_UCTIMESTAMP
--    ,HEX(L.LGRSRBA)               AS START_RBA      
--    ,HEX(L.LGRSPBA)               AS END_RBA        
      ,HEX(L.LGRSLRSN)              AS START_LRSN     
      ,HEX(L.LGRELRSN)              AS END_LRSN       
      ,L.LGRMEMB                    AS DM             
      ,'IX'                                           
FROM BOXWELL.SYSLGRNX2    L                           
    ,SYSIBM.SYSINDEXES    IX                          
WHERE IX.DBID   = L.LGRDBID                           
  AND IX.ISOBID = L.LGRPSID                           
--AND NOT L.LGRSRBA  = L.LGRSPBA                      
  AND NOT L.LGRSLRSN = L.LGRELRSN                     
ORDER BY LGRX_UCTIMESTAMP                             
FETCH FIRST 500 ROWS ONLY                              
WITH UR                                               
  ;                                                   

Here you can see that I rarely ever image copy the Db2 catalog…

---------+---------+---------+---------+---------+---------+---------+---------+
  DBID  DBNAME        ID  NEG_PSID  SPACE       PART  LGRX_UCTIMESTAMP         
---------+---------+---------+---------+---------+---------+---------+---------+
     6  DSNDB06     2002  N         SYSTSIPT       1  2019-09-28-13.45.32.200000
     6  DSNDB06     2000  N         SYSTSIXS       1  2019-09-28-13.45.32.250000
     6  DSNDB06     2016  N         SYSTSTSP       1  2019-09-28-13.46.59.970000
     6  DSNDB06     2012  N         SYSTSTPT       1  2019-09-28-13.46.59.970000
     6  DSNDB06     2034  N         SYSTSSTG       1  2019-09-28-13.48.02.210000
     1  DSNDB01      148  N         DSNSPT02       0  2019-09-28-13.51.16.620000


-------+---------+---------+---------+---------+------
START_LRSN            END_LRSN                  DM   
-------+---------+---------+---------+---------+------
00D6CACA98D8486B6C00  00D6CACCE2F5E2B14600       1  TS
00D6CACA99560741CC00  00D6CACCE2F9B5CCD800       1  TS
00D6CACAEC7D724A9600  00D6CACCE2F31722C800       1  TS
00D6CACAEC7C8EA08A00  00D6CACCE2F7C7553800       1  TS
00D6CACB27FFD7A40200  00D6CACCE2F214B70E00       1  TS
00D6CACBE127037D0A00  00D6CACCE31529801400       1  IX
00D6CACBE1279F5EA000  00D6CACCE31662377800       1  IX

What’s in a name?

Looking at the outputs you can see the NEG_PSID column. This is set based on whether or not the original PSID in the SYSLGRNX was negative or not. If it is negative it means you are dealing with a CLONE object, so buyer beware on those entries!

The DM column is the Data-sharing member number, it is 0 (zero) for Non-DS and 1 to 32 for DS systems. Sometimes you can get a clue about why an entry is there when you can link it to a member. Use the -DISPLAY GROUP command to get the number for your member:

DSN7100I  -SC10 DSN7GCMD                                            
*** BEGIN DISPLAY OF GROUP(GSC10C11) CATALOG LEVEL(V12R1M505)       
                  CURRENT FUNCTION LEVEL(V12R1M505)                 
                  HIGHEST ACTIVATED FUNCTION LEVEL(V12R1M505)       
                  HIGHEST POSSIBLE FUNCTION LEVEL(V12R1M506)        
                  PROTOCOL LEVEL(2)                                  
                  GROUP ATTACH NAME(SC1 )                           
---------------------------------------------------------------------
DB2          SUB                     DB2    SYSTEM    IRLM          MEMBER   ID  SYS  CMDPREF   STATUS   LVL    NAME      SUBSYS IRLMPROC
-------- --- ---- --------  -------- ------ --------  ----   --------
MEMSC10    1 SC10 -SC10     ACTIVE   121507 S0W1      JC10   SC10IRLM
MEMSC11    2 SC11 -SC11     QUIESCED 121507 S0W1      JC11   SC11IRLM
---------------------------------------------------------------------

Here the ID column is the member number. It is possible you might have DM ids for members that no longer exist… these must also be cleared out as well!

As always, any questions or comments would be most welcome!

TTFN,

Roy Boxwell