2020-11 Recover Résumé

This month I hope to answer a reader question – I was asked all about when/how to use the TOLOGPOINT phrase in a RECOVER utility. So here is my RECOVER résumé.

What is it?

RECOVER is one of those utilities that you hear a lot about but normally do not get to use that often! This is what makes it so dangerous and so interesting at the same time of course. We should all be using it on a regular basis so that when the time comes we are not afraid, or left dithering, while desperately trying to come up to speed with a “new” Utility at 03:00 in the morning…

What does it do?

It recovers – what else stupid! Well, actually it does a ton of stuff but at its most basic it takes a dataset (this can be any valid type of COPY’ed dataset) and recreates a VSAM Linear Dataset (LDS) from it. Typically, a Full Image Copy (FIC) is used as input but it can, and does, use Incremental Image Copies (IIC) as well. Finally, it can also apply log updates to get the LDS to the point you wish it be at, but note that this whole process is only forward in nature. RECOVER also has BACKOUT YES which starts at the current LDS and UNDOes updates against the object using the log. It could well be that going backwards is actually faster than applying an old image copy and going forwards! There is one small limit to BACKOUT YES:

If you specify BACKOUT YES, the recovery point must be within the most recent Db2 system checkpoints that are recorded in the BSDS for each member. Otherwise, the recovery cannot proceed and returns an error.

That’s all?

Basically yes! It is designed to get data back to a point at which you are happy that it is good and consistent. Note here that *you* must be happy. Db2 does not give a hoot about transactional consistency in this case because you are doing the recover. RECOVER guarantees that a Unit Of Work (UOW) will always be consistent with COMMITted data and indexes will also be consistent – this might well mean that you must rebuild or recover your indexes, naturally!

What is recovered?

Recover uses database and space name as input so it is instantly apparent that the “unit of recovery” is *not* a table but a tablespace (which can contain 1000’s of tables) and/or an index (Remember that an indexspace can only contain one index). There is an exception here and that is, you can also give a Page Number to recover but that is very rarely used these days as IO Errors really are a thing of the past.

The phases of Recover

First it accesses SYSIBM.SYSCOPY to see which dataset(s) it must use (Unless you use LOGONLY option in which case RECOVER assumes that some other process got the LDS back *or* you use the new NOSYSCOPY parameter to give it a dataset name that has been deleted from SYSCOPY presumably by a harsh usage of MODIFY RECOVER) and it checks the SYSIBM.SYSLGRNX to build a list of log datasets that possibly must be scanned for recovery info (Unless you use LOGRANGE NO which is not recommended!) This is also what the REPORT RECOVERY utility does, and you should also run that utility every now and again to get used to the style and type of data we are talking about.

Are you consistent?

If using Flashcopy Consistent Copies it then runs two early phases of work looking for uncommitted work that was backed out at the point of the Flashcopy and also uncommitted work that was committed at the point of Flashcopy.

The real work

Then comes the LOGAPPLY phase where, using data from SYSLGRNX, it reads and applies all log records from the FIC/IIC up to the recovery point you wished for. Here the Fast Log Apply (FLA) greatly speeds up the entire process.

FLA?

Fast Log Apply uses up to 10MB of memory to buffer the LOG being used for the RECOVER which made the LOG APPLY phase much quicker. The value of the, deprecated in Db2 9, ZPARM LOGAPSTG was set at a maximum of  100MB so this meant you could run 10 RECOVER jobs in parallel each with 100’s of RECOVERs to benefit from FLA. Then in Db2 10 they upped the maximum to 510MB by APAR while at the same time removing the ZPARM meaning you could do 51 parallel jobs.

FLA – What does it do?

As William Favero wrote many years ago: Fast log apply was first delivered in DB2 Version 6 and as the name implies, aids in processing log records by sorting log records by page before applying log record changes. It uses one log read task per recovery job and multiple log apply tasks whenever possible. By sorting the log records in page order prior to applying them, fast log apply can almost completely eliminate the need to touch a page set page more than once. This process can drastically reduce the log apply times. This is not just used by RECOVER but for today’s blog we can ignore all the other uses of FLA.

Enough of FLA, back to the point, well at least TOLOGPOINT?

Then come two special phases but only if using TOLOGPOINT – See later for details about this parameter. Here the recover LOGCSR phase builds info for any inflight, indoubt, inabort, and postponed abort units of recovery (Transactions) to be applied and the LOGUNDO phase rolls back any uncommitted changes that transactions made to any of the recovered objects.

What’s in a name?

Recover accepts four (well, actually five…) different parameters telling it what to do at the highest level:

  1. TOCOPY – recover the named object to this IC.
  2. TORBA – recover up to and including this Relative Byte Address (RBA) on the log. These days it is recommended not to use this anymore but to use the next option. See more later.
  3. TOLOGPOINT – recover up to and including this log point (Timestamp).
  4. TOLASTCOPY – recover up to the last copy taken – This can be a FIC or an IIC.
  5. TOLASTFULLYCOPY – recover up to the last FIC.

RBA or LOGPOINT?

If you are running non-datasharing then an RBA is the same as LOGPOINT and so you can use TOLOGPOINT with no worries. If, however, you are in datasharing then you can *only* use TORBA for the time span before this data sharing group was created. Thus, to keep things easier, it makes sense to just use TOLOGPOINT.

Is it safe?

The problem here is the famous saying “A little knowledge is dangerous”. It is recommended by IBM that the use of TOCOPY, TOLOGPOINT (TORBA), TOLASTCOPY and TOLASTFULLCOPY be restricted to senior users who know exactly what they are doing! Why so? Because all of these options are so called “Point in Time” (PiT) recoveries and they are all special!

Aren’t we all special?

Nope! If you do a “normal” RECOVER it brings you back to the current time. All of your data is back as if the transactions had actually all executed. This is fine if you have had a media failure etc. The problem is that most of the recoveries done these days are due to bad application code. This means you have to recover to a Point in Time when the bug was not live…

The problems of PiT

The absolute biggest problem is data consistency. I am not even talking about referential integrity (RI) or Check Rules here! Imagine you are a bank and someone withdraws 200 Dollars at 10:01 in the morning. You do a PiT to 10:00 and what is the end result? Your data is possibly ok and consistent but the end user has got 200 bucks and your bank hasn’t… this is traditionally known as “not good”…

Then come all the “other problems” – RI is number one here… You recover a dependent table to a PiT where the parent table has no row… this is “sub-optimal”. Or, even worse IMHO, you do a PiT to a point where a Check Constraint was added… this is 100% Nightmare time…

The problem is…

We are not alone! The objects that we have and use are all defined and managed in the Db2 Catalog and any PiT you do to so-called “user data” has no knowledge and no inference with the Db2 Catalog data. Here is where all the really nasty problems lurk around and loiter in the shadows… The basic rule is “If you changed anything in the catalog – check it all before doing the PiT!” It is much easier to check it all beforehand – doing it afterwards can be especially painful!

SLB?

System Level Back-up came in a while ago and RECOVER got updated to also be able to use it as a basis for recovery data. It started out with quite a few Ifs and Buts e.g. No REORG, REBUILD, LOAD REPLACE etc. but these all disappeared with z/OS 1.11 as then it could handle when datasets moved around the disks. There are still two problem areas where SLBs cannot be used as input and that is when you have done a REORG to remove a dropped column or you have done a REORG to remove unused partitions at the end. Apart from these two minor irritations SLBs are just as good as FICs.

In depth per option

All of the TOxxxCOPYs do the same thing. They recover the object to that COPY and thus use *no* log access. If the IC is not available, for whatever reason, Db2 falls back to the next earliest IC and then *does* do a LOGAPPLY up to the PiT of the requested IC.

TOLOGPOINT. Recover chooses the newest FIC before this point. If any IICs exist these are merged, then it does LOGAPPLY up to and including the record contained by the LOGPOINT. In this case if there is no FIC or SLB then it will use *just* LOGAPPLY from the creation of the object, LOAD TS or REORG TS. Naturally, this LOGAPPLY is only possible as long as you have *not* run any MODIFY RECOVERY that deleted records from the SYSLGRNX. This is why MODIFY RECOVERY actually inserts records in SYSCOPY!

Any uncommitted transactions running at the PiT are rolled back leaving them all in a transactionally consistent state.

Scoping out problems?

New in Db2 12 is the parameter SCOPE with default value UPDATED. Now this was quite a good idea as it meant that RECOVER now only does real work for objects that have really *changed* since the last FIC. It does this little bit of magic by looking into the SYSLGRNX to see whether or not any changes have actually been processed. This is also a serious problem! John Campbell tells us all “Please change all RECOVER jobs to use SCOPE ALL or make sure that PTF PH20056 is applied”.

Where’s the beef?

It is a well known fact that the road to disaster is paved with good intentions. The SCOPE UPDATED sounds great at first but then what would happen if you are using LISTDEF to do a PiT and the Db2 Catalog is involved? Further what happens if you, purely by bad luck, RECOVER SYSLGRNX before any other tablespaces? Yep, the “bit of magic” then has no data and is effectively blind and you get seriously bad recovered data.

Top Tip

If you want to use TOxxxCOPY but *require* consistency just use TOLOGPOINT instead and simply specify the LOGPOINT of the IC you wished to use anyway. This way Db2 guarantees that even SHRLEVEL CHANGE copies will not cause a problem!

Pain Points

RI – This is the number one bug bear! You must make sure that all objects in the RI chain get the *same* PiT. Failure to do so will leave you in CHECK pending and possibly a nasty mess to sort out!

Check Constraint added – If added try and make sure you hit a quiesce point or FIC otherwise verify that the PiT is before the creation timestamp of the constraint.

Catalog – As mentioned the catalog is a description of the objects and is *not* automatically updated by recover. This is for the carbon based beings to sort out. ALTER can be especially annoying here! Plus SYSLGRNX as mentioned above.

Identity Columns – These will no longer be in sync with SYSSEQUENCES and so care must be taken if they *must* be synchronized.

Compression dictionaries can kill you if you recover a single LDS of a multiple LDS but with a different version of the dictionary. Then the data is naturally not readable any more!

How to find a LOGPOINT

We have now seen how it all hangs together so the big question now is “How do you find an acceptable TOLOGPOINT?”

The xxxxMSTR address space contains numerous LOGPOINTs:

06.24.45 STC03367 DSNJ003I -SC10 DSNJOFF3 FULL ARCHIVE LOG VOLUME  
DSNAME=DB2SC1.SC10.ARCHLG1.D20280.T0624437.A0000374,
STARTRBA=00D6CAB49FFF50D10000, ENDRBA=00D6CAB49FFF52ECFFFF,
STARTLRSN=00D8A08DB179EBDBB800, ENDLRSN=00D8A09432EF45F1A800,
UNIT=SYSALLDA, COPY1VOL=SC1L00, VOLSPAN=00, CATLG=YES

From this message you get the ENDLRSN. If you convert the LRSN to a timestamp you get:

SELECT TIMESTAMP(BX'00D8A09432EF45F1A800' CONCAT BX'000000000000')
FROM SYSIBM.SYSDUMMY1 ;
---------+---------+---------+---------+---------+---------+------
2020-10-06-04.24.43.727967
DSNE610I NUMBER OF ROWS DISPLAYED IS 1
DSNE616I STATEMENT EXECUTION WAS SUCCESSFUL, SQLCODE IS 100

So you now have a LRSN to timestamp to work with. Be aware that the LRSN is *local* time as you can see here it is 04.24 but reported in the MSTR as 06.24 Berlin time is UTC+2.

Another possibly good starting point is to run the DSNJU004 routine and look at the CHECKPOINT QUEUE right at the very end of the output:

                    CHECKPOINT QUEUE 
09:20:31 OCTOBER 12, 2020
TIME OF CHECKPOINT 04:24:46 OCTOBER 06, 2020
BEGIN CHECKPOINT RBA 00D6CAB49FFF5308C5A8
END CHECKPOINT RBA 00D6CAB49FFF530A143A
END CHECKPOINT LRSN 00D8A09435A32E201600
TIME OF CHECKPOINT 03:55:38 OCTOBER 06, 2020
BEGIN CHECKPOINT RBA 00D6CAB49FFF511E3612
END CHECKPOINT RBA 00D6CAB49FFF511F5CB8
END CHECKPOINT LRSN 00D8A08DB297403D0E00

Full of RBAs and LRSNs – Great stuff indeed! Plus all the normal Active and Archive log details.

I have a horrible old REXX where I can enter a timestamp and it outputs the LRSN. You must manually edit it but it does work!

/* REXX */ 
NUMERIC DIGITS 64
RBAFORMAT = 'E' /* Output format (E = extended, B = basic) */
OFFSET = '000000000000' /* Set to STCK offset of data sharing group */
CVT = C2D(STORAGE(10, 4))
CHKBYTE = STORAGE(D2X(CVT + 304), 1)
CVTTZ = STORAGE(D2X(CVT + 304), 4)
IF BITAND(CHKBYTE, '80'X) = '80'X THEN CVTTZ = C2D(CVTTZ, 4)
ELSE CVTTZ = C2D(CVTTZ)
CVTTZ = CVTTZ * 1.048576 / 3600
CVTTZ = FORMAT(CVTTZ, 2, 0)
SAY 'Timezone diff. hours ' CVTTZ
MS = (DATE('BASE') - 693595) * 24 * 60 * 60
MS = MS + TIME('SECONDS')
MS = MS - (CVTTZ * 60 * 60)
MS = MS * 1000000
SAY 'Today MS = ' MS
LRSNBIN = X2D(SUBSTR(D2X(MS), 1, 12))
IF LENGTH(OFFSET) = 20 THEN OFFSET = SUBSTR(OFFSET, 3, 12)
LRSN = D2X(LRSNBIN + X2D(OFFSET))
IF RBAFORMAT = 'E' THEN LRSN = '00'LRSN'000000'
SAY 'Current LRSN (with STCK offset 'OFFSET') = 'LRSN
/* Now do for any time e.g. */
/* 2019-11-08-11.59.21.445591 00D6FE3F55460D3AD600 From syscopy */
/* output 00D6FE3F55460D000000 */
fromdate = '2019-11-08 11.59.21' /* example */
MS = (DATE('BASE','8 Nov 2019') - 693595) /* Here the date */
MS = MS * 24 * 60 * 60
MS = MS + ( 11 * 60 * 60) /* Here the hours */
MS = MS + ( 59 * 60) /* Here the minutes */
MS = MS + 21 /* Here the seconds */
MS = MS - (CVTTZ * 60 * 60)
MS = MS * 1000000
MS = MS + 445591 /* Here the rest seconds */
SAY 'From date MS = ' MS
LRSNBIN = X2D(SUBSTR(D2X(MS), 1, 12))
IF LENGTH(OFFSET) = 20 THEN OFFSET = SUBSTR(OFFSET, 3, 12)
LRSN = D2X(LRSNBIN + X2D(OFFSET))
IF RBAFORMAT = 'E' THEN LRSN = '00'LRSN'000000'
SAY 'From date = ' fromdate
SAY 'From date LRSN (with STCK offset 'OFFSET') = 'LRSN

Feel free to change it to allow the date/time/seconds as a parameter – I never got the time – pardon the pun…

SYSCOPY contains LRSNs and TIMESTAMPs and can be great starting point. SYSLGRNX obviously is nowadays also worth a look and the RTS also contains LRSNs and TIMESTAMPs.

REPORT RECOVERY also lists out a ton of info including LRSNs and timestamps of course.

Ready, Steady, RECOVER!

So with all the sources available and the ability to switch LRSN to Timestamp and vice versa you are now fully prepared to use a PiT. Just remember to think it all through, get the application people on board and get the plan approved before you start doing RECOVER!

Redirected RECOVER?

Testing all this is fraught with danger of course… You cannot simply do a RECOVER of a production table while *anything* is running in production. So how can you test whether or not your RECOVER JCL is correct and whether or not your Service Level Agreements are met by your RECOVER JCL? IBM Db2 development have now solved this for us with “Redirected RECOVERY” APAR PH27043 PTF UI72057 for Db2 12 FL500 and above. It adds a new parameter to the RECOVER syntax ” FROM xxx.yyy” so you can RECOVER an unused “dummy” object from a live production object with *no* interference! There are, as usual, a couple of caveats: The “dummy” space must be “the same” as the to-be-recovered space. Think table defs, DSSIZE, BUFFERPOOL etc. here. Finally, only UTS are supported at this time.

I would love to hear of any of your war stories about RECOVER use or abuse!

Finally, many thanks to Vasu for asking me to dig deeper into RECOVER.

TTFN

Roy Boxwell

 

2020-10 Mapping Table Mystery

We all know, and love, REORG Mapping tables, right? Well, this month I want to run through the ins and outs of defining and using these little beasts!

In the Beginning

In the beginning, well DB2 V5 actually, was a requirement for a list of Row Ids (RIDs) to be stored so that the SHRLEVEL CHANGE REORG could map between an original RID and the new “shadow” RID. It was created with normal DDL:

CREATE TABLESPACE table-space-name SEGSIZE integer;

CREATE TABLE table-name1
(TYPE CHAR(1) NOT NULL,
SOURCE_RID CHAR(5) NOT NULL,
TARGET_XRID CHAR(9) NOT NULL,
LRSN CHAR(6) NOT NULL)
IN database-name.table-space-name
;

 CREATE TYPE 2 UNIQUE INDEX index-name1 ON table-name1
 (SOURCE_RID ASC,
TYPE,
TARGET_XRID,
LRSN)
;

Notice that the TARGET_XRID is nine characters even though a RID is, at least way back then, actually five bytes!

Tablespace or Index space size?

What was interesting from the start, was that the size of the tablespace was irrelevant – it was *only* the index that was ever used. So I saw a ton of tablespaces with 7200, 720 allocations, all of which were completely pointless. I always ALTERed these back to 48, 48. Actually, I normally just DROPped and CREATEd them again!

It is also interesting to see how many shops actually do COPY, RUNSTATS, and even REORGs of these tablespaces… Completely pointless of course as they are not used by normal SQL, never used for RECOVERY, and a waste of time for all utilities. If possible, EXCLUDE them from all utilities!

Static or Dynamic?

So shops started heading off in two directions: Some created 1000’s of mapping tables, all with Job name as schema or table name, so that they could just use generator software and always “know” that the mapping table would be there. The other group decided to add a DROP and CREATE step at the beginning of the REORG job, and a DROP at the end. This was known as “dynamic” mapping tables.

Static meant less traffic to the catalog, but 1000’s of tables loafing around doing nothing, Dynamic meant DB2 Catalog traffic but only a few tables existing and only for the lifetime of the REORG. Basically, it was down to every shop to work out their best way to handle these tables.

Changes through the releases

In DB2 V7 it was even recommended to name the table the same as the Utility ID and as these, by definition, are unique, you could never hit the “duplicate name” problem.

In DB2 V8 it was noted that REORG always empties the table after completion.

In DB2 V9 you could now also use a PBG for the tablespace. This is actually required if you have more than two billion rows in the tablespace to be REORGed.

CREATE TABLESPACE table-space-name MAXPARTITIONS integer;

instead of the segmented syntax.

All new ball game

In Db2 11 it all changed! The ability was given to use a MAPPINGDATABASE or even nothing at all! This latter option seemed great, at first glance, but then it became apparent that it caused Db2 catalog contentions… not good for massively parallel REORG jobs!

The MAPPINGDATABASE xxxxx value overides the ZPARM REORG_MAPPING_DATABASE in cases where you would like to use another “location” for your implicit tables. From the docu:

REORG MAPPING DB field (REORG_MAPPING_DATABASE subsystem parameter)
The REORG_MAPPING_DATABASE subsystem parameter specifies the default database that REORG TABLESPACE SHRLEVEL CHANGE uses to implicitly create the mapping table.

Acceptable values: database-name
Default: blank
Update: option 31 on panel DSNTIPB
DSNZPxxx: DSN6SPRM.REORG_MAPPING_DATABASE
Online changeable: Yes
Data sharing scope: Member

database-name
The name of the default database that REORG TABLESPACE SHRLEVEL CHANGE uses to implicitly create the mapping table. The name must a character string with maximum length of 8 bytes.
blank
An implicitly defined database is used.

When processing a REORG TABLESPACE SHRLEVEL CHANGE request, the REORG utility has the option to create its own mapping table and mapping index, instead of relying on user's input. Specifying this subsystem parameter with a valid database name directs REORG to allocate the mapping table in the database that is specified. By default, REORG uses an implicitly defined database for the mapping table allocation.

So nearly all shops have <blank> – which led to the creation of hundreds of implicit databases…

Top tip: Create a “master” Database that is *just* for REORG SHRLEVEL CHANGE and change the ZPARM to use it. Then you have a really good high level way to exclude utilities from bothering with these tables and you stop using up your DBIDs!

Also in Db2 11, the mapping table got updated as the LRSN increased to ten bytes…

<optional>CREATE TABLESPACE table-space-name SEGSIZE integer;
or
<optional>CREATE TABLESPACE table-space-name MAXPARTITIONS integer;

CREATE TABLE table-name1
(TYPE CHAR(1) NOT NULL,
SOURCE_RID CHAR(5) NOT NULL,
TARGET_XRID CHAR(9) NOT NULL,
LRSN CHAR(10) NOT NULL)
<optional>IN database-name.table-space-name
;

CREATE UNIQUE INDEX index-name1 ON table-name1
 (SOURCE_RID ASC,
TYPE,
TARGET_XRID,
LRSN);

Actually, IBM changed the names of the columns as well (Though the column names are never really used…) so the CREATE really looks like:

<optional>CREATE TABLESPACE table-space-name SEGSIZE integer;
or
<optional>CREATE TABLESPACE table-space-name MAXPARTITIONS integer;

CREATE TABLE table-name1
(TYPE CHAR(1) NOT NULL,
ORID CHAR(5) NOT NULL,
NRID CHAR(9) NOT NULL,
LRSN CHAR(10) NOT NULL)
<optional>IN database-name.table-space-name
;

CREATE UNIQUE INDEX index-name1 ON table-name1
(ORID ASC,
TYPE,
NRID,
LRSN)
;

If this format was not available in NFM, then REORG created a table to use on its own using either an implicit database, or the database from the ZPARM, if specified. This uses up another DBID of course.

All change!

Then along came Db2 12 and they changed the mapping table again…

<optional>CREATE TABLESPACE table-space-name SEGSIZE integer;
or
<optional>CREATE TABLESPACE table-space-name MAXPARTITIONS integer;

CREATE TABLE table-name1
(TYPE CHAR(1) NOT NULL,
ORID CHAR(7) NOT NULL,
NRID CHAR(11) NOT NULL,
LRSN CHAR(10) NOT NULL)
<optional>IN database-name.table-space-name
;

CREATE UNIQUE INDEX index-name1 ON table-name1
(ORID ASC,
TYPE,
NRID,
LRSN)
;

This time, it was the RIDs expanding by two bytes each that forced the change. Just like in Db2 11 NFM, if this format was not available in Db2 12 FL500 and above, then REORG creates a table to use on its own using either an implicit database or the database from the ZPARM if specified.

Enough history…

So now we know how the table looks and how it is used and defined. The question is: Which of the various methods is the best?

Method 1: Define 1000’s of static mapping tables.

Method 2: Create each table in each utility job using the utility id as the mapping table name.

Method 3: Create a “master” Database and use it in the MAPPINGDATABASE utility parameter or ZPARM, thus allowing Db2 to create all tablespaces, tables, and indexes in it.

Method 4: Do nothing, have no MAPPINGxxx utility parameter and let Db2 “do the magic.”

Method 1: Is a bit mad, as whenever IBM development change the definition, you have 1000’s of changes to do and they do clutter up the Db2 catalog. However if you are paranoid about contention and you are doing lots, and I really mean *lots*, of parallel REORGS then this method is probably the best way to avoid DBD contention.

Method 2: Is not bad, as you only create as many objects as you have jobs and they are always dropped at the end, so it reduces clutter but it does increase the risk of Db2 Catalog contention.

Method 3: My favorite! (Unless doing *lots* of parallel REORGS!) You have one DB with one DBID and you can easily exclude it from other utilities and cleanup, and recreate is a simple DROP and CREATE. From my point of view, this is a winner! There is still Db2 Catalog contention of course as the TS, TB and IX must all be created before all being thrown away at the end of the REORG.

Method 4: Is the worst of the all IMHO. You will run through your DBIDs quicker than you can blink, and you increase contention on the Db2 Catalog as it must also create your implicit database all the time. This is for test only! However, it it does have the advantage of a new DBID every time and so very low chance of DBD contention if you have lots of parallel running REORGS.

Results from my tests

Using static caused no trouble anywhere and DBIDs did not get wasted.

Using dynamic, including creation of the database, allowed DBID reuse.

Using dynamic without the database also allowed DBID reuse as the DROP TABLESPACE also dropped the implicitly created database.

Using dynamic without the database & tablespace caused a Widow database to be left after dropping of the TABLE, (remember the current rules on dropping implicitly created databases? If you drop the Table then the Tablespace gets dropped but not the Database!) This is pretty bad of course, as not only are your DBIDs going up they are being left “used” which is not good!

Using MAPPINGDATABASE with an existing database works, as I mentioned, great and is my favorite! (Remember – Only if not doing massively parallel REORGS)

What are your thoughts on mapping table solutions?

I would love to hear from you!

TTFN

Roy Boxwell

Update: Michael Harper contacted me with some comments and I summarize them below:

  1. Definitely use a default mapping database – do not leave it blank as this will reduce contention issues
  2. If not using shared ZPARMs also use different default mapping databases as again this will reduce contention issues

Update: One of my readers pointed out that when running large numbers of parallel REORGS (Over 100!) you can hit DBD contention issues if you use one DB for the mapping Tables. In this scenario you either require a dynamic table being created for each job in its own database or use the implicit databases (DSNnnnnn) and make sure you DROP any widow style DBs.

Another Update: One of my readers mentioned that you can still get contention when you or Db2 create mapping tables. This is naturally true for creating anything within Db2. If you want a 100% guarantee of no contention then you *must* create all mapping tables in advance which would be Method 1 in my Blog.

2020-08 SYSLGRNX through the looking glass

This month I want to have a good look inside the SYSLGRNX – Not just for fun, but actually to review what is inside and ask the question: Is any of it “interesting” for us as DBAs?

Create your own?

Since Db2 10 the Db2 Directory table SYSIBM.SYSLGRNX has been available for normal SQL use. I wrote a newsletter all about this (Discovering hidden recovery problems in the SYSLGRNX.) This “older” newsletter has now been updated with a newer version of the SYSLGRNX private table/index and how to cross load it and RUNSTATS it, which we will be using within this newsletter. 

What is it?

The SYSLGRNX table is used by Db2 to “remember” when a physical page set (object) changed from R/O access to R/W access. This is a requirement to speed up RECOVER jobs. When this is done correctly, the RECOVER can simply look in SYSLGRNX to find any RBA/LRSN ranges that must actually be checked in the Db2 Log. Without this table, Db2 would *have* to access all the logs from the last Full Image Copy/Incremental Image Copy to see if any data changes had been done. This means that the table is pretty critical when it comes to doing a RECOVER job!

Why look in it?

Well, it is nice to know that Db2 is keeping tabs on everything, but what if it makes a mistake? Heaven forbid there could be bugs in the code or perhaps an LPAR crash… Basically, it boils down to a good chance that you have some garbage in the table. This does not have any impact at all *until* you try and RECOVER an object… then Db2 requires an archive log from 2017 and the object is marked as “unrecoverable” – Whoops!

SQL Time – LRSN or RBA Sir?

So now you have to decide: are you interested in data sharing (DS) or in non-data sharing (Non-DS) systems? DS use the Log Record Sequence Number (LRSN) to mark data on the log. It is basically a ten-byte super-duper timestamp field, with/without an offset, which points to a place on the log. If Non-DS, you use a Relative Byte Address (RBA) which is really just a ten-byte address pointing to a place on the log. All of the sample SQL assumes you are running in a DS world and that you have created the updated private version of SYSLGRNX. 

Start with Open Units of Work

Let’s have a quick review of all current Open Units of Work (OUW) that exist:

-- OPEN UNITS OF WORK IN SYSLGRNX                     
SELECT L.LGRDBID                    AS DBID           
      ,SUBSTR(TS.DBNAME, 1, 8)      AS DBNAME         
      ,L.LGRPSID                    AS ID             
      ,L.LGRNEGPSID                 AS NEG_PSID       
      ,SUBSTR(TS.NAME, 1, 8)        AS SPACE          
      ,L.LGRPART                    AS PART           
      ,L.LGRUCTS                    AS LGRX_UCTIMESTAMP
--    ,HEX(L.LGRSRBA)               AS START_RBA      
--    ,HEX(L.LGRSPBA)               AS END_RBA        
      ,HEX(L.LGRSLRSN)              AS START_LRSN     
      ,HEX(L.LGRELRSN)              AS END_LRSN       
      ,'TS'                                           
      ,L.LGRMEMB                    AS DM             
FROM BOXWELL.SYSLGRNX2    L                           
    ,SYSIBM.SYSTABLESPACE TS                          
WHERE TS.DBID     = L.LGRDBID                         
  AND TS.PSID     = L.LGRPSID                          
--AND HEX(L.LGRSPBA)  = '00000000000000000000'        
  AND HEX(L.LGRELRSN) = '00000000000000000000'        
UNION ALL                                             
SELECT L.LGRDBID                    AS DBID           
      ,SUBSTR(IX.DBNAME, 1, 8)      AS DBNAME         
      ,L.LGRPSID                    AS ID             
      ,L.LGRNEGPSID                 AS NEG_PSID       
      ,SUBSTR(IX.INDEXSPACE, 1, 8)  AS SPACE          
      ,L.LGRPART                    AS PART           
      ,L.LGRUCTS                    AS LGRX_UCTIMESTAMP
--    ,HEX(L.LGRSRBA)               AS START_RBA      
--    ,HEX(L.LGRSPBA)               AS END_RBA        
      ,HEX(L.LGRSLRSN)              AS START_LRSN     
      ,HEX(L.LGRELRSN)              AS END_LRSN       
      ,'IX'                                           
      ,L.LGRMEMB                    AS DM             
FROM BOXWELL.SYSLGRNX2    L                           
    ,SYSIBM.SYSINDEXES    IX                           
WHERE IX.DBID   = L.LGRDBID                           
  AND IX.ISOBID = L.LGRPSID                           
--AND HEX(L.LGRSPBA)  = '00000000000000000000'        
  AND HEX(L.LGRELRSN) = '00000000000000000000'        
ORDER BY 7                                             
WITH UR                                               
  ;                                                   

This SQL is written for DS. To convert to Non-DS just comment out the LRSN lines and uncomment out the RBA lines not forgetting the WHERE statements!

An OUW is one where the Stop RBA/End LRSN is not yet filled (so all zeroes), by the way. These are all “in use” objects and I have seen ranges from six to 17,000 of these. The Stop RBA / End LRSN is updated when the page set gets pseudo, or physically, closed again. How many of these you can/should have all depends on your workload, your choice of ZPARM PCLOSEN and when you copy the SYSLGRNX of course.

Here on my test system I only get these:

---------+---------+---------+---------+---------+---------+---------+---------+
  DBID  DBNAME        ID  NEG_PSID  SPACE       PART  LGRX_UCTIMESTAMP         
---------+---------+---------+---------+---------+---------+---------+---------+
     6  DSNDB06     2210  N         SYSTSTSS       1  2020-07-29-13.53.27.740000
     6  DSNDB06     2208  N         SYSTSISS       1  2020-08-26-12.46.56.690000
     6  DSNDB06     2068  N         SYSTSPKG       1  2020-08-26-13.46.57.710000
     6  DSNDB06     2018  N         SYSTSDBA       1  2020-08-26-14.13.13.310000
     6  DSNDB06      371  N         SYSSEQ         0  2020-08-26-14.13.13.310000
     6  DSNDB06     1994  N         SYSTSCOL       1  2020-08-26-14.13.13.320000
     6  DSNDB06     2014  N         SYSTSTAB       1  2020-08-26-14.13.13.320000
     6  DSNDB06     2016  N         SYSTSTSP       1  2020-08-26-14.13.13.320000
     6  DSNDB06     2012  N         SYSTSTPT       1  2020-08-26-14.13.13.320000
     6  DSNDB06     2020  N         SYSTSDBU       1  2020-08-26-14.13.13.320000
     6  DSNDB06     2010  N         SYSTSTAU       1  2020-08-26-14.13.13.340000
   505  DSN00171       2  N         SYSLGRNX       1  2020-08-26-14.13.33.340000
DSNE610I NUMBER OF ROWS DISPLAYED IS 12                                        

But here you can see there is something odd in that first row! I do not believe we have an OUW from July as it is the end of August at time of writing.

Grouping the output

Now to get a view of how many records per object you have. With this output you can instantly see where you have *way* too many records and need at least a MODIFY RECOVERY and/or a full image copy!

-- GROUP RECORDS IN SYSLGRNX                              
SELECT L.LGRDBID                    AS DBID               
      ,SUBSTR(TS.DBNAME, 1, 8)      AS DBNAME             
      ,L.LGRPSID                    AS ID                 
      ,L.LGRNEGPSID                 AS NEG_PSID           
      ,SUBSTR(TS.NAME, 1, 8)        AS SPACE              
      ,L.LGRPART                    AS PART               
      ,MIN(L.LGRUCTS)               AS MIN_LGRX_UCTIMESTAMP
      ,MAX(L.LGRUCTS)               AS MAX_LGRX_UCTIMESTAMP
--    ,MIN(HEX(L.LGRSRBA))          AS MIN_START_RBA      
--    ,MAX(HEX(L.LGRSRBA))          AS MAX_START_RBA      
--    ,MIN(HEX(L.LGRSPBA))          AS MIN_END_RBA        
--    ,MAX(HEX(L.LGRSPBA))          AS MAX_END_RBA        
      ,MIN(HEX(L.LGRSLRSN))         AS MIN_START_LRSN     
      ,MAX(HEX(L.LGRSLRSN))         AS MAX_START_LRSN     
      ,MIN(HEX(L.LGRELRSN))         AS MIN_END_LRSN       
      ,MAX(HEX(L.LGRELRSN))         AS MAX_END_LRSN       
      ,'TS'                                               
      ,COUNT(*)                                           
FROM BOXWELL.SYSLGRNX2    L                               
    ,SYSIBM.SYSTABLESPACE TS                              
WHERE TS.DBID = L.LGRDBID                                 
  AND TS.PSID = L.LGRPSID                                 
--AND NOT L.LGRSRBA  = L.LGRSPBA                          
  AND NOT L.LGRSLRSN = L.LGRELRSN                         
GROUP BY L.LGRDBID                                        
       , TS.DBNAME                                         
       , L.LGRPSID                                        
       , L.LGRNEGPSID                                     
       , TS.NAME                                          
       , L.LGRPART                                        
UNION ALL                                                 
SELECT L.LGRDBID                    AS DBID               
      ,SUBSTR(IX.DBNAME, 1, 8)      AS DBNAME             
      ,L.LGRPSID                    AS ID                 
      ,L.LGRNEGPSID                 AS NEG_PSID           
      ,SUBSTR(IX.INDEXSPACE, 1, 8)  AS SPACE              
      ,L.LGRPART                    AS PART               
      ,MIN(L.LGRUCTS)               AS MIN_LGRX_UCTIMESTAMP
      ,MAX(L.LGRUCTS)               AS MAX_LGRX_UCTIMESTAMP
--    ,MIN(HEX(L.LGRSRBA))          AS MIN_START_RBA      
--    ,MAX(HEX(L.LGRSRBA))          AS MAX_START_RBA      
--    ,MIN(HEX(L.LGRSPBA))          AS MIN_END_RBA        
--    ,MAX(HEX(L.LGRSPBA))          AS MAX_END_RBA        
      ,MIN(HEX(L.LGRSLRSN))         AS MIN_START_LRSN     
      ,MAX(HEX(L.LGRSLRSN))         AS MAX_START_LRSN
      ,MIN(HEX(L.LGRELRSN))         AS MIN_END_LRSN 
      ,MAX(HEX(L.LGRELRSN))         AS MAX_END_LRSN 
      ,'IX'                                         
      ,COUNT(*)                                     
FROM BOXWELL.SYSLGRNX2    L                          
    ,SYSIBM.SYSINDEXES    IX                        
WHERE IX.DBID   = L.LGRDBID                         
  AND IX.ISOBID = L.LGRPSID                         
--AND NOT L.LGRSRBA  = L.LGRSPBA                    
  AND NOT L.LGRSLRSN = L.LGRELRSN                    
GROUP BY L.LGRDBID                                  
       , IX.DBNAME                                  
       , L.LGRPSID                                  
       , L.LGRNEGPSID                               
       , IX.INDEXSPACE                              
       , L.LGRPART                                  
ORDER BY 14 DESC                                    
WITH UR                                             
  ;                                                 

Again, this is a DS version – same rules apply as the first SQL and, in fact, for all following SQL as well!

Output looks like:

---------+---------+---------+---------+---------+---------+---------+---------+
  DBID  DBNAME        ID  NEG_PSID  SPACE       PART  MIN_LGRX_UCTIMESTAMP     
---------+---------+---------+---------+---------+---------+---------+---------+
     6  DSNDB06     2208  N         SYSTSISS       1  2019-09-28-15.02.57.790000
     6  DSNDB06     2210  N         SYSTSTSS       1  2019-09-28-15.02.57.780000
     6  DSNDB06     2068  N         SYSTSPKG       1  2019-09-28-13.51.16.650000
     6  DSNDB06     2228  N         SYSTSSFB       1  2019-09-28-15.02.58.840000
     6  DSNDB06     2010  N         SYSTSTAU       1  2019-09-28-13.51.18.800000
   348  MVNXTEST     352  N         MVNXS89        1  2020-03-11-07.21.38.830000
     1  DSNDB01      127  N         SPT01          1  2019-09-28-13.51.16.620000
-------+---------+---------+---------+---------+---------+---------+--
MAX_LGRX_UCTIMESTAMP        MIN_START_LRSN        MAX_START_LRSN     
-------+---------+---------+---------+---------+---------+---------+--
2020-08-26-12.46.56.690000  00D6CADBE78BCD209000  00D86990DB7295A35400
2020-07-29-13.53.27.740000  00D6CADBE77162E15600  00D84A37CE9CCF71C400
2020-08-26-13.46.57.710000  00D6CACBE16C6D6D2800  00D867213DA095E82400
2020-08-26-09.16.50.090000  00D6CADBE7FB6E5FCC00  00D8670666C5C4344600
2020-08-26-14.13.13.340000  00D6CACBE2DBBC140000  00D866E87FD0328A4600
2020-07-02-08.42.53.340000  00D7A00B986CE7398600  00D82800085E355D1800
2020-08-26-10.18.22.240000  00D6CACBE1285CB92800  00D866E87F9F228F0A00
-----+---------+---------+---------+---------+---------+---

MIN_END_LRSN          MAX_END_LRSN                         
-----+---------+---------+---------+---------+---------+---
00D6CADE5D36D0BD2000  00000000000000000000  TS        22519
00D6CADE5D34CB2B6C00  00000000000000000000  TS        21092
00D6CACCE2EF932A2400  00000000000000000000  TS          698
00D6CADE5D381865B400  00D86710F530EDF11800  TS          568
00D6CACCE2F4C7A05800  00000000000000000000  TS          487
00D7A00A120AD792D800  00D8280A9A34E82F9800  TS          450
00D6CACCE344A2021200  00D866F33992A0EDD600  TS          448

Here you can see that the RTS tablespaces are in desparate need of a Full Image Copy and a MODIFY RECOVER! I hope you have no numbers as high as mine!

Oh, for the poor orphans

Yes indeed, orphans also exist in Db2 SYSLGRNX sometimes… Run this SQL to check: 

-- ORPHAN RECORDS IN SYSLGRNX - SHOULD BE NONE        
SELECT L.LGRDBID                    AS DBID           
      ,L.LGRPSID                    AS ID             
      ,L.LGRNEGPSID                 AS NEG_PSID       
      ,L.LGRPART                    AS PART           
      ,L.LGRUCTS                    AS LGRX_UCTIMESTAMP
--    ,HEX(L.LGRSRBA)               AS START_RBA      
--    ,HEX(L.LGRSPBA)               AS END_RBA        
      ,HEX(L.LGRSLRSN)              AS START_LRSN     
      ,HEX(L.LGRELRSN)              AS END_LRSN       
      ,L.LGRMEMB                    AS DM             
FROM BOXWELL.SYSLGRNX2    L                           
WHERE NOT EXISTS                                      
 (SELECT 1                                            
  FROM SYSIBM.SYSTABLESPACE TS                        
  WHERE TS.DBID     = L.LGRDBID                        
    AND TS.PSID     = L.LGRPSID                       
 )                                                    
  AND NOT EXISTS                                      
 (SELECT 1                                            
  FROM SYSIBM.SYSINDEXES    IX                        
  WHERE IX.DBID   = L.LGRDBID                         
    AND IX.ISOBID = L.LGRPSID                         
 )                                                    
ORDER BY LGRX_UCTIMESTAMP                              
WITH UR                                               
  ;                                                   

In my DS there are none, but when I run in my non-DS system I get:

---------+---------+---------+---------+---------+---------+---------+---------+---------+---------+----
  DBID      ID  NEG_PSID    PART  LGRX_UCTIMESTAMP            START_RBA             END_RBA            
---------+---------+---------+---------+---------+---------+---------+---------+---------+---------+----
   414       2  N              1  2020-01-06-13.31.11.100000  00000000008EB1D404EC  00000000008EB1D419EC
   414       2  N              1  2020-01-06-13.31.11.110000  00000000008EB1D438A7  00000000008EB1D49074

 

Now I happen to remember this! A colleague was “playing” with CLONE Objects and caused a bit of a problem back in the first week of the year… I have to now figure out how to get rid of these bad boys really…REPAIR should do it for me, but in a productive system I would probably contact IBM for assistance.

SYSLGRNX for Dummies?

You may have noticed that in most of the queries there is a predicate:

--AND NOT L.LGRSRBA  = L.LGRSPBA                    
  AND NOT L.LGRSLRSN = L.LGRELRSN                   

So, the question is: What??? Why would the SYSLGRNX have entries where the Start RBA was equal to the Stop RBA or the Start LRSN was equal to the End LRSN? These are special “events” or the so-called “dummy” entries. To see how many you have, run this little query:

-- SPECIAL/DUMMY ENTRIES IN SYSLGRNX                  
SELECT L.LGRDBID                    AS DBID           
      ,SUBSTR(TS.DBNAME, 1, 8)      AS DBNAME         
      ,L.LGRPSID                    AS ID             
      ,L.LGRNEGPSID                 AS NEG_PSID       
      ,SUBSTR(TS.NAME, 1, 8)        AS SPACE          
      ,L.LGRPART                    AS PART           
      ,L.LGRUCTS                    AS LGRX_UCTIMESTAMP
--    ,HEX(L.LGRSRBA)               AS START_RBA      
--    ,HEX(L.LGRSPBA)               AS END_RBA        
      ,HEX(L.LGRSLRSN)              AS START_LRSN     
      ,HEX(L.LGRELRSN)              AS END_LRSN       
      ,'TS'                                           
      ,L.LGRMEMB                    AS DM             
FROM BOXWELL.SYSLGRNX2    L                           
    ,SYSIBM.SYSTABLESPACE TS                          
WHERE TS.DBID    = L.LGRDBID                           
  AND TS.PSID    = L.LGRPSID                          
  AND L.LGRSLRSN = L.LGRELRSN                         
--AND L.LGRSRBA  = L.LGRSPBA                          
UNION ALL                                             
SELECT L.LGRDBID                    AS DBID           
      ,SUBSTR(IX.DBNAME, 1, 8)      AS DBNAME         
      ,L.LGRPSID                    AS ID             
      ,L.LGRNEGPSID                 AS NEG_PSID       
      ,SUBSTR(IX.INDEXSPACE, 1, 8)  AS SPACE          
      ,L.LGRPART                    AS PART           
      ,L.LGRUCTS                    AS LGRX_UCTIMESTAMP
--    ,HEX(L.LGRSRBA)               AS START_RBA      
--    ,HEX(L.LGRSPBA)               AS END_RBA        
      ,HEX(L.LGRSLRSN)              AS START_LRSN     
      ,HEX(L.LGRELRSN)              AS END_LRSN       
      ,'IX'                                           
      ,L.LGRMEMB                    AS DM             
FROM BOXWELL.SYSLGRNX2    L                            
    ,SYSIBM.SYSINDEXES    IX                          
WHERE IX.DBID    = L.LGRDBID                          
  AND IX.ISOBID  = L.LGRPSID                          
  AND L.LGRSLRSN = L.LGRELRSN                         
--AND L.LGRSRBA  = L.LGRSPBA                          
ORDER BY 7                                            
WITH UR                                               
  ;                                                   

I get data like this here on my test DS system:

---------+---------+---------+---------+---------+---------+---------+---------+
  DBID  DBNAME        ID  NEG_PSID  SPACE       PART  LGRX_UCTIMESTAMP         
---------+---------+---------+---------+---------+---------+---------+---------+
   320  R510D0DE     122  N         R510S23        1  2019-09-30-07.53.56.670000
   320  R510D0DE     122  N         R510S23        1  2019-09-30-07.53.58.880000
   348  MVNXTEST     122  N         MVNXS23        1  2019-11-07-11.24.59.920000
   348  MVNXTEST     122  N         MVNXS23        1  2019-11-07-11.24.59.950000
   348  MVNXTEST     389  N         MVNXS001       1  2019-11-07-14.48.37.780000
   348  MVNXTEST     389  N         MVNXS001       1  2019-11-07-14.48.38.890000
   348  MVNXTEST     398  N         MVNXS002       1  2019-11-07-14.48.40.050000
   348  MVNXTEST     398  N         MVNXS002       1  2019-11-07-14.48.40.090000
   348  MVNXTEST     429  N         MVNXS007       1  2019-11-07-14.48.42.200000
   348  MVNXTEST     389  N         MVNXS001       1  2020-02-21-11.25.54.480000
   348  MVNXTEST     389  N         MVNXS001       1  2020-02-21-11.25.55.560000
   348  MVNXTEST     398  N         MVNXS002       1  2020-02-21-11.25.56.680000
   348  MVNXTEST     398  N         MVNXS002       1  2020-02-21-11.25.57.700000
-------+---------+---------+---------+---------+------
START_LRSN            END_LRSN                      DM
-------+---------+---------+---------+---------+------
00D6CCFFBDC159BB2E00  00D6CCFFBDC159BB2E00  TS       2
00D6CCFFBFC00FF01200  00D6CCFFBFC00FF01200  TS       2
00D6FCF5C92F2D4F5200  00D6FCF5C92F2D4F5200  TS       2
00D6FCF5C976B5E74600  00D6FCF5C976B5E74600  TS       2
00D6FD234DC040BFE200  00D6FD234DC040BFE200  TS       1
00D6FD234EBD47CE8400  00D6FD234EBD47CE8400  TS       1
00D6FD235042B0392800  00D6FD235042B0392800  TS       2
00D6FD2350B1394C5800  00D6FD2350B1394C5800  TS       2
00D6FD2351B0B0B72A00  00D6FD2351B0B0B72A00  TS       2
00D7823BB0FED94EA800  00D7823BB0FED94EA800  TS       1
00D7823BB1C7253F4A00  00D7823BB1C7253F4A00  TS       1
00D7823BB2F61E0A3400  00D7823BB2F61E0A3400  TS       2
00D7823BB32BFC34CE00  00D7823BB32BFC34CE00  TS       2

What I can see from this, is that for any DDL CREATE with LOGGED attribute, or any LOAD REPLACE LOG(NO) utility, you get an entry that you can also find in SYSCOPY. The RECOVER utility might well need this data as well if you are rolling forward. 

The oldest are the bestest!

Finally, a simple “review” of the SYSLGRNX data sorted by LGRNX_UCTIMESTAMP. Just so you get an overview, this SQL has a FETCH FIRST 500 ONLY but feel free to change that number!

-- OLDEST VALID RECORDS IN SYSLGRNX                   
SELECT L.LGRDBID                    AS DBID           
      ,SUBSTR(TS.DBNAME, 1, 8)      AS DBNAME         
      ,L.LGRPSID                    AS ID             
      ,L.LGRNEGPSID                 AS NEG_PSID       
      ,SUBSTR(TS.NAME, 1, 8)        AS SPACE           
      ,L.LGRPART                    AS PART           
      ,L.LGRUCTS                    AS LGRX_UCTIMESTAMP
--    ,HEX(L.LGRSRBA)               AS START_RBA      
--    ,HEX(L.LGRSPBA)               AS END_RBA        
      ,HEX(L.LGRSLRSN)              AS START_LRSN     
      ,HEX(L.LGRELRSN)              AS END_LRSN       
      ,L.LGRMEMB                    AS DM             
      ,'TS'                                           
FROM BOXWELL.SYSLGRNX2    L                           
    ,SYSIBM.SYSTABLESPACE TS                          
WHERE TS.DBID     = L.LGRDBID                         
  AND TS.PSID     = L.LGRPSID                         
--AND NOT L.LGRSRBA  = L.LGRSPBA                      
  AND NOT L.LGRSLRSN = L.LGRELRSN                     
UNION ALL                                             
SELECT L.LGRDBID                    AS DBID           
      ,SUBSTR(IX.DBNAME, 1, 8)      AS DBNAME         
      ,L.LGRPSID                    AS ID             
      ,L.LGRNEGPSID                 AS NEG_PSID       
      ,SUBSTR(IX.INDEXSPACE, 1, 8)  AS SPACE          
      ,L.LGRPART                    AS PART           
      ,L.LGRUCTS                    AS LGRX_UCTIMESTAMP
--    ,HEX(L.LGRSRBA)               AS START_RBA      
--    ,HEX(L.LGRSPBA)               AS END_RBA        
      ,HEX(L.LGRSLRSN)              AS START_LRSN     
      ,HEX(L.LGRELRSN)              AS END_LRSN       
      ,L.LGRMEMB                    AS DM             
      ,'IX'                                           
FROM BOXWELL.SYSLGRNX2    L                           
    ,SYSIBM.SYSINDEXES    IX                          
WHERE IX.DBID   = L.LGRDBID                           
  AND IX.ISOBID = L.LGRPSID                           
--AND NOT L.LGRSRBA  = L.LGRSPBA                      
  AND NOT L.LGRSLRSN = L.LGRELRSN                     
ORDER BY LGRX_UCTIMESTAMP                             
FETCH FIRST 500 ROWS ONLY                              
WITH UR                                               
  ;                                                   

Here you can see that I rarely ever image copy the Db2 catalog…

---------+---------+---------+---------+---------+---------+---------+---------+
  DBID  DBNAME        ID  NEG_PSID  SPACE       PART  LGRX_UCTIMESTAMP         
---------+---------+---------+---------+---------+---------+---------+---------+
     6  DSNDB06     2002  N         SYSTSIPT       1  2019-09-28-13.45.32.200000
     6  DSNDB06     2000  N         SYSTSIXS       1  2019-09-28-13.45.32.250000
     6  DSNDB06     2016  N         SYSTSTSP       1  2019-09-28-13.46.59.970000
     6  DSNDB06     2012  N         SYSTSTPT       1  2019-09-28-13.46.59.970000
     6  DSNDB06     2034  N         SYSTSSTG       1  2019-09-28-13.48.02.210000
     1  DSNDB01      148  N         DSNSPT02       0  2019-09-28-13.51.16.620000


-------+---------+---------+---------+---------+------
START_LRSN            END_LRSN                  DM   
-------+---------+---------+---------+---------+------
00D6CACA98D8486B6C00  00D6CACCE2F5E2B14600       1  TS
00D6CACA99560741CC00  00D6CACCE2F9B5CCD800       1  TS
00D6CACAEC7D724A9600  00D6CACCE2F31722C800       1  TS
00D6CACAEC7C8EA08A00  00D6CACCE2F7C7553800       1  TS
00D6CACB27FFD7A40200  00D6CACCE2F214B70E00       1  TS
00D6CACBE127037D0A00  00D6CACCE31529801400       1  IX
00D6CACBE1279F5EA000  00D6CACCE31662377800       1  IX

What’s in a name?

Looking at the outputs you can see the NEG_PSID column. This is set based on whether or not the original PSID in the SYSLGRNX was negative or not. If it is negative it means you are dealing with a CLONE object, so buyer beware on those entries!

The DM column is the Data-sharing member number, it is 0 (zero) for Non-DS and 1 to 32 for DS systems. Sometimes you can get a clue about why an entry is there when you can link it to a member. Use the -DISPLAY GROUP command to get the number for your member:

DSN7100I  -SC10 DSN7GCMD                                            
*** BEGIN DISPLAY OF GROUP(GSC10C11) CATALOG LEVEL(V12R1M505)       
                  CURRENT FUNCTION LEVEL(V12R1M505)                 
                  HIGHEST ACTIVATED FUNCTION LEVEL(V12R1M505)       
                  HIGHEST POSSIBLE FUNCTION LEVEL(V12R1M506)        
                  PROTOCOL LEVEL(2)                                  
                  GROUP ATTACH NAME(SC1 )                           
---------------------------------------------------------------------
DB2          SUB                     DB2    SYSTEM    IRLM          MEMBER   ID  SYS  CMDPREF   STATUS   LVL    NAME      SUBSYS IRLMPROC
-------- --- ---- --------  -------- ------ --------  ----   --------
MEMSC10    1 SC10 -SC10     ACTIVE   121507 S0W1      JC10   SC10IRLM
MEMSC11    2 SC11 -SC11     QUIESCED 121507 S0W1      JC11   SC11IRLM
---------------------------------------------------------------------

Here the ID column is the member number. It is possible you might have DM ids for members that no longer exist… these must also be cleared out as well!

As always, any questions or comments would be most welcome!

TTFN,

Roy Boxwell

2020-07 IDUG 2020 What I have learnt

As noted in my last newsletter, the virtual world is influencing us all now. The IDUG North America 2020 had to “go virtual” and here are a few points I picked up over the last weeks while manning our virtual booth and attending virtual sessions. These are just randomly sorted and there were a ton of other interesting IDUG presentations that I recommend you try and read!

Always on

IFCID 376 for incompatible changes should be on all the time in all systems. The reasoning behind this, is quite simple: The longer ahead you are seeing problems the more time you have to get an application fix. Naturally, you can and should use the APPLCOMPAT feature to lock down applications, but when you wish to use newer features you cannot rely on this anymore. This leads to the paradox of don’t go there yet, but go there as fast as you can! Judicious use of this IFCID could well save you an embarrassment or two!

PBGs forever

Well, only in small doses! The idea with these, is to use them as a plug-replacement for segmented and simple spaces, but only with MAXPARTITIONS set to 1 – A bit strange though, as that caps your maximum size quite a bit, but the reasoning here is to use the DSSIZE to actually control the size of your objects and not let them grow and grow and grow.

FTB switch off

Sadly, the use of FTB Fast Traversal Blocks is not recommended at this time due to ongoing issues. In fact, a Red Alert came during the IDUG noting that we should also switch off Insert Algorithm 2 (IAG2) – there is even an APAR out there to change the IAG2 default to “switch off by default”. This is a real shame, as these two features were always touted as the next best thing after sliced bread. Oh well…

New Db2 Version

Yes indeed! – Spotted in at least one IBM presentation were tantalizing hints about a new release coming out. Remember when Db2 12 came out, the list of reasons for a new version/release were just “Major control block changes or a New PL/X compile to take advantage of any new architecture level” – Well, a few months ago, a new reason appeared: “Changes to terms and conditions” which I quite liked, as that is pretty transparent! Now I have seen a new one: “Non-UTS Tablespaces will be retired in the future as part of a future function level or VNEXT“

Yep, VNEXT is back – Yoo-hoo!

RUNSTATS you like

Not really learnt by me, but a few people mentioned that when you REORG only collect stats at the TS level if things have really changed. If you REORG Indexes *never* collect inline stats. Why? These days the reasons for REORG can be for actioning DDL changes and, in these cases, spending all the CPU to re-get the stats data is probably pointless. On the other hand, with indexes you can get statistical data drift, and so it makes *no* sense to do inline stats with index reorgs at all.

RTS Time travel

In Db2 12 we got a whole bunch of SYSIBM.SYSxxxxx_H style tables. We got two for the RTS tables, and you must “activate” them as they are not active by default. There are also no indexes and no automatic deletion, so when you do the ALTERs make sure you have thought about, and created, performance indexes and a purge process for old, dead data. These tables can give very good insights into the overall performance and usage of all the objects in a Db2 system.

Distributed Level

For access Db2 12 we got a little update about exactly which levels are required:

Need V11.1 FP1 (JCC driver level 3.72 to 4.22) to exploit beyond V12R1M500 (or NFM mode), otherwise ANY supported level should work

• Need clientApplCompat driver parameter to exploit features in M501+

  • Made optional with APAR PH08482 for down-level clients and Db2 Connect Servers

This is still a major cause of confusion even today, nearly three years after Db2 12 was first released!

Buzzword Bingo

Then I started the Jenkins, Pipeline learning curve, but I must admit, as an old COBOL guy, all the new stuff is a bit strange. GiT is still a term of abuse for me! But the presentation went into near epic depth about how to actually get it all working… I added this to my bucket list! I especially liked the SCORCHING JIT compiler optimization setting.

REORG Rules

The use of REORGINSERTS is no longer recommended as a reason for a REORG. A few releases ago, the REORGUPDATES reason was also removed. Now we only have REORGDELETES from the classic counters in the RTS that should be used as a trigger for a TP/TS level REORG.

So, did you learn anything new at the IDUG? Is there anything I missed?

As always, any questions or comments would be most welcome and I would love to “virtually” meet you all soon!

TTFN,

Roy Boxwell

2020-06 Let’s get Virtual with each other

Due to the Corona virus a whole bunch of us IT folk have been getting a lot of “virtual” recently! We are viewed as being “system relevant” and some of us *must* work from home to keep the systems rolling. This month I thought I would walk down the road we at Software Engineering and SEGUS have all about virtual, and simulated, Db2.

Virtual, really?

The first thing is to think what do we mean by “virtual”, when I hold a live webinar I am talking live and you are listening live, there is nothing really virtual about it is there? The same is true about Db2 sub-systems. If you wish to test a major upgrade of your software (or even just a FL switch) you really should get it all tested in a virtual world before letting it loose in the real world!

How to get there?

You need a complete copy of your Db2 Catalog and Directory. That is enough – User data is not required and actually could be classified as dangerous from the Audit point of view! How you get this done is your business but I use ICE our InstantCloning Expert to get the job done really really fast!

Enough?

Nope, once you have got this data copied across you need to “virtualize” the ZPARMS, bufferpools and even the hardware you currently have in production. IBM came up with an interesting way of doing this with hideously complicated HEX updates of EXPLAIN columns but I use VOX our VirtualOptimizer Expert to do all this for me!

Done yet?

Of course not! What is the good of having a virtual production system when you have no SQL to actually run on it? Just doing an EXPLAIN is ok of course – see later – but actually running the SQL that runs on the productive system should be your goal. For this I use the WLX (SQL WorkLoadExpert) tool to collect as much SQL as I like on production and then use this “workload” on my virtual production to actually check what happens when software is running.

Finished?

Still not yet! The next part of the puzzle is to also EXPLAIN all of the dynamic and static SQL both before and after the “change” whatever that change may be! You then compare the outputs of about 120,000 explains to see where anything diverges in its access plan. I use our BIX BindImpact Expert to do all this automatically for dynamic and static SQL.

You must be done by now!

Never! The final piece of the virtual puzzle is now to switch on the IFCID 376 while running the SQL workloads to give you a heads up if any application changes are rolling down the road towards you. We are all aware that application changes take a while to get approved and done and the further out you can see the obstacle the easier it is to plan and avoid the crash!

It’s all too much!

Well, you can even do a “cheaper” variant – Instead of cloning the whole Db2 subsystem just copy the DDL, using DEFINE NO of course, and the production Statistics using our ProductionSimulator tool to a sand-box style Db2 subsystem. You can also rename the objects on the fly and this all enables EXPLAIN processing to be done without a real workload. This will only show you changing access plans for your static and dynamic SQL but is clearly much better than nothing!

Sounds Cool!

It is! You have a nearly 100% automated method to completely check any Db2 subsystem for any changes and using our advanced checking algorithms we can weed out all the background noise to let you see just “the facts ma’am”

But Virtual with each other?

As you are probably aware the IDUG NA 2020 has gone Virtual on us all and so it has a bunch of live sessions and a whole raft of pre-recorded sessions. Happily they also Include mine – released on the 3rd August – ”A DBA’s Got to Know Their Limitations!” I am also holding a VSP about Zowe (The zGUI r(evolution) – What is ZOWE going to do for me?) on Monday 10th August at 4pm EDT (22:00 Berlin) with live Q&A.

Please visit our virtual Booth at

https://www.idug.org/page/expo-hall

Click on the SEGUS booth, middle row right hand side,  then scroll down to register and get the chance to win an iPad!!! A real one as well…

 

As always, any questions or comments would be most welcome and I would love to “virtually” meet you all soon!

TTFN,

Roy Boxwell

2020-05 Things I never knew

Believe it or not, I actually do not know everything (Please do not tell my long-suffering wife!) about Db2. I do know a ton of stuff, but you never ever stop learning.

I thought I would create a newsletter out of all the stuff I never knew before that I have learnt over the last few years. Maybe you’re very clever and know all of this already? But if you don’t, that’s OK too.

Sliding Scale

We have a nice piece of software called Space AssuranceExpert for Db2 z/OS (SAX), that guarantees that you do not hit any of the various limits within space management, and beyond, on Db2. When IBM introduced their sliding scale of secondary allocations it collided with our variant. (We use “seed” values to get, what I believe, is a much better utilization of available space, but I digress…) so we changed our system to check the MGEXTSZ. This, by the way, is one of the worst documented ZPARMs I have ever tried to understand! The documentation states “MGEXTSZ default YES – If SECQTY is greater than 0 then the sliding scale will be used”. So, when you create an object using SECQTY -1 or just not using SECQTY (also gives you -1) it would appear that the sliding scale is switched off… Well what IBM really meant was the SECQTY of the allocated VSAM dataset. Naturally this is either zero (no secondaries for you!) or a positive number! If you look at the Db2 Catalog tables you will also see that the fields you use as PRIQTY and SECQTY are actually stored as PQTY, SQTY, and SQTYI so, technically speaking, IBM are correct. The only case where MGEXTSZ is actually used is if you *do* have a positive value in the SQTY/SQTYI field. If MGEXTSZ is YES then this value is ignored and the sliding scale is used, if MGEXTSZ is NO then the value of SQTY/SQTYI, converted back to kilobytes of course, will be used.

An extent by any other name

Talking about SAX: I also saw something quite amazing as a customer allocated a PBG space with 11 maxpartitions and a DSSIZE of 8GB. He managed to hit maxpartitions without getting warned by SAX. “Impossible”, I said! There is no way it can allocate one dataset at 8GB with *no* extents… Yet this customer had somehow managed it… Their EAVs are so huge, they could allocate without a *single* extent being requested. Now, I always thought that every dataset that is allocated gets one extent, and that this extent would trigger the IFCID 258. I was wrong! The creation of an LDS, if it does not require more than one extent, does not trigger this IFCID. It does, however, trigger the IFCID 97 (LDS Creation).So I added this IFCID into our SAX system too and now we are able to catch LDS problems even with *massive* disks!

Index for DGTT

This brings me to the next item. If you create a DGTT:

DECLARE GLOBAL TEMPORARY TABLE T (COL1 CHAR(1) NOT NULL);

You get *no* dataset being created after all why? It just goes in DSNDB07 etc. so a create makes no sense, right? 

SAX has seen nothing either:

O2RT-SU04-011I: 10:01:09 - Datasets will be processed now                  

Now create an index on that DGTT

DECLARE GLOBAL TEMPORARY TABLE T (COL1   CHAR(1) NOT NULL);
CREATE INDEX I ON SESSION.T (COL1 ) ;        

Now you do get a dataset being allocated and deleted!

O2RT-SU04-011I: 10:01:09 - Datasets will be processed now                 
O2RT-SU04-024I: LDS creation for DB2DC1.DSNDBD.DSNDB07.TIX12768.I0001.A001 
O2RT-SU04-011I: 14:41:58 - Datasets will be processed now                 

Notice the Database name “DSNDB07”?- This tells you that it is creating a dataset in the work database but this is not really documented anywhere. Naturally, if you think about it for a while, it *must* create a VSAM LDS to simply hold the data, as it cannot use another one of the DSNDB07 tablespaces for that purpose! This is something to bear in mind when using indexes on DGTTs. You get dataset create/delete elapsed time…

RUNSTATS for XML

In Db2 12, FL100 RUNSTATS was enhanced to enable Frequency and Histogram data to be collected into the SYSKEYTGTDIST catalog table. I created an XML Object:

DROP DATABASE      "BOXWELLX" ;                                     
COMMIT ;                                                           
  CREATE DATABASE "BOXWELLX"                                       
         BUFFERPOOL BP0                                            
         INDEXBP    BP0                                            
         STOGROUP   SYSDEFLT                                       
         CCSID UNICODE                                              
;                                                                  
COMMIT ;                                                           
  CREATE       TABLESPACE "XMLCUSTO"                               
         IN "BOXWELLX"                                             
         USING STOGROUP SYSDEFLT                                   
         PRIQTY      720                                           
         SECQTY      720                                           
         ERASE NO                                                  
         DEFINE YES                                                
         FREEPAGE    0                                             
         PCTFREE     5                                              
         GBPCACHE CHANGED                                          
         TRACKMOD YES                                              
         LOG      YES                                              
         MAXPARTITIONS  6                                           
         BUFFERPOOL BP0                                            
         LOCKSIZE ROW                                              
         LOCKMAX SYSTEM                                            
         CLOSE YES                                                 
         COMPRESS NO                                               
         CCSID UNICODE                                             
         MAXROWS  255                                               
         SEGSIZE    4                                              
;                                                                  
COMMIT ;                                                           
  CREATE TABLE "BOXWELL"."XMLCUSTOMER"                             
  ("CID"                            BIGINT                  NOT NULL
  ,"INFO"                           XML                             
  ,CONSTRAINT CID                                                  
   PRIMARY KEY                                                     
  ("CID"                                                           
  )                                                                 
  )                                                                
  IN  "BOXWELLX"."XMLCUSTO"                                        
  AUDIT NONE                                                       
  DATA CAPTURE NONE                                                
  CCSID UNICODE                                                    
      VOLATILE CARDINALITY                                         
  APPEND NO                                                         
;                                                                  
  COMMIT ;                                                   
    CREATE UNIQUE INDEX                "BOXWELL"."XML_PRIMARY"
           ON "BOXWELL"."XMLCUSTOMER"                         
           ("CID"                            ASC             
           )                                                 
           CLUSTER                                           
           USING STOGROUP SYSDEFLT                            
           PRIQTY      720                                   
           SECQTY      720                                   
           ERASE NO                                          
           FREEPAGE    0                                      
           PCTFREE    10                                     
           GBPCACHE CHANGED                                  
           DEFINE YES                                        
           COMPRESS NO                                       
           BUFFERPOOL BP0                                    
           CLOSE NO                                          
           PIECESIZE    2 G                                  
           COPY NO                                           
  ;                                                          
  COMMIT ;                                                   
    CREATE INDEX                       "BOXWELL"."CUST_TYPE" 
           ON "BOXWELL"."XMLCUSTOMER"                        
           ("INFO"                                           
           )                                                 
    GENERATE KEY USING XMLPATTERN                            
/customerinfo/phone/@type'                                   
    AS SQL VARCHAR( 128)                                     
           NOT CLUSTER                                       
           NOT PADDED                                        
           USING STOGROUP SYSDEFLT                           
           PRIQTY        4                                   
           SECQTY        4                                   
           ERASE NO                                          
           FREEPAGE    0                                     
           PCTFREE    10                                     
           GBPCACHE CHANGED                                  
           DEFINE YES                                        
           COMPRESS NO                                        
           BUFFERPOOL BP0                                    
           CLOSE YES                                         
           PIECESIZE    2 G                                  
           COPY NO                                           
  ;                                                          
    CREATE INDEX                       "BOXWELL"."CUST_ZIP"  
           ON "BOXWELL"."XMLCUSTOMER"                        
           ("INFO"                                           
           )                                                
    GENERATE KEY USING XMLPATTERN                           
//pcode-zip'                                                
    AS SQL DECFLOAT(34)                                     
           NOT CLUSTER                                      
           NOT PADDED                                       
           USING STOGROUP SYSDEFLT                          
           PRIQTY        4                                  
           SECQTY        4                                  
           ERASE NO                                         
           FREEPAGE    0                                    
           PCTFREE    10                                    
           GBPCACHE CHANGED                                 
           DEFINE YES                                        
           COMPRESS NO                                      
           BUFFERPOOL BP0                                   
           CLOSE YES                                        
           PIECESIZE    2 G                                  
           COPY NO                                          
  ;                                                         
  COMMIT ;                                                  
    CREATE INDEX                       "BOXWELL"."CUST_PHONE"
           ON "BOXWELL"."XMLCUSTOMER"                       
           ("INFO"                                          
           )                                                
    GENERATE KEY USING XMLPATTERN                           
/customerinfo/phone'                                        
    AS SQL VARCHAR( 128)                                    
           NOT CLUSTER                                      
           NOT PADDED                                       
           USING STOGROUP SYSDEFLT                          
           PRIQTY        4                                  
           SECQTY        4                                  
           ERASE NO                                          
           FREEPAGE    0                                    
           PCTFREE    10                                    
           GBPCACHE CHANGED                                 
           DEFINE YES                                        
           COMPRESS NO                                      
           BUFFERPOOL BP0                                   
           CLOSE YES                                        
           PIECESIZE    2 G                                  
           COPY NO                                          
  ;                                                         
  COMMIT ;                                                

Phew!

Then I needed to insert a few rows. Here are the five I used for my tests:

SET CURRENT SQLID = 'BOXWELL' ;                                  
COMMIT ;                                                         
DELETE FROM BOXWELL.XMLCUSTOMER ;                                
COMMIT ;                                                         
INSERT INTO BOXWELL.XMLCUSTOMER VALUES (06001, XMLPARSE(DOCUMENT '
<customerinfo><name>Justin Horovitz</name>                        
<addr country="United States"><street>327 Ramses Ave</street>    
<city>Glendale</city><prov-state>California</prov-state>         
<pcode-zip>91208-06001</pcode-zip></addr>                        
<phone type="work">818-956-06001</phone></customerinfo>'));      
INSERT INTO BOXWELL.XMLCUSTOMER VALUES (06002, XMLPARSE(DOCUMENT '
<customerinfo><name>Matthew Broad</name>                         
<addr country="United States"><street>808 Mayo St</street>       
<city>Burbank</city><prov-state>California</prov-state>          
<pcode-zip>9150600-06002</pcode-zip></addr>                      
<phone type="work">818-541-06002</phone></customerinfo>'));      
INSERT INTO BOXWELL.XMLCUSTOMER VALUES (06003, XMLPARSE(DOCUMENT '
<customerinfo><name>Laura McCarthy</name>                        
<addr country="United States"><street>5224 Grover Court</street> 
<city>San Jose</city><prov-state>California</prov-state>         
<pcode-zip>95123-06003</pcode-zip></addr>                        
<phone type="work">408-956-06003</phone></customerinfo>'));      
INSERT INTO BOXWELL.XMLCUSTOMER VALUES (06004, XMLPARSE(DOCUMENT '
<customerinfo><name>Mira Glass</name>                            
<addr country="United States"><street>444 Valencia St</street>   
<city>San Francisco</city><prov-state>California</prov-state>    
<pcode-zip>94110-06004</pcode-zip></addr>                        
<phone type="work">415-762-06004</phone></customerinfo>'));      
INSERT INTO BOXWELL.XMLCUSTOMER VALUES (06005, XMLPARSE(         
'<customerinfo><name>Amir Malik</name>                           
<addr country="United States"><street>555 Bailey Ave</street>    
<city>San Jose</city><prov-state>California</prov-state>         
<pcode-zip>95141-06005</pcode-zip></addr>                        
<phone type="work">408-555-06005</phone></customerinfo>'));      
INSERT INTO BOXWELL.XMLCUSTOMER VALUES (06995, XMLPARSE(         
'<customerinfo><name>Amir Malik</name>                           
<addr country="United States"><street>555 Bailey Ave</street>    
<city>San Jose</city><prov-state>California</prov-state>         
<pcode-zip>95141-06995</pcode-zip></addr>                         
<phone type="work">408-555-06995</phone></customerinfo>'));      INSERT INTO BOXWELL.XMLCUSTOMER VALUES (06996, XMLPARSE(         
'<customerinfo><name>Kathy Smith</name>                          
<addr country="Canada"><street>25 EastCreek</street>             
<city>Toronto</city><prov-state>Ontario</prov-state>             
<pcode-zip>M8X-3T6-06996</pcode-zip></addr>                      
<phone type="work">416-555-06996</phone></customerinfo>'));      
INSERT INTO BOXWELL.XMLCUSTOMER VALUES (06997, XMLPARSE(     
'<customerinfo><name>Jim Noodle</name>                       
<addr country="Canada"><street>25 EastCreek</street>         
<city>Markham</city><prov-state>Ontario</prov-state>         
<pcode-zip>N9C-3T6-06997</pcode-zip></addr>                  
<phone type="work">905-555-06997</phone></customerinfo>'));  
INSERT INTO BOXWELL.XMLCUSTOMER VALUES (06998, XMLPARSE(     
'<customerinfo><name>Anant Jhingran</name>                   
<addr country="United States"><street>555 Bailey Ave</street>
<city>San Jose</city><prov-state>California</prov-state>     
<pcode-zip>95141-06998</pcode-zip></addr>                    
<phone type="work">408-555-06998</phone>                     
<phone type="home">416-555-06998</phone>                     
<phone type="cell">905-555-06998</phone>                     
<phone type="cottage">613-555-06998</phone></customerinfo>'));
INSERT INTO BOXWELL.XMLCUSTOMER VALUES (06999, XMLPARSE(     
'<customerinfo><name>Bert and Ernie Inc.</name>              
<addr country="Canada"><street>1 Yonge Street</street>       
<city>Toronto</city><prov-state>Ontario</prov-state>         
<pcode-zip>M5W-IE6-06999</pcode-zip></addr>                  
<phone type="work">416-555-06999</phone></customerinfo>'));  
COMMIT ;                                                     

If you run a RUNSTATS on that tablespace:

RUNSTATS TABLESPACE BOXWELLX.XMLCUSTO         
   TABLE(ALL)                                 
   INDEX(ALL                                  
   KEYCARD                                    
         FREQVAL   NUMCOLS 0001 COUNT   10 MOST
         FREQVAL   NUMCOLS 0002 COUNT   10 MOST
         HISTOGRAM NUMCOLS 0001               
         HISTOGRAM NUMCOLS 0002               
        )                                     
   SHRLEVEL CHANGE                            
   REPORT YES                                 
   UPDATE ALL                                  
   SORTDEVT SYSALLDA                          
   SORTNUM  0004                              

In Db2 11 and Db2 12 I got *exactly* the same output. Even though I have created three XML indexes that *should* have got the SYSKEYTGTDIST statistics! The Db2 Catalog entries for these three indexes also look a bit “odd” as the UNIQUERULE column is set to “D” (I was expecting an “X” for XML!) and the IX_EXTENSION_TYPE is set to “V” (XML). Interestingly the implicit I_DocId Index *is* a UNIQUERULE ‘X’ and IX_EXTENSION_TYPE blank entry even though, in my eyes, it is *not* an XML index!

Now, when you create the test objects listed above, you also get a whole bunch of implicit objects. The tablespace (mine was called XXML0000), which contains the implicit table BOXWELL.XXMLCUSTOMER with five columns and an implicit I_NodeId index.

If you now run this RUNSTATS:

RUNSTATS TABLESPACE BOXWELLX.XXML0000          
   TABLE(ALL)                                   
   INDEX(ALL                                   
   KEYCARD                                     
         FREQVAL   NUMCOLS 0001 COUNT   10 MOST
         FREQVAL   NUMCOLS 0002 COUNT   10 MOST
         HISTOGRAM NUMCOLS 0001                
         HISTOGRAM NUMCOLS 0002                
        )                                      
   SHRLEVEL CHANGE                             
   REPORT YES                                  
   UPDATE ALL                                   
   SORTDEVT SYSALLDA                           
   SORTNUM  0004                             

In Db2 11 it moans about the FREQVAL and HISTOGRAM keywords:

DSNU1354I -QB1A 120 09:14:41.60 DSNUSIIX - KEYWORD KEYCARD IS NOT SUPPORTED FOR XML OBJECTS. THE KEYWORD IS IGNORED 
DSNU1354I -QB1A 120 09:14:41.60 DSNUSIIX - KEYWORD FREQVAL IS NOT SUPPORTED FOR XML OBJECTS. THE KEYWORD IS IGNORED 
DSNU1354I -QB1A 120 09:14:41.60 DSNUSIIX - KEYWORD HISTOGRAM IS NOT SUPPORTED FOR XML OBJECTS. THE KEYWORD IS IGNORED

And it does not create the SYSKEYTGTDIST entries. In Db2 12 it moans, but a bit differently:

DSNU1354I -DC10 120 09:13:07.35 DSNUSIIX - KEYWORD KEYCARD IS NOT SUPPORTED FOR CERTAIN TYPES OF XML OBJECTS. 
THE KEYWORD IS IGNORED.                                                                                        DSNU1354I -DC10 120 09:13:07.35 DSNUSIIX - KEYWORD FREQVAL IS NOT SUPPORTED FOR CERTAIN TYPES OF XML OBJECTS. 
THE KEYWORD IS IGNORED.                                                                                       DSNU1354I -DC10 120 09:13:07.35 DSNUSIIX - KEYWORD HISTOGRAM IS NOT SUPPORTED FOR CERTAIN TYPES OF XML OBJECTS.
THE KEYWORD IS IGNORED.                                                                                        

In my case, right at the end of the job, were the successful results:

DSNU1353I -DC10 120 09:13:07.43 DSNUSUKD - SYSKEYTGTDIST CATALOG STATISTICS FOR                                 

                                       CUST_TYPE  KEYSEQ 0001                                                  

                              FREQUENCY           COLVALUE                                                     

                              ---------           --------                                                     

                              7,6923076923076E-01 X'776F726B0000000000000000000000000000000000000000000000000000

                                                    000000000000000000000000000000000000000000000000000000000000

                                                    000000000000000000000000000000000000000000000000000000000000

                                                    000000000000000000000000000000000000000000000000000000000000

                                                    00000000000000000004'                                      

And

DSNU1356I -DC10 120 09:13:07.44 DSNUSUKD - SYSKEYTGTDIST CATALOG HISTOGRAM STATISTICS FOR CUST_TYPE KEYSEQ 1 QUANTILE 1

                              LOWVALUE          = X'63656C6C0000000000000000000000000000000000000000000000000000      

                                                    000000000000000000000000000000000000000000000000000000000000      

                                                    000000000000000000000000000000000000000000000000000000000000      

                                                    000000000000000000000000000000000000000000000000000000000000      

                                                    00000000000000000004'                                             

                              HIGHVALUE         = X'63656C6C0000000000000000000000000000000000000000000000000000      

                                                    000000000000000000000000000000000000000000000000000000000000      

                                                    000000000000000000000000000000000000000000000000000000000000      

                                                    000000000000000000000000000000000000000000000000000000000000      

                                                    00000000000000000004'                                             

                              CARDF             = 1,0E+00                                                              

                              FREQUENCYF        = 7,6923076923076E-02                                                 

Most of the other columns were also reported, apart from one set of data which was strangely missing. Can you guess which one and why? Scroll back up to the creation of the indexes and you will see that I made one resolve into a DECFLOAT. This is*not* supported and so is simply ignored!

Where’s the beef?

Well, the interesting point about this “voyage of XML discovery” is you *must* run a RUNSTATS on a different tablespace than where the indexes were created. Does your current RUNSTATS creation software handle this? Or are you missing out on possible XMLEXISTS exploitation due to missing RUNSTATS? From the “What’s New?”:

Db2 can use frequency and histogram statistics for XML indexes to estimate the filter factor of XMLEXISTS predicates when the following conditions are true:

The predicate is one of the following types:

  • Equality predicates: =
  • Range predicates: >, >=, >=, <=
  • The right side of the predicate is a literal value.

Our Freeware StatisticsHealthCheck was just upgraded to version 3.4 to handle exactly this “problem”.

Naturally, if you can read, it is an advantage and the IBM Db2 documentation clearly states:

“XML indexes are related to XML tables, and not to the associated base tables. If you specify a base table space and an XML index in the same RUNSTATS control statement, Db2 generates an error. When you run RUNSTATS against a base table, RUNSTATS collects statistics only for indexes on the base table, including the document ID index.”

Db2 11/12 – Utilities – RUNSTATS – Collection of statistics on XML objects

I had never noticed that paragraph before… Shame on me!

Caught in a BIND

Back in Db2 10, IBM introduced the BIND PACKAGE(xxx) EXPLAIN(ONLY) syntax to fill the PLAN_TABLE and its buddies, but without actually doing a BIND. Very cool feature – until you find out that it jolly well *does* do a BIND, with all of the locking implications that we all know and hate, before casually doing a ROLLBACK as if “these are not the droids you are looking for”. One of my pet bugbears I can tell you!

And this can really hurt in production!

Learning Curve

What have you learned from Db2 over the years that was strange/obtuse or just downright weird? Don’t be shy! We’d love to hear about it.

As always, any questions or comments would be most welcome!

TTFN,

Roy Boxwell

2020-04 Four Flavors of Db2 Audit

As well as simply getting it done, I will also run through the four different ways that you can process the gathered data.


Four ways to get a Db2 z/OS Audit done


1- First up

First option is the simplest, cheapest and quickest:

Do nothing.

Whether or not this will help your company is a non-trivial question of course!

Naturally this is an absolute No No.


2- Then we have

Next option is relatively simple and cheap, but requires a bit of work: 

Write it all yourself but based on existing data that some other process already extracts for you, (SMF for example). 

If you happen to have the skills for extracting the required audit data from existing data that is being collected anyway, then this might well be the best method if you are really strapped for resources. 


3- Getting there 

Then we have not so simple, still cheap, but a ton of work: 

Write it all yourself and add all the IFCIDs you actually need to audit your system as well as capturing all the SQL. 

This needs a serious amount of skills to get and keep up with the agile world of Db2. You will also need to take care of the amount of data that you will be collecting.

However, the auditor will be happy as you have everything they could ask for.


4- Aha! The only true way 

Last option is simple, not so cheap but very quick: 

Third party software that does it all for you.

This is my preferred solution, especially as we just happen to sell one (WorkLoadExpert Audit).

This is actually the only real way to go. You probably don’t have the time to keep all these things up-to-date and running correctly. 

Data Collected – Now what? 

So, you have chosen one of these ways to gather the data. Now you must evaluate what you got. Here again we have four separate ways to go forward: 

First up 

There it is! 

Do nothing. Just point at the datasets, print outs, database objects and say “It is all in there…” 

This is not really a solution and any auditor worth his, or her, salt would quite rightly be extremely upset! 

Then we have 

A whole bunch of pre-written SQLs. 

SPUFI is ok, but much better would be to see these in a GUI where graphical viewing is built in and saving and sharing results is much easier.  

This is not bad, but still a manual “island” process. Just Db2 and nothing else plus it must be triggered by humans. 

Getting there

A whole bunch of pre-written and custom SQLs.

This time, all run in Batch and the results are emailed to the auditor directly. These emails can “just sit there” until the auditor checks the results. Naturally, if anything is found, then the underlying data must still be there for a detailed analysis.

Better, as it is getting automatic but still not really “round”, as it is still Db2 in isolation…

Aha! The only true way

Use of LEEF or SYSLOGGER-style formats to export all audit data.

The data is then in a data-lake where SPLUNK, QRADAR et al can happily slice and dice their way through the data.

This is the best way!

You also get an extra bonus point for *removing* the data from the mainframe. As auditors *love* a single point of control, this is the only real way forward. It also pushes the Db2 data into the world of other data that auditors use and require.


Db2 Audit with “GIVE&TAKE” :


Software Engineering GmbH and SEGUS Inc are launching a new free Give&Take which this time is the Audit support from WorkLoadExpert.

If you would like to take part, then please just fire off an email to db2support@segus.com telling us who you are and which firm you work for and we will get in touch!

Give and Take 

By the way, it is called “Give&Take” because :

  • we Give you the software, for free, to run for a trial period, and
  • we would like to Take away what you think, feel, and find about the software after the trial period. 

More about Give&Take


TTFN, 

Roy Boxwell 

2020-03 Db2 REORG SYSCOPY: DRAIN Delays are Despicable

I heard about an interesting problem the other day. Please remember that “interesting” to me is just that: “interesting”. ”Interesting” for the DBAs and employees of the firm where it happened is, naturally, a bit different.

A normal start

  • Monday morning and all is well until around 07:00, when delays start appearing in online transactions.
  • Soon the delays are gaining the advantage and customers are starting to complain.
  • At around about 07:20 nearly the whole machine just sat there…
  • About 15 minutes later everything started running normally.

Hmmm, interesting.

The stage is set

So, the lead DBAs are off and running, looking for bad SQL that could possibly have caused the disturbance in the force. They were checking whether the coupling facility was under stress, they were checking for parallel running REORG, MODIFY, or QUIESCE in the SYSIBM.SYSCOPY, they were using our WorkLoadExpert (WLX) to see what was happening in the time window involved.

Tuesday arrives

And so do I! As luck would have it, I am at this site to hold a presentation all about BindImpactExpert, which saves you from bad access paths, and RunstatsRescue, which rescues you from badly timed RUNSTATS. Now this site already has these products, but I must present to a new intake of DBA and developer employees.

Check everything

After my presentation we checked everything and found a few timeouts and deadlocks, but nothing serious. Then I got my Deer Stalker hat on, (now there’s an image!), and decided to see where delays were coming in. One of the developers had already done a quick WLX check and had seen very high Drain Lock values.

WLX outputs a summary of what workload has been processed which, here in the labs on my little test system, looks like this:

Wait times in microseconds because of …                                
 latch requests               :                 594                    0
 page latch                   :                   0                    0
 drain locks                  :                   0                    0
 drain lock claims            :                   0                    0
 log writer                   :               32230                    0
 synchronous I/O              :             6840389                 9623
 locks                        :                   0                    0
 synchronous execute          :                   0                    0
 global locks                 :                   0                    0
 other threads read  activity :            28429563                    0
 other threads write activity :               13166                    0 

At the actual customer site I could see a 1000 times increase in wait drain locks between two runs!

Utility versus SQL

Now, as I am sure you are all aware, a drain is *only* used by a command or a utility, so I started thinking:

“There must be a parallel running something somewhere!”

“There must be a parallel running something somewhere!”

So I used WLX to show me the SQLs that had the highest wait drain locks. I took the top two (over 30,000 seconds of delay!) and got their tablespace names from the Db2 Catalog using the first referenced table column.

Horrible Job to do

It is not a pleasant task to search master address space sysouts, but in this case it was the only way. Using the tablespace names from the Db2 Catalog. I just navigated to the date and time in question and did F commands in SDFS on the tablespace names.

BINGO!

After a few minutes I found a strange message (Correlation Id=010.TLPLKNC3) about a drain not being possible for an internal Db2 system task

– This happens to be used by REORG, and it gave me the info about where the drain came from. I looked at that system’s log output in the time range, and sure enough there was a REORG of that very table which kept failing due to not getting the drain!

A retry too far?

At this site they use a 3rd Party software tool to generate REORG, RUNSTATS and COPY and it had a default of RETRY 30. It kept trying 30 times before eventually failing.

This explains the missing SYSCOPY entry as the REORG had failed!

The other one?

So that was one bad boy found – What about the other? That tablespace did not appear in any of the sysouts. So I drilled down to get the full SQL text (Over 8000 bytes long!) and scrolled on down to the FROM lines – and there was the *first* table name! After the dust had settled, I went back and I saw that :

this one table was actually in every single SQL in the top 200 delay candidates! A pretty central table if you ask me!

Who? and Why?

The management level now wanted to know who did it? And why? I left that part up to the customer, of course, as I do not want to get involved in a finger pointing exercise! My feeling is: like most disasters, it was probably a chain of events something like:

  1. REORG generated on Sunday.
  2. Due to some unforeseen problem the JCL was shunted to Monday.
  3. On Monday at 07:00 it started and killed the machine.

Never again?

Best way is to generate jobs straight to the Job Scheduler for instantaneous execution (No waiting or shunting allowed) and guess what? We have the RealTimeDBAExpert (RTDX) that allows you to do just that! You can easily exclude objects from utilities based on days of the week, hours of day etc. If you have a bought-in or home-grown system would it also have caused this disaster?

Console Messages

If the customer had had our WLX Console Message Use Case licensed, it would have also made the detective work much easier, as then you have a central place to go where *all* console messages from *all* members are written and searchable! This would have saved a lot of time and trouble.

Bottom Line

(Removing my Deer Stalker hat and replacing it with a mortar board.)

Look everywhere, trust no-one and remember that a DRAIN is almost definitely nothing to do with SQL or a badly timed RUNSTATS.

As always, any questions or comments would be most welcome!

TTFN,

Roy Boxwell

2020-02 Db2 UPDATE column: UPDATEs for nothing and CPU ain’t free!

A bad misquote of a great Dire Straits song, but it is one great thing I saw last year!

What is an Update?

We all know what an update is, right? You have a column containing some value and you wish to update it to a new – different – value. You code an UPDATE with SET and all is done.

But, what happens “under the covers” when the column value you are updating is exactly the same as the new value?

Suspected real-time abuse

The problem surfaced gradually… as all good problems do… The DBAs were wondering why an SQL that was executed half a million times per day was waiting for other threads for so long. (This was discovered using our WorkLoadExpert (WLX)). The wait times were frighteningly high, and so it was decided that this SQL should be the target of some sort of tuning effort.

It then came out, while gathering basic tablespace statistical data,

that the related tablespace had not been REORGed for over six months when, according to SEGs WorkLoadExpert, there were half a million updates every day against it!

RTS or SEG Bug?

Naturally the first idea is it must be a bug. Either the Real-time Statistics (Not incrementing the REORGUPDATES counter – It could even be NULL for example) or, heaven forbid, a bug in our WorkLoadExpert.

I took a closer look at the SQL:

UPDATE aaa.bbb
 SET COL1 = ?
 WHERE COL1 = ?
 ;

and just sort of wondered out loud,

“They are not using the same value in both parameter markers are they?”

The “they” in this case was the developer of course…

Oh My Word!

After a quick e-mail discussion, it then came out that, that was indeed the case! Db2 is clever but sometimes not that brilliant! The developer had had the idea of executing this SQL to “see” if the value existed or not… He did not think about what Db2 then actually does…

Under the Covers

Db2 does not “know” what the current value of COL1 is. It used, in this case, Index access to get and obtain an X lock on the target page – (this was then the reason for the very large wait times on the other threads!). Once the lock was held, it could then discover that there was *nothing* to do, and so it did nothing! Then it happily released the lock(s) after doing nothing and returned SQLCODE 0.

No Log data was written as nothing was done, and REORGUPDATES was not incremented as nothing was done, but the CPU/Elapsed overhead was enormous!

The right way

The head DBA has said the SQL should look like:

SELECT ‘A’ FROM aaa.bbb
WHERE COL1 = ?
FETCH FIRST 1 ROW ONLY
WITH UR
;

This is now on its way through change management! Naturally, it is the way the developer should have coded it from the get go!

What can you do?

Now this, of course, caused alarm bells to ring as “cut-and-paste” is your friend. If there is bad code in one place it is probably being copied further, even as you read this! Using SEG’s WorkLoadExpert and the Real-time Statistics, you can easily pull out and analyze any “bad guys”.


Put simply, use the UPDATE count from WLX and correlate it to the REORGUPDATES counter. If they are wildly different, taking into account REORGLASTTIME and WLX_TIMESTAMP, then you have a candidate to further track down!


Now, where are those refrigerators we have to move?

As always, I would be pleased to hear from you and any war stories you have!

TTFN,
Roy Boxwell
Senior Architect


More about WLX

2020-01 How RUNSTATS causes an error SQLCODE

Impossible! I hear you all say… How can a RUNSTATS *cause* an SQL error code? Well, my gentle readers, read on.


Two examples in SPUFI at the end of this newsletter

What we know

We are all, I hope, aware that a badly timed RUNSTATS can cripple your SQL Performance. Just think of a see-sawing, or volatile table, and it gets a RUNSTATS at the zero point… Tablespace scan is then a good access choice… After a couple of million inserts that is probably not the best!

Awful, Implicit Casting

Implicit casting came along a few Db2 releases ago and either made you very happy: “I never have to worry about using apostrophes again!” or very angry: “Developers must know what they are coding!” The thing is, we got it and you cannot *not* do it!

It goes both ways…

The idea behind Implicit casting, in case you don’t know, is that:

The predicate COL1 = 1 and COL1 = ‘1’ are the same to Db2.

It will take that ‘1’ and “cast” it to a variable type that will then be able to be compared to whatever type COL1 is.

So,

  • if COL1 contains a numeric representation of data everything is ok.
  • But if COL1 is CHAR(1) and contains a ’Y’ you then get an SQLCODE -420

Nasty, nasty business…

Access path is also sub-optimal

When Db2 does implicit casting, it casts to DECFLOAT to then do the comparison etc. (See my DECFLOAT newsletter about what I think about that data type!) Anyway, it is *not* good for performance. In fact, it got so annoying that a few users actually asked for a ZPARM to switch off implicit casting! They actually wanted an error whenever they compared mismatched data types. This request was, of course, turned down.

So where’s the beef?

So, what happened in production was quite simple really: a query had been running for three years with never a problem. Then one day, after a RUNSTATS, it started returning SQLCODE -420. This was due to the fact that the table processing order had switched, due to the RUNSTATS running at a “bad” time.

This in turn exposed the buggy SQL WHERE predicate that previously had never seen the bad data as it was removed in an earlier branch! This could also happen when the column in question actually contains non-numeric data perhaps dues to a code bug.

See the example SQL at the end.

RUNSTATS Rescue for Db2 Z/os

This user site has our software RUNSTATS Rescue so they quickly got the query up and running, without doing a code change, in a matter of seconds.

Of course, this bad code was discovered in the middle of the year end production freeze so they could not simply change the application code! This code change has been scheduled and will be done in the new year.

Bottom Line

RUNSTATS can cause negative SQLCODEs to be returned and RUNSTATS Rescue buys you the needed time to continue running – even with buggy SQL code!

As always, I would be pleased to hear from you and any war stories you have!

TTFN,
Roy Boxwell
Senior Architect


PS: Just for the record, here are two examples in SPUFI:


 SELECT *                                                               
 FROM SYSIBM.SYSDUMMY1                                                  
 WHERE IBMREQD = '1'                                                    
 ;                                                                      
 ---------+---------+---------+---------+---------+---------+---------+-
 IBMREQD                                                                
 ---------+---------+---------+---------+---------+---------+---------+-
 DSNE610I NUMBER OF ROWS DISPLAYED IS 0                                 
 DSNE616I STATEMENT EXECUTION WAS SUCCESSFUL, SQLCODE IS 100            
 ---------+---------+---------+---------+---------+---------+---------+-
 SELECT *                                                               
 FROM SYSIBM.SYSDUMMY1                                                  
 WHERE IBMREQD = 1                                                      
 ;                                                                      
 ---------+---------+---------+---------+---------+---------+---------+-
 IBMREQD                                                                
 ---------+---------+---------+---------+---------+---------+---------+-
 DSNE610I NUMBER OF ROWS DISPLAYED IS 0                                 
 DSNT408I SQLCODE = -420, ERROR:  THE VALUE OF A STRING ARGUMENT WAS NOT
          ACCEPTABLE TO THE DECFLOAT FUNCTION     
 DSNT418I SQLSTATE   = 22018 SQLSTATE RETURN CODE                        
 DSNT415I SQLERRP    = DSNXRNUM SQL PROCEDURE DETECTING ERROR            
 DSNT416I SQLERRD    = -245 0  0  -1  0  0 SQL DIAGNOSTIC INFORMATION    
 DSNT416I SQLERRD    = X'FFFFFF0B'  X'00000000'  X'00000000'  X'FFFFFFFF'
          X'00000000'  X'00000000' SQL DIAGNOSTIC INFORMATION            

And the result when the column actually contains a numeric valid value and not “Y”:

CREATE TABLE BOXWELL.SYSDUMMY1 LIKE SYSIBM.SYSDUMMY1       
 ;                                                          
 ---------+---------+---------+---------+---------+---------
 DSNE616I STATEMENT EXECUTION WAS SUCCESSFUL, SQLCODE IS 0  
 ---------+---------+---------+---------+---------+---------
 INSERT INTO  BOXWELL.SYSDUMMY1 VALUES ('1')                
 ;                                                          
 ---------+---------+---------+---------+---------+---------
 DSNE615I NUMBER OF ROWS AFFECTED IS 1                      
 DSNE616I STATEMENT EXECUTION WAS SUCCESSFUL, SQLCODE IS 0  
 ---------+---------+---------+---------+---------+---------
 SELECT *                                                   
 FROM BOXWELL.SYSDUMMY1                                     
 WHERE IBMREQD = '1'                                        
 ;                                                          
 ---------+---------+---------+---------+---------+---------
 IBMREQD                                                    
 ---------+---------+---------+---------+---------+---------
 1                                                          
 DSNE610I NUMBER OF ROWS DISPLAYED IS 1                     
 DSNE616I STATEMENT EXECUTION WAS SUCCESSFUL, SQLCODE IS 100
 ---------+---------+---------+---------+---------+---------
 SELECT *                                                   
 FROM BOXWELL.SYSDUMMY1                                     
 WHERE IBMREQD = 1                                          
 ;                                                          
 ---------+---------+---------+---------+---------+---------
 IBMREQD                                                    
 ---------+---------+---------+---------+---------+---------
 1                                                          
 DSNE610I NUMBER OF ROWS DISPLAYED IS 1                     
 DSNE616I STATEMENT EXECUTION WAS SUCCESSFUL, SQLCODE IS 100

Cool and dangerous! Just like me! LoL ! ! !