2025-01 Compress and Replicate This!

This month, I want to delve into one of those newish, but well-hidden, features of Db2 that arrived some years ago but did not get that much publicity!

Compress Yes!

We all love compression! It saves tons of disk space and, over the years, we even got Index Compression. Yes, I know it is *not* really compression at all – just the leading bytes – but it is better than nothing! We also got better, newer, Table compression algorithms (Looking at you Mr. Huffman!) and we always had the problems of un-compressing data if “we” needed to look at it off-line.

Dictionaries Anymore?

Db2, at least for Tables, uses dictionaries of varying sizes to save “common strings” mapped to small numbers. These dictionaries are pretty obviously only valid for one set of data from one table and/or partition. If you do a REORG after inserting a ton of new data then the new dictionary can be very different from the prior one. Db2 writes the *previous* dictionary to the log and we carry on fine …

Or Do We?

Well, what happens when you wish to read data from the log that was inserted *before* your current dictionary was created? This is, and will always be, a small technical problem! The programs out there now simply crawl back through the Db2 log looking for the compression dictionary that matches to your timescales. This obviously takes time, I/O and CPU!

Online?

Even if the dictionary you are looking for is the current one, accessing it might well cause DBD Locks to happen when it has to be opened, and it can even cause Db2 GBP dependency problems. The fun continues if the logs you need have been archived to tape, of course.

DCC

Data Capture Changes are the keywords here. DCC for the standard TLA. As you are probably aware, setting this at the table level enables Db2 to support data replication using the log. IBM Db2 call their version DataPropagator. The DDL syntax in CREATE/ALTER TABLE is DATA CAPTURE NONE/CHANGES:

DATA CAPTURE Specifies whether the logging of the following actions on the table includes additional information to support data replication processing:

• SQL data change operations

• Adding columns (using the ADD COLUMN clause)

• Changing columns (using the ALTER COLUMN clause)

NONE Do not record additional information to the log. This is the default.

CHANGES Write additional data about SQL updates to the log.

This Changes Everything!

Indeed, it does! DATA CAPTURE CHANGES (DCC and also CDC sometimes!) causes a full row to be logged for every update. If there was an update of just the very last byte of data in a row then it was a tiny log record of basically just that last byte. With DCC you always get the full row. For INSERT and DELETE you always got the full row anyway. Why the change? So that the replication software has it easy! Simple as that!

Side Effects?

There are a few side effects, of course. Mainly, the log size grows as you are logging more data, and any use of the SQL TRUNCATE will behave like a MASS DELETE (so no IGNORE DELETE TRIGGERS or IMMEDIATE options, for example). There are also a few ZPARMs that must all be reviewed and/or changed. Firstly, the ZPARM UTILS_BLOCK_FOR_CDC with options YES/NO and default NO. If set to NO then no utilities are barred from working on these DCC tables. If set to YES then:

  • CHECK DATA with DELETE YES LOG NO
  • LOAD with SHRLEVEL NONE or REFERENCE
  • RECOVER with TOLOGPOINT, TORBA, TOCOPY, TOLASTCOPY, or TOLASTFULLCOPY
  • REORG TABLESPACE with DISCARD
  • REPAIR with LOCATE DELETE

Terminated with Extreme Prejudice!

Will all be terminated. Why, you may wonder? Well, all of these can obviously *flood* the log with rows, millions of rows of data. If using data replication then their usage should be “evaluated” before attempting them – hence the ZPARM. Too many customers shot themselves in the foot with a badly timed LOAD, for example, so we got this “protection” ZPARM. Naturally, it is changeable online, so once you have reviewed the requirement for the utility you can still fire it off, if desired, and then, very quickly, switch back!

More?

Yes, there is always more! ZPARM REORG_DROP_PBG_PARTS with options DISABLE/ENABLE and default DISABLE. If set to ENABLE to allow REORG to drop empty PBG partitions after a successful REORG, and the table within is DCC, then this drop of empty partitions will be ignored! Finally, we have the ZPARM RESTRICT_ALT_COL_FOR_DCC with options YES/NO and default NO. It specifies whether to allow ALTER TABLE ALTER COLUMN for DCC tables. If it is set to YES you will get an SQLCODE -148 if ALTERing a DCC object using any of the options SET DATA TYPE, SET DEFAULT or DROP DEFAULT. Again, this is to stop accidental log and data replication flooding!

Compression Woes

So now you can fully see the quandary! If the rows are all compressed on the log you must access the dictionary to un-compress them on the “other side”, where you do the replication, or just locally if you want to see the data. As mentioned, getting our grubby little hands on the dictionary can cause performance issues so what can we do?

CDDS!

This arrived in Db2 11 for data-sharing people out there with a, and I quote, “GDPS® Continuous Availability with zero data loss environment.” It actually stands for Compression Dictionary Data Set and it brought in a couple of new ZPARMs:

  • CDDS_MODE with options NONE, SOURCE and PROXY and default NONE.
  • CDDS_PREFIX a 1 – 39 Byte z/OS dataset prefix for the VSAM Dataset where the “magic” happens.

VSAM???

Yep you read that right! The CDDS VSAM dataset (CDDS_PREFIX.CDSS) stores up to three versions of the dictionaries that your table/tableparts are using, thus enabling fast, problem-free access to the dictionaries. Naturally, the CDDS must be available to *both* systems – and this is why VSAM was chosen, of course!

Start Me Up!

Some new commands also came in here. -START CDDS instructs all members of a data-sharing group to allocate the CDDS and start using it. -STOP CDDS tells all members to stop using CDDS, close and deallocate it. This must be done before recovering the CDDS, for example.

So???

So now you have a possible way of getting great performance in data replication, but how do you get it all running and how do you review what’s going on? This is where REORG and a new standalone utility DSNJU008 come in.

Reorg Yes But No…

The way to initialize your CDDS is simply to REORG all of your tables/table partitions  with the INITCDDS option. Do not panic! If this option is set to YES it will *not* do a REORG! It will purely externalize the current dictionary and carry on. This means you can use a “generator” SQL statement like this:

SELECT DISTINCT 'REORG TABLESPACE ' CONCAT STRIP(A.DBNAME)
                        CONCAT '.' CONCAT STRIP(A.TSNAME)
                        CONCAT ' INITCDDS YES'
FROM SYSIBM.SYSTABLES A
WHERE A.DATACAPTURE = 'Y'
;

To get all your REORG cards, and then one mass pseudo Reorg later your CDDS is up and running!

Standalone

The DSNJU008 utility can be executed using JCL like this:

//CDDSPRIN EXEC PGM=DSNJU008

//STEPLIB  DD DSN=<Your Db2 exit lib>,DISP=SHR

//         DD DSN=<Your Db2 load lib>,DISP=SHR

//SYSUT1   DD DSN=<Your CDDS prefix>.CDDS,DISP=SHR

//SYSPRINT DD SYSOUT=*

//SYSIN    DD *

LIMIT(9999)

/*

Please refer to the Db2 for z/OS Utilities Guide and Reference for all the details about other keywords, but the interesting factoid is that VERSIONS are stored 1, 2, 3 with 1 being the most recent and the rest going backwards in time.

What Time Was It?

The program outputs everything you could ever wish for about your data sharing systems’ use of compression dictionaries. Including the Dictionary timestamp, which I find very nice as every now and again it really can pay to analyze the data with a new compression test, just to see if you can squeeze any more onto your SSDs!

Why All This Trouble, Again?

The idea behind all of this work is to make using IFCID 306 better, faster and cheaper. It also has a knock-on affect into the “vendor scape”, as lots of vendors offer tooling for data replication or log reading/analysis and they all should now work with this feature enabling you, the DBA, to be more productive at less cost. If their version of REORG/LOAD does *not* handle the CDDS then you must always schedule a new pseudo Reorg with INITCDDS afterwards to handle the VSAM Updates correctly.

Using It?

My question for this month, dear readers, is: Are any of you using this feature or are planning to use it soon?

TTFN

Roy Boxwell