2014-08: A Million ways to kill your DSC (Dynamic Statement Cache)

That’s No Way to Treat Your Best Friend!

 

The Dynamic Statement Cache (DSC) can, and should, be your best friend for helping SQL run nice and fast on your machine. If you are good, you have a latency (how long the statements stay in the DSC) of about two days, and everything looks hunky-dory at first glance.

However, it could be that your best friend is not all he’s cracked up to be!

The first inkling that something somewhere is not right, is when you see your DSC flush rate soaring upwards (i.e. how many statements are being flushed per hour). Statements are flushed all the time through RUNSTATS, or security changes, etc. But when you see thousands of statements grabbing their coats and heading for the door you know something is wrong – and it can’t just be the music!

 

1,000’s of INSERTS statements are taking a “slot” from the DSC for no purpose!

What I have seen is that lots of people concentrate on Dynamic SQLs that are SELECT or UPDATE or predicated DELETEs and MERGEs, but no-one bothers about simple INSERT statements… and guess what I had seen? Yep, 1,000’s are INSERTS using literals, each of which takes a “slot” from the DSC for no purpose whatsoever!!! These are really, really nasty indeed…

 

How to list and fix the „bad guys“ (the INSERTS Statements)

To find them,

– I use a snap of the DSC.

– Then I select only the INSERT Statements,

– and then simply exclude any that have parameter markers.

What you are left with, is the list of “bad guys” that, if excessive in number, must be fixed as soon as possible!

 

Here’s how I do this using our SQL PerformanceExpert (SPX):

1 First I snap the DSC and show only the INSERTS:

News from the labs Newsletter 2014-08 - A Million ways to kill your DSC - Screenshot1

Here you can see in our DB2 10 NF test system that there are 2,673 statements.

 

2 Identifying the „good guys“

After filtering on “INSERT”, we get the next Panel:

News from the labs Newsletter 2014-08 - A Million ways to kill your DSC - Screenshot2

Here I can see only 200 INSERTS are in the cache. Note that some have multiple executions.

These are the “good guys” – Multiply executed but singly prepared.

 

3 Now I use the primary command PR to just dump these statements in a file for ISPF usage:

"News from the labs" 2014-08 - A Million ways to kill your DSC - Screenshot3 - SQL Statements in a file for ISPF usage

Here all of the SQL text is shown, (it carries on over to the right hand side, of course!)

Now I can see some INSERTS with Parameter Markers, but I want to see the ones without.

 

4  INSERTS without Parameter Markers

Simply doing these ISPF commands enables me to see that view too:

X ALL

F ALL ?

F ALL INSERT 1

 

Then I simply page down looking for INSERTS with no matching question marks.

Of course you can also do this with a small REXX, etc

News from the labs Newsletter 2014-08 - A Million ways to kill your DSC - Screenshot4

Now I can quickly see that there appears to be a long set of INSERTS – all with literals.

All of these should be changed to be Parameter Markers to help boost overall system Performance!

 

Naturally I use the above SQL data for much more than just Parameter Marker usage. You can imagine the fun to be had just conducting text searches to see which SQL uses which Built-In-Function, for example. CHAR9 usage anyone?

The possibilities are endless!

Now I get to spend lots of time with my best friend.

 

Happy Tuning!

As usual any questions or comments are welcome,
TTFN Roy Boxwell
Senior Software Architect

2014-07: APREUSE(WARN) – like getting married?


DB2 11 REBIND, something old, something new, something borrowed, something blue

Something olde,
Something new,
Something borrowed,
Something blue,
oh my goodness,
what’s going on,
within DB2?


Olde, New, Borrowed and Blue… in DB2 11 REBIND processing

Now once you stop laughing, the idea is that:

olde” is REBIND processing which changes old access paths,
new” is the new access path that comes from REBIND processing,
borrowed” is the use, and abuse, of OPT_HINT processing, and finally
blue” is, of course, IBM itself!!!

 

New in DB2 11… REBINDs with APREUSE(WARN)

Now in DB2 11 CM and above, you can issue REBINDs with the APREUSE(WARN) feature. This allows you to be nearly 100% sure that your access paths will *not* change, *but* it will regenerate the run time structures which allows fast xPROC processing. How does it do this magic? Quite simply: DB2 uses “internal” Opt Hints. They are not externalized in the PLAN_TABLE and they ignore the OPT_HINT ZPARM setting completely. In fact, they are generated from the “new”, in DB2 9 NF and above, directory element Explain Plan section in the Package, (this contains a sort of compressed version of the PLAN_TABLE access path data but nowhere near all of the access path data is saved away).

 

So what happens is, DB2 gets told to REBIND a package

– first it extracts the current access plan and stores it away,
– then it calls the optimizer for the new access path using the stored version as an OPT_HINT.
– If the hint is used, the access path is honoured and, if running with EXPLAIN(YES), you get PLAN_TABLE data with “APREUSE” in the column HINT_USED while OPTHINT stays empty.

 

Unknown access path “deviations“ if the hint failed

Now, if a hint failed, a new access path is created but you then get spaces in the HINT_USED column. I bet you see what I am aiming for??? Yep, all this does is create unknown access path “deviations” while also retaining possibly useless and obsolete access paths! The whole system makes nearly no sense to me whatsoever!

 

APREUSE(ERROR) to migrate to DB2 11

Now with APREUSE(ERROR) at least you either got new code for a complete package or you didn’t get *any* runnable code. It was still not taking advantage of any new optimizer functionality, but on the other hand was 100% stable.

The way I see it today, the only way to actually use this feature is purely in version migration and with some sort of explain tool to actually check out stuff before & after the event.

 

My way to migrate to DB2 11 is

1) Make sure RUNSTATS is 100% up to date and correct

2) Make sure plan management is active and possibly in EXTENDED mode

3) Make sure all packages are bound with EXPLAIN(YES)

4) Migrate to DB2 11 CM

5) Use a tool to analyse all current access paths and see which ones go *bad* with and without using APREUSE(WARN). Also, check out any CHANGED access paths (table order, different Index etc.)

6) Run with APREUSE(WARN) only those packages where no changes are found and also use APRETAINDUP(NO)

7) Run without APREUSE(WARN) on those packages with improvements to get the full functionality that you have paid good money for

8) Work first on any statements that go bad and then the changed ones

9) From this point on analyse every REBIND package and only do the REBIND if improvements in access path are visible

10) Get ready for DB2 vnext and repeat the above!

 

APREUSE(WARN)’s drawback

The real dangers with APREUSE(WARN) are that you will end up locking your code into access paths that are completely obsolete and could be dramatically improved with the creation of a new index etc. This new data is not and cannot be used due to the hint nature of this parameter. Worse, in my opinion, is the fact that there are a ton of reasons when the hint is not used and new access paths come charging over the hill like some irate mother-in-law!!!

 

Use APREUSE(WARN) *just* for release migration and *only* for the first rebind

Last word: As always with new parameters and options there are times to use them and times to *not* use them!  In this case *just* for release migration and *only* for the first rebind. Then — just like a wedding dress — never ever use again until the next one.

 

As usual, any comments or criticisms are greatly welcome!

How do you plan to use this parameter?

 

TTFN Roy Boxwell
Senior Software Architect

2014-06: SQL Metrics – Ripe for the Plucking

After a short absence, I am now back with my newsletters on a regular basis. During the summer, when you were (hopefully) relaxing and enjoying time with friends and family, my co-workers were busy here updating our websites. You may not see a big difference in „the look“, as most of the improvements were behind the scenes, but we have now migrated to a completely new platform and intend to be adding more features and enhancements to our sites as we move forward; all in the name of providing even better service for our customers!

As many of you know, August is the time of year us Europeans take our long summer vacations. Dreaming of Ice-cream in Italy, the balmy breezes of the Balearic Islands, or – in my case – the dense fog banks of Dover, (what white cliffs?).

So I’ll keep this newsletter short, while I unpack my swimsuit. For the past several months, I’ve been visiting various customer sites on DB2 roadshows. It’s always fascinating to re-acquaint myself with some real-life situations, but it never ceases to amaze me that so many sites still rely on the same traditional methods that have been used for the past 10 – 20 years.

Poorly designed SQL

Modern structures, languages, APIs etc. do nothing to stop bad, poorly-designed, SQL from landing in Production. It is, at the end of the day, the DBA who is left to pick up the pieces and try to figure out how to get a quart *out* of a pint pot, (or a large frame into a small swimsuit.) One of our newest product developments is called SQL WorkloadExpert (WLX), and it was designed to work for DB2 10 NFM and above only. Why, you ask? Because it needs the new and enhanced IFCIDs that came with that release – notably the enhanced 316, which gets written when a Dynamic Statement Cache is flushed, and the new IFCIDs 400 and 401, which give the same data but from the EDM Pool for static SQL. (Personally, I refer to this as the Static Statement Cache, or SSC.)

Build a Data Warehouse containing all the Dynamic SQL

This new and useful data is used to build a Data Warehouse containing all the SQL that is running in a plex. Also, using EXPLAIN data, DB2 Catalog data, and the RealTimeStatistics table, it builds up a complete picture of who is running what, when – and how often – on your machine. This is particularly interesting for some of those sites that I visited recently. It is truly amazing how much “low hanging fruit” is out there for really, really simple tuning that yields a huge bang for the buck! The aforementioned IFCIDs establish performance metrics that give you a handle on where CPU, IO, and Elapsed time is all going. These metrics cause the so-called “bad guys” to virtually leap off the page! The most quantitative metrics that are most useful for the “first” tuning aims, are the Executions per hour, CPU per hour and IO (Getpages) per hour.

 

2,500 seconds of CPU per hour reduced only by adding one little index…

For example, at one of those shops I saw the following: (This is a very large shop with 18 way data sharing and 10’s of CPUs per machine) One SQL was running 1.3 Million times per hour, using 2,500 seconds of CPU per hour and 240 Million getpages per hour. This is intense! The SQL was not particularly complex in this case either, and the per execution rates were not particularly significant. However, by simply adding one little index, the site reduced CPU consumption to 1,700 seconds per hour and 180 Million getpages per hour! Immediately, this provided a savings of 40% overall by providing such a huge performance boost. Another SQL’s CPU consumption was reduced from 17,111 seconds per hour down to just 16 seconds per hour! …Wow. Finally, a third SQL had an Elapsed time of 30,000 seconds per hour reduced down to 30 seconds per hour! The point I am trying to make is:   Don’t always trust blindly in your traditional Top Ten reports. By digging just a little deeper, you may be able to find some very easy (tuning) fruit, ripe for plucking. Now how do I tune my mind back in to „work“ mode…

Happy Tuning!

As usual any questions or comments are welcome,
TTFN Roy Boxwell
Senior Software Architect

2012-07: Spatial Indexes – Do you know where your customers live?

Well I must admit that I did not know what a long and painful birth this “little” newsletter would be!

First I had to get a whole bunch of installation jobs run, then a few APARs from good old IBM, and then I had to run the “enable” Spatial support job (DSN5ENB) and finally the Spatial bind job (DSN5BND). This took a while and gave me a few extra grey hairs…however if you follow the documented approach it all works – now of course it assumes that everyone in the world lives in the USA or the UK as continental Europe uses the comma as a decimal separator but the Spatial support has hard coded decimal point – This also cost me LOTS of grey hairs as I had to manipulate the data using the ISPF Editor to get the format “right”.

 

So there I am with 20 rows of IBM test data and a newsletter to write…hmmm…need more data methinks.. So I decide to use GOOGLE to find some data and Voila! I find 43,191 ZIPCODEs from the USA with Latitude and Longitude…of course it is comma separated…load into excel change the format and save away to then be uploaded to the host and then INSERTED into a new spatial table…All goes horribly wrong because you cannot use a column during the insert of a function to get a point… GRRR!!

But let us do what I did step by step:

First I created a little table to hold the data from the excel:

CREATE DATABASE ZIPCODE;
COMMIT;
CREATE TABLE BOXWELL.ZIPCODES
(
ZIP           INTEGER     NOT NULL,
CITY          VARCHAR(64) NOT NULL,
STATE         CHAR(2)     NOT NULL,
LATITUDE      DECIMAL(9 , 6) NOT NULL,
LONGITUDE     DECIMAL(9 , 6) NOT NULL,
TIME_ZONE     SMALLINT    NOT NULL,
DST_FLAG      SMALLINT    NOT NULL
)
IN DATABASE ZIPCODE;
COMMIT;

Then I loaded it up with data

OUTPUT START FOR UTILITY, UTILID = LOAD
PROCESSING SYSIN AS EBCDIC
LOAD DATA FORMAT DELIMITED COLDEL ';' CHARDEL '"' DECPT '.'
INTO TABLE BOXWELL.ZIPCODES
(ZIP INTEGER,
CITY CHAR,
STATE CHAR,
LATITUDE DECIMAL,
LONGITUDE DECIMAL,
TIME_ZONE SMALLINT,
DST_FLAG SMALLINT)
RECORD (1) WILL BE DISCARDED DUE TO 'ZIP'
CONVERSION ERROR FOR BOXWELL.ZIPCODES
ERROR CODE '02 - INPUT NUMERIC INVALID'
(RE)LOAD PHASE STATISTICS - NUMBER OF RECORDS=43191 FOR TABLE
BOXWELL.ZIPCODES
(RE)LOAD PHASE STATISTICS - TOTAL NUMBER OF RECORDS LOADED=43191 FOR
TABLESPACE ZIPCODE.ZIPCODES
(RE)LOAD PHASE STATISTICS - NUMBER OF INPUT RECORDS NOT LOADED=1
(RE)LOAD PHASE STATISTICS - NUMBER OF INPUT RECORDS PROCESSED=43192
(RE)LOAD PHASE COMPLETE, ELAPSED TIME=00:00:01
DISCARD PHASE STATISTICS - 1 INPUT DATA SET RECORDS DISCARDED
DISCARD PHASE COMPLETE, ELAPSED TIME=00:00:00
UTILITY EXECUTION COMPLETE, HIGHEST RETURN CODE=4

The first record in the excel table was the headings record so it was OK to lose it by the way.

 

Then I created a new table with the LOCATION column and tried to INSERT like this:

CREATE DATABASE ZIPCODES;
---------+---------+---------+---------+---------+---------+---------+
DSNE616I STATEMENT EXECUTION WAS SUCCESSFUL, SQLCODE IS 0
---------+---------+---------+---------+---------+---------+---------+
COMMIT;
---------+---------+---------+---------+---------+---------+---------+
DSNE616I STATEMENT EXECUTION WAS SUCCESSFUL, SQLCODE IS 0
---------+---------+---------+---------+---------+---------+---------+
CREATE TABLE USA.ZIPCODES
(
LOCATION      DB2GSE.ST_POINT ,
LONGITUDE     DECIMAL(9 , 6) NOT NULL,
LATITUDE      DECIMAL(9 , 6) NOT NULL,
STATE         CHAR(2)     NOT NULL,
TIME_ZONE     SMALLINT    NOT NULL,
DST_FLAG      SMALLINT    NOT NULL,
ZIP           INTEGER     NOT NULL,
CITY          VARCHAR(64) NOT NULL
)
IN DATABASE ZIPCODES;
---------+---------+---------+---------+---------+---------+---------+
DSNE616I STATEMENT EXECUTION WAS SUCCESSFUL, SQLCODE IS 0
---------+---------+---------+---------+---------+---------+---------+
COMMIT;
---------+---------+---------+---------+---------+---------+---------+
DSNE616I STATEMENT EXECUTION WAS SUCCESSFUL, SQLCODE IS 0
---------+---------+---------+---------+---------+---------+---------+
INSERT INTO USA.ZIPCODES
SELECT DB2GSE.ST_POINT('POINT (LONGITUDE LATITUDE)' , 1003)
,LONGITUDE
,LATITUDE
,STATE
,TIME_ZONE
,DST_FLAG
,ZIP
,CITY
FROM BOXWELL.ZIPCODES
;
---------+---------+---------+---------+---------+---------+---------+
DSNT408I SQLCODE = -443, ERROR:  ROUTINE GSEGEOMFROMTEXT (SPECIFIC NAME
STCO00002GFT) HAS RETURNED AN ERROR SQLSTATE WITH DIAGNOSTIC TEXT
GSE3049N  Invalid WKT format, expecting a number instead of
LONGITUDE  LATITUDE).
DSNT418I SQLSTATE   = 38SV8 SQLSTATE RETURN CODE
DSNT415I SQLERRP    = DSNXRUFS SQL PROCEDURE DETECTING ERROR
DSNT416I SQLERRD    = -101 0  0  -1  0  0 SQL DIAGNOSTIC INFORMATION
DSNT416I SQLERRD    = X'FFFFFF9B'  X'00000000'  X'00000000' X'FFFFFFFF'
X'00000000'  X'00000000' SQL DIAGNOSTIC INFORMATION
---------+---------+---------+---------+---------+---------+---------+
DSNE618I ROLLBACK PERFORMED, SQLCODE IS 0
DSNE616I STATEMENT EXECUTION WAS SUCCESSFUL, SQLCODE IS 0

Oh dear… So now I wrote a select to generate a set of INSERTS like this:

SELECT 'INSERT INTO USA.ZIPCODES VALUES (' ||
'DB2GSE.ST_POINT(''POINT (' || LONGITUDE ||
' ' || LATITUDE || ')'' , 1003)' ||
', '  || LONGITUDE ||
' , ' || LATITUDE ||
' ,''' || STATE ||
''', '  || TIME_ZONE ||
' , '  || DST_FLAG  ||
' , ' ||  ZIP ||
' ,''' || CITY || ''') §'
FROM BOXWELL.ZIPCODES  ;

To create the geometry for the LOCATION column you can use the ST_POINT function and that takes as input a LONGITUDE and a LATITUDE and, all important, the SRS_ID which tells the system which Spatial Reference System you are using – I use 1003 in my examples because my data is “old” and used the WGS84 standard. The supplied Geometries have these names and, more importantly, the SRS_IDs that must be used!

SELECT * FROM DB2GSE.ST_SPATIAL_REFERENCE_SYSTEMS;
---------+---------+---------+-------------+-
SRS_NAME                               SRS_ID
---------+---------+---------+-------------+-
DEFAULT_SRS                                 0
NAD83_SRS_1                                 1
NAD27_SRS_1002                           1002
WGS84_SRS_1003                           1003
DE_HDN_SRS_1004                          1004

 

This SQL created output like this:

INSERT INTO USA.ZIPCODES VALUES (DB2GSE.ST_POINT('POINT (-71,013202 43,005895)' , 1003), -71,013202 , 43,005895 ,'NH', -5 , 1 , 210 ,'Portsmouth') §
INSERT INTO USA.ZIPCODES VALUES (DB2GSE.ST_POINT('POINT (-71,013202 43,005895)' , 1003), -71,013202 , 43,005895 ,'NH', -5 , 1 , 211 ,'Portsmouth') §
INSERT INTO USA.ZIPCODES VALUES (DB2GSE.ST_POINT('POINT (-71,013202 43,005895)' , 1003), -71,013202 , 43,005895 ,'NH', -5 , 1 , 212 ,'Portsmouth') §
INSERT INTO USA.ZIPCODES VALUES (DB2GSE.ST_POINT('POINT (-71,013202 43,005895)' , 1003), -71,013202 , 43,005895 ,'NH', -5 , 1 , 213 ,'Portsmouth') §
INSERT INTO USA.ZIPCODES VALUES (DB2GSE.ST_POINT('POINT (-71,013202 43,005895)' , 1003), -71,013202 , 43,005895 ,'NH', -5 , 1 , 214 ,'Portsmouth') §
INSERT INTO USA.ZIPCODES VALUES (DB2GSE.ST_POINT('POINT (-71,013202 43,005895)' , 1003), -71,013202 , 43,005895 ,'NH', -5 , 1 , 215 ,'Portsmouth') §
INSERT INTO USA.ZIPCODES VALUES (DB2GSE.ST_POINT('POINT (-72,637078 40,922326)' , 1003), -72,637078 , 40,922326 ,'NY', -5 , 1 , 501 ,'Holtsville') §
INSERT INTO USA.ZIPCODES VALUES (DB2GSE.ST_POINT('POINT (-72,637078 40,922326)' , 1003), -72,637078 , 40,922326 ,'NY', -5 , 1 , 544 ,'Holtsville') §
INSERT INTO USA.ZIPCODES VALUES (DB2GSE.ST_POINT('POINT (-66,749470 18,180103)' , 1003), -66,749470 , 18,180103 ,'PR', -4 , 0 , 601 ,'Adjuntas') §
.
.
.

Now to get this to fly I had to change my SPUFI output row maximum, default column width, and output record sizes but finally I had a file with 43,191 INSERTS. As you can see the DECIMAL numbers have a comma in them so I had to do ISPF “change all” commands to change them to full stops and then I had to write a small dynamic SQL program to actually execute these SQLs as they were up to 177 bytes wide…I already had a high-speed version of DSNTIAD that I had written myself a while ago and a simple couple of changes and shazam! I had 43,191 rows of data – the INSERTS took 15 minutes of CPU by the way….

 

So now I am the proud owner of 43,191 zip codes *and* their spatial “locations” – what could I do? Well the first thing that sprung to mind was – How many cities with how many zip codes are there within a radius of 15 miles of Phoenix, AZ ?

SELECT A.CITY, COUNT(*)
FROM USA.ZIPCODES A
WHERE DB2GSE.ST_WITHIN(A.LOCATION,DB2GSE.ST_BUFFER (
(SELECT B.LOCATION
FROM USA.ZIPCODES B
WHERE B.ZIP = 85009)     -- phoenix
, 15 , 'STATUTE MILE')) = 1
GROUP BY A.CITY
;

 

There are about 73 different spatial functions by the way. In the above SQL I use just two

– ST_BUFFER has three parameters. The first is a “geometry” or LOCATION type field, then a distance, and finally the units. What it does is create a geometry space that is centered on the input geometry and is then the number of units around it (A radius in this case as we have a point as the input geometry). Thus we have a “space” 15 statute miles in radius centered on Phoenix AZ (Well actually on the location of zip code 85009 but that is near enough for me!).

– ST_WITHIN has two parameters which are both “geometries” and if one is within the other it returns a 1 else a 0 thus enabling the simple SQL I wrote.

This query returns:

---------+---------+---------+---------+---------+--------
CITY
---------+---------+---------+---------+---------+--------
Avondale                                             1
Carefree                                             1
Cashion                                              1
Cave Creek                                           1
Chandler                                             2
Fountain Hills                                       1
Gilbert                                              1
Glendale                                            12
Laveen                                               1
Mesa                                                 5
Palo Verde                                           1
Paradise Valley                                      1
Peoria                                               5
Phoenix                                             71
Scottsdale                                           8
Sun City                                             2
Sun City West                                        1
Surprise                                             2
Tempe                                                8
Tolleson                                             1
Tortilla Flat                                        1
Wickenburg                                           1
Youngtown                                            1
DSNE610I NUMBER OF ROWS DISPLAYED IS 23
DSNE616I STATEMENT EXECUTION WAS SUCCESSFUL, SQLCODE IS 100

 

Or the same for, say, Moline, IL

SELECT A.CITY, COUNT(*)
FROM USA.ZIPCODES A
WHERE DB2GSE.ST_WITHIN(A.LOCATION,DB2GSE.ST_BUFFER (
(SELECT B.LOCATION
FROM USA.ZIPCODES B
WHERE B.ZIP = 61265)     -- moline
, 15 , 'STATUTE MILE')) = 1
GROUP BY A.CITY
;
---------+---------+---------+---------+---------+---------+
CITY
---------+---------+---------+---------+---------+---------+
Andalusia                                               1
Barstow                                                 1
Bettendorf                                              1
Blue Grass                                              2
Buffalo                                                 1
Carbon Cliff                                            1
Coal Valley                                             1
Colona                                                  1
Davenport                                               9
East Moline                                             1
Eldridge                                                1
Hampton                                                 1
Le Claire                                               1
Milan                                                   1
Moline                                                  2
Orion                                                   1
Osco                                                    1
Pleasant Valley                                         1
Port Byron                                              1
Preemption                                              1
Princeton                                               1
Rapids City                                             1
Rock Island                                             5
Sherrard                                                1
Silvis                                                  1
Taylor Ridge                                            1
DSNE610I NUMBER OF ROWS DISPLAYED IS 26

 

Finally I wondered about the distance between two locations or geometries so I thought how many English Chains (I kid you not!) and Miles between these two cities??

SELECT INTEGER(DB2GSE.ST_DISTANCE(
(SELECT B.LOCATION
FROM USA.ZIPCODES B
WHERE B.ZIP = 61265)     -- moline
, (SELECT B.LOCATION
FROM USA.ZIPCODES B
WHERE B.ZIP = 85009)     -- phoenix
, 'STATUTE MILE')) AS MILES
,INTEGER(DB2GSE.ST_DISTANCE(
(SELECT B.LOCATION
FROM USA.ZIPCODES B
WHERE B.ZIP = 61265)     -- moline
, (SELECT B.LOCATION
FROM USA.ZIPCODES B
WHERE B.ZIP = 85009)     -- phoenix
, 'BRITISH CHAIN (SEARS 1922)')) AS CHAINS
FROM SYSIBM.SYSDUMMY1
;
---------+---------+---------+---------+-----
MILES       CHAINS
---------+---------+---------+---------+-----
1306       104520
DSNE610I NUMBER OF ROWS DISPLAYED IS 1

Isn’t that great! You can even specify which type of chain!!! The Function here ST_DISTANCE just accepts two geometries and the Unit you wish to have output. Just for your reference the number of units are in the GSE view DB2GSE.ST_UNITS_OF_MEASURE the first few rows look like

---------+---------+---------
UNIT_NAME
---------+---------+---------
METRE
FOOT
US SURVEY FOOT
CLARKE'S FOOT
FATHOM
NAUTICAL MILE
GERMAN LEGAL METRE
US SURVEY CHAIN
US SURVEY LINK
US SURVEY MILE
KILOMETRE
CLARKE'S YARD
CLARKE'S CHAIN
CLARKE'S LINK
BRITISH YARD (SEARS 1922)
BRITISH FOOT (SEARS 1922)
BRITISH CHAIN (SEARS 1922)
BRITISH LINK (SEARS 1922)
BRITISH YARD (BENOIT 1895 A)
BRITISH FOOT (BENOIT 1895 A)
BRITISH CHAIN (BENOIT 1895 A)
BRITISH LINK (BENOIT 1895 A)
BRITISH YARD (BENOIT 1895 B)
BRITISH FOOT (BENOIT 1895 B)
BRITISH CHAIN (BENOIT 1895 B)
BRITISH LINK (BENOIT 1895 B)
BRITISH FOOT (1865)
INDIAN FOOT

There is also a handy DESCRIPTION column that explains how and why.

 

So now I did an EXPLAIN of the three queries (1 is Phoenix, 2 is Moline, and 3 is Distance) and got this

---------+---------+---------+---------+---------+---------+----------+
LINE   QNO  PNO  SQ  M  TABLE_NAME A CS  INDEX IO  UJOG  UJOGC P CE
---------+---------+---------+---------+---------+---------+----------+
00001  01   01   00  0  ZIPCODES   R 00        N   ----  ----   S
00001  01   02   00  3               00        N   ----  ---Y      S
00001  02   01   00  0  ZIPCODES   R 00        N   ----  ----   S
00002  01   01   00  0  ZIPCODES   R 00        N   ----  ----   S
00002  01   02   00  3               00        N   ----  ---Y      S
00002  02   01   00  0  ZIPCODES   R 00        N   ----  ----   S
00003  01   01   00  0  SYSDUMMY1  R 00        N   ----  ----   S
00003  02   01   00  0  ZIPCODES   R 00        N   ----  ----   S
00003  03   01   00  0  ZIPCODES   R 00        N   ----  ----   S
00003  04   01   00  0  ZIPCODES   R 00        N   ----  ----   S
00003  05   01   00  0  ZIPCODES   R 00        N   ----  ----   S
-----+---------+---------+---------+---------+---------+-----------+
QUERYNO PROGNAME STMT_TYPE CC PROCMS PROCSU REASON
-----+---------+---------+---------+---------+---------+-----------+
      1 DSNESM68 SELECT    B   327   929    TABLE CARDINALITY UDF
      2 DSNESM68 SELECT    B   327   929    TABLE CARDINALITY UDF
      3 DSNESM68 SELECT    B   676  1918    TABLE CARDINALITY UDF

 

No big surprises as I have not run a RUNSTATS and have not created any indexes yet.

 

So now a full runstats and a re-explain

---------+---------+---------+---------+---------+---------+---------+-
LINE   QNO  PNO  SQ  M  TABLE_NAME A  CS  INDEX IO  UJOG  UJOGC  P  CE
---------+---------+---------+---------+---------+---------+---------+-
00001  01   01   00  0  ZIPCODES   R   00        N   ----  ----   S
00001  01   02   00  3                 00        N   ----  ---Y      S
00001  02   01   00  0  ZIPCODES   R   00        N   ----  ----   S
00002  01   01   00  0  ZIPCODES   R   00        N   ----  ----   S
00002  01   02   00  3                 00        N   ----  ---Y      S
00002  02   01   00  0  ZIPCODES   R   00        N   ----  ----   S
00003  01   01   00  0  SYSDUMMY1  R   00        N   ----  ----   S
00003  02   01   00  0  ZIPCODES   R   00        N   ----  ----   S
00003  03   01   00  0  ZIPCODES   R   00        N   ----  ----   S
00003  04   01   00  0  ZIPCODES   R   00        N   ----  ----   S
00003  05   01   00  0  ZIPCODES   R   00        N   ----  ----   S
-----+---------+---------+---------+---------+---------+---------+----
QUERYNO  PROGNAME  STMT_TYPE CC PROCMS PROCSU  REASON
-----+---------+---------+---------+---------+---------+---------+----
      1  DSNESM68  SELECT    B    2427   6893  UDF
      2  DSNESM68  SELECT    B    2427   6893  UDF
      3  DSNESM68  SELECT    B     676   1918  TABLE CARDINALITY UDF

The MS and SU numbers went up as now the optimizer realized the size of the table. So now I created a unique index on column ZIP

CREATE UNIQUE INDEX USA.ZIPCODES_IX ON USA.ZIPCODES (ZIP  ASC) USING STOGROUP SYSDEFLT PRIQTY 720 SECQTY 720 CLUSTER FREEPAGE  0 PCTFREE  10 CLOSE YES;

 

Run RUNSTATs and re-explained

 ---------+---------+---------+---------+---------+---------+---------+----
LINE   QNO  PNO  SQ  M  TABLE_NAME A CS INDEX        IO  UJOG  UJOGC  P  CE
---------+---------+---------+---------+---------+---------+---------+-----
00001  01   01   00  0  ZIPCODES   R 00               N  ----  ----   S
00001  01   02   00  3               00               N  ----  ---Y      S
00001  02   01   00  0  ZIPCODES   I 01 ZIPCODES_IX   N  ----  ----
00002  01   01   00  0  ZIPCODES   R 00               N  ----  ----   S
00002  01   02   00  3               00               N  ----  ---Y      S
00002  02   01   00  0  ZIPCODES   I 01  ZIPCODES_IX  N  ----  ----
00003  01   01   00  0  SYSDUMMY1  R 00               N  ----  ----   S
00003  02   01   00  0  ZIPCODES   I 01  ZIPCODES_IX  N  ----  ----
00003  03   01   00  0  ZIPCODES   I 01  ZIPCODES_IX  N  ----  ----
00003  04   01   00  0  ZIPCODES   I 01  ZIPCODES_IX  N  ----  ----
00003  05   01   00  0  ZIPCODES   I 01  ZIPCODES_IX  N  ----  ----
-----+---------+---------+---------+---------+---------+-------+--------
QUERYNO PROGNAME STMT_TYPE CC  PROCMS   PROCSU  REASON
-----+---------+---------+---------+---------+---------+-------+--------
     1  DSNESM68 SELECT    B     2427     6893  UDF
     2  DSNESM68 SELECT    B     2427     6893  UDF
     3  DSNESM68 SELECT    B      676     1918  TABLE CARDINALITY UDF

Aha! My index is being used but the cost estimates did not change a bit! So now is the time for a Spatial Index to rear its ugly head and get involved. Now a spatial index is a very, very special type of beast. It is *not* a standard B-Tree style index and it exists “outside” the scope of normal DB2 Indexes. Now to create a SPATIAL INDEX you must use a stored procedure with a bunch of parameters:

 

sysproc.ST_create_index ( table_schema/NULL, table_name , column_name , index_schema/NULL, index_name , other_index_options/NULL, grid_size1 , grid_size2 , grid_size3 , msg_code , msg_text )

 

I am actually using the command processor DSN5SCLP so my syntax looks like:

DSN5SCLP /create_idx ZA00QA1A +
-tableschema USA -tablename ZIPCODES -columnname LOCATION +
-indexschema USA -indexname LOC_IX +
-otherIdxOpts "FREEPAGE 0" +
-gridSize1 1.0 -gridSize2 2.0 -gridSize3 3.0

When it executes it just tells you this:

********************************* TOP OF DATA *****
GSE0000I  The operation was completed successfully.
******************************** BOTTOM OF DATA ***

 

Now do a re-explain

---------+---------+---------+---------+---------+---------+---------+--
LINE QNO PNO SQ  M TABLE_NAME A CS INDEX         IO  UJOG  UJOGC  P  CE
---------+---------+---------+---------+---------+---------+---------+--
00001 01 01  00  0 ZIPCODES   I 01 LOC_IX        N   ----  ----
00001 01 02  00  3              00               N   ----  ---Y      S
00001 02 01  00  0 ZIPCODES   I 01 ZIPCODES_IX   N   ----  ----
00002 01 01  00  0 ZIPCODES   I 01 LOC_IX        N   ----  ----
00002 01 02  00  3              00               N   ----  ---Y      S
00002 02 01  00  0 ZIPCODES   I 01 ZIPCODES_IX   N   ----  ----
00003 01 01  00  0 SYSDUMMY1  R 00               N   ----  ----   S
00003 02 01  00  0 ZIPCODES   I 01 ZIPCODES_IX   N   ----  ----
00003 03 01  00  0 ZIPCODES   I 01 ZIPCODES_IX   N   ----  ----
00003 04 01  00  0 ZIPCODES   I 01 ZIPCODES_IX   N   ----  ----
00003 05 01  00  0 ZIPCODES   I 01 ZIPCODES_IX   N   ----  ----
-----+---------+---------+---------+---------+---------+---------+-----
QUERYNO  PROGNAME STMT_TYPE CC  PROCMS   PROCSU  REASON
-----+---------+---------+---------+---------+---------+---------+----
      1  DSNESM68 SELECT    B        2        5  UDF
      2  DSNESM68 SELECT    B        2        5  UDF
      3  DSNESM68 SELECT    B      676     1918  TABLE CARDINALITY UDF

BINGO! Looks like a winner to me!

Now a RUNSTATS and a re-explain

DSNU000I    200 09:00:34.20 DSNUGUTC - OUTPUT START FOR UTILITY, UTILID = RUNSTATS
DSNU1044I   200 09:00:34.35 DSNUGTIS - PROCESSING SYSIN AS EBCDIC
DSNU050I    200 09:00:34.37 DSNUGUTC -  RUNSTATS TABLESPACE ZIPCODES.ZIPCODES TABLE(ALL) INDEX(ALL KEYCARD)
SHRLEVEL REFERENCE UPDATE ALL SORTDEVT SYSALLDA SORTNUM 4
DSNU186I  -QA1A 200 09:00:34.39 DSNUGSRI - A SPATIAL INDEX CANNOT BE PROCESSED BY THIS UTILITY
DSNU012I    200 09:00:34.41 DSNUGBAC - UTILITY EXECUTION TERMINATED, HIGHEST RETURN CODE=8

Argghhh! That was *not* in the documentation!!! So not just LOAD dies but also RUNSTATS!! The message itself is *very* helpful:

 

DSNU186I csect-name– A SPATIAL INDEX CANNOT BE PROCESSED BY THIS UTILITY

Explanation: The target objects specified for the utility include one or more spatial indexes, which are not supported by this utility. If the utility specifies a table space, then you can drop the spatial indexes, run the utility, and then recreate the indexes.
System action: The utility does not execute.
Severity: 8

 

I will try a REORG just for fun…nope it dies as well. Looks like you must remember to DROP all spatial indexes before any ”normal” DB2 Database Maintenance and then CREATE them back afterwards. Or if you have our RealTimeDatabaseExpert you can simply Exclude them from processing.

 

Finally this month a few CPU statistics from before and after index creation:

Query   Without Spatial Index   With Spatial Index
 Elapsed       CPU Elapsed CPU
Phoenix Query
Moline Query
Distance
202 seconds
186 seconds
2
186 seconds
172
1
25 seconds
13
2
23 seconds
9
1

Note that these figures are for 100 times execution in batch. As can be seen use of Spatial Indexes is very very good!

 

Well I learnt a lot this month and hope some of you did as well – As usual any comments or questions please feel free to mail me!

 

2011-01: APARS for RTS, RUNSTATS, and SQL PERFORMANCE for DB2 z/OS – Which are needed and why?

The Programmers‘ 1st Law of Bugdom; all non-trivial programs have bugs –  ergo   DB2 has bugs!

Of course, software being software means “keeping an eye” on the “APARs of interest”.  This is hard work because you must search the IBM technical problem database – lots of people do not like doing this, have no time to do this, or do not know how to do this!  At the end of this newsletter, I’ve attached lists of important APARs to save you the work.

First up: Real-Time Statistics (RTS):

Real-time statistics came in as an optional extra in DB2 V7.  They are now firmly embedded in the architecture of DB2 to the point of being used for space allocation, and even the Optimizer is starting to use this data. In this case, it would be very sensible to make sure that the data is as accurate as possible.  To do this, any bugs must be found and removed from the code.
Of course, you are already using RTS to drive your major DB2 utilities, aren’t you?  I hope so!  This makes 90% of all RUNSTATS irrelevant and stops 95% of REBINDs, which stops 99% of bad access paths in production from ever occurring.

Next up: RUNSTATS:

As you all know, the DB2 Optimizer makes its cost-based decisions on several columns in the DB2 catalog, and RUNSTATS updates some of these as well – just as RTS has bugs, so does RUNSTATS.  When you have bad catalog data, access paths and performance can suffer. SEGUS has a licensed FREEWARE product called “Statistics HealthCheck”, which checks the DB2 catalog for internal consistency and plausibility.  There is also a fully licensed version that will actually generate “corrective” RUNSTATS.  But, of course, if the bad catalog data comes from a bug in RUNSTATS, a new run of RUNSTATS will not help you and you will still suffer from bad performance.

Last up: SQLPERFORMANCE:

This is a keyword available in the IBM technical problem database search that I use because the Optimizer also has bugs.  (Of course, it also has more features than you can shake a stick at).  In other words, “it has lots of switches and buttons and dials” that can be tweaked, pushed, prodded, and flipped.  All of these have a direct effect on performance, so when Terry Purcell et al decides to change the composite filter factor for negative square-roots, it is nice if you can see this and apply the new feature where and when needed.

As an “aide de camp”, I maintain a most current APAR lists.  I track these and update my lists every month (I only keep about two years’ worth, because that old enough). Please contact us if you wish to receive this list. As soon as it becomes available, you will get an updated list containing the latest APARs for RTS, RUNSTATS and SQLPERFORMANCE.

Feel free to send me your comments and ask questions.

TTFN,
Roy Boxwell
Senior Architect

2011-04: RTS – data for DBA daily usage

IBM brought in the RTS in DB2 V7 and it was a paradigm-shattering event! Well, at least for me…
What it did, was finally give us DBAs the ability to “see” what is happening in our DB2 sub-systems, without having to run an expensive RUNSTATS utility.

Up until this time, shops had to run 1000’s of RUNSTATS to see if a REORG was needed, and, of course, these RUNSTATS caused massive side effects:

  1. The DSC was invalidated for all RUNSTATSed objects
  2. Automatic REBIND detection software then scheduled a REBIND – which sometimes caused severe performance problems when the access path “went south” or “turned pear-shaped”
  3. Lots of CPU was wasted on non-changed data by RUNSTATS

 

The moment that RTS came into town it changed the whole game. Now you only need a RUNSTATS when you actually need a RUNSTATS and you only need a REORG when you actually need a REORG. (See my list of reasons why you do not actually need a REORG later in this newsletter). Just to complete the picture: You only need a COPY when you actually need a COPY! Lots of shops exploiting the RTS reported huge costs savings from using the real-time data for threshold based maintenance – if you haven’t yet considered the RTS then please do so, as it is a gold mine!

 

RTS tables from DB2 V8 up until DB2 10.

To start with, let us have a look at the RTS tables from DV8-10
As can readily be seen, IBM keeps adding columns to these tables!
Note that all the counter columns are NULLable. The way that the RTS data has been externalized, and which column(s) are updated  -and when – has changed a few times over the years. Now it seems to be very stable and the basic rule is:
When you first create an object (as long as it is not a DEFINE NO one of course!) it gets an entry in the RTS. The counters contain 0, but the relevant utility timestamp is NULL – which stops externalization of the counters (apart from REORGLASTTIME which is set to creation date).
When a RUNSTATS – or a COPY – has been done, then these counters start counting as well.

Now, just looking at the columns gives a great idea of the usefulness of this data: Upfront is a caveat, “The use of REORGLEAFNEAR is not recommended to decide upon a REORG or not”. The REORGLEAFNEAR is basically worthless as a data point. Recently, at a local DB2 user group meeting it was explained that the numbers in this column are really a waste of time and that, “IBM informed us that to use REORGLEAFNEAR in the REORG decision process was incorrect and could cause repetitive, needless, REORGs.” – You have been warned!! But apart from that, the data can help DBAs every day in their work.

 

REORG decisions:

Do not do a REORG
Simple huh? I basically see people all the time doing REORGs of TSs just because “The software told me to do it”. This is no excuse! The only real reason to REORG tablespaces is either to

a) reclaim space,

b) change attributes or

c) re-sort the data.

Now the last one really only makes sense if you do indeed have sequential access of the TS, otherwise, if the access is through an index and is effectively random, why bother with an expensive REORG???

 

RUNSTAT decisions:

Do not do a RUNSTATS
Equally simple! I see 1000’s of RUNSTATS conducted for absolutely no good reason. They kill your Dynamic Statement Cache (DSC), they can kill your access paths for REBINDs and BINDs, and what if you ran it when the table was empty but then along comes a LOAD of 2,000,000 rows??? Yup, a RUNSTATS run must be aware of the usage and environment and should be well timed. Basically, the rule is to run a RUNSTATS when the DB2 Optimizer benefits from the fresh statistics and generates a new, better access path. This may be a challenge if you don’t have the right database maintenance solution, but it’s another gold mine if you do!

 

COPY decision:

Do COPYs on-the-fly. Use SHRLEVEL CHANGE and run your ICs during the day. At 08:30 a.m. you have copied the world and your SLA’s for RECOVERY are great. What happens when you have a problem at 05:30 p.m.? I guess your SLA will *not* be met, right? What you must do, is learn to not be afraid of image copies during the day. They can save your life, but don’t do them on a fixed schedule. Do them when they’re needed!

At the INDEX level the RTS data can be used for the same set of Utilities as above.

 

IX REORG decisions:

The REORG of an index can bring real plus points! If REORGPSEUDODELETES is greater than 10% of TOTALENTRIES or REORGNUMLEVELS is greater than zero, then it is definitely worth doing an index REORG to clear out the junk and get the index back into tip-top condition. This should not change your access paths, if they were bound when the index was “fit”, but it will improve your elapsed time as the number of IOs will go down dramatically.

 

IX RUNSTAT decisions:

Do not do a RUNSTATS
Same as TS RUNSTATS, but with the added hint of never actually doing just an Index RUNSTATS on its own. If you do this, then the data stored in the DB2 catalog will start to “drift” apart and, eventually, even the DB2 Optimizer will “see” this and start to make bad decisions based on inaccurate data. If you can, always try to do a consistent full RUNSTATS at the TP or TS level. BTW, if you’d like to check your DB2 catalog for inconsistency, please try our Statistics HealthCheck Freeware. As the name says, it’s FREE! !
Also note that a RUNSTATS at the index level will only invalidate access paths in the DSC that actually currently use this index, so if the optimizer has picked a tablespace scan or another index, then just running a RUNSTATS index will not enable a new access path for existing statements – another reason to run at the TS or TP level!

 

IX COPY decision:

Do not do index COPYs on their own! Personally I am no fan of COPY YES indexes. The REBUILD INDEX utility is so fast these days, (I have examples of a 17,000,000 row table with 17 indexes and the rebuild took three minutes elapsed…). and in a data sharing system you can even bring the system crashing down around you if you do too many ALTERs at the same time, so be very careful! But, if you feel you must have COPY YES indexes to reduce CPU consumption during RECOVERY, then do the TS and the IX at the same time, in the same copy statement of course!

In the IBM manual “Performance Monitoring and Tuning Guide”, look at the chapter “Real-time Statistics” to see which thresholds IBM uses to determine when a utility should be run – and, of course, make sure you tailor them to suit your site, if you don’t have a set it and forget it solution like our RealTime DBAExpert! Solutions like that also manage the RTS. Orphans build up in the RTS and can cause problems, so every now and again make sure that only databases and tablespaces that actually exist are also in the RTS. Watch out for DSNDB01, Work Databases and the RTS itself, as the first two are *in* the RTS but the RTS is *not* in the RTS.

 

Finally, a quick run through some of my favourite columns in the RTS:

DBNAME & NAME – Remember that the DSNDB01 is in the RTS, but watch out for SYSLGRNX NACTIVE and do not forget that the RTS is not in the RTS, of course!
EXTENTS – This is the number of extents in the *last* linear dataset, not the total over all datasets.
INSTANCE – If using the CLONE system, then each CLONE has two entries in the RTS, so you must always also join to the INSTANCE column in the SYSIBM.SYSTABLESPACE to make sure you are looking at real data and not behind-the-scenes data – also for indexes, of course!
LASTUSED – A great place to begin the “unused index analysis saga”.

And, of course, the new DB2 10 fields, first the TABLESPACE ones:
REORGCLUSTERSENS – The number of times data has been read by SQL statements that are sensitive to the clustering sequence – in other words CLUSTER is important!
REORGSCANACCESS – The number of times data is accessed for SELECT, FETCH, searched UPDATE, or searched DELETE. Great for REORG decisions!
REORGHASHACCESS – The number of times data is accessed using hash access for SELECT, FETCH, searched UPDATE, searched DELETE, or used to enforce referential integrity constraints. Great for checking whether your usage of HASH Access is bringing dividends or not.
HASHLASTUSED – When it was last actually used. The docu states TIMESTAMP but I always see a DATE column here, just like normal index LASTUSED.
DRIVETYPE – Does this object live on a normal disk (HDD) or on a solid state drive (SSD)? Only for the first created object dataset.
LPFACILITY – Does the disk unit that this object lives on support the new High Performance List Prefetch facilty Y or N?
STATS01 – Some internally used data for IBM

 

And now the INDEX ones:

REORGINDEXACCESS – The number of times the index was used for SELECT, FETCH, searched UPDATE, searched DELETE, or used to enforce referential integrity constraints. Great for REORG decisions as, if this number is high, then a REORG may not bring much in performance terms. Extra bonus data is that for hash overflow indexes, this is the number of times DB2 has used the hash overflow index – which is a *BAD* thing and should be controlled!
DRIVETYPE – Does this object live on a normal disk (HDD) or on a solid state drive (SSD)? Only for the first created object dataset.
STATS101 – Some internally used data for IBM. I just hope it is nothing to do with the Orwellian room 101!
Feel free to send me your comments and ask questions.

TTFN,
Roy Boxwell
Senior Architect

2011-02: XML Table Spaces – How they work and what you must take care of, or „LOBs the Next Generation“

 

XML has become more and more interesting for DB2 over the years.  Starting with DB2 9 it has also become much easier to use!

One of the weird things about XML is that the table spaces come in three (yes three!) different flavors depending on the base table space/table definition. Therefore, they are a bit „trickier“ than a „normal“ table space.

Our newsletter this month covers the relatively straight-forward rules for creating these.

 

Segmented table spaces:

If you create a segmented table space with an XML column type, DB2 will automatically:

  1. Add a column to the base table “DB2_GENERATED_DOC_ID_FOR_XML” BIGINT.
  2. Create an XML Partition by Growth (PBG) universal table space, (the name will be Xyyynnnn where yyy is the first three UTF-8 bytes of the base table space, and nnnn four numeric’s starting with 0000).
  3. Create an XML table in the above space. The name will be Xyyyyyyyyyyyyyyyyyynnn, where the y-y are the first 18 UTF-8 characters of the base table name, if it already exists.  nnn is used as three numeric’s starting at 000 until a unique name is found.
  4. Create a document ID index on the implicit column DB2_GENERATED_DOC_ID_FOR_XML in the base table. The name will be I_DOCIDyyyyyyyyyyyyyyyyyynnn, where the y-y are the first 18 UTF-8 characters of the base table name, if it already exists. nnn is used as three numeric’s starting at 000 until a unique name is found.
  5. Create a node ID index on each XML column in the base table. The name will be I_NODEIDyyyyyyyyyyyyyyyyyynnn, where the y-y are the first 18 UTF-8 characters of the XML table name, if it already exists. nnn is used as three numeric’s starting at 000 until a unique name is found.

 

PBG spaces:

If you create a Partitioned by Growth (PBG) table space with an XML column type, DB2 does the same as above, but with a default DSSIZE of 4GB – also for *all* of the indexes.

 

PBR spaces:

If you create a Partitioned by Range (PBR) table space with an XML column, DB2 will create as many PBR XML table spaces for you with their associated indexes but with a possibly larger DSSIZE.
This can all lead to a spectacular growth in storage space. Imagine a 4096 partition space, which is then altered to have two XML columns…the mind boggles!

 

What does this boil down to?

  • When using XML spaces make sure your back-up and recovery plans work.
  • Coordinate space management proactively.
  • Use COMPRESS wisely. XML table spaces inherit the COMPRESS attribute or can be ALTERed and then REORGed to enable compression. They compress very well.
  • Monitor your 16k Bufferpool usage because this is always used by XML table spaces.
  • Watch out if you copy data from subsystem to subsystem! XML data tends not to copy across too well, (e.g., DSN1COPY does not support it) because the XML descriptive data in the XSR system tables tends to be “forgotten”…

 

New from IBM:

Take a look at IBM’s RedBook „Extremely pureXML in DB2 10 for z/OS, SG24-7915-00“, which is full of great info about the use and maintenance of XML data.

 

Life of an XML column example:

  • Create segmented table space XMLCUSTO.
  • Create table XMLCUSTOMER (columns CID BIGINT and INFO XML).
  • DB2 adds the column DB2_GENERATED_DOC_ID_FOR_XML BIGINT, and it creates table space XXML0000.
  • DB2 creates the XML table XXMLCUSTOMER containing the three columns:
     DOCID         BIGINT
     MIN_NODE_ID   VARBIN(128)    NOT NULL
     XMLDATA       VARBIN(15850)  NOT NULL
  • DB2 creates the unique index I_DOCIDXMLCUSTOMER on the base table with the following column:
    DB2_GENERATED_DOC_ID_FOR_XML

 

  • Finally, DB2 creates the clustering, not padded, non-unique index I_NODEIDXXMLCUSTOMER on the XML table with columns:
     DOCID
     XMLDATA

 

A table INSERT:

INSERT INTO XMLCUSTOMER VALUES (06001, XMLPARSE(DOCUMENT '
 <customerinfo><name>John Smith</name>
 <addr country="United States"><street>327 Smith Ave</street>
 <city>Glendale</city><prov-state>California</prov-state>
 <pcode-zip>91208-06001</pcode-zip></addr>
 <phone type="work">888-111-0001</phone></customerinfo>'));

 

A SELECT:  

SELECT CID
FROM BOXWELL.XMLCUSTOMER
WHERE XMLEXISTS ('//addr[city="San Jose"]' passing INFO)

You will also find useful information about XML basics, XML integration in DB2 9, 10 and XML index design in our presentation entitled „XML Episode 9 – the return of the hierarchical Empire –„.

 

If you would like to sample SQL to create and populate some example XML spaces, please contact us.

 

Feel free to send me your comments and ask questions.

TTFN,
Roy Boxwell
Senior Architect

2011-03: PBG – Recommended Usage and REORG Strategy

When IBM introduced us to the „Universal Table Space“, (UTS), they brought  in two distinct flavors:

  1. Range-Partitioned – which finally enabled segmented, partitioned data (hoorah!)
  2. Partition-By-Growth (PBG)

The advantages of UTS are many, including: better space management for VARCHAR style data, improved MASS DELETE performance, TS scans limited to   active segments, and immediate data reuse after delete.

Enabling the PBG option with the MAXPARTITIONS clause on the CREATE TABLESPACE was a wonderful addition to the world of DB2. Finally, you never had to „worry“ about a data set running out of space. As soon as it  got to, say, 2GB a new partition was simply „added“, and the world carried on spinning. What a great idea! No more 64GB limit for a segmented space since a PBG can get to be 128TB.

Then the doubts set in…

How do I REORG this?
How do I recover this?
How do I manage this?

 

How do I REORG PBG?

Now you may be wondering, „What’s the problem with REORG?“ Well, take a second to reflect on what a good old partitioned  space looked like at the physical level: lots of partitions with varying amounts of data in them and none of them going to a secondary linear data  set of course.

Now in the bright new world of a PBG: lots of partitions absolutely choc  full of data and none of them going to a secondary linear data set of  course.

See the problem now?
If you REORG a partition of a PBG, you run the very  real risk of not being able to fit your data back in the partition! This is known by the very technical term: „not a good thing“.

To try and get around the REORG problem, IBM then introduced an „opaque“  ZPARM called REORG_IGNORE_FREESPACE.  If set to YES, the REORG simply uses zero for the PCTFREE and FREEPAGE options when reorganizing a partition.  The hope being that, without adding space around newly inserted records, it would „fit back in the box.“ However, we are all aware that it is easy  to unpack stuff, but pretty hard to pack stuff back into the original box.

 

How do I recover PBG?

The good news is: RECOVERY and COPY are no  problem for PBGs. All you have to do is take care of your current REORG thresholds and *always* REORG a PBG at the table space level – time to check your „home-grown“ or 3rd party DB2 database maintenance routines! –  unless you are 101% sure that the data will actually fit back inside! Also bear in mind the DBA mantra: „Never REORG a table space unless you have sequential access, or you must resize it“.

 

How do I manage partitioned table spaces?

As with all partitioned table spaces, you must still manage the „limits of DB2“, but no longer the „am I reaching the maximum size of a partition?“.  Instead, you must check „am I reaching the MAXPARTITIONS value?“  I hope you have automatic DB2 space management software that does *both* of these checks for you now.  As always with DB2 – old stuff and problems disappear and new stuff and problems come in!

 

What else is not allowed?

Restrictions to partition-by-growth table spaces

Well, a quick look in the IBM documentation provides the following restrictions that apply to partition-by-growth table spaces:

  • The PART option of the LOAD utility is not supported.
  • The REBALANCE option of the REORG utility is not supported.
  • The default SEGSIZE value 32.
  • Table spaces must be DB2-managed (not user-managed) so that DB2 has the “freedom” to create data sets as partitions become full.
  • Table spaces cannot be created with the MEMBER CLUSTER option.
  • Partitions cannot be explicitly added, rotated, or altered. Therefore, ALTER TABLE ADD PARTITION, ALTER TABLE ROTATE PARTITION, or ALTER TABLE ALTER PARTITION statements cannot target a partition partition-by-growth table space.
  • XML spaces are always implicitly defined by DB2.
  • If the partition-by-growth table space is explicitly defined, the LOB table space for the first partition of the partition-by-growth table space  is defined based on the SQLRULES(DB2) or SQLRULES(STD). Any additional LOB  table space for the newly-grown partition in the partition-by-growth table  space is always implicitly defined by DB2, regardless of whether SQLRULES is in effect. Specification of the SQLRULES(DB2) or SQLRULES(STD) option does not affect the LOB table space for implicitly defined partition-by-growth table spaces.
  • A non-partitioning index (NPI) always uses a 5 byte record identifier (RID).
  • Partitioned indexes are not supported.

Quite a hefty list of restrictions there! But, if you can live with the above and you want exceptionally large segmented table spaces, then PBGs are the way to go. Remember the other DB2 DBA mantra: „It Depends“, since PBG spaces should not be used for every table space – at least not yet anyway!
Feel free to send me your comments and ask questions.

TTFN,
Roy Boxwell
Senior Architect

2011-05: Virtual Indexes – General usage hints and tips

This month’s newsletter addresses a fantastic feature, which I bet you’ll – if you don’t already – exploit!

Just imagine you could give a new index a  “try” before you build it, or you could even pre-check if your  application’s performance will degrade if an index that you suspect isn’t used anymore (refer to the unused IX, but used stats dilemma!). This is  exactly what a virtual index can do for you!

 

IBM brought in the DSN_VIRTUAL_INDEXES table in DB2 9 and fitted it into DB2 V8 CM, but with the shorter name, DSN_VIRTUAL_INDEX. Since then it has been studiously ignored by nearly everybody!
This table is actually incredibly useful in the DBAs job it is just that IBM made it very difficult to fill, change and test the effectiveness of what you can actually do with it…

So how do you get one?

 

Check out the DSN910.SDSNSAMP(DSNTESC) member or the DSNTIJOS

First check out the DSN910.SDSNSAMP(DSNTESC) member or the DSNTIJOS where IBM supply the sample DDL for all the OSC tables.

 

The DSN_VIRTUAL_INDEXES looks like.

"TBCREATOR"             VARCHAR(128)  NOT NULL
"TBNAME"                VARCHAR(128)  NOT NULL
"IXCREATOR"             VARCHAR(128)  NOT NULL 
"IXNAME"                VARCHAR(128)  NOT NULL  
"ENABLE"                CHAR(1)       NOT NULL      
                         CHECK("ENABLE" IN('Y','N')) 
"MODE"                  CHAR(1)       NOT NULL 
                          CHECK("MODE" IN('C','D'))
"UNIQUERULE"            CHAR(1)       NOT NULL 
                          CHECK("UNIQUERULE" IN('D','U'))
"COLCOUNT"              SMALLINT      NOT NULL
                          CHECK("COLCOUNT" > 0) 
"CLUSTERING"            CHAR(1)       NOT NULL  
                          CHECK("CLUSTERING" IN('Y','N')) 
"NLEAF"                 INTEGER       NOT NULL  
                          CHECK("NLEAF" >= -1)
"NLEVELS"               SMALLINT      NOT NULL 
                          CHECK("NLEVELS" >= -1) 
"INDEXTYPE"             CHAR(1)       NOT NULL WITH DEFAULT 
                          CHECK("INDEXTYPE" IN('D','2')) 
"PGSIZE"                SMALLINT      NOT NULL 
                          CHECK("PGSIZE" IN(4, 8, 16, 32)) 
"FIRSTKEYCARDF"         FLOAT        NOT NULL WITH DEFAULT -1
                         CHECK("FIRSTKEYCARDF" = -1 
                            OR "FIRSTKEYCARDF" >= 0)
"FULLKEYCARDF"          FLOAT        NOT NULL WITH DEFAULT -1
                          CHECK("FULLKEYCARDF" = -1 
                             OR "FULLKEYCARDF" >= 0)   
"CLUSTERRATIOF"         FLOAT        NOT NULL WITH DEFAULT -1
                          CHECK("CLUSTERRATIOF" = -1 
                             OR "CLUSTERRATIOF" >= 0) 
"PADDED"                CHAR(1)       NOT NULL WITH DEFAULT
                          CHECK("PADDED" IN(' ','Y','N')) 
"COLNO1"                SMALLINT     
                          CHECK("COLNO1" IS NULL 
                             OR "COLNO1" > 0)
"ORDERING1"             CHAR(1)    
                          CHECK("ORDERING1" IS NULL 
                             OR "ORDERING1" IN('A','D')) 
.
.
.
"COLNO64"               SMALLINT                     
                          CHECK("COLNO64" IS NULL  
                             OR "COLNO64" > 0) 
"ORDERING64"            CHAR(1)            
                         CHECK("ORDERING64" IS NULL 
                             OR "ORDERING64" IN('A','D'))

 

As can be seen there are an awful lot of columns that all have to be filled in correctly to get it to work, and lots of columns have got CHECK  rules.

The contents of the columns is fully described in the “Performance Monitoring and Tuning Guide” however the INSERTs, SELECTs, DELETEs and UPDATEs must be coded by yourselves…(Hint: Use the DRAW REXX to generate these statements for you – If you do not have the DRAW REXX then contact us for a copy).

 

Here is an IBM example INSERT:

INSERT INTO DSN_VIRTUAL_INDEXES 
 (TBCREATOR, TBNAME, IXCREATOR, IXNAME, ENABLE, MODE, UNIQUERULE, COLCOUNT, CLUSTERING, NLEAF, NLEVELS, INDEXTYPE, PGSIZE, FIRSTKEYCARDF, FULLKEYCARDF, CLUSTERRATIOF, PADDED, COLNO1, ORDERING1, COLNO2, ORDERING2, COLNO3, ORDERING3)
 VALUES
 ('SAPR3', 'AUSP', 'SAPR3', 'AUSPTEST', 'Y', 'C', 'D', 7, 'N', 130000, 5, '2', 4, 1, 3693479, 0.95, 'N', 1, 'A', 6, 'A', 3, 'A');

As the Index columns are NULLable, just ignoring them all in the INSERT is “OK”. Tthe trick is getting the NLEAF, NLEVELS, FIRSTKEYCARDF, FULLKEYCARDF and CLUSTERRATIOF “correct”, so that EXPLAIN can actually use it!
Of course, if the index is inserted as ENABLE = ‘Y’ and MODE = ‘D’ then you can simply do an INSERT from SELECT and simulate DROPped indexes – which is quite wonderful!

 

Here is another IBM example INSERT from SELECT:

INSERT INTO DSN_VIRTUAL_INDEXES
(TBCREATOR, TBNAME, IXCREATOR, IXNAME, ENABLE, MODE, UNIQUERULE, COLCOUNT, CLUSTERING, NLEAF, NLEVELS, INDEXTYPE, PGSIZE, FIRSTKEYCARDF, FULLKEYCARDF, CLUSTERRATIOF, PADDED)                       

SELECT TBCREATOR, TBNAME, CREATOR, NAME, 'Y', 'D', UNIQUERULE, COLCOUNT, CLUSTERING, NLEAF, NLEVELS, '2', 4, FIRSTKEYCARDF, FULLKEYCARDF, CLUSTERRATIOF, PADDED
  FROM SYSIBM.SYSINDEXES
WHERE CREATOR = 'SAPR3'
    AND NAME = 'AUSP~Z1';

There was an APAR (PM14223), opened last year for V8 UK60838 and V9 UK60839, which was closed on 2010-09-27, and corrected various problems with regard to empty tables and check constraints.

 

Set of EXPLAIN tables with the correct owner and the DSN_VIRTUAL_INDEXES

To actually get this feature to work, you will also need a set of EXPLAIN tables with the correct owner and the DSN_VIRTUAL_INDEXES with the *same* owner. So if you only create the SYSIBM or the OSC creator, then it is unlikely you will get the benefit of this feature. This is the reason I often hear , It didn’t work at our shop!”

Once everything is in place and you have enabled a virtual index, any BIND, REBIND or EXPLAIN will consider this index. This is good if it’s intentional, but it’s really bad if you just have a record sitting there!

Some golden rules of this table are:

  1. Only leave ENABLEd indexes for the duration of your tests – Never insert Enabled indexes and then go on vacation!
  2. Delete or set “disabled” as soon as you can! (Remember that EXPLAIN will use the virtual indexes, but the actual access paths for BIND/REBIND will not!)
  3. Take care to specify the “correct” values for space and clustering values.

The “correct” values are the hard part…
For Space you can use the space calculation in the DB2 Administration Guide.
For Cardinality you can use estimates or count(*) results.
For Clusterratiof you can use 0, 0.5 , 0.85 , 0.95 and 1.

If you manage all these pieces correctly, the capability is just great. However, it all reminds me a bit on the note Terry Purcell made during a presentation at IDUG NA the 1st week of May: We just make it possible, if you want to use it nicely you may want to talk to your tools vendor. If you have some DB2 experts in your shop, you may even figure it out yourself. – I fully agree. There are tools out there that act as a complete “front end” for this feature. They warn if ENABLEd indexes are found, they allow simple change, copy, insert, update and delete and also  multiple explain runs to see if any of your queries will be affected by a created, changed or dropped index – subsystem wide. Of course I am referring to Bind ImpactExpert and SQL PerformanceExpert from us.  (Well I have to mention our software somewhere in the newsletter!).

 

In a future newsletter I will describe a methodology for redundant index removal that will use this feature – so stay tuned!
I hope that this little bit of data was of interest and/or of use, and, as always, comments or questions are more than welcome!
TTFN,
Roy Boxwell
Senior Architect

2011-06: Unused index analysis

 

Building on what we have read over the last months, there is now a methodology available for checking if all of your indexes are actually being used as you think they should be.

 

Check the LASTUSED field in the RTS tables

First, you can simply check the LASTUSED field in the RTS tables, which gives you a good pointer to “used” or “not used”. But beware if the index is Unique and is there purely to stop duplicate inserts, then this style of usage is *not* recorded in the RTS!

You should have full EXPLAIN data for production Static SQL and you should  also trap and run snapshot EXPLAINs of all the Dynamic SQL. Over time, you  can easily build up a usage table to show which indices are being actively used by the SQL running on your machine.

Now all you have to do is work out which indices are:

  1. Marked as not used in the RTS
  2. Not listed in any EXPLAIN table (for dynamic and static)

 

Analyze candidate indexes in depth

This list of candidate indexes then needs to be analyzed in depth. The way I do it, is as follows:

– Pick an index (start with the largest, longest nastiest index(s), where the payback is good or where you have lots of unused indexes on a give table, where the INSERT and DELETE ratio is relatively high).

– Then find the table that it is indexing. Use this as a “driver” for the following EXPLAINS. Use EXPLAIN to get all the access paths that use this table, (this captures non-use of existing indexes), and store into a temporary work table. Use DSN_VIRTUAL_INDEXES to mark our candidate index(s) as DROPPED, and then re-run the EXPLAINS into a new table.
Compare the access paths.

If there is no difference in access path, then you can be 95% sure that the index can indeed be dropped. Now all you must do, is make sure it is either non-unique, or that it is not being used as a duplication stopper!

If all is ok, drop the index and monitor the table usage for a week or two.

 

Repeat the entire procedure every six months or so…

 

What is important to stress here, is that some indexes – even though “not  used” – are actually used by the DB2 optimizer to decide access path. This  is because the SYSCOLDIST is based on the table and not the index, so any  access path choice can use the filter factors and cardinality data in the table. Virtual Indexes does not stop this usage as the data has not been deleted from SYSCOLDIST. This means, that to get that last 5% certainty, you must do the EXPLAINS and the compare *again* after physically dropping the index.

I hope that this little bit of data was of interest and/or of use and, as always, comments or questions are more than welcome!
TTFN,
Roy Boxwell
Senior Architect