Tuesday, April 16, 2013

Lot Genealogy, Part 3: Debugging Lots and Batches

As you might have noticed I've just updated the LOT_GENEALOGY source code for this project to a new version following the discovery of a bug where the product of a batch is two different item numbers. This is a rare occurrence in our system, which is why it went undetected for so long, but it might not be in yours.

Rather than dwell on that I thought I'd talk about exactly what the process was that led to us discovering the fixing the error.

The first report came from the users; a lot which they knew had two item numbers was only showing one. Other lots didn't seem to be affected. This was then traced back to a single batch, 73591, running this SQL gave us the transactions that affected the cache for that batch. To publish this data I'm obfuscating the data (see here);

       HIDEDATA.Hide('LGG.LN', LGG.LOT_NUMBER) Lot_Number,
       HIDEDATA.Hide('LGG.ID', LGG.Item_Description) Item_Description,
 WHERE 1 = 1
   AND LGG.Batch_No = '73591';

The result of this is the following transactions;

LG Transactions For A Single Batch
As you can see the first thing we have is three "WIP Completion" transactions. Each of these represents the completion of the batch and the generation of an output. In this case Item Numbers 073081 and 07440. However without knowing what information was put *into* the batch in the first place it's not possible to usefully use this information in the Genealogy - It has to wait until the "WIP Issue" transactions are processed (each of which represents an ingredient).

The next stage for debugging the cache was to (on a test system!) change the LOT_GENEALOGY_GETTRANSACTIONS view so that rather than looking at *everything* it only looks at the records for a single batch - this is simply done by adding the following where clause to the view SQL;

-- Restrict transactions to a single batch (for debugging)
AND GBH.Batch_No = '73591'

Now we're restricted the input records to just those affecting that batch we can just re-build the whole cache - it will take longer to do the delete than the insert. The script to do it is;

  -- Call the procedure

Once that has been completed the entire content of the cache is the single batch we're tracking. The SQL to show it is;

SELECT HIDEDATA.Hide('LGG.LN', LGW.Master_Lot_Number) Master_Lot_Number,
       HIDEDATA.Hide('LGG.LN', LGW.Ingred_Lot_Number) Ingred_Lot_Number,
       HIDEDATA.Hide('LGG.ID', LGW.Ingred_Item_Description) Ingred_Item_Description,
       HIDEDATA.Hide('LGG.ID', LGW.Product_Item_Description) Product_Item_Description,
 WHERE 1 = 1

The result of this is;

Lot Genealogy Cache: Result for a Single Batch
As you can see the only product for Ingredient Lot LGG.LN 37 is item number 073002, if we look at the transactions earlier we can see that it should be reporting Item Number 07440 as well - it's not which means something is going wrong in the "WIP Issue" part of the cache processing.

If we look at the source code (available here - via Google Drive) you'll see that the final part of the WIP Issue is creating the records in LOT_GENEALOGY_BATCH_PRODUCT so the next stage to check is to see if these records are being created correctly. Here is the SQL;

SELECT LGBP.batch_number,
       HIDEDATA.Hide('LGG.ID', LGBP.Product_Item_Description) Product_Item_Description,
 WHERE 1 = 1
   AND LGBP.Batch_Number = '73591'

This gives us the result;

Lot Genealogy Batch Products
This shows us that the correct batch products are being recorded - this is important as we now have both ends of the transaction; the correct transactions are going in and the correct products are coming out. However we also know that the cache isn't being updated correctly therefore the culprits must be the two pieces of SQL that are actually doing the inserts.

If you look at the first one;

INSERT INTO NOETIX_SYS.Lot_Genealogy_WhereUsed
    SELECT v_Transaction.Lot_Number, -- Master_Lot_Number
           v_Transaction.Lot_Number, -- Ingred_Lot_Number
           v_Transaction.Item$item, -- INGRED_ITEM$ITEM
           v_Transaction.Item_Description, -- INGRED_ITEM_DESCRIPTION
           v_Transaction.Item_Type_Code, -- INGRED_ITEM_TYPE_CODE
           v_Transaction.Batch_Number, -- Batch_Number
           v_Product.Product_Lot_Number, -- Product_Lot_Number
     (SELECT 1
              FROM NOETIX_SYS.Lot_Genealogy_WhereUsed LGI
             WHERE LGI.MASTER_LOT_NUMBER = v_Transaction.Lot_Number
               AND LGI.Ingred_LOT_NUMBER = v_Transaction.Lot_Number
               AND LGI.Batch_Number = v_Transaction.Batch_Number
               AND (v_Product.Product_Lot_Number IS NULL AND
                   v_Product.Product_Lot_Number = LGI.PRODUCT_LOT_NUMBER));

You can see that this is actually just a select from DUAL so the fact that it's only working on the first pass (i.e. for the first product) means that the offending part of the code must be the NOT EXISTS.

Looking at the WHERE clause in the sub-select reveals that it's not using the Product_Item_Number field. Not using this field means that after it's inserted the first product it mistakes the second one for a duplicate and skips over it.

Altering the final AND statement to;

AND (v_Product.Product_Lot_Number IS NULL AND LGI.PRODUCT_LOT_NUMBER IS NULL OR (v_Product.Product_Lot_Number = LGI.PRODUCT_LOT_NUMBER AND v_Product.Product_Item_Number = LGI.PRODUCT_ITEM_NUMBER))

For BOTH pieces of SQL (the one updating  all the existing records as well as this one which is creating a new master record) addresses the issue logically, if you rebuild the cache from scratch (using the script above) and re-run the SQL to get the content of the cache you will now see;

Lot Genealogy Cache - Complete
The first thing you notice is there are a lot more records. The new records are for Item Number 07440 which is the one we were missing earlier.

If you do a source comparison between the new and old versions of the package you'll notice that this wasn't the only change I made - I've added a great deal more logging to the WIP Issue transaction so it's possible to see what is going on and what "SQL%ROWCOUNT = 0" actually means! I also added a warning to WIP Completion if no records are updated.

I hope this is of some interest ... Debugging an application, even one you've written yourself, is a skill that's very difficult to transfer. Hopefully there is enough information here for people to at least make a start for future issues (which will, by their very nature, be completely different I'm sure!).

No comments: