Friday, August 16, 2013

Designing and Building your CMDB, Part 1: From System Description To Configuration Items

This series of posts is going to be a slight departure from normal in that it won't be showing you any code. We are going through the process of designing a CMDB (that's a Configuration Management Database) to hold details for all the systems (500+) that we administer. The point of this post is to, by means of an example, show you the sort of questions you should be asking yourself when you put together a CMDB.

So let's start with a description of a system;

"System X is a fairly simple VB.NET solution deployed using IIS and installed on the server WIN005. It consists of two  applications; User Interface (an application open to all users), and Admin Interface (only available to a few).

The Admin Interface works on a vanilla IIS install but the User Interface requires the installation of  the Visual Studio 2010 Tools for Office Runtime.

The installation files for the software are located on WIN080\Software (a file share) as are the bi-monthly patch files that are applied.

At the back end the database, SYSXDB, is SQL Server 2008R2 and is held on a SQL Server cluster called SQLC001.

The application uses Active Directory for authentication, and the User Interface renders some information  from Google Maps to which it requires access.

The users of the Solution are spread across two Countries; France and the United Kingdom. We have internally configured the system so that in the UK users know the solution as as 'InfoMaps' and in France it's known as 'LocalMaps'."

I'm sure there are probably parts in that description that you'll recognise from systems you've worked on. As you can see despite it being only a fairly simple VB.Net website with a couple of plugins there is already quite a lot of information here to capture in our CMDB. If we take this structure and put it into Visio then as a system overview we get something like this;

System X: An Overview
Now for most small organisations this is probably 95% of the information they're ever going to need. If you're a small company and aren't expecting do significantly increase in size and you're not planning on managing hundreds of systems across the globe then you can make do - let's be honest we all have our "go to guy" for a particular system and so long as they're not on holiday (or haven't left!) then they can keep the system ticking over quite happily from both the users and managements perspecitive.

The problem comes when you don't just have one system, or a few, you start to have tens of systems like this and each system takes some time to administer. Suddenly your team of 3/4 software engineers don't really have any time to do anything new because they're too busy keeping the systems that the business is already relying on working to put in anything new.

Once you approach this level you need to significantly increase the quality of information you are holding on each system; you stop needing "Bob" to fix your problem but instead you need "someone who knows IIS" or "someone who can fix SQL Server". If all the knowledge is in Bob's head then Bob quickly becomes a bottle-neck through which all issues have to go through - this isn't good for Bob (although he might think in the short term that it is!) and it's certainly not good for the company or the users.

So let's go back to the description for System X and look for all the items in the configuration that we might want to store information on in our CMDB. Each of these items will become a Configuration Items (CI) in the CMDB. It's fairly easy looking at the system description to just pick things out;
  • IIS
  • WIN005
  • User Interface
  • Admin Interface
  • Visual Studio 2010 Tools for Office Runtime
  • WIN080\Software
  • SQLC001
  • Active Directory
This is a fairly long list, but is only part of the story. We (as IT Professionals) then need to take this list and add in the non-obvious things that will help us troubleshoot the system when there's a problem six months after it's gone live and we've all moved on to other projects. Again there is no easy way to do this and you're heavily reliant on vendors to provide "full and complete" information.

The sort of questions that need to be picked out from the system description are; have both applications been installed into the same Application Pool in IIS? Is the Application Pool running as a local user or is it using network credentials? How are we connecting to the database? Are users typing in http://win005 to access the site or have we setup DNS entries (http://infomaps for example)? How are we deciding if a user has access to the Admin Interface? Etc.

So let's assume someone technical has gone through the system, had the discussions with the vendor, and found out how everything is not just connected but configured. Here's the list of things we might like to consider turning into CI's in additional the ones we've already identified;
  • Application Pool: SystemXUserInterface (Installed on WIN005)
  • Application Pool: SystemXAdminInterface (Installed on WIN005)
  • SYSTEMXSERVER (Active Directory account Used by both Application Pools and SQLC001 to grant access to SYSXDB)
  • "UK InfoMaps Standard Users" (Active Directory Group, Used By "System X User Interface")
  • "FR LocalMaps Standard Users " (Active Directory Group, Used by "System X User Interface")
  • "UK InfoMaps Administrators" (Active Directory Group, Used By "System X User Interface")
  • "FR LocalMaps Administrators" (Active Directory Group, Used by "System X User Interface")
  • DNS Entry: (Maps to WIN005)
  • DNS Entry: (Maps to WIN005)
  • (Used by System X Admin Interface to send email notifications)
  • Firewall Ports: 80,443 (Required for access to WIN005)
  • VT001 (Hyper-V server hosting WIN005 - a virtual server)
Now this list is looking a little more comprehensive!

But how do we know we've captured everything? or even captured enough details for us to be able to properly support the system after we've put it in?

In Part 2 we'll look at "testing" our configuration to try and identify the gaps.

Tuesday, August 13, 2013

PL/SQL: Dynamically Building Your Data Archive

The purpose of this blog post is just to outline a design I put together as part of an internal project for dynamically building a data archive using rules based on the source data being fed into the system. It's far from complete but I think it highlights an interesting way of building an archive for your data when you don't know when you're doing the designing exactly what data you will be putting into it.

At the moment in order to put data from various sources into the data archive a multitude of different loading programs are used (SSIS, command-line applications, scripts, etc) each of which uses it's own rules to determine where the source data ends up (largely dependent on what rules the developer used when putting it together) and inter-dependencies are largely invisible.

New feeds are added at a rate of one every other month and the system should cope with this wile keeping track of the dependencies in the database.

In essence the problem this solution is trying to solve is to provide a single point of entry into the data archive where you can put your source data and which will then be put into the archive using a pre-specified set of rules to determine where the data ends up and what format it's in.

A simple diagram for the system is;
System Diagram
The specific bit that is "in scope" for this work is the "LOAD Process". How data gets into the DATASOURCE tables is really dependent on where the data is coming from, what format it's in, etc and it's practically impossible to write something so universally generic to cover every possible option from a CSV text file to a database link.

The aim of the solution will be to process the data as it arrives but it's possible that it could be adapted to work with data in batches.

I've created a fairly simple structure using the tables;
  • SOURCEDATATYPE - This holds a unique reference and description for each different data source
  • STAGINGOUTPUT - This table holds the raw data as loaded into the database from the external feed (I went with this name in case it's necessary to use staging tables for the IMPORT process to manipulate the data prior to it being loaded via the LOAD process)
  • ENTITY - This is the name for a table that is being created as part of the LOAD process in the Data Archive.
  • ENTITYDETAIL - This table contains information on how the data from the source table should be manipulated before being moved into the ENTITY table.
Here's a simple data structure;
Database Structure
As you can see it's pretty self explanatory.

Once you've configured the data source type, and entity details then you're ready to start loading data.

In order to load the database I've created a package called DW_LOADDATA. This has two routines;
  • ProcessAll, and
  • ProcessRow (p_rowID ROWID)
Basically "ProcessAll" loops through the unprocessed rows and passes them one at a time to the "processRow" routine.

The process row routine performs the following steps;
  • Get the new record from STAGINGOUTPUT
  • Identify the ENTITY/ENTITYDETAIL for the feed specified in the STAGINGOUTPUT record
  • Check to see if the ENTITY exists - if not create it.
  • Work out the column name, and if that doesn't exist as part of the ENTITY create it
  • Does a value already exist? If so update it (using MERGE), otherwise INSERT the new value
  • Mark the STAGINGOUTPUT record as processed
Sounds simple? Well it's less than 150 lines of code include comments and formatting ...

The key is the information in the ENTITY/ENTITYDETAIL tables. For example let's suppose I'm loading sales data and I want to create an ENTITY called SUPPLIER_SALES_BY_MONTH with separate columns for each month of data.

In the ENTITY table I'd create a simple record with the name of the new ENTITY (bearing in mind the actual name of the table will be prefixed with the Short_Code from the SOURCEDATATYPE table) and then in the ENTITYDETAIL table create the following rows;

SELECT 1, 1, 2,
  '''PERIOD_'' || TO_CHAR(SO.DATE01, ''YYYYMM'')', -- column_name_expression
  'SO.NUMBER01', -- row_unique_expression
  'OLD.VALUE = NVL(OLD.VALUE, 0) + SO.NUMBER04', -- value_expression
  'NUMBER', -- on_create_type
  '0' -- on_create_default
  '''SUPPLIER_NAME''', -- column_name_expression
  'SO.NUMBER01', -- row_unique_expression
  'OLD.VALUE = SO.TEXT01', -- value_expression
  'VARCHAR2(80)', -- on_create_type
  '0' -- on_create_default

I know "INSERT INTO ..." probably isn't the best way to do this but this is only an example!

As you can see the column_name_expression is looking at the SO (STAGINGOUTPUT) table and formatting the first date to YYYYMM - so a value of 13-JAN-2013 will create/ update the column PERIOD_201301.

The value (for the supplier) is being updated to add on the sales for that month.

The second column that's created is the SUPPLIER_NAME - this is simply the name of the supplier. If I run this using some random test data I end up with a table that looks like;
Generated Table
I've created a script which creates the objects and loads some simple test data. It's available here (via Google Drive - DO NOT RUN IT IN AN EXISTING DATABASE SCHEMA UNLESS YOU WANT OBJECTS STARTING WITH SAL_ TO BE DROPPED!). You'll need to have setup a user with default tablespace permissions in order to get the script to work.

Let me know in the comments if you find this useful