Saturday, December 15, 2012

A Time dimension specifically for PowerPivot & AS Tabular

I need a time dimension BOYEE!!
The other day I needed to build a model that would enable slicing of events by time of day.  My users wanted to be able to easily focus in on events that happened just in the morning or just after noon.  Several different slices or bands of time.

So obviously, since I'm a big proponent of dimensional models, and since my users naturally think dimensionally when they analyze and explore data, I knew that I needed a big clock in my model... 

Make that a time dimension.  But not just any old time dimension.  I needed one with flavor...








Uh..  ok, by flavor, I mean I needed my time dimension to work with PowerPivot and AS Tabular.  Specifically, I needed the time component (the alternate key) returned as a datetime field with 12/30/1899 as the base date. 
 
Quick aside, I just finished reading the SQL2012 Analysis Services Tabular book by Chris Webb, Marco Russo and Alberto Ferrari.  Great book and definitely a must read for anyone working with AS Tabular.  In it, they go into detail about a tuning technique involving splitting a datetime stamp column into 2 parts in the relational source query to optimize memory usage in the model.  The basic idea is that typically your analysis will be on either dates or times, but not both at the same time (both on a given axis or filter).  So the datetime stamp is more useful separated.  And instead of bringing the datetime stamp into your model and then breaking out the separate date and time components in calculated columns within the model, it's a more efficient use of memory for your model to split the 2 parts in the source query.  Especially since the datetime stamp with typically have many more unique values (potentially every row could be unique), and that means you would get minimal compression on the field.

That said, the other key part to this solution is that PowerPivot and AS Tabular use a zero date of 12/30/1899.  In terms of the actual underlying data values, PowerPivot & Tabular represent a datetime value as a decimal where the numbers to the left of the decimal point represent the date and the numbers to the right of the decimal represent the time component.

So I wanted to get a datetime value that I could work with on its own in my Tabular model.  But I also wanted one that I could easily add back to the date component if I ever needed it for detail-level reporting or something. 

I set out looking for a query to build the full set of time data points that I needed (every second for a 24 hour period). I already had a recursive CTE to produce my date dimension, so I knew I could use the same approach to produce this time dimension.  Thankfully, I found a sample (thanks to Mark Hudson - check his blog here).  And I was able to tweek it to return the base date I needed and the bands my users wanted.  The final result set was this:
 










Obviously, you can extend this and add more useful attributes and user-defined bands.  Below is the SQL code.  Hopefully this is helpful for anybody out there that needs to build a time dimension for their dimensional model with bands.  And especially for the PowerPivot and Tabular users that need to do time-based analyses.
 
Next time I'll walk through my actual scenario and how I integrated the time dimension into my Tabular model. 

-----------------------------------------------------------------------------------------------------------------------

WITH cteTime
AS
(
  SELECT CAST('00:00:00' AS TIME(0)) ClockTime

  UNION ALL
  SELECT DATEADD(SECOND,1,ClockTime)
  FROM cteTime
  WHERE ClockTime < CAST('23:59:59' AS TIME(0))
)
SELECT
 ((DATEPART(HOUR,ClockTime) * 10000)
  + (DATEPART(MINUTE,ClockTime) * 100)
  + DATEPART(SECOND,ClockTime)) AS TimeKey
 ,CONVERT(datetime,'12/30/1899 '+ CONVERT(VARCHAR(8),ClockTime)) as DateTimeStamp
 ,ClockTime
 ,CONVERT(VARCHAR(10),ClockTime,109) AS Time12HourDescription
 ,DATEPART(HOUR,ClockTime) AS HourID
 ,DATEPART(MINUTE,ClockTime) AS MinuteID
 ,CASE
  WHEN DATEPART(MINUTE,ClockTime) < 15 THEN 1
  WHEN DATEPART(MINUTE,ClockTime) < 30 THEN 2
  WHEN DATEPART(MINUTE,ClockTime) < 45 THEN 3
  WHEN DATEPART(MINUTE,ClockTime) < 60 THEN 4
  END AS QuarterHourID

 ,ROW_NUMBER() OVER (ORDER BY ClockTime) as SecondSequence
 ,CASE
  WHEN ClockTime BETWEEN '00:00:00' AND '03:59:59' THEN 'After Midnight'
  WHEN ClockTime BETWEEN '04:00:00' AND '07:59:59' THEN 'Early Morning'
  WHEN ClockTime BETWEEN '08:00:00' AND '11:59:59' THEN 'Late Morning'
  WHEN ClockTime BETWEEN '12:00:00' AND '15:59:59' THEN 'Afternoon'
  WHEN ClockTime BETWEEN '16:00:00' AND '19:59:59' THEN 'Evening'
  WHEN ClockTime BETWEEN '20:00:00' AND '23:59:59' THEN 'Late Night'
  END AS PeriodName
FROM cteTime
  OPTION (MAXRECURSION 0)


No comments:

Post a Comment