|Who knew Delorean's topped out at 88 MPH...|
First, let's set the stage by defining a set of business questions we want to answer with our analysis. Let's say we want to see what kind of patterns are in our event data across the hours in a business day. And see if those patterns are consistent across days, months, years. Let's say we're trying to understand our peak hours versus our slow times to help with capacity planning, scheduling, etc. This will give us a good goal to work towards.
Now for this exercise, we need a model that will allow us to slice by date, as well as time, to see these patterns. Here's the basic dimensional model that we'll build to address the questions above.
I tried a few different approaches, and the one I settled on was this: Make the assumption (for this example) that each customer shops at the same time everyday. Given that, I multiplied each customer's CustomerID by 10, and then added that many seconds to 00:00 time. Then combine that with the OrderDate of the customer's orders. Not the best solution, but it works out to produce a nicely distributed dataset. The altered OrderDate SQL we'll work from is calculated as:
DATEADD(second,(CustomerID * 10),[OrderDate])
Now, with the source datetimestamp in place, we'll need to define a date column and a time column in our Sales table that we can join to the date dimension and time dimension respectively.
Calculated Column Solution:
One way to do this is to simply bring the raw datetime into your PowerPivot / Tabular model, and then create calculated columns with DAX to break apart the time and date. The DAX for the Time column would be:
And the DAX for the date column would be:
Technically this works. You get the grain right (seconds) with minimal effort. And you get the '1899-12-30' base date by default without having to do anything special. The downside of this approach, as I mentioned in the previous post, is that you waste precious memory loading the raw datetime source field that you likely won't even use in your analysis. This field typically has a unique value on every row, so the Vertipaq compression algorithms won't be as effective on these compared to splitting the date from the time in the source query. For small datasets where optimization isn't that important, feel free to stop here.
Datasource Query Solution:
On larger datasets, and when optimization and query speed are critical, there's a better approach we can explore. It involves splitting the source datetimestamp into separate date and time columns in the source query. This is the approach I mentioned in my prior post that Webb, Russo and Ferrari touched on in the tuning section of their book on AS Tabular.
To accomplish this splitting, we use some date manipulation in our source query SQL. The date part is pretty straight forward so let's knock that out first.
We need a simple date with zeros for the time component. I originally tried to just convert the OrderDate from Datetime to Date datatype (yyyy-MM-dd) to get rid of the seconds. Unfortunately, PowerPivot imported the Date datatype as text. So to avoid that, I simply casted back to datetime in my source query. Like this:
CONVERT(datetime, CONVERT(date, DATEADD(second,(CustomerID * 10),[OrderDate])))
Now for the time component. We need a timestamp with the desired '1899-12-30' base date to match up with our Time dimension. The main challenge to address here (and one that may not be immediately obvious) is matching up the granularity of the event time data with the grain of our Time dimension. For now, we'll stick with the grain I defined in the last post: seconds in a 24 hour period.
The source field for our timestamp (SalesOrderHeader.OrderDate) is Datetime datatype (yyyy-MM-dd hh:mm:ss.000). Since we are adding whole seconds to tweak our AW source, technically, we could get away without addressing milliseconds directly. But in the real world, when you're dealing with timestamps, they'll typically have full precision (milliseconds or lower depending on the datatype or source). And you'll need a technique to round off to seconds in order to be able to get a perfect match between each event datetimestamp and your Time dimension.
Several ways to do this. The simplest way I came up with was to convert the source data's date to a time datatype (this truncates the date component), then cast to Datetime2 with zero precision (i.e., no milliseconds - just truncate them). This gives us the time with the default zero date (1900-01-01). Like this:
CONVERT(datetime2(0),(CONVERT(time,DATEADD(second,(CustomerID * 10),[OrderDate]))))
The last little twist is that we need to get the base date of '1899-12-30' that our Time dimension has. For that, we just need to subtract 2 days from the zero date like this:
With that, we've got the 2 date parts we need to build our model. After loading the SalesOrder, Date and Time tables, just define relationships between the OrderDate and OrderTime (from SalesOrder to DimDate and DimTime). With that, we've created this model:
With that, let's focus on answering the questions in our time analysis.
To see the distribution of Sales events across dates and across the times of day, we can create a simple pivot chart and table off our model like this:
Here, I'm slicing by the hour buckets (4 hour blocks) across months. And I've added some conditional formatting in the pivot table below the chart to high-light the time blocks with the most sales.
This let's us see pretty clearly that mid-morning is the busiest time block, while all other periods are relatively busy as well. Also see a sharp rise ins sales in summer 2007 which we could focus in on and look more closely at. None of the 4 hour blocks are completely dead, so no clear downtime for our staffing / retooling question. Apparently Adventure Works is killin it around the clock :-D. Anyway, we can easily slice by different time buckets and filter down to just certain years or months to explore different patterns in the data. Lot of directions you can go with this.
That's it for now. Here's a link to the workbook (2013 version) with the model for reference. I'll build this out a bit more with some more bells and whistles in my next post.