The One thing MSPs Should Monitor to Measure Their Operations

Screen Shot 2021-07-05 at 12.20.44 PM.png

"But Steve, what's the one thing I need to look at to know things are operating as they should be?" I was asked this same question 4 times this past week.  My homework was to figure out what is the One Thing a Service Manager or Owner should be monitoring. 

The challenge is that MSP operations are very complex.  If all we did was Break/Fix, life would be very easy.  But we don't! In an effort to make the complexity simple, we need to divide and conquer the KPI question(s).  As I pondered the arsenal of metrics we’re aware of in the National Gallery of Autotask Advanced Live Reports, it dawned on me: there are three types of metrics: Strategic, Tools, and Performance Indicators.   

We use the Strategic ones to make decisions.  For example, Service Delivery Forecast (SDF), Project Availability Forecast (PAF), etc.  We use the Tools to help us know where to coach the Team or to manage all Open Tickets.  For example, Scheduled Tickets with no future service calls, On-Hold tickets with Due Dates in the past, Reopen Rate and Escalation Rate reports. 

But the big daddy of them all is the Performance Reports.  But whose performance?  An individual’s, the Team’s or the performance of the whole Company? Well, this is where we’ll start the discussions and see how far they may carry us.  It could take the rest of the year to get through Performance, Strategic, and the ones that are just plain Service Delivery Tools. 

Service Delivery KPIs by Role: 

  • Service Delivery Team: Labor Profitability Margins 

  • Techs: Real-Time Time Entry (RTTE) 

  • Service Coordinators: SLA Performance 

  • Service Managers: 

  • Resource Utilization 

  • Reactive Hours per Endpoint per Month (RHEM) 

  • Operations Managers: Mean Time To Resolve (MTTR) 

  • Project Managers: Earned Value 

KPI for Service Delivery:  Labor Profitability Margins 

Hold on now, partner. What about CSAT, SLA, or Resource Utilization? Aren't those important?  Well, yes, of course, but unless the operation is sustainable, none of those scores will matter. And if the combination of these is all optimized, Labor Profitability will be maximized.  Therefore, Labor Profitability is the final scorecard for how the Service Delivery Operation is running. 

But since you brought up those other possible numbers, let's further divide the Service Delivery Operation into the various Roles and Responsibilities on the Team. 

Techs: Real-Time Time Entry (Goal: -.02) 

Yes, this is a negative number.  The reason is that the Time Entry and Documentation, and Customer communications are part of the engagement.   

Any Tech worth his weight in gold is updating the Customer with: 

  • What was accomplished? 

  • What are the next steps? 

  • When will they hear from someone? 

  • And who will they hear from? 

Good Documentation is key to Service Delivery efficiency.  We say it all the time, but do we hold the Techs accountable, and do we schedule Documentation clean-up time? 

Time Entries drive both Customer communication and Documentation.  Therefore, Real-Time Time Entry is the Tech's KPI, and it’s what they’re being paid to do, along with fixing the Customers' sh.. (Stuff). 

Service Coordinator:  SLA performance (Goal: within Contractual and non-Contractual Agreements)  

Service Coordinators are responsible for Intake and Open Ticket Management.  As the hub of the Company, they should be empowered to take ownership of every Open Ticket and every Tech's Dashboard/Calendar.  The beauty of the SLA Performance metric is that it has three parts: 

  1. First Response: Is the Service Coordinator Triaging the Tickets in Timely Fashion? 

  2. Tech Engagement: Is the Service Coordinator proactively managing the workload across all Techs and setting them up for success? 

  3. Completion: Is the Service Coordinator managing all Open Tickets, driving them from New to Completion, and alerting the Service Manager when full Team Collaboration is needed? 

P.S. from the Advanced Global Service Coordinator Training proposal: 

Prerequisites with Owner's sign-off before we start Service Coordinator training: 

The Service Coordinator / Dispatcher is the hub of the Company. If this person does not handle the Service Coordinator / Dispatcher job, then no matter what the rest of the Team does, the Customer will be disappointed, and Chaos will reign. The Service Coordinator / Dispatcher also needs to be the Single Point of Coordination (SPoC) for all Customer requests, which means they must have complete Ownership of the Techs calendar, workday, and duties. Service Coordinator / Dispatcher daily duties reviewed, adjusted if needed, and accepted. The Service Coordinator / Dispatcher is one of 3-Sys Admin and needs to know how to maintain the following Dashboards, WFRs, and Holiday Sets updated, Renew Live Report Schedules Giving Client Portal access 

Service Manager: There are two 

Resource Utilization (Goal: 80%) 

This is the inventory of an MSP, and the labor required to provide the services.  The average in the industry is 70%.  Best-in-class (that’s you!) regularly run about 80-84%.  It’s not uncommon for an MSP to be running with a Resource Utilization at around 50%.  Meaning, 50% of their pay is spent providing services to the Customer, and the rest is wasted.    Some waste is necessary overhead (Training, Research, Standard Build documentation, POC, etc.), but not every minute spent goes toward driving the company's core function, at least not in the near term.  All these activities are to position the Company to provide better service to the Customer in the future.  Even Google doesn't allow more than 20% of any employee’s time to be on non-near-term productivity. 

RHEM (Goal: -.20) 

This is a tricky calculation as you need to know exactly how many endpoints you’re supporting.  But what’s an endpoint?  People and devices!  Yes, but what people?  We could count the # of computers, but some are spares or to support machine-to-machine automation. Do we count those?  We could count users, but if 2 or 3 people share the same workstation, does it triple what it takes to support the one machine?  What about which devices?  Every printer, scanner, POS device?  So, we leave it up to each MSP to decide, but then there is no industry benchmarking.   

Here is Webroot’s definition:   

"An endpoint is any device that is physically an endpoint on a network. Laptops, desktops, mobile phones, tablets, servers, and virtual environments can all be considered endpoints. When one considers a traditional home antivirus, the desktop, laptop, or smartphone that antivirus is installed on is the endpoint. " 

Building off Webroot, we will define Endpoint as any hardware item that is listed in Autotask Configurations. 

But that’s only part of the problem:  What’s the average technician effort, and how many tickets were completed last month?  These are easier numbers to get to, but most MSPs are unaware of the actual number.  And when you look at the tech's time entries (as an explanation for the low utilization #), we often hear, "Well, that’s not all the time."  OK, but then where is it?  Are we back to R-TTE is the most important Tech KPI – why yes!  And as far as the # of Tickets, remove Merged Tickets and Alerts with no time entries. 

FYI: this number only pertains to Managed Service Customer tickets, but you can extrapolate the efficiency number across the rest of the organization.   

Operations Manager: MTTR (Goal: 2 Bus Days) 

MTTR is the average of the total time the Customer is waiting for something to be fixed, changed, or installed.  It pertains to non-project work as that’s all negotiable.  We hide behind the Status SLA Event "Waiting Customer," not to be confused with the Status "Waiting Customer" (why could Autotask not use the same naming convention for two different things – I mean, is a Project a Project or is it a Ticket??).  By “hiding behind”, I mean we put tickets on-hold and then claim we completed them within our contractual obligations (read: SLA expectations).  FALSE!  We trick the system into thinking we're doing well, when in fact, from the customer's perspective, they’ve been waiting a long time and don't want to hear excuses.  From a Customer Experience point of view – MTTR is a better indicator of how well the MSP is delivering a Superior Service to the customer than SLA Performance. 

The reason it’s the Operations Manager's KPI is that the Operations Manager is responsible for the process of getting tickets from New to Complete.  The Service Coordinator is responsible for driving the tickets through every process the Ops Manager has deemed the MSP’s standard. 

Project Manager: Earned Value 

It’s been years since I have calculated Earned Value.  It’s a handy calculation that can detect if the cost overrun is a schedule problem or a scope problem.  If you want to know more about it, either Google “Earned Value”, or shoot me an email.  I'd be happy to dust off the paperback book and refresh my memory.  By the time we get around to writing on this specific KPI, we'll have more information and another voice to add to the discussion. 

In the meantime, sit back and relax.  This is only the beginning of the KPI train ride.  Just like on September 6th, when we kicked off a year-long discussion on Service Delivery Foundational Improvements (SDFI), this series has a long way to go.  Unlike SDFI (how to properly use Autotask so you can take full advantage of the tool), Service Delivery Performance Improvements (SDPI) are much more difficult.  Mostly because, unlike foundational improvements, we're not dealing with a machine; we're dealing with people.  And like the difference between a Network Device and an End-User device, people's opinions matter.   And for these reasons, there’s no end in sight.  Stay with us and enjoy the ride.