We’ve warned before of icebergs – feature requests that seem trivial but hide a ton of developmental complexity. When we set out to help our customers measure and understand the quality of their support, we hit an iceberg.
Our Respond product is all about making support conversations feel natural and personable, similar to consumer messaging. As our customers scaled, we knew growing teams needed a simple and consistent way to understand how they were performing. There are numerous industry standards for this – Customer Effort Score (CES), Customer Satisfaction (CSAT), Net Promoter Score (NPS), etc – but we wanted to understand the core of the problem and design a solution that was in line with our mission. When we talked with customer support managers, we learned what was most important for them to understand was the quality of support their team was giving, and how their customers felt at the end of a conversation.
As Peter Drucker said, “What gets measured, gets managed”. Before we introduced conversation ratings we offered high level information on the volume and speed at which support teams were handling conversations, but that didn’t help teams know if they were doing a good job or giving high quality support. As teams scaled, this became an essential question they were unable to answer using Intercom.
Existing rating patterns
To understand how to approach this problem, we looked broadly at rating patterns across the web. There are a myriad of structured scales for rating things; virtually every product, hotel, restaurant, article, video or even person you view online seems to come attached with a score, rating or recommendation. These signals have become powerful social proof to help us judge the quality and value of products and services we view.
Early in the project, our team began looking more critically at the various rating scales used in consumer products from stars, to thumbs, to text buttons and emoji. We examined the products and places where each scale was used, what it aimed to measure, and how the results were aggregated and used.
We made our decision on what to build based on a number of clear product principles.
1. Be clear what you’re asking
Asking users if they’d recommend your product isn’t enough to tell a support manager how – or even if – a teammate is doing a great job or if they could have done better. We knew asking them to rate their support interaction was going to provide more useful data.
It has to be clear to the end user what they’re being asked to rate.
Clarity is key. It has to be clear to the end user what they’re being asked to rate, and it has to be clear to the business what information has been collected and how it should be used. In initial prototypes we asked customers to “rate your experience”. While we thought this was broad enough to encapsulate all aspects of the experience, it turned out to be too vague. It wasn’t clear if they were being asked to rate the person that helped them, the company as a whole, the product they were using or if they got their problem resolved. We also found that results were being used by support teams to measure the performance of the team and individuals on the team, not the general satisfaction with the company or product. It was essential the question asked was collecting this information.
We tested some variations on the question including using the company name or the teammates name in the question. We found that using the teammate’s name personalized the question and made it clear that you were rating how the person helped you. We also saw that it increased the response rate by 2 percentage points compared to using the company name. The final phrasing we released with was, “Help John understand how they’re doing. Rate your conversation.”
2. It must be universally understood
Our messenger has reached nearly 2 billion end users around the world, so we needed to make sure our rating scale was uniformly interpreted by both the teammates using Intercom and their end users. During design, we explored star, thumbs and emoji rating scales.
One of the first prototypes we explored was a five star rating scale. This type of rating scale aggregates well for reporting, and is a common and clear rating pattern found on Amazon products, movie reviews, Yelp, etc.
However, when we began testing the prototype of the star rating system we uncovered some surprising biases about how end users perceive stars, and the difficulty some managers had interpreting the aggregated results. Star ratings can be subjective with some people only giving a five star rating when they’re bowled over by amazing service and others giving it when they were merely satisfied. This is a constant challenge for Uber, whose drivers are required to maintain a rating above 4.6. With a threshold so high, it seems like the rating system is gamed to only have one positive rating (5 stars), and four distinct ratings for you to express just how unsatisfied you are.
Managers also found it hard to interpret the aggregate results. What does 3.9 stars mean? Seeing that 85% of customers were happy with the service you provided is clear, but seeing that your score is 3.9 stars isn’t.
To avoid subjectivity we considered a simpler thumbs up/thumbs down design like you might find on YouTube and recently on Netflix. Although the latter use this system primarily to train their recommendation engines, we wondered, could it work to measure feelings?
Conversations are never just good or bad.
The advantage of thumbs is that they leave no room for interpretation from the end user’s perspective. A conversation was either “good” or “bad”. It’s also quick and easy to respond. But from the team’s perspective, it lacks the granularity that gives managers a deep understanding of the spectrum of feeling from end users.
Conversations are never just “good” or “bad”; they can be terrible or amazing, and all shades in between. Should a support manager interpret a thumbs up as “ok” or “amazing”? Similarly, does a thumbs down mean the end user didn’t get what they were looking for, or that they were really annoyed to begin with because of their issue?
This system didn’t tell us how the end user felt at the end of a conversation, and didn’t surface the outliers to managers. These are always the first area for managers to review and act on.
It’s no secret that Intercom is a fan of emoji 🙌 . Our Slack channels, internal docs and emails are full of them, not to mention a few blog posts, reports and tweets. One of the early prototypes for conversation ratings used emoji as the rating scale and this ultimately became the direction we pursued. Using emoji faces felt expressive, modern and inline with the Intercom messenger style. Knowing emoji can be subjectively interpreted, we also included a text description for each emoji on hover, so end users would be clear about what rating they were giving.
This prototype was also well received by both teammates and end users. Managers felt it allowed them to understand how their customers felt, and end users felt it was modern and fun, and clearly understood what they were being asked. While most feedback was positive, there are always challenges with some customers feeling it was too expressive or not professional enough for their business.
It may feel like emoji are a fad, but even if they are, they’re clearly not going anywhere. The 2015 word of the year by the Oxford Dictionary was 😂 , and Facebook recently reported that 5 billion emojis are sent every day on its messenger. It’s clear that emoji use is only increasing, and becoming more and more a part of our written language.
3. Don’t just rate…motivate
In the first beta we shipped, we allowed users to add a comment to any rating they gave; positive and negative. In early research, support managers indicated they mainly wanted to measure quality to see what they could improve on; however, once we began looking at the results, we discovered something really interesting. People left comments on positive ratings nearly as much as they did on negative ratings. And support teams loved it!
Users leave comments on about 50% of negative ratings and on about 30% of positive ones. Given that over 80% of the ratings given are positive, we realized we were collecting four times as many positive comments about conversations than negative ones.
Customer support is a hard job, and can sometimes feel a little thankless when there’s a constant stream of incoming queries from customers. It feels good to know when customers have been helped and appreciate it.
Once we uncovered this gold mine of positive motivation, we pushed it one step further. In the top right of the inbox, we added a small heart that fills up when you get a positive rating. Clicking on it creates a pop of confetti on the screen and shows small card showing the feedback. This may seem frivolous but it’s become one of our favorite parts of the feature.
What we built
When a conversation is closed, Operator will ask the end user to rate the conversation. A card is displayed with five emojis to choose from, and the user has the option to add a comment. For the support team, we display insights on the ratings they received, broken down by teammate.
What started out as a simple request to measure customer satisfaction for support teams became a much more complex project than we anticipated. This was a feature many other support products already had, so it may seem like adding a simple “me too” feature would meet the needs of our customers, but doing this can quickly develop into an incoherent product.
In fact, the card we released with three words – “Rate your conversation” – and five emoji almost looks embarrassingly simple. But this simple card provides incredibly powerful information for teams to improve the way they work, has an engagement rate of over 50% in the messenger, and brings a little shot of joy to support teammates’ daily job.