Grape expectations: How genAI reshaped a wine company’s customer service team
New York-based Wine Enthusiast offers online customers what it calls everything they need to live the wine lifestyle — from the vino itself to corkscrews, glasses, wine cellars, furniture, and even two magazines on the topic. The company also receives 100,000 customer service inquiries annually.
During the COVID-19 pandemic, the 45-year old online retailer’s presense boomed. Consumers were staying home, nesting, building out their perfect office space, and drinking more.
For more than a year, Wine Enthusiast had been using a SaaS-based platform from San Francisco-based startup Pathlight for performance management metrics of its customer-facing teams. Then Pathlight pitched a new generative artificial intelligence (genAI) product called Conversation Intellgence; it could transcribe every customer service conversation, grade customer reps based on company metrics, and red flag potential problems.
The large language model (LLM) that underpins the tool uses Wine Enthusiast’s own data to learn company policies and prodedures and determine whether a representative followed procedures — and whether the customer left happy or not after a call, according to John Burke, head of customer service and systems at Wine Enthusiast.
When it came to customer service calls, historically, the company had to manually comb through each call to discern customer trends or problems, an impossible task to perform at scale. As a result, Wine Enthusiast barely scratched the surface in terms of analyzing customer service conversations. And when complaints came in, they were all anecodal; finding persistent problems was next to impossible.
Now, genAI tools essentially act as “autonomous analysts,” Burke said. The LLMs the tools use can quickly sift through the bulk of customer conversations, analyze content, and synthesize the transcripts into reports that surface consumer trends and product issues.
Computerworld spoke with Burke about the rollout of genAI at Wine Enthusiast, the project history, hurdles, and benefits.
What problem were you trying to solve with genAI? “The company had a relatively small customer service footprint, and it wasn’t really able to manage the volume of people coming in. People think of customer service as point-of-sales service, but we’re talking product warranties and supporting these products. Wine cellars are built to last. Some will last 10 to 15 years and they’re going to need parts and maintenance.
“My role coming in was to figure out how do we responsibly grow this part of the business to meet the demand, the expectation of customers — especially in the Amazon world that we live in today of immediacy and technology — without going out and hiring 60 more people?”
How did you go about solving the issue? “Phase one was getting us on a set of tools and platforms just to better communicate with customers. We’d moved on to Zendesk as our platform, and one of the first challenges we ran into — even though Zendesk was great in allowing us to communicate with the customer — was knowing what those customers were contacting us about.
“That started with us really putting the onus on our service team that when you complete a conversation, answer a couple questions to tell us what you talked about. To no surprise, what we found was that 90% of the reason for conversation was questions. But questions about what?
“I don’t blame the team. They’re moving from call to call and they don’t want to have to stop and answer six or seven questions.
“My focus is not just on call volume or how many tickets you address. It’s about what quality are you delivering the customers and with what consistency? We came across Pathlight because it had a really cool coaching platform that would basically take all these different metrics that mattered to us and roll them into the concept of what they call a Health Score, which is a digestible way for the team to understand where they stand.
“Instead of saying, ‘you’re doing really well in first-contact resolution but not so well in chat response time,’ we say, ‘your overall Health Score is 90 and here’s where you want to improve.’
“About a year into our relationship with Pathlight…, they said they were developing a product that in addition to assessing the [service] agent’s performance, it would also analyze every conversation that happens. So, it can tell you what they’re talking about, what their sentiment is, what the resolution looks like, and if they’re adhering to our policies and procedures. That’s what started us on our journey with AI.”
Are the majority of service rep conversations conducted by voice or messaging app? “Our channel mix is 70% voice and 30% everything else. For us, the challenge was how do we get something meaningful from these phone conversations that in some cases are 20, 30, or even 40 minutes long?
“That’s where the pain point started for us. With Pathlight, we have like a rubric and we can digitally grade our reps. But my leadership team was coming to me saying, ‘John, I just spent 20 minutes listening to a phone call and I graded one conversation. How am I supposed to do my job and also evaluate the team?’
“Historically, the only time we looked at a [service] recording was if the customer complained, and we went back and tried to figure out what happened. We were always looking at the worst of the worst conversations and using them to evaluate our team, when they’re having hundreds of conversations that are perfectly pleasant.”
“We were always looking at the worst of the worst conversations and using them to evaluate our team, when they’re having hundreds of conversations that are perfectly pleasant.”
How much work was involved prior to the genAI rollout evaluating agents? “I have a relatively small leadership team, and I’d say they spent half their time evaluating. So much of that evaluation was about recovery. This customer is upset because their order didn’t arrive on time. Many of the team felt like they were a lawyer trying to build a case against a client. It truly did become half [the management team’s] time either trying to determine who are their top performers and who are the ones on the team who need additional coaching and training, or who’s adhering to our processes?
“For us, we look at certain business metrics. We want to make the customer happy, but we also don’t want to give away the store — finding the balance of how to make the customer feel fulfilled when something goes wrong without an immediate knee-jerk reaction of giving a full refund.”
In what ways did your older method of evaluating customer support fall short of your goals? “We ended up in a space where we were just looking at the worst of the worst. The bigger challenge for me — I sit on our marketing and our commerce meetings — and the question always raised for me was what are the products people are liking or disliking, and what are the key issues we’re seeing? I knew we had a problem when before that meeting every week I was Slacking my team and asking, ‘What have you guys heard this week?’
“It was so anecdotal, and I felt silly presenting that to the marketing team. Their immediate follow-up questions were always, ‘How many, what customers? What product lines? Every time all I could say is, ‘That’s all I’ve got for you.’”
When did you begin your rollout of genAI and when did you complete it? “We started it in August of last year and it took us about a month of experimenting. Then we went fully live around September. We’ve been live ever since. I’d say we’re pretty well done tweaking the prompt. We’ve got it pretty well dialed in as to who we are and what conversations should look like, which has been really helpful.
“We’ve effectively eliminated manual grading. We don’t do it anymore. We just let the system do it.”
Were you concerned that Pathlight’s cloud-based LLM would use your proprietary data to train itself, and could potentially expose your data later on? “I’ve been a student of AI and I like to be on the emerging end of technology. So, I’ve kept myself educated on privacy concerns and ethical limitations of AI — governance and things like that. I didn’t immediately have that concern, partially because we’re not a bank. We’re not in insurance or healthcare. If the language model wanted to learn against our customer base, I wasn’t particularly concerned about that.
“The concerns I did have — and Pathlight was very transparent about it up front — was what about customer credit card information? They assured us the model was trained to detect those patterns and remove them from their learnings, which gave me a little comfort. That’s the only thing we don’t own, the customer’s personal information. Getting that walled off gave us the comfort to say, ‘We’re fine to proceed.’”
Did you create a genAI team to deploy the platform, or did you mostly rely on Pathlight for the expertise? “Being a medium-sized business, we didn’t have the luxury of saying this new space is going to get a new team. It was largely myself and a couple of my managers. We sort of fell into it very early on with Pathlight. The first conversation we had where they were analyzing our calls and showing us what the readouts looked like was not even in a Pathlight product. It was still a prototype. So we were seeing the sausage being made on the back end. We hoped we helped in developing aspects of the product early on.”
You’ve called your genAI tech “autonomous analysts.” Why? How does it work? “The way Pathlight pitched the product to us ended up being the reverse in terms of value. They looked at it and said this is going to be the way the eliminate the manual process of evaluating your team and the byproduct of that will be you’ll know more about what your customers are talking about.
“The value for us was the exact opposite: I want to know what our customers are talking about. I want to fix those issues up front. And then, naturally our team is going to perform better because of that.
“We’ve effectively eliminated manual grading. We don’t do it anymore. We just let the system do it.”
“So, having this robot in the background listening to calls all day long and surfacing the stuff most important to us both on the agent and customer level was incredibly helpful to us, especially when my team’s biggest complaint before was they were spending half their day or more not even doing the work, just listening and scrubbing through calls and then having to go through the manual process of evaluating. That’s another area we struggled in.
“My leadership team has different backgrounds. They have different management styles. One of my managers who has been in this industry for 40 years is a tough grader. It takes a lot to impress her. So, when I looked at scores when manually graded, the agents she evaluated were generally graded a lot lower than one of our other managers who is a little more forgiving.
“When we switched to AI, that bias was removed. What we were seeing was the actual analysis of the conversation without the human nature of thinking, ‘Well, the agent has had a tough week.’ Or ‘the customer was really laying into them, and I think they really did well enough.’ We removed that element from the equation.”
How do you store your customer service interactions, and how is Pathlight’s LLM able to sift through them? “We currently use a cloud-based telephony system called Aircall. Aircall and Pathlight integrate together through APIs. So, basically the conversations are recorded securely on the Aircall side and we give access to Pathlight to access those recordings for a brief period of time to analyze them and move on.
“That was something important to us; We didn’t have to change the way we were operating. We could still use our same phone system and our ticketing system and just securely allow Pathlight secure access to only the data they needed to accomplish the assessment.”
Were there any hurdles? Did you need to tag your data for better recognition, for example? “Admittedly to this day, we’re still tweaking it. So much of the value comes down to the prompts that we put into the AI model on the front end. For us, it was educating the model on what our business was. It’s not as straightforward as, ‘We sell wine.’ You’re going to hear about corkscrews. You’re going to hear about furniture. You’re going to hear about magazine articles. You’re going to hear about refunds.
“It took us a couple iterations with the support of Pathlight to say, ‘It’s just not getting our customer yet.’”
“The other area where we really had to train the model was around our procedures. Early on, we were finding the AI wasn’t able to tell us if the customer’s issue was resolved. That was because it didn’t really understand what ‘resolved’ meant to us. Was that a return? Was that a refund? Is that a credit? So, over time we continued to tweak the prompts, even to the point of helping the system understand a customer doesn’t have to leave the conversation happy if we have accomplished certain goals that protect the business, provide the customer a good experience. They could still be annoyed, but we could have still delivered on what our expectation was.”
“I think that was a learning process for us. We had an initial prompt we built, but it wasn’t until you started seeing the output that we realized we need to tell is a little more about our business, a little more about our products for it to really understand what we were looking for.”
<div id="drr-container" class="cat ">
<section class="pagination">
<a href="https://www.computerworld.com/article/3712252/grape-expectations-how-genai-reshaped-a-wine-company-s-customer-service-team.html" class="page-link prev" rel="prev"><i class="ss-icon ss-navigateleft"></i> Previous</a>
<span class="page-numbers">
<a href="https://www.computerworld.com/article/3712252/grape-expectations-how-genai-reshaped-a-wine-company-s-customer-service-team.html" class="page-link ">1</a>
<a href="https://www.computerworld.com/article/3712252/grape-expectations-how-genai-reshaped-a-wine-company-s-customer-service-team.html?page=2" class="page-link current">2</a>
</span>
<span class="current-page">
Page 2
</span>
</section>
<strong>Did you have to become an LLM prompt engineer? Who took on those roles?</strong> “It was really a partnership. It was largely Pathlight. We had a great team we were working with and we’d get on the phone once a week and look at the data coming in. For a while we did double the work. What I would do is have Pathlight’s AI create the conversation and then I’d separately have my managers do it in isolation. Then we’d compare the results to see not just what they both thought of the interaction, but what did they determine the root issue was? Was this a product return? Was this a product issue? How did they think the rep did?
“That really helped us identify areas where the prompt needed to be updated. We’d talk that through with Pathlight and they’d provide suggestions on how to tweak it or how to ask a question in a certain way to get more detail out of the AI, and that process has continued to work.”
What was the result of using the genAI from Pathlight? “A couple things came out of it, some expected and some not. We knew from our understanding of the product it was going to be able to tell us some of the basics. What did the customer call about? What was their sentiment?
“Some unexpected findings for us is we found some opportunities to double down on hospitality training. We found that there was a higher number of conversations we were comfortable with where the AI was grading the agent’s conversations as negative. Our initial impression was that this had to be wrong, and we adjusted the prompt and made it a little more forgiving, but what we found was that for a lot of customers when they get very demanding our reps didn’t necessarily have all the tools they needed to de-escalate, avoid conflict, to pivot. Or, more importantly, when to get themselves out and pass that conversation along to someone else. I think that was a huge finding for us.”
Did the AI surface anything unexpected? “We identified some product-related themes even before we went live, and this was one of the moments that totally sold me. We’d given Pathlight 150 recordings of totally random conversations, and they were just going to build us something to show us what their tool could do at scale. It just so happened within those 150 conversations they identified four or five customers with similar complaints about a product that ended up being a manufacturing defect that was causing rusting on electrical components. We really lucked out, because we’re not usually able to see something like that this clearly and identify we have a manufacturing issue with this specific set of products…
“We were able to trace it all the way back to the factory. We identified the serial numbers. It ended up being a small subset of a specific [wine] cellar we made. And we addressed it. We also made it right with those customers. That was one of those moments when thought, ‘I don’t know how we ever would have found that insight out.’ That’s not something we were looking for. It’s never happened to one of our products before. It’s not part of any checklist. It wasn’t even part of our evaluation process. We were looking at the rep’s performance, not the products. This was cool.”
Were you able to calculate an ROI? “For many others I’ve spoken with, that’s been the most elusive part of rolling out AI. This is where I continue to struggle. The direct answer is, no. I know it’s delivering value for our business and I can demonstrate that by what my team’s doing on a daily basis. We’ve been able to directly map the investments we’ve made in hospitality training to hire [customer service reps] because we know this conversation is happening the way they’re supposed to. We’ve been able to assess and confirm procedural compliance and we know our agents are using the tools we’ve given them to make the right business decisions. But, I don’t know that we’ve ever been able to put a specific number on it to be able to say this drive this percent of ROI.
“It’s been a validation tool more than anything. I allows us to assess what’s going on. It has given us the ability to be more nimble. Perfect example: we ran a promotion with a sale a couple weeks ago. We ran a test to see how people would respond to a shipping-related discount. It was not something we’d ever done before. The marketing team asked if we can figure out what customers are saying.
“So, we put in a prompt for the system to look at conversations over the past two days; who’s talking about this shipping discount and what are they saying. We were very [quickly] able to pull our an analysis of it. Alternatively, I can’t think of another good way of doing that, short of saying to the team, ‘Listen to every call and have every rep make a tally every time a customer mentions that.’ And a lot of that would have to be planned ahead of time. There’s almost no way to go back afterward and say, ‘Hey, how did that promotion go?’”
Is there a next step in evolving your AI strategy? “What we have done thus far has largely been retrospective. Let’s look back on how the team performed over the past six months. Let’s look at how reps are improving on certain metrics. Let’s look at how they’re reacting to products. I want to get to a more proactive state, like with the corrosion issue. I want to find more of those. Show me the thing that maybe a human wouldn’t even put together that we can go out and action off of and maybe fix something before it’s really a problem.
“There are all these examples of times when we present an issue and ask how many examples of it do we have? Well, maybe one or two. Well, then is it a coincidence? Do we invest resources in fixing it? I think if we can move to a place where we’re prompting the model not just to look back on what it’s done but to say, ‘You know our business well enough now. Tell us if something seems odd to you.’”
</div>