Skip to content

Voice Enabled RAG-Based AI Agent for Paychex Product Recommendations

Goergen Institute
Paychex
Capstone Team
Baichuan Duan  ·  Lexiang Yang  ·  Shixin Lin  ·  Zhengyang Zhu
University of Rochester × Paychex  ·  April 2026
A Capstone Project · Spring 2026

Listening Between the Lines of a Sales Call

How a team of graduate students built an AI that hears what clients really mean — and surfaces the right Paychex product at the right moment.

This capstone project, sponsored by Paychex, Inc., developed a voice-enabled AI recommendation agent that listens to real-time sales conversations and suggests relevant Paychex products to sales representatives. The system combines real-time speech transcription, pain-point detection, and retrieval-augmented generation (RAG) to deliver timely, grounded product recommendations during live customer interactions.

187
Products
11
Categories
87
Pages
<200ms
Latency

Background

Paychex is a leading provider of human capital management solutions in the United States, offering payroll and HR services to nearly 800,000 business clients every day, Paychex handles hundreds of thousands of sales and service interactions, each of which represents an opportunity to recommend relevant products. However, sales representatives often need to multitask during live calls, making it easy to miss high-value recommendation opportunities. Our project addresses this challenge by building a real-time AI copilot.

Every conversation is a door that opens only once. Our system makes sure no one walks past it without noticing.

— Project Philosophy

Learn More

Learn more about Paychex, the sponsor of this capstone project, and the Goergen Institute for Data Science, where the project was conducted.

Timeline

February 2026
Foundation
Defined project scope, built the initial data pipeline, and validated the vector knowledge base.
March 2026
Text-Based Agent
Developed the RAG pipeline with intent detection and grounded product recommendations.
April 2026
Voice Integration
Integrated real-time speech transcription with pain-point detection and confidence-based triggering.
Late April 2026
Delivery
Finalized the system, documentation, and handoff to the Paychex Data Science team.

Data

Our knowledge base draws from two complementary sources. A Python crawler harvested 87 product pages from Paychex.com, which we cleaned with regex filters to strip navigation, footers, and marketing noise. In parallel, the Paychex team provided an authoritative catalog of 187 products across 11 categories.

System Pipeline

The system is built on a two-tier streaming architecture, inspired by production sales copilots like Gong and Chorus. Product data from the Paychex website is scraped, cleaned, chunked, and stored in a ChromaDB vector database. During live calls, dual-channel audio capture separates the customer and representative voices and sends each stream to Voxtral for real-time transcription. A Fast Path scores every customer utterance in under 100ms using lexical keywords and embedding-based semantic similarity, maintaining a rolling pain-point state that accumulates weak signals over time. When the rolling state crosses a threshold, a Slow Path runs asynchronously in the background — an LLM verifies the context, and if confidence is high enough, GPT-4o-mini generates a grounded product recommendation via the RAG pipeline. If confidence is moderate, the system surfaces a clarifying question instead.

System Pipeline
Fast Path · runs on every utterance · < 100ms
01
Dual-Channel Capture
——
02
Speaker-Labeled Transcript
——
03
Rolling Pain Scoring
Slow Path · triggered only on threshold cross · async
04
LLM Verifier
——
05
Vector Retrieval
——
06
Recommendation Card

Key Features

  • Dual-channel audio capture for native speaker separation
  • Two-tier architecture: Fast Path scoring + Slow Path reasoning
  • Rolling pain-point state that accumulates weak signals over time
  • Lexical keywords combined with embedding-based semantic scoring
  • Confidence-gated output with clarifying questions as fallback
  • Non-blocking async pipeline with cooldown and dedup logic
Key Features

What the System Does

I
Dual-Channel Audio Capture for Native Speaker Separation
II
Two-Tier Architecture: Fast Path Scoring + Slow Path Reasoning
III
Rolling Pain State with Time-Decaying Signals
IV
Lexical and Embedding-Based Semantic Scoring
V
Confidence-Gated Recommendations with Clarifying Fallback
VI
Non-Blocking Async Pipeline with Cooldown and Dedup

DEMO

Tech Stack

  • Python
  • Web Scraping
  • OpenAI API (GPT-4o-mini, text-embedding-3-small)
  • Mistral Voxtral Transcribe v2 Realtime
  • LangChain
  • ChromaDB
Technical Stack

Built With

Language
Python
Core runtime and orchestration
Language Model
GPT-4o-mini
Intent analysis and recommendation
Speech Model
Mistral Voxtral
Real-time voice transcription
Embedding
text-embedding-3-small
Semantic vector representation
Framework
LangChain
Pipeline orchestration
Vector Store
ChromaDB
Semantic product retrieval
Data Processing
Pandas
Data cleaning and transformation
Version Control
Git & GitHub
Collaboration and deployment

Team Members

The Team Behind This Project
Zhengyang Zhu
Project Manager · Data Scientist · AI Engineer
Baichuan Duan
Data Scientist · AI Engineer
Lexiang Yang
Data Scientist · AI Engineer
Shixin Lin
Data Scientist · AI Engineer

Acknowledgements

We would like to thank our sponsors at Paychex — Daniel Card, Daniel Riggi, Jing Zhu, Ledion Lico, Michael Lyons, and Shubham Tamhane — for their continuous guidance and support throughout the project. We are also grateful to our instructors, Dr. Ajay Anand and Dr. Cantay Caliskan, for their mentorship.

Each conversation represents a meaningful opportunity to cross-sell and upsell relevant products.

— Paychex Project Brief

Resources

Project Repository

Paychex Sales Voice Agent

Explore the project source code, implementation details, and development materials in our GitHub repository.

View on GitHub