Listening Between the Lines of a Sales Call
How a team of graduate students built an AI that hears what clients really mean — and surfaces the right Paychex product at the right moment.
This capstone project, sponsored by Paychex, Inc., developed a voice-enabled AI recommendation agent that listens to real-time sales conversations and suggests relevant Paychex products to sales representatives. The system combines real-time speech transcription, pain-point detection, and retrieval-augmented generation (RAG) to deliver timely, grounded product recommendations during live customer interactions.
Background
Paychex is a leading provider of human capital management solutions in the United States, offering payroll and HR services to nearly 800,000 business clients every day, Paychex handles hundreds of thousands of sales and service interactions, each of which represents an opportunity to recommend relevant products. However, sales representatives often need to multitask during live calls, making it easy to miss high-value recommendation opportunities. Our project addresses this challenge by building a real-time AI copilot.
Every conversation is a door that opens only once. Our system makes sure no one walks past it without noticing.
— Project PhilosophyLearn More
Learn more about Paychex, the sponsor of this capstone project, and the Goergen Institute for Data Science, where the project was conducted.
Timeline
Data
Our knowledge base draws from two complementary sources. A Python crawler harvested 87 product pages from Paychex.com, which we cleaned with regex filters to strip navigation, footers, and marketing noise. In parallel, the Paychex team provided an authoritative catalog of 187 products across 11 categories.
System Pipeline
The system is built on a two-tier streaming architecture, inspired by production sales copilots like Gong and Chorus. Product data from the Paychex website is scraped, cleaned, chunked, and stored in a ChromaDB vector database. During live calls, dual-channel audio capture separates the customer and representative voices and sends each stream to Voxtral for real-time transcription. A Fast Path scores every customer utterance in under 100ms using lexical keywords and embedding-based semantic similarity, maintaining a rolling pain-point state that accumulates weak signals over time. When the rolling state crosses a threshold, a Slow Path runs asynchronously in the background — an LLM verifies the context, and if confidence is high enough, GPT-4o-mini generates a grounded product recommendation via the RAG pipeline. If confidence is moderate, the system surfaces a clarifying question instead.
Key Features
- Dual-channel audio capture for native speaker separation
- Two-tier architecture: Fast Path scoring + Slow Path reasoning
- Rolling pain-point state that accumulates weak signals over time
- Lexical keywords combined with embedding-based semantic scoring
- Confidence-gated output with clarifying questions as fallback
- Non-blocking async pipeline with cooldown and dedup logic
What the System Does
DEMO
Tech Stack
- Python
- Web Scraping
- OpenAI API (GPT-4o-mini, text-embedding-3-small)
- Mistral Voxtral Transcribe v2 Realtime
- LangChain
- ChromaDB
Built With
Team Members
Acknowledgements
We would like to thank our sponsors at Paychex — Daniel Card, Daniel Riggi, Jing Zhu, Ledion Lico, Michael Lyons, and Shubham Tamhane — for their continuous guidance and support throughout the project. We are also grateful to our instructors, Dr. Ajay Anand and Dr. Cantay Caliskan, for their mentorship.
Each conversation represents a meaningful opportunity to cross-sell and upsell relevant products.
— Paychex Project BriefResources
Paychex Sales Voice Agent
Explore the project source code, implementation details, and development materials in our GitHub repository.
View on GitHub