Unlocking Customer Insights at Scale: Batch Inference with LLMs on Databricks

Analyze multilingual survey data at scale — translated, summarized, and classified using AI — all within Databricks.Image by ChatGPT-4oWhy This MattersModern organizations constantly collect feedback through surveys, support tickets, and product review…


This content originally appeared on Level Up Coding - Medium and was authored by Summer

Analyze multilingual survey data at scale — translated, summarized, and classified using AI — all within Databricks.

Image by ChatGPT-4o

Why This Matters

Modern organizations constantly collect feedback through surveys, support tickets, and product reviews. However, most of this feedback is unstructured, diverse in language, and difficult to process manually. Trying to understand it all without automation is labor-intensive and limits how fast teams can respond to customer needs.

Batch inference offers a scalable solution to this problem. By combining Databricks’ AI platform capabilities with generative AI models, we can automate the translation, classification, and summarization of customer feedback. The outcome: faster insights and data-driven decisions.

This isn’t just about improving efficiency — it’s about giving cross-functional teams access to insights they can trust. With batch inference, product managers, analysts, and support teams don’t need to wait for data science teams to tag survey responses. The insights are continuously updated as new data arrives.

In this post, we’ll walk through a complete example of how to:

  • Load customer survey data from CSV files using Auto Loader
  • Use ai_translate() to convert feedback into English
  • Apply ai_query() with Llama 3 to identify sentiment, summarize responses, and detect key topics
  • Build a real-time dashboard for easy exploration

This project showcases how batch inference can bring value to both technical and business teams by automating insight generation from qualitative feedback.

What Is Batch Inference with LLMs on Databricks?

Batch inference is the process of applying machine learning models — like LLMs — to large datasets all at once, rather than in real-time. It’s useful when you have a backlog of data or routinely process data in batches, such as weekly survey exports or monthly customer reviews.

On Databricks, you can use SQL-native functions like ai_query() and ai_translate() to apply these models directly inside your workflows—no external APIs or custom ML infrastructure required.

Key Benefits:

  • Scalable: Works with billions of rows thanks to serverless compute
  • Efficient: Process new data incrementally with Auto Loader
  • Secure and governed: Integrated with Unity Catalog and Delta Lake
  • Accessible: Use SQL and Python, no ML engineering required

Setting the Stage: The Survey

To demonstrate this workflow, we built a sample survey using Google Forms. It includes 5 questions:

  1. How satisfied are you with our service?
  2. What did you like most?
  3. What could we improve?
  4. How likely are you to recommend us to a friend or colleague?
  5. Any additional comments?
Image by Author

We generated responses in English, Spanish, Chinese, French, and Japanese to reflect real-world feedback diversity. You can download the CSVs here and follow along in your own workspace.

Step 1: Incremental Ingestion with Auto Loader

To start, we saved each batch of survey results as a CSV file and uploaded them into a Databricks Volume. Using Auto Loader, we ingested the data into a Delta table incrementally:

Auto Loader ensures that new files are processed exactly once, supports schema evolution, and works well for production pipelines. It’s ideal for survey systems that may evolve over time.

Before we move forward, let’s take a quick look at the raw dataset.

Image by Author

As shown in the image above, the original survey results are a mix of different languages, with inconsistent formats and open-ended responses. It’s messy and unstructured — exactly the type of data that’s difficult to summarize manually. Without AI, turning this into actionable insights would be extremely time-consuming and error-prone.

Step 2: Translate Responses with ai_translate()

Next, we applied ai_translate() to convert non-English feedback to English:

This approach gives us access to both the original and translated versions, helping maintain transparency and traceability.

Step 3: Normalize and Prepare Data for AI

With the translated responses available, we built a view that uses English as the standard language for all downstream analyses:

By coalescing the original and translated columns, we ensure each record is complete and ready for inference.

Step 4: Generate Insights with ai_query() and LLaMA 3

Now for the most exciting part — extracting insights using LLMs. The ai_query() function allows us to embed natural language prompts inside SQL queries. For each customer response, we extract:

  • Sentiment (Positive / Neutral / Negative)
  • A one-sentence summary of the feedback
  • Topic classification (e.g., Support, UI, Shipping)

These insights can help product and support teams prioritize improvements based on real customer data.

Because ai_query() is built into Databricks SQL, you don’t need to host your own models or manage APIs. Everything runs on serverless infrastructure and scales automatically.

Step 5: Visualize Results in a Dashboard

The final step is making insights accessible to stakeholders. We can create a dashboard using the Databricks SQL editor:

  • Pie chart of sentiment (Positive, Neutral, Negative)
  • Bar chart of topics categorized by sentiment
  • Table of feedback summaries
  • Filters for topic and translation status

Business users can explore trends over time, highlight areas for improvement, and identify customer pain points — without writing a single line of code.

The dashboard updates automatically when new survey files are ingested.

Why This Approach Works

This workflow provides a fast, scalable way to process and analyze unstructured survey feedback. With just a few SQL and Python commands, we:

  • Automated ingestion of multilingual survey data
  • Used AI functions to translate and enrich responses
  • Applied LLM-powered classification and summarization
  • Built a fully interactive dashboard

It’s a reusable solution that can be adapted for product reviews, support tickets, internal employee feedback, and more.

Try It Yourself

Want to build this for your team? Start with a CSV file of survey responses and use Auto Loader to ingest them. From there, apply ai_translate() and ai_query() to generate insights. You’ll be amazed at how much value is hidden in free-text feedback.

While this post focused on customer surveys, the same workflow applies to many real-world applications — such as product reviews, app store feedback, support tickets, and onboarding responses. Anywhere your team is dealing with open-ended text, this pattern can automate insight generation.


Unlocking Customer Insights at Scale: Batch Inference with LLMs on Databricks was originally published in Level Up Coding on Medium, where people are continuing the conversation by highlighting and responding to this story.


This content originally appeared on Level Up Coding - Medium and was authored by Summer


Print Share Comment Cite Upload Translate Updates
APA

Summer | Sciencx (2025-05-07T14:17:54+00:00) Unlocking Customer Insights at Scale: Batch Inference with LLMs on Databricks. Retrieved from https://www.scien.cx/2025/05/07/unlocking-customer-insights-at-scale-batch-inference-with-llms-on-databricks/

MLA
" » Unlocking Customer Insights at Scale: Batch Inference with LLMs on Databricks." Summer | Sciencx - Wednesday May 7, 2025, https://www.scien.cx/2025/05/07/unlocking-customer-insights-at-scale-batch-inference-with-llms-on-databricks/
HARVARD
Summer | Sciencx Wednesday May 7, 2025 » Unlocking Customer Insights at Scale: Batch Inference with LLMs on Databricks., viewed ,<https://www.scien.cx/2025/05/07/unlocking-customer-insights-at-scale-batch-inference-with-llms-on-databricks/>
VANCOUVER
Summer | Sciencx - » Unlocking Customer Insights at Scale: Batch Inference with LLMs on Databricks. [Internet]. [Accessed ]. Available from: https://www.scien.cx/2025/05/07/unlocking-customer-insights-at-scale-batch-inference-with-llms-on-databricks/
CHICAGO
" » Unlocking Customer Insights at Scale: Batch Inference with LLMs on Databricks." Summer | Sciencx - Accessed . https://www.scien.cx/2025/05/07/unlocking-customer-insights-at-scale-batch-inference-with-llms-on-databricks/
IEEE
" » Unlocking Customer Insights at Scale: Batch Inference with LLMs on Databricks." Summer | Sciencx [Online]. Available: https://www.scien.cx/2025/05/07/unlocking-customer-insights-at-scale-batch-inference-with-llms-on-databricks/. [Accessed: ]
rf:citation
» Unlocking Customer Insights at Scale: Batch Inference with LLMs on Databricks | Summer | Sciencx | https://www.scien.cx/2025/05/07/unlocking-customer-insights-at-scale-batch-inference-with-llms-on-databricks/ |

Please log in to upload a file.




There are no updates yet.
Click the Upload button above to add an update.

You must be logged in to translate posts. Please log in or register.