Question: Alternative to MLeap for Real-Time Inference Without Spark Context with SparkXGBClassifier

We are exploring alternatives to MLeap for running inference without Spark, since MLeap has limitations with Spark/PySpark version compatibility and library updates.

Our Setup & Goal

Environment: PySpark 3.5.5

Algorithm: Distri…


This content originally appeared on DEV Community and was authored by himadri bhattacharjee

We are exploring alternatives to MLeap for running inference without Spark, since MLeap has limitations with Spark/PySpark version compatibility and library updates.

Our Setup & Goal

  • Environment: PySpark 3.5.5
  • Algorithm: Distributed ML training using XGBoost with Spark.
  • Goal: Run real-time inference without requiring a Spark session/context, to reduce overhead and response latency.

What We Did

  1. Took a dataset (Titanic), converted it to Parquet, and split it into 80% (train) and 20% (test).
  2. Trained with Spark (80% data) including preprocessing + XGBoost.
  3. Evaluated on Spark (20% data) and logged the trained model.
  4. Tried multiple logging/serialization approaches:
    • MLflow pyfunc
    • ONNX
    • XGBoost native model (JSON/binary)
  5. For inference: loaded the same 20% data, applied preprocessing outside Spark, reloaded the trained model, and ran predictions.

The Problem

  • In all approaches tested (MLflow pyfunc, ONNX, XGBoost native save/load), accuracy differs between:
    • Spark-based evaluation (during training)
    • Non-Spark inference (real-time service)
  • It seems precision is lost when the model is saved and reloaded outside Spark.

Main Requirement

  • The accuracy from Spark-based evaluation and non-Spark inference must match.
  • Need a solution to serialize/deserialize models that works across Spark training and non-Spark inference.
  • Prefer portable formats (JSON or similar).
  • Must avoid Spark context overhead at inference for real-time serving.

Question

👉 Is there any solution or alternative to MLeap for serving models trained with Spark (e.g., XGBoost with PySpark), but performing inference outside of Spark (lightweight, real-time)?

  • Should support PySpark 3.5.5
  • Must work with XGBoost distributed training
  • Should prevent accuracy mismatch between Spark and non-Spark inference
  • JSON or portable serialization preferred

Any recommendations for frameworks, libraries, or best practices beyond MLeap would be greatly appreciated.


This content originally appeared on DEV Community and was authored by himadri bhattacharjee


Print Share Comment Cite Upload Translate Updates
APA

himadri bhattacharjee | Sciencx (2025-08-21T05:55:00+00:00) Question: Alternative to MLeap for Real-Time Inference Without Spark Context with SparkXGBClassifier. Retrieved from https://www.scien.cx/2025/08/21/question-alternative-to-mleap-for-real-time-inference-without-spark-context-with-sparkxgbclassifier/

MLA
" » Question: Alternative to MLeap for Real-Time Inference Without Spark Context with SparkXGBClassifier." himadri bhattacharjee | Sciencx - Thursday August 21, 2025, https://www.scien.cx/2025/08/21/question-alternative-to-mleap-for-real-time-inference-without-spark-context-with-sparkxgbclassifier/
HARVARD
himadri bhattacharjee | Sciencx Thursday August 21, 2025 » Question: Alternative to MLeap for Real-Time Inference Without Spark Context with SparkXGBClassifier., viewed ,<https://www.scien.cx/2025/08/21/question-alternative-to-mleap-for-real-time-inference-without-spark-context-with-sparkxgbclassifier/>
VANCOUVER
himadri bhattacharjee | Sciencx - » Question: Alternative to MLeap for Real-Time Inference Without Spark Context with SparkXGBClassifier. [Internet]. [Accessed ]. Available from: https://www.scien.cx/2025/08/21/question-alternative-to-mleap-for-real-time-inference-without-spark-context-with-sparkxgbclassifier/
CHICAGO
" » Question: Alternative to MLeap for Real-Time Inference Without Spark Context with SparkXGBClassifier." himadri bhattacharjee | Sciencx - Accessed . https://www.scien.cx/2025/08/21/question-alternative-to-mleap-for-real-time-inference-without-spark-context-with-sparkxgbclassifier/
IEEE
" » Question: Alternative to MLeap for Real-Time Inference Without Spark Context with SparkXGBClassifier." himadri bhattacharjee | Sciencx [Online]. Available: https://www.scien.cx/2025/08/21/question-alternative-to-mleap-for-real-time-inference-without-spark-context-with-sparkxgbclassifier/. [Accessed: ]
rf:citation
» Question: Alternative to MLeap for Real-Time Inference Without Spark Context with SparkXGBClassifier | himadri bhattacharjee | Sciencx | https://www.scien.cx/2025/08/21/question-alternative-to-mleap-for-real-time-inference-without-spark-context-with-sparkxgbclassifier/ |

Please log in to upload a file.




There are no updates yet.
Click the Upload button above to add an update.

You must be logged in to translate posts. Please log in or register.