Fast Estimation of Wasserstein Distances via Regression on Sliced Wasserstein Distances

Researchers have developed a novel regression-based method for fast, accurate estimation of Wasserstein distances between probability distributions. The approach trains linear models on Sliced Wasserstein distances as predictive features, achieving high accuracy while drastically reducing computation time. This breakthrough accelerates applications in machine learning, computer vision, and single-cell biology where Wasserstein metrics are fundamental.

Fast Estimation of Wasserstein Distances via Regression on Sliced Wasserstein Distances

New AI Research Enables Fast, Accurate Estimation of Complex Wasserstein Distances

Researchers have developed a novel method to rapidly estimate the computationally intensive Wasserstein distance between multiple pairs of probability distributions, a core metric in machine learning for comparing data shapes. The approach, detailed in a new paper (arXiv:2509.20508v2), uses a regression model trained on easier-to-compute Sliced Wasserstein (SW) distances to predict the true Wasserstein distance with high accuracy. This breakthrough promises to significantly accelerate tasks in fields like computer vision, single-cell biology, and generative modeling where these distances are fundamental.

Bridging the Gap Between Speed and Accuracy

The core challenge the research addresses is the trade-off between computational speed and precision. While the exact Wasserstein distance is powerful for measuring distributional similarity, its calculation is prohibitively slow for many real-world applications. Simpler surrogates like the Sliced Wasserstein distance are fast but less accurate. The new method cleverly bridges this gap by using SW distances not as a replacement, but as predictive features.

The model leverages both standard SW distances, which provide a theoretical lower bound, and lifted SW distances, which provide an upper bound. By regressing the true Wasserstein distance onto these bounds, the system learns a highly accurate predictive relationship. The team proposed two efficient linear models: an unconstrained version with a closed-form solution and a parsimonious constrained model using only half the parameters, both learnable from a surprisingly small number of example distribution pairs.

Empirical Validation Across Diverse Domains

The researchers rigorously tested their estimation framework across several high-stakes domains. Empirical validation covered tasks involving Gaussian mixtures, point-cloud classification, and creating visualizations in Wasserstein space for 3D data. The model was evaluated on diverse datasets including MNIST point clouds, ShapeNetV2 3D objects, MERFISH Cell Niches, and scRNA-seq data, which are critical in spatial transcriptomics and single-cell analysis.

Across all tests, the new regression-based estimator consistently outperformed the current state-of-the-art approximation method, Wasserstein Wormhole, particularly in low-data regimes where sample efficiency is paramount. Once trained, the model can predict the Wasserstein distance for any new pair of distributions via a simple, instantaneous linear combination of SW distances, offering a drastic speed-up over direct computation.

Accelerating Existing Frameworks with RG-Wormhole

The utility of the estimator extends beyond standalone use. The researchers demonstrated that it can be integrated into existing pipelines to accelerate them. By employing their fast regressor to guide the training process, they created an enhanced version of the Wasserstein Wormhole model, which they term RG-Wormhole (Regression-Guided Wormhole). This hybrid approach leverages the accuracy of the regression model to make the Wormhole training process more efficient, showcasing the method's versatility as a plug-in component for improving other optimal transport methodologies.

Why This Matters: Key Takeaways

  • Breaks the Speed-Accuracy Trade-off: The method provides a fast, linear-time approximation of the Wasserstein distance without sacrificing the metric's renowned accuracy, enabling its use in large-scale applications.
  • Enables New Scientific Analysis: By making Wasserstein distance calculations feasible for massive datasets in fields like computational biology (e.g., MERFISH, scRNA-seq) and 3D vision (e.g., ShapeNetV2), it opens doors to more nuanced analyses of complex data shapes.
  • Enhances Existing AI Models: The creation of RG-Wormhole proves the estimator's value as a component for accelerating and improving other state-of-the-art machine learning models that rely on optimal transport.

常见问题