Weight-Space Linear Recurrent Neural Networks

WARP (Weight-space Adaptive Recurrent Prediction) is a novel recurrent neural network architecture that parameterizes its hidden state as the weights of an auxiliary neural network. This enables gradient-free adaptation and in-context learning, with a physics-informed variant outperforming the next best model by more than an order of magnitude (10x). The model ranked in the top three on 4 out of 6 challenging real-world datasets.

Weight-Space Linear Recurrent Neural Networks

WARP: A Brain-Inspired AI Model Unifying Weight-Space Learning and Linear Recurrence

Researchers have introduced WARP (Weight-space Adaptive Recurrent Prediction), a novel sequence modeling framework that fundamentally rethinks the architecture of recurrent neural networks. By explicitly parameterizing its hidden state as the weights of an auxiliary neural network, WARP enables efficient, gradient-free adaptation and demonstrates superior performance on a range of challenging tasks, including a physics-informed variant that outperforms the next best model by more than an order of magnitude.

Redefining the Hidden State in Sequence Modeling

Conventional Recurrent Neural Networks (RNNs) process sequences by collapsing temporal information into a fixed-dimensional hidden state vector. In a significant departure, the WARP model parametrizes its entire hidden state as the weights and biases of a separate, auxiliary neural network. The recurrence is driven by input differences, creating a dynamic, brain-inspired system where the "memory" is a fully functional, adaptable sub-network.

This architectural shift from a static vector to a malleable weight-space is the core innovation. It allows the auxiliary network's parameters to be updated efficiently at test-time without backpropagation, enabling powerful in-context learning capabilities. Furthermore, the framework allows for the seamless integration of domain knowledge, such as physical laws, directly into the model's structure as priors.

Empirical Performance and Generalization

Empirical validation across diverse benchmarks confirms WARP's transformative potential. The model matches or surpasses state-of-the-art baselines on various classification tasks, ranking in the top three in 4 out of 6 real-world, challenging datasets. Its expressiveness and generalization were further proven through extensive experiments in sequential image completion, multivariate time series forecasting, and dynamical system reconstruction.

The most striking result comes from a physics-informed variant of WARP. This version, which integrates specific physical priors into the auxiliary network's formulation, outperformed the next best model by more than 10x, highlighting the profound advantage of its adaptable, structured memory. Ablation studies solidified the necessity of the model's key components, establishing weight-space linear RNNs as a compelling new paradigm.

Why This Matters: Key Takeaways

  • Architectural Innovation: WARP moves beyond fixed hidden states, using a full neural network's weights as a dynamic, adaptable memory system.
  • Efficient Adaptation: The model supports gradient-free, test-time adaptation of its auxiliary network, enabling strong in-context learning.
  • Prior Integration: The framework uniquely accommodates the integration of domain-specific knowledge, such as physics, leading to orders-of-magnitude performance gains in specialized applications.
  • Broad Competence: It demonstrates state-of-the-art or superior results across classification, forecasting, and reconstruction tasks, proving its generalizability.

常见问题