A Faster Path to Smarter AI: The New Anc-VI Method

Anc-VI is a new accelerated value iteration method that improves the convergence rate of reinforcement learning models, achieving faster Bellman error reduction compared to standard VI.


This content originally appeared on HackerNoon and was authored by Anchoring

:::info Authors:

(1) Jongmin Lee, Department of Mathematical Science, Seoul National University;

(2) Ernest K. Ryu, Department of Mathematical Science, Seoul National University and Interdisciplinary Program in Artificial Intelligence, Seoul National University.

:::

Abstract and 1 Introduction

1.1 Notations and preliminaries

1.2 Prior works

2 Anchored Value Iteration

2.1 Accelerated rate for Bellman consistency operator

2.2 Accelerated rate for Bellman optimality opera

3 Convergence when y=1

4 Complexity lower bound

5 Approximate Anchored Value Iteration

6 Gauss–Seidel Anchored Value Iteration

7 Conclusion, Acknowledgments and Disclosure of Funding and References

A Preliminaries

B Omitted proofs in Section 2

C Omitted proofs in Section 3

D Omitted proofs in Section 4

E Omitted proofs in Section 5

F Omitted proofs in Section 6

G Broader Impacts

H Limitations

Abstract

Value Iteration (VI) is foundational to the theory and practice of modern reinforcement learning, and it is known to converge at a O(γ k )-rate, where γ is the discount factor. Surprisingly, however, the optimal rate in terms of Bellman error for the VI setup was not known, and finding a general acceleration mechanism has been an open problem. In this paper, we present the first accelerated VI for both the Bellman consistency and optimality operators. Our method, called Anc-VI, is based on an anchoring mechanism (distinct from Nesterov’s acceleration), and it reduces the Bellman error faster than standard VI. In particular, Anc-VI exhibits a O(1/k)-rate for γ ≈ 1 or even γ = 1, while standard VI has rate O(1) for γ ≥ 1 − 1/k, where k is the iteration count. We also provide a complexity lower bound matching the upper bound up to a constant factor of 4, thereby establishing optimality of the accelerated rate of Anc-VI. Finally, we show that the anchoring mechanism provides the same benefit in the approximate VI and Gauss–Seidel VI setups as well.

1 Introduction

Value Iteration (VI) is foundational to the theory and practice of modern dynamic programming (DP) and reinforcement learning (RL). It is well known that when a discount factor γ < 1 is used, (exact) VI is a contractive iteration in the k·k∞-norm and therefore converges. The progress of VI is measured by the Bellman error in practice (as the distance to the fixed point is not computable), and much prior work has been dedicated to analyzing the rates of convergence of VI and its variants.

\

\ 37th Conference on Neural Information Processing Systems (NeurIPS 2023).

\

:::info This paper is available on arxiv under CC BY 4.0 DEED license.

:::

\


This content originally appeared on HackerNoon and was authored by Anchoring


Print Share Comment Cite Upload Translate Updates
APA

Anchoring | Sciencx (2025-01-14T22:55:55+00:00) A Faster Path to Smarter AI: The New Anc-VI Method. Retrieved from https://www.scien.cx/2025/01/14/a-faster-path-to-smarter-ai-the-new-anc-vi-method/

MLA
" » A Faster Path to Smarter AI: The New Anc-VI Method." Anchoring | Sciencx - Tuesday January 14, 2025, https://www.scien.cx/2025/01/14/a-faster-path-to-smarter-ai-the-new-anc-vi-method/
HARVARD
Anchoring | Sciencx Tuesday January 14, 2025 » A Faster Path to Smarter AI: The New Anc-VI Method., viewed ,<https://www.scien.cx/2025/01/14/a-faster-path-to-smarter-ai-the-new-anc-vi-method/>
VANCOUVER
Anchoring | Sciencx - » A Faster Path to Smarter AI: The New Anc-VI Method. [Internet]. [Accessed ]. Available from: https://www.scien.cx/2025/01/14/a-faster-path-to-smarter-ai-the-new-anc-vi-method/
CHICAGO
" » A Faster Path to Smarter AI: The New Anc-VI Method." Anchoring | Sciencx - Accessed . https://www.scien.cx/2025/01/14/a-faster-path-to-smarter-ai-the-new-anc-vi-method/
IEEE
" » A Faster Path to Smarter AI: The New Anc-VI Method." Anchoring | Sciencx [Online]. Available: https://www.scien.cx/2025/01/14/a-faster-path-to-smarter-ai-the-new-anc-vi-method/. [Accessed: ]
rf:citation
» A Faster Path to Smarter AI: The New Anc-VI Method | Anchoring | Sciencx | https://www.scien.cx/2025/01/14/a-faster-path-to-smarter-ai-the-new-anc-vi-method/ |

Please log in to upload a file.




There are no updates yet.
Click the Upload button above to add an update.

You must be logged in to translate posts. Please log in or register.