This content originally appeared on DEV Community and was authored by polar3130
Organizations adopting LLMs at scale often struggle with fragmented API usage, inconsistent authentication methods, and lack of visibility across teams. Tools like Gemini CLI make local development easier, but they also introduce governance challenges—especially when authentication silently bypasses centralized gateways.
In this article, I walk through how to route Gemini CLI traffic through LiteLLM Proxy, explain why this configuration matters for enterprise environments, and highlight key operational considerations learned from hands-on testing.
Why Use a Proxy for Gemini CLI?
Before diving into configuration, it’s worth clarifying why an LLM gateway is needed in the first place.
Problems with direct Gemini CLI usage
If developers run Gemini CLI with default settings:
- Authentication may fall back to Google Account login → usage disappears from organizational audits
- API traffic may hit multiple GCP projects/regions → inconsistent cost attribution
- Personal API keys or user identities may be used → security and compliance risks
- Team-wide visibility into token usage becomes impossible → cost governance cannot scale
LiteLLM Proxy as a solution
LiteLLM Proxy provides:
- A unified OpenAI-compatible API endpoint
- Virtual API keys with per-user / per-project scoping
- Rate, budget, and quota enforcement
- Centralized monitoring & analytics
- Governance applied regardless of client tool (CLI, IDE, scripts)
This makes it suitable for organizations where 50–300+ developers may use Gemini, GPT, Claude, or Llama models across multiple teams.
Architecture Overview
For this walkthrough, I deployed LiteLLM Proxy onto Cloud Run, using Cloud SQL for metadata storage.
Why this design?
- Cloud Run scales automatically and supports secure invocations.
- Cloud SQL stores key usage, analytics, and configuration.
- Vertex AI IAM is handled via the LiteLLM Proxy’s service account.
- API visibility is centralized and independent of client behavior.
Caveats
- Cloud SQL connection limits must be considered when scaling Cloud Run.
- Cold starts may slightly increase latency for short-lived CLI invocations.
- Multi-region routing is out of scope but may be required for HA.
Configuration: LiteLLM Proxy
Below is a minimal configuration enabling Gemini models via Vertex AI:
model_list:
- model_name: gemini-2.5-pro
litellm_params:
model: vertex_ai/gemini-2.5-pro
vertex_project: os.environ/GOOGLE_CLOUD_PROJECT
vertex_location: us-central1
- model_name: gemini-2.5-flash
litellm_params:
model: vertex_ai/gemini-2.5-flash
vertex_project: os.environ/GOOGLE_CLOUD_PROJECT
vertex_location: us-central1
general_settings:
master_key: os.environ/LITELLM_MASTER_KEY
ui_username: admin
ui_password: os.environ/LITELLM_UI_PASSWORD
Operational notes & recommendations
- Region selection: Vertex AI availability varies by location;
us-central1is generally safest for new Gemini releases. - Key management:
Store
LITELLM_MASTER_KEYand UI credentials in Secret Manager, not environment variables. - Production settings to consider:
num_retries,timeout,async_calls, request logging policies. - Access control: Use Cloud Run’s invoker IAM or an API Gateway layer for stronger borders.
Virtual key issuance
curl -X POST https://<proxy>/key/generate \
-H "Authorization: Bearer <master key>" \
-d '{"models": ["gemini-2.5-pro","gemini-2.5-flash"], "duration":"30d"}'
This key will later be used by the Gemini CLI.
Configuration: Gemini CLI
Point the CLI to LiteLLM Proxy:
export GOOGLE_GEMINI_BASE_URL="https://<LiteLLM Proxy URL>"
export GEMINI_API_KEY="<virtual key>"
Important
GEMINI_API_KEY must be a LiteLLM virtual key, not a Google Cloud API key.
Gemini CLI now behaves as if it were talking to Vertex AI, but traffic flows through LiteLLM.
Testing the End-to-End Path
Once configured, run a simple test through Gemini CLI:
$ gemini hello
Loaded cached credentials.
Hello! I'm ready for your first command.
On the LiteLLM dashboard, you should see request logs, latency, and token usage.
Important Note: Authentication Bypass in Gemini CLI
During testing, I observed situations where:
- Gemini CLI worked normally
- but LiteLLM Proxy showed zero usage
Why it happens
Gemini CLI supports three authentication methods:
- Login with Google
- Use Gemini API Key
- Vertex AI
When a user logs in with Google Login:
- The CLI uses Google OAuth credentials
- These credentials automatically route traffic directly to Vertex AI
-
GOOGLE_GEMINI_BASE_URLis ignored - LiteLLM Proxy is completely bypassed
If OAuth login is left enabled:
- Teams lose visibility of CLI usage
- Costs appear under personal or unintended projects
- Security review cannot track data flowing to Vertex AI
- API limits and budgets set on LiteLLM do not apply
This is the number one issue organizations should be aware of.
Summary
In this article, we walked through how to route Gemini CLI traffic through LiteLLM Proxy and highlighted key lessons from testing.
Benefits
- Unifies API governance across CLI, IDE, and backend services
- Enables per-user quotas, budgets, and access scopes
- Provides analytics across all models and providers
- Gives SRE/PFE teams full visibility into LLM usage patterns
Limitations / Things to Consider
- Gemini CLI’s Google-auth login bypasses proxies unless explicitly disabled
- Cloud Run + Cloud SQL requires connection pooling considerations
- Model list updates must be maintained when Vertex releases new versions
- LiteLLM Enterprise features (SSO, RBAC, audit logging) may be necessary for large orgs
This content originally appeared on DEV Community and was authored by polar3130
polar3130 | Sciencx (2025-11-25T03:31:00+00:00) Using Gemini CLI Through LiteLLM Proxy. Retrieved from https://www.scien.cx/2025/11/25/using-gemini-cli-through-litellm-proxy/
Please log in to upload a file.
There are no updates yet.
Click the Upload button above to add an update.


