This content originally appeared on DEV Community and was authored by Chris
Hey devs 👋
I'm working solo on a project called Azura and I’d love blunt technical + product feedback before I go too deep.
TL;DR
- Local-first personal AI assistant (Windows / macOS / Linux)
- Runs 7B-class models locally on your own machine
- Optional cloud inference with 70B+ models (potentially up to ~120B if I can get a GPU cluster cheap enough)
- Cloud only sees temporary context for a given query, then it’s gone
- Goal: let AI work with highly personalized data while keeping your data on-device and making AI compute more sustainable by offloading work to the user’s hardware
Think of it as Signal, but for AI:
- private by default
- transparent about what leaves your device
- and actually usable as a daily “second brain”.
Problem I’m trying to solve
Most AI tools today:
- ship all your prompts and files to a remote server
- keep embeddings / logs indefinitely
- centralize all compute in big datacenters
That’s bad if you want to:
- use AI on sensitive data (legal docs, internal company info, personal notes)
- build a long-term memory of your life and work
- not rely 100% on someone else’s infrastructure for every tiny inference
On top of that, current AI usage is very cloud-heavy. Every small task hits a GPU in a datacenter, even when a smaller local model would be good enough.
Azura’s goal:
Let AI work deeply with your personal data while keeping that data on your device by default, and offload as much work as possible to the user’s hardware to make AI more sustainable.
Core concept
Azura has two main execution paths:
-
Local path (default)
- Desktop app (Win / macOS / Linux)
- Local backend (Rust / llama.cpp / vector DB)
- Uses a 7B model running on your machine
- Good for:
- day-to-day chat
- note-taking / journaling
- searching your own docs/files
- “second brain” queries that don’t need super high IQ
-
Cloud inference path (optional)
- When a query is too complex / heavy for the local 7B:
- Azura builds a minimal context (chunks of docs, metadata, etc.)
- Sends that context + query to a 70B+ model in the cloud (ideally up to ~120B later)
-
Data handling:
- Files / context are used only temporarily for that request
- Held in memory or short-lived storage just long enough to run the inference
- Then discarded – no long-term cloud memory of your life
- When a query is too complex / heavy for the local 7B:
Context engine (high-level idea)
On top of “just call an LLM”, I’m experimenting with a structured context engine:
-
Ingests:
- files, PDFs, notes, images
-
Stores:
- embeddings + metadata (timestamps, tags, entities, locations)
-
Builds:
- a lightweight relationship graph (people, projects, events, topics)
-
Answers:
- “What did I do for project A in March?”
- “Show me everything related to ‘Company A’ and ‘pricing’.”
- “What did I wear at the gala in Tokyo?” (from images ingested)
Standard RAG is part of this, but the goal is an ongoing personal knowledge base that the LLM can query, not just a vector search API.
All of this long-term data lives on-device.
Sustainability angle (important to me)
Part of the vision is:
- Don’t hit a giant GPU cluster for every small query.
- Let the user’s device handle as much as possible (7B locally).
- Use big cloud models only when they actually add value.
Over time, I’d like Azura to feel like a hybrid compute layer:
- Local where possible,
- Cloud only for heavy stuff,
- Always explicit and transparent.
- And most of all, PRIVATE.
What I’d love feedback on
-
Architecture sanity
- Does the “local-first + direct cloud inference” setup look sane?
- Any better patterns you’ve used for mixing on-device models with cloud models?
-
Security + privacy model
- For ephemeral cloud context: what would you need to see (docs / guarantees / telemetry) to trust this?
- Anything obvious I’m missing around temporary file handling?
-
Sustainability / cost
- As engineers: do you care about offloading compute to end-user devices vs fully cloud?
- Any horror stories optimizing 7B vs 70B usage that I should know about?
-
Would you actually use this?
- If you’re into self-hosting / local LLMs:
- What’s missing for this to replace “Ollama + notebook + random SaaS” for you?
- If you’re into self-hosting / local LLMs:
Next steps
Right now I’m:
- Testing 7B models on typical consumer hardware
- Designing the first version of the context engine and schema
If this resonates, I’d really appreciate:
- Architecture critiques
- “This will break because X” comments
- Ideas for must-have features for a real, daily-use personal AI
Thanks for reading 🙏
Happy to dive into any part in more detail if you’re curious.
This content originally appeared on DEV Community and was authored by Chris
Chris | Sciencx (2025-11-14T12:04:16+00:00) Azura: local-first personal assistant (feedback wanted). Retrieved from https://www.scien.cx/2025/11/14/azura-local-first-personal-assistant-feedback-wanted/
Please log in to upload a file.
There are no updates yet.
Click the Upload button above to add an update.