This content originally appeared on DEV Community and was authored by JackalGoesPro
Eight. Eight hours of average screentime every day.
That is a third of my whole day gone with the wind every single day.
I'm not proud of it, of course... though now I am much better in my daily time management. (Android said I used my phone 11 hours less than last week. Woo-hoo!) From that experience, I really don't want to go back to that person again, so using the available tools that I have. I want to create something on my phone that nudge me to stop this insidious habit of mine whenever I fall down that pit again.
That’s when this personal project was born, an assistant, or better yet, a digital coach, to help steer my actions whenever I overuse my phone. It should observe what I’m doing on my phone, keep track of my usage, and intervene to remind me of the patterns I’m falling back into again. Also, I’m a huge fan of F.R.I.D.A.Y., Tony’s new assistant in Iron Man: Age of Ultron, so if it’s gonna tell me to stop my Netflix binge as my coach, I’ll probably listen to it, haha.
So that’s basically what my blog is going to be, my journey in developing a mini FRIDAY-like character to un-glue my face from my phone.
Here are some teasers from my app:
This project will be divided into 3 dev logs, each posted in their separate blog.
- Devlog #1: Building a proof-o'-concept.
- Devlog #2: On building an A.I agent that effectively reads, uses, and responds to your phone's rich context.
- Devlog #3: On building tools to prevent bad habits for the A.I agent and polishing the "screentime coach" personality feel
Devlog #1 - Building a proof-o'-concept.
Section 1 - Planning is the Easy Part
So “A Screen Time AI Coach” app, huh? That’s fairly new and original. As mentioned above, now I’d like to go into detail about what my A.I should do.
- First, it needs to be able to understand my phone’s context state.
- Second, it needs to have ways to communicate with the user.
- Third, it needs to feel like FRIDAY — or at least something that can nudge me away from my hours-long movie rampage (heh...).
Okay, sounds easy. I already have some good solutions in mind for these features.
- To show the A.I my phone background, a simple interval-based screenshotting system + an image-to-text summarizer should do it. My thought process is: since I’m gonna use an LLM, which is much better with text than pictures, a text summarizer is in order.
- To have it communicate with the user, I really dig a classic RPG-style pop-over dialogue system, so I’m going with that.
- With some clever prompt engineering for the LLM, making it act like FRIDAY is probably no problem. There are documents of many people who’ve done this before, so I think I won’t get stuck if some problems arise. Also, let’s just use GPT-4.1-mini for the summarizer and GPT-4.1 for FRIDAY.
Alrighty! Seems like we have a pretty solid layout system. Here is a diagram of step-by-step internal logic:
When I started this project, I really thought, “Hah! I might get this done within a week.” I mean, it’s just an Android app that sends screenshots to the A.I need to comment on and track time usage, right? Plus, with the power of GitHub Copilot and Claude 4.5 helping me code at blazing speed, it really did seem possible. Looking back... how wrong I was, haha!
Me back then:
Section 2 - The Android App Reality Check
I have to confess. I’m an experienced full-stack web developer. I am NOT an Android app developer, NOR have I ever touched Android Studio (the SDK), nor used the OpenAI API.
I thought web dev and Android dev were similar enough that I could easily transfer my skills. I mean, how different is a web app vs an Android app really? Both are apps, right? It’s like apples and oranges, I thought.
The answer? Surprisingly difficult!
I’m not going into too much detail because I want this blog to revolve more around the LLM system, but here are some notable surprises I encountered for tech-savvy people:
“You want to take some screenshots of your phone?” said Android. “It’s simple, just run the code
Android.Tools.captureScree-
" "NO! REGISTER a WHOLE internal phone screen capturing system and ASK for user permission EVERY TIME you want to do that...” calmly explains Android....Uh. I want this app in the foreground so Android won’t kill it to save on resources. That too took a considerable amount of time to learn.
The overlay dialogue system also requires additional permission from the user (duh. You think Android is gonna let you display anything over the user’s screen willy-nilly?)
Anyhow, here is the app’s internal component diagram:
You might notice I blurred some of the components. Don’t worry, they’re revealed in the next section.
Section 3 - Teaching the AI to Understand Context
Now for the part I want to center the blog around. Before we discuss further, we need to know a little about the OpenAI API (don’t worry, it’ll be quick and it’s necessary to understand the jargon I encountered in my project).
TL;DR: Instead of the natural conversation-like flow on the ChatGPT website, where you just send messages and GPT answers, the API is much more barebones. Every time you send a request to the API, you have to send the WHOLE conversation. Think of the GPT API as a person with no memory whatsoever, and every time you need it to respond, you have to remind it of your entire conversation again. Again and again. Every. Single. Request.
ChatGPT be like on every request:
Got it? Cool.
Other than that, you give it a system prompt (basically a guideline for the GPT), and the LLM will try to follow it.
So each request body to the chatbot API needs to have:
The system prompt
The screenshot summary
The entire chat history with user and assistant messages in order
Alright? Simple. Plus, after a few days of coding, we have the following:
Cool, right? Message me if you’re curious about what the system prompt looks like 😄.
That’s our proof of concept, yeah? Great, now see you in the next devlo-
Oh wait, what is this?
Uh oh. The total token count is rising rapidly
My token count is getting quite big, huh? (A token is basically a unit for characters for the LLM — the longer the chat history in the request, the more tokens the LLM needs to consume.)
Now, my initial token count is 3000, and it’s steadily increasing by 300 with each screenshot loop, accumulating with every loop. So 3000 (loop 1) + 3300 (loop 2) + 3600 (loop 3) + ... Did I mention my app takes 3 screenshots every 18 seconds? So, 18 seconds per loop, 200 loops an hour — that’s 6,570,000 tokens an hour, 157,680,000 tokens a day. What’s GPT-4.1’s token price again? 3 dollars per 1 million tokens?
473 bloody America doll hairs PER DAY. That is my monthly rent! What the bollock! Sigh... These things are never easy...
Problem numero uno: How to manage token economy
To be honest, this isn’t all that bad. Of course, I need to optimize for the LLM context window.
There are many ways, to be honest:
- Truncating unnecessary data in the chat history
- Summarizing past chat history (long-term self-condensing context)
- A better screenshot strategy (don’t screenshot when the user is AFK, for example)
All are valid approaches to ultimately reduce the request size, but I just wanted to continue prototyping, so the low-hanging fruit I could do fast was to implement an internal memory.
In short, every time the LLM responds, have it revise the chat history, find some notable events or information about the user, and save it to be sent again. Also, for every request, only take a certain number of messages from the chat history (let’s say the 20 most recent messages).
I know it’s not perfect. There are many more details I don’t list here for the sake of the devlog’s length.
It does the job of preventing the chat history from endlessly expanding out of bounds.
And now the maximum token count varies between 8500-9500.
Each screenshot loop's token is not getting over 9000. Great!
So that brings us to the previously blurred-out components:
ChatManager which is the chatbot that can create memories and only takes 20 of the most recent chat messages.
Problem numero dos: Talking to a Black Box
Great! Token economy in the short future is no longer an issue.
Before, I told you about how my app by default takes 3 screenshots every 18 seconds, and not all of the phone summaries will be that exciting.
Take myself, for example: if you were the A.I, you’d see a timeline like this:
OpenAIAnalyzer’s output:
- At 10:00 AM, user wakes up and opens their phone screen, swiping left and right to display various apps.
- At 10:01 AM, user opens YouTube. YouTube home UI includes videos like “How to train your dove to fly to you,” “Why eating 1000 bananas every day is bad for you,” “How To Quickly Stop a Crying Baby” by HowToBasic, etc...
- At 10:02 AM, user opens a YouTube Short, it’s a clip of Iron Man’s final moment in Avengers: Endgame. (Oops, spoiler.)
- At 10:03 AM, user is viewing a YouTube Short clip about how Doakes is the Bay Harbor Butcher.
- At 10:04 AM, user is viewing another YouTube Short clip about...
- ...
- At 06:00 PM, user exits the YouTube app and locks the phone.
Now, should the A.I comment on every single YouTube Short clip over the span of 8 hours?
Just imagine FRIDAY saying this every couple of seconds, repeated for 8 hours
Annoying right? Ideally, at first, it should comment on something funny on the first short or two, but afterward, the A.I should recognize my YouTube short addiction pattern, and after an hour, have a conversation to intervene.
So, how do you make the A.I recognize certain repeating patterns based on memories, timeline, and chat history over a long period of time? Well, that’s for devlog#2 since this devlog is already long enough. But I wanted at least to have the A.I not re-comment on repeating or unimportant stuff (like your home screen or a blank screen).
The fix is actually simple but takes a bit of trial and error to tweak it right.
Just include the following in the system prompt:
[RULES — RESPONSE BEHAVIOR]
* Avoid redundancy.
* Do NOT comment on trivial or static events (e.g., home screen, idle phone, blank screen).
* If the current observation is essentially identical to a very recent one, ignore it completely — do not mention, repeat, or comment on it.
* You only respond when the situation warrants your insight or when a new, relevant observation occurs.
Now the A.I won't give a response if I am to AFK in the response log for a few minutes.
FRIDAY isn't responding if I afk on its chat screen
Last message was at 9:31 and now it is 9:34 so FRIDAY hasn't been rambling for 3 minutes about a mundane chat history screen
Section 4 - Proof-of-Concept Demo
Ah, finally! Through all that, let’s check out our end product, shall we?
- FRIDAY watching me watching CinemaWins watching Iron Man 2:
- FRIDAY warning me about my chronic short content addiction:
Here we have a proof o’ concept that can:
- A: Understand our phone context via screenshot capture, although the summaries definitely need to be more robust in the future if we want to capture all information and nuances
- B: Communicate with the user though an overlay dialogue system.
- C: Talk like FRIDAY, comment on my phone screen content and warning me not watching any short content.
The past few weeks were quite a fun journey (and an exhausting one too) for me.
Was there a better idea to solve my YouTube addiction than trying to create an A.I app? Most likely. An A.I that can read your phone screen sounds like a big privacy issue and a bit overkill for such a task - but I wanted to learn about A.I and prompt engineering while doing something exciting in my free time, sooo... no regrets, haha!
But to be honest, I’ll try to evolve this project into something more practical in the future — something I would actually use. Got a lot of ideas but little time to implement. Sigh... Oh well, I’ll probably find ways to build this project faster.
P.S. GitHub Copilot and Claude Sonnet 4.5 are pretty good, but sometimes there are issues they just can’t get past.
At the time of writing this devlog, I was trying to record some video demos for it, but the issue is: I can’t take screenshots and record video at the same time due to Android only allowing one media projection activity at once - and Claude didn’t help by taking half a day to write code that failed spectacularly.
Make sure to this amateur A.I Engineer what he did wrong in the comment.
That’s all. Thank you for reading. See you in the next devlog, where I’ll improve the A.I’s ability to read my phone context. ✌
— Jackal
This content originally appeared on DEV Community and was authored by JackalGoesPro

JackalGoesPro | Sciencx (2025-10-21T09:02:57+00:00) I Built FRIDAY on My Phone to Stop Me From Scrolling. Devlog #1. Retrieved from https://www.scien.cx/2025/10/21/i-built-friday-on-my-phone-to-stop-me-from-scrolling-devlog-1-2/
Please log in to upload a file.
There are no updates yet.
Click the Upload button above to add an update.