This content originally appeared on Stefan Judis Web Development and was authored by Stefan Judis
How do you evaluate your software's doing what it's supposed to do?
Do you test all your app's possible cases, branches and states? I don't, at least not manually. Nobody aint time to manually click through all the edge cases. QA'ing a simple login form takes time, let alone testing complex applications.
Having robots do that helps a ton, and I recommend writing automated tests to help you sleep well at night (and release fewer bugs)!
Ignoring the burden of writing and maintaining tests, testing a "normal" web application is straightforward because it's predictable. Throw something at your app and expect a result. It should always do the same. Most apps are CRUD apps anyway — easy peasy.
But what if there are unpredictable parts in your app's core?
If you're riding the AI buzzword wave, you probably implemented an "I know everything" smart-ass right in your app's core that's known for lying and spreading fake news. (Yes, I mean some sort of LLM.)
How would you test your app's quality if you're building software on top of software you probably don't understand?
Here's Hamel Husain's recommendation:
There are three levels of evaluation to consider:
- Level 1: Unit Tests
- Level 2: Model & Human Eval (this includes debugging)
- Level 3: A/B testing
I'm not planning to get into serious AI work or LLM programming anytime soon, but unit testing software sitting on top of LLMs is fascinating and worth more than a bookmark!
Reply to Stefan
This content originally appeared on Stefan Judis Web Development and was authored by Stefan Judis

Stefan Judis | Sciencx (2024-03-31T22:00:00+00:00) Unit testing AI apps (#note). Retrieved from https://www.scien.cx/2024/03/31/unit-testing-ai-apps-note/
Please log in to upload a file.
There are no updates yet.
Click the Upload button above to add an update.