This content originally appeared on HackerNoon and was authored by Pair Programming AI Agent
Table of Links
2. Prior conceptualisations of intelligent assistance for programmers
3. A brief overview of large language models for code generation
4. Commercial programming tools that use large language models
5. Reliability, safety, and security implications of code-generating AI models
6. Usability and design studies of AI-assisted programming
7. Experience reports and 7.1. Writing effective prompts is hard
7.2. The activity of programming shifts towards checking and unfamiliar debugging
7.3. These tools are useful for boilerplate and code reuse
8. The inadequacy of existing metaphors for AI-assisted programming
8.2. AI assistance as compilation
8.3. AI assistance as pair programming
8.4. A distinct way of programming
9. Issues with application to end-user programming
9.1. Issue 1: Intent specification, problem decomposition and computational thinking
9.2. Issue 2: Code correctness, quality and (over)confidence
9.3. Issue 3: Code comprehension and maintenance
9.4. Issue 4: Consequences of automation in end-user programming
9.5. Issue 5: No code, and the dilemma of the direct answer
9.2. Issue 2: Code correctness, quality and (over)confidence
The second challenge is in verifying whether the code generated by the model is correct. In GridBook, users were able to see the natural language utterance, synthesized formula and the result of the formula. Of these, participants heavily relied on ‘eyeballing’ the final output as a means of evaluating the correctness of the code, rather than, for example, reading code or testing rigorously.
\ While this lack of rigorous testing by end-user programmers is unsurprising, some users, particularly those with low computer self-efficacy, might overestimate the accuracy of the AI, deepening the overconfidence end-user programmers are known to have in their programs’ accuracy (Panko, 2008). Moreover, end-user programmers might not be able to discern the quality of non-functional aspects of the generated code, such as security, robustness or performance issues.
\
:::info Authors:
(1) Advait Sarkar, Microsoft Research, University of Cambridge (advait@microsoft.com);
(2) Andrew D. Gordon, Microsoft Research, University of Edinburgh (adg@microsoft.com);
(3) Carina Negreanu, Microsoft Research (cnegreanu@microsoft.com);
(4) Christian Poelitz, Microsoft Research (cpoelitz@microsoft.com);
(5) Sruti Srinivasa Ragavan, Microsoft Research (a-srutis@microsoft.com);
(6) Ben Zorn, Microsoft Research (ben.zorn@microsoft.com).
:::
:::info This paper is available on arxiv under CC BY-NC-ND 4.0 DEED license.
:::
\
This content originally appeared on HackerNoon and was authored by Pair Programming AI Agent

Pair Programming AI Agent | Sciencx (2025-08-09T14:30:02+00:00) Beyond The Final Answer: Why Non-Experts Can’t Spot Bad AI Code. Retrieved from https://www.scien.cx/2025/08/09/beyond-the-final-answer-why-non-experts-cant-spot-bad-ai-code/
Please log in to upload a file.
There are no updates yet.
Click the Upload button above to add an update.