Exploiting Vision-LLM Vulnerability: Enhancing Typographic Attacks with Instructional Directives

This article proposes a linguistic augmentation scheme for typographic attacks using explicit instructional directives.


This content originally appeared on HackerNoon and was authored by Text Generation

Abstract and 1. Introduction

  1. Related Work

    2.1 Vision-LLMs

    2.2 Transferable Adversarial Attacks

  2. Preliminaries

    3.1 Revisiting Auto-Regressive Vision-LLMs

    3.2 Typographic Attacks in Vision-LLMs-based AD Systems

  3. Methodology

    4.1 Auto-Generation of Typographic Attack

    4.2 Augmentations of Typographic Attack

    4.3 Realizations of Typographic Attacks

  4. Experiments

  5. Conclusion and References

4.2 Augmentations of Typographic Attack

Inspired by the success of instruction-prompting methodologies [37, 38], the greedy reasoning in LLMs [39], and to further exploit the ambiguity between textual and visual tokens in Vision-LLMs, we propose to augment the typographic attacks prompts within images by explicitly providing instruction keywords that emphasize text-to-text alignment over that of visual-language tokens. Our approach realizes the concept in the form of instructional directives: ❶ command directives for emphasizing a false answer and ❷ conjunction directives to additionally include attack clauses. In particular, we have developed,

\ • Command Directive. By embedding commands with the attacks, we aim to prompt the VisionLLMs into greedily producing erroneous answers. Our work investigates the "ANSWER:" directive as a prefix before the first attack prompt.

\ • Conjunction Directive. Conjunctions, connectors (or the lack thereof) act to link together separate attack concepts that make the overall text appear more coherent, thereby increasing the likelihood of multi-task success. In our work, we investigate these directives as "AND," "OR," "WITH," or simply empty spaces as prefixes between attack prompts.

\ While other forms of directives can also be useful for enhancing the attack success rate, we focus on investigating basic directives related to typographic attacks in this work.

\

:::info Authors:

(1) Nhat Chung, CFAR and IHPC, A*STAR, Singapore and VNU-HCM, Vietnam;

(2) Sensen Gao, CFAR and IHPC, A*STAR, Singapore and Nankai University, China;

(3) Tuan-Anh Vu, CFAR and IHPC, A*STAR, Singapore and HKUST, HKSAR;

(4) Jie Zhang, Nanyang Technological University, Singapore;

(5) Aishan Liu, Beihang University, China;

(6) Yun Lin, Shanghai Jiao Tong University, China;

(7) Jin Song Dong, National University of Singapore, Singapore;

(8) Qing Guo, CFAR and IHPC, A*STAR, Singapore and National University of Singapore, Singapore.

:::


:::info This paper is available on arxiv under CC BY 4.0 DEED license.

:::

\


This content originally appeared on HackerNoon and was authored by Text Generation


Print Share Comment Cite Upload Translate Updates
APA

Text Generation | Sciencx (2025-09-30T19:30:07+00:00) Exploiting Vision-LLM Vulnerability: Enhancing Typographic Attacks with Instructional Directives. Retrieved from https://www.scien.cx/2025/09/30/exploiting-vision-llm-vulnerability-enhancing-typographic-attacks-with-instructional-directives/

MLA
" » Exploiting Vision-LLM Vulnerability: Enhancing Typographic Attacks with Instructional Directives." Text Generation | Sciencx - Tuesday September 30, 2025, https://www.scien.cx/2025/09/30/exploiting-vision-llm-vulnerability-enhancing-typographic-attacks-with-instructional-directives/
HARVARD
Text Generation | Sciencx Tuesday September 30, 2025 » Exploiting Vision-LLM Vulnerability: Enhancing Typographic Attacks with Instructional Directives., viewed ,<https://www.scien.cx/2025/09/30/exploiting-vision-llm-vulnerability-enhancing-typographic-attacks-with-instructional-directives/>
VANCOUVER
Text Generation | Sciencx - » Exploiting Vision-LLM Vulnerability: Enhancing Typographic Attacks with Instructional Directives. [Internet]. [Accessed ]. Available from: https://www.scien.cx/2025/09/30/exploiting-vision-llm-vulnerability-enhancing-typographic-attacks-with-instructional-directives/
CHICAGO
" » Exploiting Vision-LLM Vulnerability: Enhancing Typographic Attacks with Instructional Directives." Text Generation | Sciencx - Accessed . https://www.scien.cx/2025/09/30/exploiting-vision-llm-vulnerability-enhancing-typographic-attacks-with-instructional-directives/
IEEE
" » Exploiting Vision-LLM Vulnerability: Enhancing Typographic Attacks with Instructional Directives." Text Generation | Sciencx [Online]. Available: https://www.scien.cx/2025/09/30/exploiting-vision-llm-vulnerability-enhancing-typographic-attacks-with-instructional-directives/. [Accessed: ]
rf:citation
» Exploiting Vision-LLM Vulnerability: Enhancing Typographic Attacks with Instructional Directives | Text Generation | Sciencx | https://www.scien.cx/2025/09/30/exploiting-vision-llm-vulnerability-enhancing-typographic-attacks-with-instructional-directives/ |

Please log in to upload a file.




There are no updates yet.
Click the Upload button above to add an update.

You must be logged in to translate posts. Please log in or register.