TnT-LLM: LLMs for Automated Text Taxonomy and Classification

After the taxonomy is finalized, we next train a text classifier that can be reliably deployed to perform label assignments at very large-scale and in real-time. Following recent work that shows the strengths of LLMs as annotators of training data [8, 15], we propose to leverage LLMs to obtain a “pseudo-labeled” corpus set

\ Figure 3: An illustration of the LLM-augmented text classification phase (Phase 2).

\ using the taxonomy yielded in Phase 1, then use these labels to train more efficient classifiers at scale. Specifically, we prompt an LLM to infer the primary label (as a multiclass classification task) and all applicable labels (as a multilabel classification task) on a “medium-to-large” scale corpus sample that covers the range of labels in the taxonomy, creating a representative training dataset that can be used to build a lightweight classifier, such as a Logistic Regression model or a Multilayer Perceptron classifier. In this way, we can induce “pseudo labels” from the LLM classifier and transfer its knowledge to a more efficient and manageable model that can be deployed and served at scale. An illustrative figure of this phase is presented in Figure 3.

:::info This paper is available on arxiv under CC BY-NC-ND 4.0 DEED license.

:::

:::info Authors:

(1) Mengting Wan, Microsoft Corporation and Microsoft Corporation;

(2) Tara Safavi (Corresponding authors), Microsoft Corporation;

(3) Sujay Kumar Jauhar, Microsoft Corporation;

(4) Yujin Kim, Microsoft Corporation;

(5) Scott Counts, Microsoft Corporation;

(6) Jennifer Neville, Microsoft Corporation;

(7) Siddharth Suri, Microsoft Corporation;

(8) Chirag Shah, University of Washington and Work done while working at Microsoft;

(9) Ryen W. White, Microsoft Corporation;

(10) Longqi Yang, Microsoft Corporation;

(11) Reid Andersen, Microsoft Corporation;

(12) Georg Buscher, Microsoft Corporation;

(13) Dhruv Joshi, Microsoft Corporation;

(14) Nagu Rangan, Microsoft Corporation.

:::

This content originally appeared on HackerNoon and was authored by Language Models (dot tech)

Print Share Comment Cite Upload Translate Updates

APA

Language Models (dot tech) | Sciencx (2025-04-18T02:11:01+00:00) TnT-LLM: LLMs for Automated Text Taxonomy and Classification. Retrieved from https://www.scien.cx/2025/04/18/tnt-llm-llms-for-automated-text-taxonomy-and-classification/

MLA

" » TnT-LLM: LLMs for Automated Text Taxonomy and Classification." Language Models (dot tech) | Sciencx - Friday April 18, 2025, https://www.scien.cx/2025/04/18/tnt-llm-llms-for-automated-text-taxonomy-and-classification/

HARVARD

Language Models (dot tech) | Sciencx Friday April 18, 2025 » TnT-LLM: LLMs for Automated Text Taxonomy and Classification., viewed ,<https://www.scien.cx/2025/04/18/tnt-llm-llms-for-automated-text-taxonomy-and-classification/>

VANCOUVER

Language Models (dot tech) | Sciencx - » TnT-LLM: LLMs for Automated Text Taxonomy and Classification. [Internet]. [Accessed ]. Available from: https://www.scien.cx/2025/04/18/tnt-llm-llms-for-automated-text-taxonomy-and-classification/

CHICAGO

" » TnT-LLM: LLMs for Automated Text Taxonomy and Classification." Language Models (dot tech) | Sciencx - Accessed . https://www.scien.cx/2025/04/18/tnt-llm-llms-for-automated-text-taxonomy-and-classification/

IEEE

" » TnT-LLM: LLMs for Automated Text Taxonomy and Classification." Language Models (dot tech) | Sciencx [Online]. Available: https://www.scien.cx/2025/04/18/tnt-llm-llms-for-automated-text-taxonomy-and-classification/. [Accessed: ]

rf:citation

» TnT-LLM: LLMs for Automated Text Taxonomy and Classification | Language Models (dot tech) | Sciencx | https://www.scien.cx/2025/04/18/tnt-llm-llms-for-automated-text-taxonomy-and-classification/ |

Please log in to upload a file.

There are no updates yet.
Click the Upload button above to add an update.

You must be logged in to translate posts. Please log in or register.

Table of Links

3.2 Phase 2: LLM-Augmented Text Classification

Related Posts