Models – Oversimplified

This content originally appeared on DEV Community and was authored by Mindy Zwanziger

Generally writing this for my own benefit as I'm diving into Natural Language Processing (NLP) for a current project. In the era of AI, folks throw around the term "model" and my mind (even as a certified math person™) replaces that with <vague mathy, computer-sciencey magic thingamajig>.

But I wanted to understand it a little more and did a little digging. My current understanding can be narrowed down to:

a set of test data
a set of features (things about the data - like "is capitalized")
a set of weights (numbers between 0 to 1) for each feature
a loop where the program makes a guess, changes the weights, and tries again - millions of times until it gets it right enough that it's worthwhile to keep around

Concretely, if you were implementing NLP you might have categories that define a word as being a person, an organization, or a location.

So you'd get some basic features like the below:

word_features = {
    "is_capitalized": true,
    "previous_word": "new",
    "next_word": "announced",
    "is_followed_by_Inc": true,
}

And you might start off with random weights and then, through the loops (this is the "training" part of creating a model), it'd eventually get to something like this:

weights = {
    "is_capitalized": {
        "ORG": 0.8,    // High, most organizations are capitalized
        "PERSON": 0.7,  // ...same for person names
        "LOC": 0.6     // Somewhat high for locations - as some are capitalized and some aren't like "school" vs "Fred Meyer"
    },
    "previous_word": {
        "new": { 
            "ORG": 0.5, // etc... for the rest of the categories and features
       },
    },
}

Then, of course, there's some probability mathy mathness in there that looks at all the weights across all the features and decides which category is most probable.

Makes me think of those personality tests I took in middle school: "if you answered mostly C, you're a sporty tortoise"! Though I suspect it's more complicated than that.

This content originally appeared on DEV Community and was authored by Mindy Zwanziger

Print Share Comment Cite Upload Translate Updates

APA

Mindy Zwanziger | Sciencx (2025-02-10T23:50:57+00:00) Models – Oversimplified. Retrieved from https://www.scien.cx/2025/02/10/models-oversimplified/

MLA

" » Models – Oversimplified." Mindy Zwanziger | Sciencx - Monday February 10, 2025, https://www.scien.cx/2025/02/10/models-oversimplified/

HARVARD

Mindy Zwanziger | Sciencx Monday February 10, 2025 » Models – Oversimplified., viewed ,<https://www.scien.cx/2025/02/10/models-oversimplified/>

VANCOUVER

Mindy Zwanziger | Sciencx - » Models – Oversimplified. [Internet]. [Accessed ]. Available from: https://www.scien.cx/2025/02/10/models-oversimplified/

CHICAGO

" » Models – Oversimplified." Mindy Zwanziger | Sciencx - Accessed . https://www.scien.cx/2025/02/10/models-oversimplified/

IEEE

" » Models – Oversimplified." Mindy Zwanziger | Sciencx [Online]. Available: https://www.scien.cx/2025/02/10/models-oversimplified/. [Accessed: ]

rf:citation

» Models – Oversimplified | Mindy Zwanziger | Sciencx | https://www.scien.cx/2025/02/10/models-oversimplified/ |

Please log in to upload a file.

There are no updates yet.
Click the Upload button above to add an update.

You must be logged in to translate posts. Please log in or register.

Related Posts