This content originally appeared on DEV Community and was authored by Alexander Uspenskiy
I've inspected the latest response from Mistral: Mistral-Small-24B-Instruct. It is bigger, slower than deepseek-ai/deepseek-r1-distill-qwen-7b but it also showing how it is thinking and doesn't send your sensitive data to China soil :)
So let's start.
This project provides an interactive chat interface for the mistralai/Mistral-Small-24B-Instruct-2501 model using PyTorch and the Hugging Face Transformers library.
Requirements
- Python 3.8+
- PyTorch
- Transformers
- An Apple Silicon device (optional, for MPS support)
Setup
Clone the repository:
git clone https://github.com/alexander-uspenskiy/mistral.git
cd mistral
Create and activate a virtual environment:
python -m venv venv
source venv/bin/activate # On Windows use `venv\Scripts\activate`
Install the required packages:
pip install torch transformers
Set your Hugging Face Hub token:
export HUGGINGFACE_HUB_TOKEN=your_token_here
Usage
Run the chat interface:
python mistral.py
Features
- Interactive chat interface with the Mistral-Small-24B-Base-2501 model.
- Progress indicator while generating responses.
- Supports Apple Silicon GPU (MPS) for faster inference.
Code:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
import os
import time
import threading
# Check if MPS (Apple Silicon GPU) is available
device = torch.device("mps" if torch.backends.mps.is_available() else "cpu")
# Load the Mistral-Small-24B-Base-2501 model
model_name = "mistralai/Mistral-Small-24B-Instruct-2501"
token = os.getenv("HUGGINGFACE_HUB_TOKEN")
tokenizer = AutoTokenizer.from_pretrained(model_name, token=token)
model = AutoModelForCausalLM.from_pretrained(
model_name,
device_map={"": device},
torch_dtype=torch.float16, # Optimized for M1 GPU
token=token
)
def show_progress():
while not stop_event.is_set():
for char in "|/-\\":
print(f"\rGenerating response {char}", end="", flush=True)
time.sleep(0.1)
# Interactive terminal loop
print("mistralai/Mistral-Small-24B-Instruct-2501 Chat Interface (type 'exit' to quit)")
interaction_count = 0 # Initialize counter
while True:
user_input = input("You: ")
if user_input.lower() in ["exit", "quit"]:
break
inputs = tokenizer(user_input, return_tensors="pt").to(device)
stop_event = threading.Event()
progress_thread = threading.Thread(target=show_progress)
progress_thread.start()
with torch.no_grad():
outputs = model.generate(
**inputs,
max_length=200,
do_sample=True,
temperature=0.7,
top_p=0.9,
pad_token_id=tokenizer.eos_token_id # Added to avoid warning
)
stop_event.set()
progress_thread.join()
print("\r", end="") # Clear the progress line
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(f"Mistral: {response}\n")
Example:
You: The bat and the ball costs $1.10 in total, bat is expensive than a ball costs $1.00 more than a ball, How much the ball costs
Mistral: The bat and the ball costs $1.10 in total, bat is expensive than a ball costs $1.00 more than a ball, How much the ball costs?
Let's denote the cost of the ball as \( B \) and the cost of the bat as \( B + 1.00 \).
According to the problem, the total cost of the bat and the ball is $1.10. Therefore, we can write the equation:
\[ B + (B + 1.00) = 1.10 \]
Simplifying the equation:
\[ 2B + 1.00 = 1.10 \]
Subtract 1.00 from both sides:
\[ 2B = 0.10 \]
Divide both sides by 2:
\[ B = 0.05 \]
So, the ball costs $0.05.
Summary:
So as you see modern models are able to run locally and solve logical tasks with the excellent performance.
Happy coding!
This content originally appeared on DEV Community and was authored by Alexander Uspenskiy
Alexander Uspenskiy | Sciencx (2025-02-04T00:56:17+00:00) Mistral’s ‘Small’ 24B Parameter Model Blows Minds—No Data Sent to China, Just Pure AI Power!. Retrieved from https://www.scien.cx/2025/02/04/mistrals-small-24b-parameter-model-blows-minds-no-data-sent-to-china-just-pure-ai-power/
Please log in to upload a file.
There are no updates yet.
Click the Upload button above to add an update.