deep-learning |

How Griffin’s Local Attention Window Beats Global Transformers at Their Own Game

A Friendly Introduction To Model Pruning With PyTorch

torch.compile — The Missing Manual

Quantization-Aware Training With PyTorch

Zero‑Reboot GPU Power: CUDA 12 on WSL 2 in 30 Minutes

PyTorch — A Comprehensive Performance Tuning Guide

Converting Unstructured Data into a Knowledge Graph Using an End-to-End Pipeline

Fraud Detection Using Artificial Intelligence and Machine Learning

Only the Beginning Matters: How the LLM Decides Where to Focus Attention

Training Deep-Learning Models At Ultra-Scale Using PyTorch

Stepping into the Future of 3D Vision with VGGT

Scaling Up: A Beginner’s Guide to Multi-Node Distributed Training in PyTorch

Mastering GPU Memory Management With PyTorch and CUDA

Linear Attention and Long Context Models

State Space Models vs RNNs: The Evolution of Sequence Modeling

How AI Chooses What Information Matters Most

The HackerNoon Newsletter: Why The Hell is Observability So Darn Expensive!? (3/14/2025)

Unleashing the Beast: Building a Production-Grade, Real-Time Anomaly Detection Pipeline for…

Unlock the Hidden Secrets of Machine Learning: A 10-Year Expert’s Journey from Theory to Code That…

Mastering time series analysis with python

Artificial Intelligence Is Not What You Think It Is

The Math Behind nn.BCELoss()

FAISS & RAG: The Dynamic Duo of Knowledge-Powered AI

How To Optimize Memory Usage For Training LLMs In PyTorch

Revolutionizing Deep Learning: Advanced Knowledge Distillation for Optimized Teacher-Student Model…

The Chinese Software Industry is Shifting From the Dinosaur Model to the Monkey-Troop Model

Transformer-Squared: Stop Finetuning LLMs

How To Train Your PyTorch Models (Much) Faster

Transformers for Long-Term Time Series Forecasting

A New Approach to Attention — Differential Transformers | Paper Walkthrough and PyTorch…

Fine-Tuning of DeepSeek LLM for Text Classification and Sentiment Analysis: Techniques, Code…

Text Classification in the era of Transformers

AI Is Now Creating Antidotes for Snake Venom

Creating Human Faces from Scratch: A Hands-On Guide to GANs

Icon Detection for Test Automation: A Deep Learning Playbook

RNNs vs. Transformers: Innovations in Scalability and Efficiency

Hawk and Griffin: Mastering Long-Context Extrapolation in AI

Griffin Model: Advancing Copying and Retrieval in AI Tasks

Hawk and Griffin Models: Superior Latency and Throughput in AI Inference

Recurrent Models: Enhancing Latency and Throughput Efficiency

Recurrent Models: Decoding Faster with Lower Latency and Higher Throughput

Training speed on longer sequences

Efficient linear recurrences on device

Efficient Training: Scaling Griffin Models for Large-Scale AI on TPUs

Hawk and Griffin Models: Superior NLP Performance with Minimal Training Data

Griffin Models: Outperforming Transformers with Scalable AI Innovation

Recurrent Models Scale as Efficiently as Transformers

RG-LRU: A Breakthrough Recurrent Layer Redefining NLP Model Efficiency

RNN Models Hawk and Griffin: Transforming NLP Efficiency and Scaling

AI Voice Conversion: Recreate Any Speaker’s Voice with VITS

Implement Transformers (Bidirectional) from Scratch in Pytorch for Sequence Classification

Embeddings for RAG – A Complete Overview

Let’s Build our own GPT Model from Scratch with PyTorch

Why Advanced LLMs, Such as GPT-4 or Claude, Fail in Critical Use Cases Despite Large Training Data