# Day 6 · Vector anisotropy and collapse (No.5, No.6)

This content originally appeared on DEV Community and was authored by PSBigBig

Embeddings bunch into a skinny cone, neighbors look the same for every query, recall craters. This is the classic space collapse that hides real semantics.

Problem types

No.5 Semantic ≠ Embedding
No.6 Logic Collapse and Recovery

What it looks like in practice

Cosine similarity is high for almost everything, top-k lists barely change across different queries
Near neighbors are dominated by boilerplate or global terms
Recall\@k on a held out set drops after re-ingest or model swap
IVF or HNSW shows busy lists with poor separation

60 second quick test

Sample about 5k vectors from your store
Compute per dimension variance and PCA explained variance ratio up to 50 components
Plot PC1 and PC2, then check the cone

Rules of thumb

Red flag if PC1 explained variance is above 0.70 or if the per dimension variance has Gini above 0.6
Also bad if the median cosine to the centroid is above 0.55

Common root causes

Model change without re whitening
Mixed normalization states, some vectors L2 normalized and some not
Truncation or unicode normalization bugs in the text pipeline
Over aggressive stopword removal or duplicate boilerplate in documents
FAISS metric set to inner product while vectors were already L2 normalized for cosine
Shard trained on one domain, then queried with another

Minimal fix

Goal is to keep geometry isotropic enough for cosine to carry meaning.

Mean center then L2 normalize all vectors again
Whiten with a small rank

Fit PCA on a random subset around 50k, pick r so that cumulative EVR sits between 0.90 and 0.98
Transform, then L2 normalize again
1. Rebuild the index with a metric that matches the vector state
For cosine use an L2 index with normalized vectors
For inner product use IP and avoid double normalization
1. Trash and re ingest any mixed state shards. Do not patch in place

You usually see recall recover right away after steps 1 and 3.

Harder fixes if the minimal path is not enough

Domain de duplication and boilerplate masking before embedding. Keep per doc tf idf masks or a learned salience to damp headers, nav, legal text
Subspace drop. If PC1 to PCk are topic or style axes, drop the k where EVR spikes, then renormalize
Temperature up sampling for rare intents so neighbors reflect intent not frequency
Metric sanity. Cosine with normalized vectors is often safer than IP on mixed magnitudes
FAISS hygiene. Retrain IVF or PQ codebooks after geometry changes, avoid reusing old centroids

WFGY guardrails that help

BBMC residual checks that flag geometry drift and trigger re whiten when the residue grows
BBPF multi path retrieval to avoid single cone collapse during query expansion
BBCR a bridge step when the chain stalls on near duplicates
BBAM attention damping that prevents one token hijacks in long answers

Tiny scripts you can paste

Variance and cone checks

import numpy as np
from sklearn.decomposition import PCA
from sklearn.preprocessing import normalize

X = np.load("sample_embeddings.npy")     # shape [N, d]
X = X - X.mean(axis=0, keepdims=True)
X = normalize(X, norm="l2", axis=1)

p = PCA(n_components=min(50, X.shape[1])).fit(X)
evr = p.explained_variance_ratio_
print("PC1 EVR:", evr[0], "PC1..5 cum:", evr[:5].sum())

centroid = X.mean(axis=0, keepdims=True)
cos = (X @ centroid.T).ravel()
print("median cos to centroid:", float(np.median(cos)))

Whiten then renormalize

from sklearn.decomposition import PCA
from sklearn.preprocessing import normalize
import numpy as np, joblib

X = np.load("all_embeddings.npy")
mu = X.mean(0, keepdims=True)
Xc = X - mu

p = PCA(n_components=0.95, svd_solver="full").fit(Xc)  # 95% EVR
Z = p.transform(Xc)
Z = normalize(Z, norm="l2", axis=1)

joblib.dump({"mu": mu, "pca": p}, "whitener.pkl")
np.save("embeddings_whitened.npy", Z)

FAISS rebuild sketch

import faiss, numpy as np

Z = np.load("embeddings_whitened.npy").astype("float32")  # L2 normalized
d = Z.shape[1]
index = faiss.IndexHNSWFlat(d, 32)
index.hnsw.efConstruction = 200

faiss.normalize_L2(Z)
index.add(Z)
faiss.write_index(index, "hnsw_cosine.faiss")

Acceptance checks before you declare it fixed

PC1 EVR at or below 0.35 and PC1 to PC5 cumulative at or below 0.70
Median cosine to centroid at or below 0.35 after renormalization
Neighbor overlap rate across 20 random queries at or below 0.35 for k equal to 20
Recall on a held out set improves and top k varies with the query

TL;DR

Cone geometry means the space collapsed. Re center, whiten, renorm, rebuild. Then re check PC1 EVR and neighbor overlap. If your reasoning chain still stalls label it No.6 and insert a bridge step with BBCR.

Series index
All articles in this Problem Map series live here → ProblemMap Articles Index

This content originally appeared on DEV Community and was authored by PSBigBig

Print Share Comment Cite Upload Translate Updates

APA

PSBigBig | Sciencx (2025-08-27T02:16:23+00:00) # Day 6 · Vector anisotropy and collapse (No.5, No.6). Retrieved from https://www.scien.cx/2025/08/27/day-6-%c2%b7-vector-anisotropy-and-collapse-no-5-no-6/

MLA

" » # Day 6 · Vector anisotropy and collapse (No.5, No.6)." PSBigBig | Sciencx - Wednesday August 27, 2025, https://www.scien.cx/2025/08/27/day-6-%c2%b7-vector-anisotropy-and-collapse-no-5-no-6/

HARVARD

PSBigBig | Sciencx Wednesday August 27, 2025 » # Day 6 · Vector anisotropy and collapse (No.5, No.6)., viewed ,<https://www.scien.cx/2025/08/27/day-6-%c2%b7-vector-anisotropy-and-collapse-no-5-no-6/>

VANCOUVER

PSBigBig | Sciencx - » # Day 6 · Vector anisotropy and collapse (No.5, No.6). [Internet]. [Accessed ]. Available from: https://www.scien.cx/2025/08/27/day-6-%c2%b7-vector-anisotropy-and-collapse-no-5-no-6/

CHICAGO

" » # Day 6 · Vector anisotropy and collapse (No.5, No.6)." PSBigBig | Sciencx - Accessed . https://www.scien.cx/2025/08/27/day-6-%c2%b7-vector-anisotropy-and-collapse-no-5-no-6/

IEEE

" » # Day 6 · Vector anisotropy and collapse (No.5, No.6)." PSBigBig | Sciencx [Online]. Available: https://www.scien.cx/2025/08/27/day-6-%c2%b7-vector-anisotropy-and-collapse-no-5-no-6/. [Accessed: ]

rf:citation

» # Day 6 · Vector anisotropy and collapse (No.5, No.6) | PSBigBig | Sciencx | https://www.scien.cx/2025/08/27/day-6-%c2%b7-vector-anisotropy-and-collapse-no-5-no-6/ |

Please log in to upload a file.

There are no updates yet.
Click the Upload button above to add an update.

You must be logged in to translate posts. Please log in or register.