# Day 6 · Vector anisotropy and collapse (No.5, No.6)

Embeddings bunch into a skinny cone, neighbors look the same for every query, recall craters. This is the classic space collapse that hides real semantics.

Problem types

No.5 Semantic ≠ Embedding
No.6 Logic Collapse and Recovery


This content originally appeared on DEV Community and was authored by PSBigBig

Embeddings bunch into a skinny cone, neighbors look the same for every query, recall craters. This is the classic space collapse that hides real semantics.

Problem types

  • No.5 Semantic ≠ Embedding
  • No.6 Logic Collapse and Recovery

What it looks like in practice

  • Cosine similarity is high for almost everything, top-k lists barely change across different queries
  • Near neighbors are dominated by boilerplate or global terms
  • Recall\@k on a held out set drops after re-ingest or model swap
  • IVF or HNSW shows busy lists with poor separation

60 second quick test

  1. Sample about 5k vectors from your store
  2. Compute per dimension variance and PCA explained variance ratio up to 50 components
  3. Plot PC1 and PC2, then check the cone

Rules of thumb

  • Red flag if PC1 explained variance is above 0.70 or if the per dimension variance has Gini above 0.6
  • Also bad if the median cosine to the centroid is above 0.55

Common root causes

  • Model change without re whitening
  • Mixed normalization states, some vectors L2 normalized and some not
  • Truncation or unicode normalization bugs in the text pipeline
  • Over aggressive stopword removal or duplicate boilerplate in documents
  • FAISS metric set to inner product while vectors were already L2 normalized for cosine
  • Shard trained on one domain, then queried with another

Minimal fix

Goal is to keep geometry isotropic enough for cosine to carry meaning.

  1. Mean center then L2 normalize all vectors again
  2. Whiten with a small rank
  • Fit PCA on a random subset around 50k, pick r so that cumulative EVR sits between 0.90 and 0.98
  • Transform, then L2 normalize again

    1. Rebuild the index with a metric that matches the vector state
  • For cosine use an L2 index with normalized vectors

  • For inner product use IP and avoid double normalization

    1. Trash and re ingest any mixed state shards. Do not patch in place

You usually see recall recover right away after steps 1 and 3.

Harder fixes if the minimal path is not enough

  • Domain de duplication and boilerplate masking before embedding. Keep per doc tf idf masks or a learned salience to damp headers, nav, legal text
  • Subspace drop. If PC1 to PCk are topic or style axes, drop the k where EVR spikes, then renormalize
  • Temperature up sampling for rare intents so neighbors reflect intent not frequency
  • Metric sanity. Cosine with normalized vectors is often safer than IP on mixed magnitudes
  • FAISS hygiene. Retrain IVF or PQ codebooks after geometry changes, avoid reusing old centroids

WFGY guardrails that help

  • BBMC residual checks that flag geometry drift and trigger re whiten when the residue grows
  • BBPF multi path retrieval to avoid single cone collapse during query expansion
  • BBCR a bridge step when the chain stalls on near duplicates
  • BBAM attention damping that prevents one token hijacks in long answers

Tiny scripts you can paste

Variance and cone checks

import numpy as np
from sklearn.decomposition import PCA
from sklearn.preprocessing import normalize

X = np.load("sample_embeddings.npy")     # shape [N, d]
X = X - X.mean(axis=0, keepdims=True)
X = normalize(X, norm="l2", axis=1)

p = PCA(n_components=min(50, X.shape[1])).fit(X)
evr = p.explained_variance_ratio_
print("PC1 EVR:", evr[0], "PC1..5 cum:", evr[:5].sum())

centroid = X.mean(axis=0, keepdims=True)
cos = (X @ centroid.T).ravel()
print("median cos to centroid:", float(np.median(cos)))

Whiten then renormalize

from sklearn.decomposition import PCA
from sklearn.preprocessing import normalize
import numpy as np, joblib

X = np.load("all_embeddings.npy")
mu = X.mean(0, keepdims=True)
Xc = X - mu

p = PCA(n_components=0.95, svd_solver="full").fit(Xc)  # 95% EVR
Z = p.transform(Xc)
Z = normalize(Z, norm="l2", axis=1)

joblib.dump({"mu": mu, "pca": p}, "whitener.pkl")
np.save("embeddings_whitened.npy", Z)

FAISS rebuild sketch

import faiss, numpy as np

Z = np.load("embeddings_whitened.npy").astype("float32")  # L2 normalized
d = Z.shape[1]
index = faiss.IndexHNSWFlat(d, 32)
index.hnsw.efConstruction = 200

faiss.normalize_L2(Z)
index.add(Z)
faiss.write_index(index, "hnsw_cosine.faiss")

Acceptance checks before you declare it fixed

  • PC1 EVR at or below 0.35 and PC1 to PC5 cumulative at or below 0.70
  • Median cosine to centroid at or below 0.35 after renormalization
  • Neighbor overlap rate across 20 random queries at or below 0.35 for k equal to 20
  • Recall on a held out set improves and top k varies with the query

TL;DR

Cone geometry means the space collapsed. Re center, whiten, renorm, rebuild. Then re check PC1 EVR and neighbor overlap. If your reasoning chain still stalls label it No.6 and insert a bridge step with BBCR.

Series index
All articles in this Problem Map series live here → ProblemMap Articles Index


This content originally appeared on DEV Community and was authored by PSBigBig


Print Share Comment Cite Upload Translate Updates
APA

PSBigBig | Sciencx (2025-08-27T02:16:23+00:00) # Day 6 · Vector anisotropy and collapse (No.5, No.6). Retrieved from https://www.scien.cx/2025/08/27/day-6-%c2%b7-vector-anisotropy-and-collapse-no-5-no-6/

MLA
" » # Day 6 · Vector anisotropy and collapse (No.5, No.6)." PSBigBig | Sciencx - Wednesday August 27, 2025, https://www.scien.cx/2025/08/27/day-6-%c2%b7-vector-anisotropy-and-collapse-no-5-no-6/
HARVARD
PSBigBig | Sciencx Wednesday August 27, 2025 » # Day 6 · Vector anisotropy and collapse (No.5, No.6)., viewed ,<https://www.scien.cx/2025/08/27/day-6-%c2%b7-vector-anisotropy-and-collapse-no-5-no-6/>
VANCOUVER
PSBigBig | Sciencx - » # Day 6 · Vector anisotropy and collapse (No.5, No.6). [Internet]. [Accessed ]. Available from: https://www.scien.cx/2025/08/27/day-6-%c2%b7-vector-anisotropy-and-collapse-no-5-no-6/
CHICAGO
" » # Day 6 · Vector anisotropy and collapse (No.5, No.6)." PSBigBig | Sciencx - Accessed . https://www.scien.cx/2025/08/27/day-6-%c2%b7-vector-anisotropy-and-collapse-no-5-no-6/
IEEE
" » # Day 6 · Vector anisotropy and collapse (No.5, No.6)." PSBigBig | Sciencx [Online]. Available: https://www.scien.cx/2025/08/27/day-6-%c2%b7-vector-anisotropy-and-collapse-no-5-no-6/. [Accessed: ]
rf:citation
» # Day 6 · Vector anisotropy and collapse (No.5, No.6) | PSBigBig | Sciencx | https://www.scien.cx/2025/08/27/day-6-%c2%b7-vector-anisotropy-and-collapse-no-5-no-6/ |

Please log in to upload a file.




There are no updates yet.
Click the Upload button above to add an update.

You must be logged in to translate posts. Please log in or register.