🎯 The Heart of a Proxmox Cluster: Understanding Corosync for a Stable Homelab

📝 Introduction

Setting up a Proxmox cluster feels like unlocking a new superpower, doesn’t it? You get to:

Manage multiple servers from a single interface
Live-migrate VMs like you’re in The Matrix
Feel like a proper sysadmin (even if you…


This content originally appeared on DEV Community and was authored by Chikara Inohara

📝 Introduction

Setting up a Proxmox cluster feels like unlocking a new superpower, doesn't it? You get to:

  • Manage multiple servers from a single interface
  • Live-migrate VMs like you're in The Matrix
  • Feel like a proper sysadmin (even if you're just wearing pajamas)

Cluster screenshot
Please don't judge my messy cluster... pve1 decided to take a vacation and these VMs are just test dummies!

But here's the thing - I never really stopped to think about what's actually happening under the hood to make all this magic work. It just... worked, you know?

What changed my mind?
Recently at work, I had to do some research on cluster technologies, and I fell down the rabbit hole of learning about Corosync - the critical component that keeps Proxmox clusters from falling apart. It was one of those "aha!" moments where everything suddenly clicked!

So today, let's dive into what I learned about Corosync, why it matters, and answer the big question for us homelabbers: "Should I actually care about this stuff?"

🤝 What Exactly is Corosync?

Think of Corosync as the nervous system of your cluster.

It's the open-source software that lets all your Proxmox servers gossip with each other, constantly checking if everyone's still alive and sharing important updates. Without it, your cluster would be like a group chat where nobody knows if anyone else is online.

Corosync's Main Jobs:

  1. 📋 Membership Management

    • Keeps track of who's in the club
    • Knows exactly which nodes are active right now
  2. 💬 Messaging

    • Makes sure commands reach all nodes
    • "Hey everyone, we're starting VM 101 on node 3!"
  3. ⚖️ Quorum Management

    • The "majority rules" system
    • This is the big one! (More on this in a sec)

⚖️ Understanding "Quorum" - The Cluster's Democracy

Deep dive into Proxmox Quorum docs

If you remember just one thing from this post, make it Quorum. It's basically democracy for servers - decisions only happen when the majority agrees.

🧠 The Dreaded "Split-Brain" Problem

Let me paint you a picture of what could go wrong without quorum:

Imagine you have a 4-node cluster, and suddenly your network has a bad day. The cluster splits into two groups of two nodes each.

Split brain diagram

Without quorum rules, both groups would think:

  • "The other guys must have crashed!"
  • "We're the real cluster now!"
  • "Let's start all those VMs that were on the other nodes!"

Result? Both sides try to run the same VMs, write to the same storage, and basically create digital chaos. This nightmare scenario is called a split-brain, and yes, it's as scary as it sounds! 😱

How Quorum Saves the Day

The solution is elegantly simple:

The Majority Rules
Only the group with MORE than half the total votes can keep operating.

  • Got 3 nodes and 2 are talking? ✅ You have quorum (2 > 1.5)
  • Got 4 nodes and only 2 are talking? ❌ No quorum (2 = 2, not greater)
  • Got 5 nodes and 3 are talking? ✅ You have quorum (3 > 2.5)

Any group without a majority goes into "safe mode" and stops all cluster operations. This is called fencing, and while it might seem harsh, it's way better than data corruption!

When you see this scary red X in Proxmox:

No Quorum screenshot

Your node is basically saying: "I'm in the minority, so I'm sitting this one out to avoid causing problems!"

💥 When Things Get Aggressive

Nodes take "safety first" to the extreme. If a node loses contact with the cluster for too long (usually after a few tens of seconds), it might literally reboot itself as a precaution!

I learned this the hard way when a brief network hiccup caused one of my nodes to panic and restart. Not fun when you have important VMs running!

You can watch the drama unfold in real-time in your system logs:

System log screenshot

🔢 Why Odd Numbers are Your Friend

Here's why everyone recommends an odd number of nodes:

Quorum=⌊Total Nodes2⌋+1 \text{Quorum} = \left\lfloor \frac{\text{Total Nodes}}{2} \right\rfloor + 1 Quorum=2Total Nodes+1

Let me break it down with real examples:

Nodes Can Survive Why?
3 nodes 1 failure 2 remaining > 1.5 ✅
4 nodes 1 failure 2 remaining = 2 ❌ Risk of 2v2 split!
5 nodes 2 failures 3 remaining > 2.5 ✅

The takeaway?
Even numbers = potential 50/50 splits = bad times

Stick with 3, 5, or 7 nodes for a happier cluster life!


🏢 The "Enterprise-Grade" Setup (aka Overkill for Most of Us)

If you're running mission-critical stuff, here's what the pros recommend:

  • Redundant dedicated networks for Corosync
  • Separate physical switches just for cluster traffic
  • Multiple NICs on each node
  • Basically, treat Corosync traffic like it's made of gold

For a homelab? Yeah... probably not happening. But it's good to know what "best practice" looks like!

🏡 The Realistic Homelab Approach

Here's what I'm actually running (and it works fine!):

Homelab network diagram

Everything goes through a single NIC per node - management, VM traffic, Corosync, the works. Is it perfect? Nope. Does it work? Absolutely!

⚠️ Watch Out For These Gotchas:

  1. Network Saturation

    • Don't try to migrate VMs while uploading ISOs while backing up while... you get it
    • I've definitely made my cluster unhappy by being too ambitious with simultaneous transfers
  2. Cheap Switches

    • That $20 switch might save money but could cause random cluster hiccups
    • Invest in something decent if you're having stability issues

My advice? Start simple with single NICs. Only add complexity when you actually hit problems!

🤔 "But I Only Have 2 Nodes!"

A 2-node cluster isn't great for High Availability (since losing one = losing quorum), but it's totally fine if you just want easier management!

The Emergency Recovery Trick

When one node dies in a 2-node cluster, here's your lifeline:

# Check if you've lost quorum
$ pvecm status
# Output: Quorum: No 😱

# Tell the surviving node it's now a 1-node cluster
$ pvecm expected 1

# Check again
$ pvecm status
# Output: Quorum: Yes 🎉

Pro tip: QDevice to the rescue!
You can also add a QDevice - basically a tiny third voter (like a Raspberry Pi) that breaks ties in 2-node clusters. It's a bit more complex to set up, but worth investigating if you're stuck with 2 nodes long-term.

Check out:



💭 Final Thoughts

So that's what I've learned about Corosync - the unsung hero keeping our Proxmox clusters from descending into chaos!

The TL;DR:

  • Understand Quorum (majority rules!)
  • Keep your network stable (especially latency)
  • Use odd numbers of nodes when possible
  • Don't overthink it for a homelab

The beauty of homelabbing is learning enterprise concepts and then figuring out what actually matters for your setup. You don't need redundant 10Gb networks and enterprise switches - you just need to understand the principles and adapt them to your reality (and budget)!

What's your cluster setup like? Are you running the recommended odd number of nodes, or living dangerously with an even number? Let me know in the comments!

Found this helpful? Drop a ❤️ and follow for more homelab adventures and my Devops learning adventures too! I'm always breaking things and (usually) fixing them, so there's plenty more to come!


This content originally appeared on DEV Community and was authored by Chikara Inohara


Print Share Comment Cite Upload Translate Updates
APA

Chikara Inohara | Sciencx (2025-09-16T13:31:33+00:00) 🎯 The Heart of a Proxmox Cluster: Understanding Corosync for a Stable Homelab. Retrieved from https://www.scien.cx/2025/09/16/%f0%9f%8e%af-the-heart-of-a-proxmox-cluster-understanding-corosync-for-a-stable-homelab/

MLA
" » 🎯 The Heart of a Proxmox Cluster: Understanding Corosync for a Stable Homelab." Chikara Inohara | Sciencx - Tuesday September 16, 2025, https://www.scien.cx/2025/09/16/%f0%9f%8e%af-the-heart-of-a-proxmox-cluster-understanding-corosync-for-a-stable-homelab/
HARVARD
Chikara Inohara | Sciencx Tuesday September 16, 2025 » 🎯 The Heart of a Proxmox Cluster: Understanding Corosync for a Stable Homelab., viewed ,<https://www.scien.cx/2025/09/16/%f0%9f%8e%af-the-heart-of-a-proxmox-cluster-understanding-corosync-for-a-stable-homelab/>
VANCOUVER
Chikara Inohara | Sciencx - » 🎯 The Heart of a Proxmox Cluster: Understanding Corosync for a Stable Homelab. [Internet]. [Accessed ]. Available from: https://www.scien.cx/2025/09/16/%f0%9f%8e%af-the-heart-of-a-proxmox-cluster-understanding-corosync-for-a-stable-homelab/
CHICAGO
" » 🎯 The Heart of a Proxmox Cluster: Understanding Corosync for a Stable Homelab." Chikara Inohara | Sciencx - Accessed . https://www.scien.cx/2025/09/16/%f0%9f%8e%af-the-heart-of-a-proxmox-cluster-understanding-corosync-for-a-stable-homelab/
IEEE
" » 🎯 The Heart of a Proxmox Cluster: Understanding Corosync for a Stable Homelab." Chikara Inohara | Sciencx [Online]. Available: https://www.scien.cx/2025/09/16/%f0%9f%8e%af-the-heart-of-a-proxmox-cluster-understanding-corosync-for-a-stable-homelab/. [Accessed: ]
rf:citation
» 🎯 The Heart of a Proxmox Cluster: Understanding Corosync for a Stable Homelab | Chikara Inohara | Sciencx | https://www.scien.cx/2025/09/16/%f0%9f%8e%af-the-heart-of-a-proxmox-cluster-understanding-corosync-for-a-stable-homelab/ |

Please log in to upload a file.




There are no updates yet.
Click the Upload button above to add an update.

You must be logged in to translate posts. Please log in or register.