This content originally appeared on HackerNoon and was authored by Ali Pourbazargan
The Disaster That Changed Everything
In October 2019, we made a critical mistake that led to 10,000 important files disappearing overnight. It was a disaster—one that could have ruined our business. But five years later, that same experience saved our new company from an even bigger crisis.
\ This is a story about data loss, misconfigurations, and the hard lessons that led us to build a bulletproof backup system. If you're running a system that stores critical data, this could help you avoid making the same mistakes.
Background: How Gama Stored Data
Gama (gama.ir) is a K-12 educational content-sharing platform launched in 2014 in Iran, with over 10 million users worldwide. It provides services such as:
\
- Past Papers
- Tutorials & Learning Resources
- Online Exams & School Hub
- Live Streaming & Q&A Community
- Tutoring Services
\ Since our content was user-generated, maintaining secure file storage was a top priority. We used MooseFS, a distributed file system with five nodes and a triple-replication model, ensuring redundancy.
Our Backup Strategy
A simple external HDD where we stored copies of every file. It worked fine, and we rarely needed it. But then, we made a dangerous assumption.
The Migration That Led to Disaster
One of our engineers suggested migrating to GlusterFS, a more well-known distributed file system. It sounded great—more scalability, higher adoption, and seemingly better performance. After evaluating the cost-benefit tradeoff, we decided to switch.
\ Two months later, the migration was complete. Our team was thrilled with the new system. Everything seemed stable… until it wasn’t.
\ There was just one small problem:
\ Our backup HDD was 90% full, and we needed to make a decision.
The Mistake
Because we had never really needed our full backups before, we assumed GlusterFS was reliable enough.
\ We removed our old backup strategy and trusted GlusterFS replication.
\ That was a bad decision.
The Day Everything Went Wrong
Two months later, one morning, we started receiving reports: some files were missing.
\ At first, we thought it was a network glitch—something minor. But as we dug deeper, we found that Gluster was showing missing chunks and sync errors.
\
- Files were disappearing.
- More and more pages were throwing errors.
- It was spreading fast.
The Immediate Response
3:30 AM: We decided to restart the Gluster network, believing a fresh bootstrap would fix the problem. At first, it seemed to work!
\ We thought we had solved it.
\ Then, a WhatsApp message from the content team came in:
“The files are empty.”
\ Wait, what? The files existed, but they contained nothing.
\ We checked manually. The files still had size and metadata, but when we opened them, they were completely blank.
\ 10,000 files were gone.
The Backup That Was Useless
We had a backup HDD. That should have saved us, right?
\ Wrong. Because after migrating to GlusterFS, we had restructured our directory system. Every file had a new hashed path in the database.
\ Our old backups were useless because they had different filenames.
\ We tried multiple recovery methods. Nothing worked.
\ In the end, we had to email thousands of users, asking them to re-upload their lost files.
\ It was a nightmare. But it forced us to rethink everything.
\
How We Fixed It: Introducing Gama File Keeper (GFK)
After this disaster, we completely redesigned our storage and backup strategy. Our solution had two parts:
1. Gama File Keeper (GFK): A Smarter Storage System
- Every uploaded file is mapped with a checksum, making it trackable even if renamed.
- Instead of hard deletions, files now go through a 3-month soft delete process before starting removal process.
- Recovery is now instant using checksum-based matching.
2. Backapp: A Multi-Layered Backup Strategy
We no longer rely on a single storage system. Instead, we implemented a three-layered backup strategy:
\
- Warm Backup (Every 2 Hours): Real-time sync within the same data center.
- Cold Backup (Every 6 Hours): Replicated to a separate data center.
- Offline Backup (Weekly): Stored on physical HDDs in a separate location.
Database Backups
- Full backups every 24 hours, stored for 12 months.
The Real Test: How This System Saved Us in 2025
Fast forward five years. Gamatrain.com, our new business in the UK, faced another rare incident.
\ But this time, we didn’t lose a single file.
\ Why? Because of the lessons we learned in 2019 and the system we built to prevent it.
Lessons for Every Engineer
- Never trust a single storage system—even if it seems rock solid.
- Backups should be independent, multi-layered, and stored in different locations.
- Disasters will happen. Your resilience depends on how well you prepare for them.
What’s the worst data loss disaster you’ve faced? Share your experience in the comments!
\ #devops #backupstrategy #datarecovery #engineeringfailures #disasterrecovery
This content originally appeared on HackerNoon and was authored by Ali Pourbazargan

Ali Pourbazargan | Sciencx (2025-03-04T09:12:39+00:00) How We Lost 10,000 Files Overnight—And Built a Bulletproof Backup System. Retrieved from https://www.scien.cx/2025/03/04/how-we-lost-10000-files-overnight-and-built-a-bulletproof-backup-system/
Please log in to upload a file.
There are no updates yet.
Click the Upload button above to add an update.