How to Build a High-Performance, Free ELT Pipeline Locally using DuckDB Post date December 3, 2025 Post author By Henry Post categories In airflow, data-engineering, data-science, programming, python
When the System Works but the Data Lies: Notes on Survivorship Bias in Large-Scale ML Pipelines Post date December 1, 2025 Post author By Jeet Mehta Post categories In data-engineering, data-observability, data-quality, distributed-systems, large-scale-ml-pipelines, mlops, production-engineering, survivorship-bias
Stop Making Technical Decisions on Feel Post date November 30, 2025 Post author By Clay Gambetti Post categories In Agile, data-engineering, product-management, product-owner, software engineering
What You Already Know About Big Data Post date November 28, 2025 Post author By PATRICK OKARE Post categories In big-data-analytics, data-engineering, data-veracity, database, etl, everyday-sources-of-data, micro interaction, what-is-big-data
Lessons From The Night I Met Dbt on Databricks Post date November 28, 2025 Post author By PATRICK OKARE Post categories In data-build-tool, data-engineering, databricks, dbt, etl, measuring-customer-sentiment, medallion-architecture, sales-model
What the Heck is dbc? Post date November 27, 2025 Post author By Shawn Gordon Post categories In apache-arrow, apache-arrow-ecosystem, columnar-data-formats, data-engineering, dbc, dbc-installer, duckdb, in-memory-analytics
Designing Reliable API Systems: Exception Handling with Spring Boot’s ControllerAdvice Post date November 27, 2025 Post author By AdiA Post categories In controlleradvice-tutorial, data-engineering, exceptionhandler-example, global-exception-handler-java, java-api, java-microservices-reliability, spring-boot, spring-boot-error-handling
Conversational Analytics: the Next Generation of Data Analysis and Business Intelligence Post date November 27, 2025 Post author By Risharoo Post categories In ai, artificial-intelligence, business-intelligence, data-analysis, data-analytics, data-engineering, data-science, data-visualization
Conversational Analytics: the Next Generation of Data Analysis and Business Intelligence Post date November 27, 2025 Post author By Risharoo Post categories In ai, artificial-intelligence, business-intelligence, data-analysis, data-analytics, data-engineering, data-science, data-visualization
Stop Hacking SQL: How to Build a Scalable Query Automation System Post date November 26, 2025 Post author By Ivan Timonov Post categories In data-engineering, data-quality, query-automation-system, scalable-query-automation, sql, sql-automation, sql-hacking, sql-query-automation-system
Data Quality on Spark, Part 2: Soda Post date November 25, 2025 Post author By Ivan Kurchenko Post categories In data-engineering, data-quality, python, soda, spark
Google & Yale Turned Biology Into a Language Here’s Why That’s a Game-Changer for Devs Post date November 22, 2025 Post author By GlobalHawk Post categories In ai, bioinformatics, data-engineering, deep-tech, Google, hackernoon-top-story, llm, yale-ai-research
Improving RAG with Hierarchies and Content Assembly Post date November 21, 2025 Post author By Andre Rabold Post categories In data-engineering, genai, knowledge-graph, large-language-models, retrieval-augmented-gen
Final Project Report 2| Apache SeaTunnel Adds Metalake Support Post date November 20, 2025 Post author By William Guo Post categories In apache-seatunnel, bigdata, data-engineering, data-science, metalake, open-source-development, opensource, plugin-architecture
Make Your Data Pipelines 5X Faster with Adaptive Batching Post date November 20, 2025 Post author By LJ Post categories In adaptive-batching, ai-pipeline-scaling, automatic-batching-system, cocoindex, data-engineering, gpu-optimization, rust-engine-ai, sentence-transformers-batching
If Data Is the New Oil, We Already Built a Planet-Sized Spill Post date November 6, 2025 Post author By Carl Watts Post categories In archive-ai, data-engineering, digital-disposophobia, digital-transformation, meta-data-operations, saving-bits, semantic-storage, thought-experiment
Why Your GenAI Strategy Demands an All-Inclusive Data Modernization Post date November 6, 2025 Post author By Rudrendu Paul Post categories In ai-strategy, business-logic, cloud-data-migration, cloud-migration, data-engineering, enterprise-data-modernization, generative-ai, legacy-system-modernization
Why Your GenAI Strategy Demands an All-Inclusive Data Modernization Post date November 6, 2025 Post author By Rudrendu Paul Post categories In ai-strategy, business-logic, cloud-data-migration, cloud-migration, data-engineering, enterprise-data-modernization, generative-ai, legacy-system-modernization
Beyond Data: The Rising Need for AI Security Post date November 4, 2025 Post author By Sarath Chandra Vidya Sagar Machupalli Post categories In ai-security, architecture, big-data, cybersecurity, data-engineering, data-privacy, data-security, techxchange
The Observability Debt Hypothesis: Why Perfect Dashboards Still Mask Failing Systems Post date November 3, 2025 Post author By Jeet Mehta Post categories In data-engineering, devops, distributed-systems, monitoring, observability, site-reliability-engineering, software-reliability, system-design
Why Multimodal AI Broke the Data Pipeline — And How Daft Is Beating Ray and Spark to Fix It Post date November 3, 2025 Post author By YK Sugi Post categories In ai-data-processing, aiops, daft-vs-ray-data, data-engineering, distributed-systems, mlops, multimodal-ai, ray-data-performance
Why Multimodal AI Broke the Data Pipeline — And How Daft Is Beating Ray and Spark to Fix It Post date November 3, 2025 Post author By YK Sugi Post categories In ai-data-processing, aiops, daft-vs-ray-data, data-engineering, distributed-systems, mlops, multimodal-ai, ray-data-performance
AI Native Data Pipeline – What Do We Need? Post date November 2, 2025 Post author By LJ Post categories In ai, ai-native-data-pipeline, cocoindex, data-engineering, data-for-ai, data-ownership, data-pipeline, what-is-cocoindex
A Hands-On Guide to Building the Speed Layer of the Lambda Architecture Post date October 31, 2025 Post author By RisingWave Labs Post categories In data-engineering, kafka, lambda, risingwave, stream-processing
How to Extract and Embed Text and Images from PDFs for Unified Semantic Search Post date October 27, 2025 Post author By LJ Post categories In ai, cocoindex, data-engineering, multimodal-search, pdf-indexing, qdrant, semantic-search, text-and-image-embeddings
How to Extract and Embed Text and Images from PDFs for Unified Semantic Search Post date October 27, 2025 Post author By LJ Post categories In ai, cocoindex, data-engineering, multimodal-search, pdf-indexing, qdrant, semantic-search, text-and-image-embeddings
Python Script to Read and Judge 1,500 Legal Cases Post date October 20, 2025 Post author By GlobalHawk Post categories In Automation, data-engineering, etl-pipeline, legal-tech, negation-handling, python, real-world-nlp-applications, web-scraping
Synchronizing Data from MySQL to PostgreSQL Using Apache SeaTunnel Post date October 20, 2025 Post author By William Guo Post categories In apache-seatunnel, data-engineering, data-science, data-sync, hackernoon-top-story, mysql, postgresql, real-time-etl
The Data Infrastructure Behind Every Successful AI Startup Post date October 6, 2025 Post author By Muhammad Usman Post categories In ai, ai-development, ai-startups, bright-data, data-engineering, data-infrastructure, data-pipelines, large-language-models
Building a Modern Dashboard with Python and Tkinter Post date September 28, 2025 Post author By Thomas Reid Post categories In data-engineering, data-visualization, programming, python, technology
A Developer’s Guide to DolphinScheduler 3.1.9 Worker Startup Process Post date September 26, 2025 Post author By William Guo Post categories In apache-dolphinscheduler, bigdata, Code Analysis, data-engineering, data-science, dolphinscheduler-3.1.9, opensource, workflow-orchestration
Risks of Synthetic Data: A Technical Overview Post date September 24, 2025 Post author By Dr.Ahmed Gamal Post categories In artificial-intelligence, data-engineering, data-science, machine-learning-risks, synthetic-data
From “Decentralized” to “Unified”: SUPCON Uses SeaTunnel to Build an Efficient Data Collection Frame Post date September 22, 2025 Post author By William Guo Post categories In apacheseatunnel, bigdata, cdc, data-engineering, data-sync, hackernoon-top-story, high-availability, supcon
Unified Data, Smarter Agents—Is Your Architecture Future-Proof? Post date September 17, 2025 Post author By Saradha Post categories In ai, aws, Azure, big-data-analytics, data, data-engineering, etl, product
Debugging Data Pipelines Shouldn’t Be a Guessing Game Post date September 16, 2025 Post author By Edward van Eechoud Post categories In data-analytics, data-engineering, data-science, open source, python
Debugging Data Pipelines Shouldn’t Be a Guessing Game Post date September 16, 2025 Post author By Edward van Eechoud Post categories In data-analytics, data-engineering, data-science, open source, python
The Price of BigQuery and the True Cost of Being Data-Driven Post date September 1, 2025 Post author By maximzltrv Post categories In cost-optimization-bigquery, data-engineering, data-platform, data-warehouse, data-warehouse-build, dwh-for-machine-learning, Google Cloud Platform, modern-dwh-architecture
The Terminal That Changed My ETL Career Forever Post date August 28, 2025 Post author By Girish Dhamane Post categories In command-line, data-engineering, etl, linux, unix
AI Just Learned to Feel Embarrassed (And It’s Hilariously Human) Post date August 19, 2025 Post author By Girish Dhamane Post categories In ai, artificial-intelligence, data-engineering, humor, technology
I Accidentally Became a Full-Stack Developer (And I Only Know SQL) Post date August 19, 2025 Post author By Girish Dhamane Post categories In careers, data-engineering, devops, full-stack-developer, technology
Inside the Bonkers DIY Project to Corral Every Gadget Rumor on Earth Post date August 14, 2025 Post author By Bill Anderson Post categories In data-engineering, dev-blog, hackernoon-top-story, machine-learning, news-aggregator, python, rust, tech-zeitgeist-machine
Control Processing Concurrency for Large Scale RAG Pipelines in Production Post date August 12, 2025 Post author By LJ Post categories In big-data-performance, cocoindex, concurrency-control, data-engineering, rag, rag-architecture, scalable-pipelines, workflow-optimization
From Wrangling Code to Taming Chaos: How Being a Software Engineer Made Me a Better Operator Post date August 8, 2025 Post author By Charles Wong Post categories In bizops, data-engineering, engineering-management, north-star-metrics, objective-functions, product-development, product-management, software-development
How I Built a Bulletproof ETL Pipeline with Data Validation That Business Teams Actually Trust Post date August 6, 2025 Post author By Sriram Murali Post categories In apache-spark, big-data, data-engineering, databricks, etl-pipeline
Plug, Play, and Ship: Modular Pipelines Get a Major Upgrade Post date August 1, 2025 Post author By LJ Post categories In ai, ai-lego-bricks, cocoindex, cocoindex-custom-targets, custom-targets, data-engineering, modular-pipelines, python
How Tripadvisor Delivers Real-Time Personalization at Scale with ML Post date July 22, 2025 Post author By ScyllaDB Post categories In data-engineering, good-company, microservices-architecture, real-time-ml-models, scylladb-aws, travel-recommendations, tripadvisor-personalization, visitor-platform
The HackerNoon Newsletter: AI Race With China Risks Undermining Western Values (7/17/2025) Post date July 17, 2025 Post author By Noonification Post categories In ai, artificial-intelligence, data-engineering, hackernoon-newsletter, latest-tect-stories, nextjs, noonification, stablecoin
Redefining Data Operations With Data Flow Programming in CocoIndex Post date July 17, 2025 Post author By LJ Post categories In ai, building-data-applications, cocoindex, data-engineering, data-flow-in-cocoindex, data-flow-programming, data-operations, hackernoon-top-story
UUIDs in Python: Use Cases & How to Speed Them Up âš¡ Post date July 16, 2025 Post author By Eric Narro Post categories In data-engineering, database, python, uuid, uuid-generator-python
Turn Your PDF Library into a Searchable Research Database with 100 Lines of Code Post date July 11, 2025 Post author By LJ Post categories In ai, ai-academic-search-engine, data-engineering, hackernoon-top-story, llm-based-paper-parsing, pdf-metadata-extraction, research-paper-indexing-tool, semantic-search-for-pdfs
Building an End-to-End Data Pipeline in the Google Cloud Post date June 6, 2025 Post author By Angela Niederberger Post categories In bigquery, data-engineering, dataform, Google Cloud Platform, kestra
How I Think About Handling Updates in Indexing Pipelines Post date May 28, 2025 Post author By LJ Post categories In ai, data-engineering, data-processing, data-science, etl, how-to-handle-updates, indexing-pipelines, streaming
AWS Regions and Availability Zones: A Useful Guide for Beginners Post date April 29, 2025 Post author By luminousmen Post categories In aws, aws-regions, cloud, data, data-engineering, devops, hackernoon-top-story, what-is-an-aws-region
Accelerating Data Engineering Pipelines Post date April 24, 2025 Post author By Dr.Ahmed Gamal Post categories In ai, data, data-engineering, data-science, pipeline
Synthetic Data Generation of Singular Structured Tabular Data: SDV vs LLM Post date April 9, 2025 Post author By Lu Zhenna Post categories In data-engineering, data-science, genai, generative-ai-tools, synthetic-data
Keep Your Indexes Fresh With This Real-time Pipeline Post date April 7, 2025 Post author By LJ Post categories In ai, Continuous Integration, data-engineering, data-indexing, data-science, etl-tools, live-data-sync, real-time-data
Time Travel Queries: The Data Time Machine Fixing Bugs & Solving Disputes in Real-Time Systems Post date March 28, 2025 Post author By RisingWave Labs Post categories In case-study, data-engineering, database-design, fintech, streaming-data-processing
Stop Moving Data Manually—Let DolphinScheduler’s Output Variables Do the Heavy Lifting For You Post date March 20, 2025 Post author By William Guo Post categories In apache-dolphinscheduler, data-engineering, dolphinscheduler-guide, opensource, programming, shell-scripts, technical-writing, workflow-orchestration
Data Transformation and Discretization: A Comprehensive Guide Post date March 12, 2025 Post author By Aleeza Adnan Post categories In data-engineering, data-mining, data-preparation, data-preprocessing, data-science, data-transformation, discretization, normalization
Trino 471: When SQL Meets AI (and S3 Gets Easier) Post date March 7, 2025 Post author By Shashank Mayya Post categories In ai, apache-iceberg, aws, data-engineering, sql