Spark and PySpark: Redefining Distributed Data Processing Post date August 29, 2025 Post author By Manasvi Arya Post categories In apache-spark, distributed-data-processing, good-company, pyspark, python, python-pyspark, python-spark, sruthi-erra-hareram
How I Built a Bulletproof ETL Pipeline with Data Validation That Business Teams Actually Trust Post date August 6, 2025 Post author By Sriram Murali Post categories In apache-spark, big-data, data-engineering, databricks, etl-pipeline
The HackerNoon Newsletter: Complete Gemini CLI Setup Guide for Your Terminal (7/13/2025) Post date July 13, 2025 Post author By Noonification Post categories In access-control, ai, ai-agent, ai-moderation, apache-spark, data-breaches, Gemini, hackernoon-newsletter, latest-tect-stories, noonification, web3
The HackerNoon Newsletter: A Data Engineers Guide to PyIceberg (7/6/2025) Post date July 6, 2025 Post author By Noonification Post categories In ai, ai-moderation, apache-spark, future-of-ai, gtm-strategies, hackernoon-newsletter, hackernoon-sia-partnership, latest-tect-stories, noonification, pyiceberg, startups
How to Write Complex Queries in Apache Spark SQL Using CTE (WITH Clause) Post date June 29, 2025 Post author By Islam Elbanna Post categories In apache-spark, apache-spark-sql, common-table-expressions, cte-with-clause, spark-cte-chaining, spark-sql-with-clause, sql-modular-queries, sql-reusable-queries
How to Fix Data Skew in Apache Spark with the Salting Technique Post date June 27, 2025 Post author By Islam Elbanna Post categories In apache-spark, big-data, data-skew-issues, pyspark, salting, salting-benefits, scala, when-yo-use-salting
Orchestrating Airflow DAGs with GitHub Actions – A Lightweight Approach to Data Curation Across Spa Post date October 25, 2024 Post author By Alex Merced Post categories In airflow, airflow-deployment, apache-spark, dbt, dremio, github, github-actions, snowflake
What The Heck is Apache Polaris? Post date September 11, 2024 Post author By Shawn Gordon Post categories In apache-iceberg, apache-polaris, apache-polaris-explained, apache-spark, data-space, databricks, snowflake, what-is-apache-polaris
Accelerating Write-Intensive Data Workloads on AWS S3 Post date September 10, 2021 Post author By Bin Fan Post categories In apache-spark, aws-s3, caching, cloud, data-orchestration, performance, software-development, storage
Share Large Amounts of Live Data With Delta Sharing and Docker Post date September 3, 2021 Post author By Frank Munz Post categories In apache-spark, delta-lake, linux-foundation, machine-learning, open source, pandas, programming, python
How to Authenticate Kafka Using Kerberos (SASL), Spark, and Jupyter Notebook Post date July 19, 2021 Post author By Artem Gogin Post categories In apache-spark, jupyter-notebook, kafka, kerberos, programming, pyspark, spark, spark-streaming
Analyzing Dogecoin Tweet Sentiment in Real Time Post date May 25, 2021 Post author By Merlin Post categories In apache-kafka, apache-spark, cryptocurrency, data-analytics, dogecoin, real-time-processing, stream-processing, twitter-sentiment-analysis
Introduction to Delight: Spark UI and Spark History Server Post date May 8, 2021 Post author By Jean-Yves "JY" Stephan Post categories In apache-spark, big-data, data-engineering, data-science, monitoring, open source, spark-history-server, spark-ui
Apache Spark Ecosystem Post date April 30, 2021 Post author By Anello Post categories In apache, apache-spark, big-data, data, spark
The DeltaLog: Fundamentals of Delta Lake [Part 2] Post date March 18, 2021 Post author By Adi Polak Post categories In apache-spark, beginners-guide, big-data-engineer, data-engineering, delta-lake, delta-lake-fundamentals, deltalog, hackernoon-top-story
ACID Transactions: Fundamentals of Delta Lake – Part 1 Post date March 6, 2021 Post author By Adi Polak Post categories In acid-transactions-delta-lake, apache-spark, big-data, delta-lake, delta-lake-fundamentals, deltalog-acid-transactions, hackernoon-top-story, scala