Transforming and Querying JSON Data in AWS S3 with Glue and Athena

AWS Glue is an AWS service that helps discover, prepare, and integrate all your data at any scale. It can aid in the process of transformation, discovering, and integrating data from multiple sources.

In this blog we have a data source that uploads JS…


This content originally appeared on DEV Community and was authored by Sushanta Paudel

AWS Glue is an AWS service that helps discover, prepare, and integrate all your data at any scale. It can aid in the process of transformation, discovering, and integrating data from multiple sources.

In this blog we have a data source that uploads JSON data files periodically to an Amazon S3 bucket named 'uploads-bucket'. These JSON data files contain log entries.

Below is an example of one of the log entries JSON files that are being uploaded to the Amazon S3 bucket named uploads-bucket.

{
  "timestamp": "2025-10-04T10:15:30Z",
  "level": "ERROR",
  "service": "authentication",
  "message": "Failed login attempt",
  "user": {
    "id": "4626",
    "username": "johndoe",
    "ip_address": "192.168.1.101"
  },
  "metadata": {
    "session_id": "abc123xyz",
    "location": "Kathmandu"
  }
}

Step 1: Create the processing Lambda Function

Now, we will implement an AWS Lambda function that processes Amazon S3 object-created events for the JSON files and the Lambda code will transform the data and write it to a second Amazon S3 bucket named
target-bucket. This function will:

  1. Transform the JSON data.
  2. Write it to a target S3 bucket named target-bucket in Parquet format.
  3. Catalog the data in AWS Glue for querying.
import json
import boto3
import awswrangler as wr
import pandas as pd

GLUE_DATABASE = "log_database"
TARGET_BUCKET = "target-bucket"

def parse_event(event):
    # EventBridge S3 object-created structure
    key = event['detail']['object']['key']
    bucket = event['detail']['bucket']['name']
    return key, bucket

def read_object(bucket, key):
    s3 = boto3.resource('s3')
    obj = s3.Object(bucket, key)
    return obj.get()['Body'].read().decode('utf-8')

def create_database():
    databases = wr.catalog.databases()
    if GLUE_DATABASE not in databases['Name'].values:
        wr.catalog.create_database(GLUE_DATABASE)
        print(f"Database {GLUE_DATABASE} created")
    else:
        print(f"Database {GLUE_DATABASE} already exists")

def lambda_handler(event, context):
    key, bucket = parse_event(event)
    object_body = read_object(bucket, key)

    create_database()

    log_entry = json.loads(object_body)

    # Flatten JSON
    log_df = pd.json_normalize(log_entry)

    # Create partition columns for Glue
    log_df['log_level'] = log_df['level']
    log_df['service_name'] = log_df['service']

    # Write logs to Glue catalog
    wr.s3.to_parquet(
        df=log_df.astype(str),
        path=f"s3://{TARGET_BUCKET}/data/logs/",
        dataset=True,
        database=GLUE_DATABASE,
        table="logs",
        mode="append",
        partition_cols=["log_level", "service_name"],
        description="Application log table",
        parameters={
            "source": "Application",
            "class": "log"
        },
        columns_comments={
            "timestamp": "Time of the log event",
            "level": "Log level (INFO, WARN, ERROR, etc.)",
            "service": "Service generating the log",
            "message": "Log message",
            "user.id": "User identifier",
            "user.username": "Username"
        }
    )

We need to attach the necessary permission role to the Lambda function. The function needs access to both the S3 bucket and AWS Glue.

Step 2:Creating an Amazon EventBridge Rule

Now we will create an Amazon EventBridge rule to invoke our AWS Lambda function when an object is uploaded to the Amazon S3 bucket. We will create a new Amazon EventBridge rule as follows:

  • Name: s3-uploads
  • Event bus: default
  • Rule type: Rule with an event pattern

Creating an AWS EventBridge Rule

Scroll down to the Event pattern section, and enter and select:

  • Event source: AWS Services
  • AWS service: S3
  • Event type: Object Created
  • Specific Bucket: uploads-bucket

Specifying the S3 bucket

Target Configuration:

  • Target types: AWS Lambda
  • Function: Enter our Lambda function name

Selecting the Lambda as target

The event pattern JSON here will be:

{
  "source": ["aws.s3"],
  "detail-type": ["Object Created"],
  "detail": {
    "bucket": {
      "name": ["uploads-bucket"]
    }
  }
}

Step 3: Add index partition to the logs table using AWS Glue

When the Lambda function we created runs, it will create tables named 'logs table' in the Glue Database.

In the AWS Console, go to the Glue dashboard.

  • To see databases, in the left-hand menu, under Data Catalog, click Databases. We will find the database our Lambda function created on this section. Click on this database.
  • Click on the logs table. The table data is stored in the target Amazon S3 bucket named target-bucket.
  • We will now add a partition index in this table for faster query performance.
  • To begin creating an index, click Add index
  • Enter and select the following to configure the index Index name: log_level_index Index keys: log_level Click Update
  • Now we can also view this data using Amazon Athena. For this, there is an option at the top of the page , where we need to click on Actions and then View Data.

Step 4: Searching within your Indexed AWS S3 data

With index now in place, we can query efficiently using log_level as filter.

We need to first open the Athena console and open the query editor and select our database.

Example query: Retrieve all ERROR logs from the authentication service:

SELECT timestamp, service, message, user_username, metadata_location
FROM logs
WHERE log_level = 'ERROR'
  AND service_name = 'authentication';

In conclusion, we can combine AWS Lambda, Amazon S3, AWS Glue, and Athena, to build a fully serverless, scalable data pipeline that transforms, catalogs, and queries e-commerce data in near real-time. We can also leverage Glue partitions and indexes for allows for efficient storage and faster analytics, with Athena enabling ad-hoc queries without the overhead of managing traditional databases.


This content originally appeared on DEV Community and was authored by Sushanta Paudel


Print Share Comment Cite Upload Translate Updates
APA

Sushanta Paudel | Sciencx (2025-10-15T10:34:36+00:00) Transforming and Querying JSON Data in AWS S3 with Glue and Athena. Retrieved from https://www.scien.cx/2025/10/15/transforming-and-querying-json-data-in-aws-s3-with-glue-and-athena/

MLA
" » Transforming and Querying JSON Data in AWS S3 with Glue and Athena." Sushanta Paudel | Sciencx - Wednesday October 15, 2025, https://www.scien.cx/2025/10/15/transforming-and-querying-json-data-in-aws-s3-with-glue-and-athena/
HARVARD
Sushanta Paudel | Sciencx Wednesday October 15, 2025 » Transforming and Querying JSON Data in AWS S3 with Glue and Athena., viewed ,<https://www.scien.cx/2025/10/15/transforming-and-querying-json-data-in-aws-s3-with-glue-and-athena/>
VANCOUVER
Sushanta Paudel | Sciencx - » Transforming and Querying JSON Data in AWS S3 with Glue and Athena. [Internet]. [Accessed ]. Available from: https://www.scien.cx/2025/10/15/transforming-and-querying-json-data-in-aws-s3-with-glue-and-athena/
CHICAGO
" » Transforming and Querying JSON Data in AWS S3 with Glue and Athena." Sushanta Paudel | Sciencx - Accessed . https://www.scien.cx/2025/10/15/transforming-and-querying-json-data-in-aws-s3-with-glue-and-athena/
IEEE
" » Transforming and Querying JSON Data in AWS S3 with Glue and Athena." Sushanta Paudel | Sciencx [Online]. Available: https://www.scien.cx/2025/10/15/transforming-and-querying-json-data-in-aws-s3-with-glue-and-athena/. [Accessed: ]
rf:citation
» Transforming and Querying JSON Data in AWS S3 with Glue and Athena | Sushanta Paudel | Sciencx | https://www.scien.cx/2025/10/15/transforming-and-querying-json-data-in-aws-s3-with-glue-and-athena/ |

Please log in to upload a file.




There are no updates yet.
Click the Upload button above to add an update.

You must be logged in to translate posts. Please log in or register.