🛡️ Data quality, SQL, duckdb and http_client on CI🦆

💭 CI, duckdb et and data protection

To efficiently yet effortlessly manage data quality, we created a GitHub Action to install duckdb :

🦆 Effortless Data Quality w/duckdb on Git…


This content originally appeared on DEV Community and was authored by adriens

💭 CI, duckdb et and data protection

To efficiently yet effortlessly manage data quality, we created a GitHub Action to install duckdb :

... but recently I had to face an another challenge : as part of our CI, I had the need to validate data... that were relying on web resources.

I needed to be sure that a GitHub Account was really existing (for example to avoid typos) as part of our CI.

In this very short article, I'll show how to use DuckDB with the http_client extension to verify GitHub handles stored in a table, for example to lint data as part of a CI pipeline thanks to GitHub Duckdb Action... and do the job with a very simple SQL script and CHECK constraints.

🍿 For impatients

🔖 Resources

🕹️ Do the job with SQL

First install extension:

INSTALL http_client FROM community;
LOAD http_client;

Then create a table with some example data:

create or replace table person_gh_member
(
  sam_accountname varchar primary key,
  gh_member varchar not null
);

-- insert rows
INSERT INTO person_gh_member (sam_accountname, gh_member)
values
('adriens', 'adriens'), 
('jdoe', 'johndoe'), 
('asmith', 'annasmithRRRRR');

Finally, run a query to check the status of each GitHub handle:

create or replace view v_person_gh_status as
    select 
    sam_accountname,
    gh_member,
    'https://github.com/' || gh_member as gh_url,
    cast(http_get(gh_url).status as integer) as http_gh_status
    from person_gh_member;
    --where gh_status <> 200;

from v_person_gh_status;

Now, to lint, we can add a check to see if any status is not 200:

from v_person_gh_status
    where http_gh_status <> 200;
create or replace table lint_gh_handle(
    gh_handle varchar primary key,
    gh_url varchar not null unique check (gh_url like 'https://github.com/%'),
    gh_status integer check (gh_status = 200)
);

insert into lint_gh_handle(
            gh_handle,
            gh_url,
            gh_status)
select
    gh_member,
    'https://github.com/' || gh_member as gh_url,
    cast(http_get(gh_url).status as integer) as gh_status
from person_gh_member;


This content originally appeared on DEV Community and was authored by adriens


Print Share Comment Cite Upload Translate Updates
APA

adriens | Sciencx (2025-09-08T04:02:44+00:00) 🛡️ Data quality, SQL, duckdb and http_client on CI🦆. Retrieved from https://www.scien.cx/2025/09/08/%f0%9f%9b%a1%ef%b8%8f-data-quality-sql-duckdb-and-http_client-on-ci%f0%9f%a6%86/

MLA
" » 🛡️ Data quality, SQL, duckdb and http_client on CI🦆." adriens | Sciencx - Monday September 8, 2025, https://www.scien.cx/2025/09/08/%f0%9f%9b%a1%ef%b8%8f-data-quality-sql-duckdb-and-http_client-on-ci%f0%9f%a6%86/
HARVARD
adriens | Sciencx Monday September 8, 2025 » 🛡️ Data quality, SQL, duckdb and http_client on CI🦆., viewed ,<https://www.scien.cx/2025/09/08/%f0%9f%9b%a1%ef%b8%8f-data-quality-sql-duckdb-and-http_client-on-ci%f0%9f%a6%86/>
VANCOUVER
adriens | Sciencx - » 🛡️ Data quality, SQL, duckdb and http_client on CI🦆. [Internet]. [Accessed ]. Available from: https://www.scien.cx/2025/09/08/%f0%9f%9b%a1%ef%b8%8f-data-quality-sql-duckdb-and-http_client-on-ci%f0%9f%a6%86/
CHICAGO
" » 🛡️ Data quality, SQL, duckdb and http_client on CI🦆." adriens | Sciencx - Accessed . https://www.scien.cx/2025/09/08/%f0%9f%9b%a1%ef%b8%8f-data-quality-sql-duckdb-and-http_client-on-ci%f0%9f%a6%86/
IEEE
" » 🛡️ Data quality, SQL, duckdb and http_client on CI🦆." adriens | Sciencx [Online]. Available: https://www.scien.cx/2025/09/08/%f0%9f%9b%a1%ef%b8%8f-data-quality-sql-duckdb-and-http_client-on-ci%f0%9f%a6%86/. [Accessed: ]
rf:citation
» 🛡️ Data quality, SQL, duckdb and http_client on CI🦆 | adriens | Sciencx | https://www.scien.cx/2025/09/08/%f0%9f%9b%a1%ef%b8%8f-data-quality-sql-duckdb-and-http_client-on-ci%f0%9f%a6%86/ |

Please log in to upload a file.




There are no updates yet.
Click the Upload button above to add an update.

You must be logged in to translate posts. Please log in or register.