Open-Source ETL to prepare data for RAG ๐Ÿฆ€ ๐Ÿ

Iโ€™ve built an open source ETL framework (CocoIndex) to prepare data for RAG with my friend.

๐Ÿ”ฅ Features:

Data flow programming
Support custom logic – you can plugin your own choice of chunking, embedding, vector stores; plugin your own logic like …


This content originally appeared on DEV Community and was authored by Linghua

Image description

Iโ€™ve built an open source ETL framework (CocoIndex) to prepare data for RAG with my friend.

๐Ÿ”ฅ Features:

  • Data flow programming
  • Support custom logic - you can plugin your own choice of chunking, embedding, vector stores; plugin your own logic like lego. We have three examples in the repo for now. In the long run, we also want to support dedupe, reconcile etc.
  • Incremental updates. We provide state management out-of-box to minimize re-computation. Right now, it checks if a file from a data source is updated. In future, it will be at smaller granularity, e.g., at chunk level.
  • Python SDK (RUST core ๐Ÿฆ€ with Python binding ๐Ÿ)
  • ๐Ÿ”— GitHub Repo: CocoIndex - Appreciate your support with a github star โญ !

Sincerely looking for feedback and learning from your thoughts. Would love contributors too if you are interested :) Thank you so much!


This content originally appeared on DEV Community and was authored by Linghua


Print Share Comment Cite Upload Translate Updates
APA

Linghua | Sciencx (2025-03-09T02:34:26+00:00) Open-Source ETL to prepare data for RAG ๐Ÿฆ€ ๐Ÿ. Retrieved from https://www.scien.cx/2025/03/09/open-source-etl-to-prepare-data-for-rag-%f0%9f%a6%80-%f0%9f%90%8d/

MLA
" » Open-Source ETL to prepare data for RAG ๐Ÿฆ€ ๐Ÿ." Linghua | Sciencx - Sunday March 9, 2025, https://www.scien.cx/2025/03/09/open-source-etl-to-prepare-data-for-rag-%f0%9f%a6%80-%f0%9f%90%8d/
HARVARD
Linghua | Sciencx Sunday March 9, 2025 » Open-Source ETL to prepare data for RAG ๐Ÿฆ€ ๐Ÿ., viewed ,<https://www.scien.cx/2025/03/09/open-source-etl-to-prepare-data-for-rag-%f0%9f%a6%80-%f0%9f%90%8d/>
VANCOUVER
Linghua | Sciencx - » Open-Source ETL to prepare data for RAG ๐Ÿฆ€ ๐Ÿ. [Internet]. [Accessed ]. Available from: https://www.scien.cx/2025/03/09/open-source-etl-to-prepare-data-for-rag-%f0%9f%a6%80-%f0%9f%90%8d/
CHICAGO
" » Open-Source ETL to prepare data for RAG ๐Ÿฆ€ ๐Ÿ." Linghua | Sciencx - Accessed . https://www.scien.cx/2025/03/09/open-source-etl-to-prepare-data-for-rag-%f0%9f%a6%80-%f0%9f%90%8d/
IEEE
" » Open-Source ETL to prepare data for RAG ๐Ÿฆ€ ๐Ÿ." Linghua | Sciencx [Online]. Available: https://www.scien.cx/2025/03/09/open-source-etl-to-prepare-data-for-rag-%f0%9f%a6%80-%f0%9f%90%8d/. [Accessed: ]
rf:citation
» Open-Source ETL to prepare data for RAG ๐Ÿฆ€ ๐Ÿ | Linghua | Sciencx | https://www.scien.cx/2025/03/09/open-source-etl-to-prepare-data-for-rag-%f0%9f%a6%80-%f0%9f%90%8d/ |

Please log in to upload a file.




There are no updates yet.
Click the Upload button above to add an update.

You must be logged in to translate posts. Please log in or register.