One idea I always had in my mind was to build an open-source alternate of https://getpocket.com/ kind of personal knowledge base. The main idea is to allow searching on it via natural language query.
This is what rough sketch I had in my mind. Cleaning/Splitting and Indexing can easily be covered by Haystack (I am community maintainer of this project). If anyone is interested they can contribute connectors to this FOSS library or build a solution on top of it. It also has streamlit powered UI.