Spaces:
Running
title: Shenute AI
emoji: 🦉
colorFrom: yellow
colorTo: blue
sdk: docker
app_port: 8501
tags:
- streamlit
pinned: false
short_description: Ancient Egyptian & Coptic translation RAG assistant
Shenute AI — Named in honour of Apa Shenute the Archimandrite
A comprehensive RAG (Retrieval-Augmented Generation) assistant for Ancient Egyptian & Coptic translation, grammar, and historical linguistics. Shenute is designed to assist researchers, domain experts, and language enthusiasts by retrieving accurately sourced knowledge from embedded dictionaries, lexicons, and grammars directly into the AI's context.
Online URL:
Shenute AI on Hugging Face Spaces
Features:
- Multi-Provider AI Options: Seamlessly switch between Local AI (via Ollama), Google Gemini API, and Hugging Face serverless endpoints.
- RAG-Powered Memory: Deep contextual grounding from trusted Egyptological and Coptological references.
- Automated Ingestion Pipeline: Ingest CCL (Comprehensive Coptic Lexicon) XMLs, Faulkner dictionaries, and pedagogical PDFs into a robust local ChromaDB vector store.
- Coptic OCR Integration: Includes automated fallback OCR (with LLM garble detection) for scanning image-heavy Coptic PDFs, and a "Force OCR" toggle in the UI.
- Hieroglyph Vision Support: Optional support for recognizing and summarizing ancient hieroglyphs using the
llavavision model.
Technologies:
- Streamlit: Framework for building the interactive web application.
- LangChain: For RAG orchestration and connecting AI models.
- ChromaDB: Local vector store for document embeddings.
- PyMuPDF: For PDF parsing and text extraction.
- Ollama: For local AI model execution (optional).
- Google Generative AI: For utilizing the Gemini API.
- Hugging Face API: For serverless model inference.
- Docker: For containerizing the application.
Installation
Clone the repository:
git clone https://github.com/your-username/Shenute_app.git cd Shenute_appCreate a virtual environment (Optional, if not using Docker):
python3 -m venv .venv source .venv/bin/activate # On Windows use .venv\Scripts\activateInstall dependencies:
pip install -r requirements.txtSet environment variables: Create a
.envfile in the root directory:# Choose AI Provider: "Local AI", "Gemini API", or "Hugging Face" AI_PROVIDER=Local AI # Local AI Configuration LOCAL_AI_BASE_URL=http://localhost:11434 # Gemini API Configuration GEMINI_API_KEY=your_gemini_api_key_here # Hugging Face Configuration HF_TOKEN=your_huggingface_token_hereRun the application:
streamlit run app.pyAlternatively, to run via Docker Compose:
docker-compose up --build -dOpen your browser and go to
http://localhost:8501to interact with the application.
How to Use:
Select or configure an AI Provider:
- Open the sidebar and choose between Local AI, Gemini API, or Hugging Face based on your configuration.
Ingest Documents:
- Go to the Ingestion page to upload Coptic or Egyptian grammar PDFs/XMLs into the ChromaDB vector store.
- Toggle "Force OCR" if the PDF has legacy unreadable fonts.
Chat and Translate:
- Go to the Chat page to ask translation questions or grammatical analyses. The AI will retrieve and cite ingested texts.
Example Usage:
- Local Model: Connect to Ollama locally and ask questions without an internet connection using weights like
llama3. - PDF Ingestion with OCR: Upload an English-Coptic dictionary. The system will detect garbled legacy fonts using an LLM and automatically route those pages through the custom Coptic OCR API.
Requirements:
- Python 3.11+
- Streamlit
- LangChain (Community, Google GenAI, OpenAI, Ollama, Chroma)
- ChromaDB
- PyMuPDF
- PyTesseract
- Docker (optional)
To install these requirements, run:
pip install -r requirements.txt
Contributing:
Feel free to fork the repository, create a branch, and submit pull requests for improvements or bug fixes.
License:
This project is licensed under the MIT License - see the LICENSE file for details.
Contact Information:
- Your Name Email: your-email@example.com