Transcribe audio or YouTube videos into text
Generate spoken audio from text using Edge TTS
Generate depth map from your images