新增md
This commit is contained in:
parent
25a8cd077b
commit
77199541cd
21
LICENSE
Normal file
21
LICENSE
Normal file
@ -0,0 +1,21 @@
|
||||
MIT License
|
||||
|
||||
Copyright (c) 2025 kuraa
|
||||
|
||||
Permission is hereby granted, free of charge, to any person obtaining a copy
|
||||
of this software and associated documentation files (the "Software"), to deal
|
||||
in the Software without restriction, including without limitation the rights
|
||||
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
||||
copies of the Software, and to permit persons to whom the Software is
|
||||
furnished to do so, subject to the following conditions:
|
||||
|
||||
The above copyright notice and this permission notice shall be included in all
|
||||
copies or substantial portions of the Software.
|
||||
|
||||
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
||||
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
||||
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
||||
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
||||
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
||||
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
||||
SOFTWARE.
|
||||
178
README.en.md
Normal file
178
README.en.md
Normal file
@ -0,0 +1,178 @@
|
||||
<picture>
|
||||
<source media="(prefers-color-scheme: dark)" srcset="readme/screenshot-main.png">
|
||||
<img src="readme/screenshot-main.png" alt="CrossSubtitle-AI Screenshot" width="100%">
|
||||
</picture>
|
||||
|
||||
<div align="center">
|
||||
|
||||
# CrossSubtitle-AI
|
||||
|
||||
**AI-Powered, Local-First Subtitle Workbench**
|
||||
|
||||
[](https://github.com/AndySkaura/crosssubtitle-ai/releases)
|
||||
[](https://github.com/AndySkaura/crosssubtitle-ai/blob/main/LICENSE)
|
||||
[](#)
|
||||
|
||||
**English** · [简体中文](./README.md)
|
||||
|
||||
</div>
|
||||
|
||||
---
|
||||
|
||||
## About
|
||||
|
||||
CrossSubtitle-AI is a **local-first** audio/video subtitle processing tool. It uses [Whisper](https://github.com/ggerganov/whisper.cpp) for speech recognition, [Silero VAD](https://github.com/snakers4/silero-vad) for voice activity detection, and supports OpenAI-compatible APIs for intelligent translation — helping you quickly transcribe and translate media files into bilingual subtitles.
|
||||
|
||||
All speech recognition runs locally on your machine. No audio or video files are ever uploaded to any server, ensuring your data privacy.
|
||||
|
||||
## Features
|
||||
|
||||
- **Speech Recognition** — High-accuracy speech-to-text powered by Whisper, supporting 17 source languages including Chinese, English, Japanese, Korean, French, and more
|
||||
- **Voice Activity Detection** — Silero VAD precisely splits speech segments and automatically filters out silence
|
||||
- **Smart Translation** — Connect to any OpenAI-compatible API (GLM, DeepSeek, ChatGPT, etc.) to translate transcripts into your target language
|
||||
- **Audio Extraction** — Built-in FFmpeg automatically extracts audio and converts to 16kHz mono WAV
|
||||
- **Multiple Export Formats** — Export subtitles in SRT, VTT, and ASS formats
|
||||
- **Bilingual Export** — Export side-by-side original + translated bilingual subtitles
|
||||
- **Subtitle Editor** — Built-in editor for modifying both source text and translations line by line
|
||||
- **Drag & Drop** — Drag and drop files to quickly create tasks
|
||||
- **Task Queue** — Batch process multiple media files with real-time progress tracking
|
||||
- **Bilingual UI** — Switch between Chinese and English interface languages
|
||||
- **Local-First** — Speech recognition runs entirely locally, no data upload required
|
||||
|
||||
## Workflow
|
||||
|
||||
1. **Choose Mode** — Select "Source" for transcription only, or "Translate" mode for automatic translation after transcription
|
||||
2. **Add Task** — Click "Add Task" or drag-and-drop media files onto the window
|
||||
3. **Wait for Processing** — Tasks go through: Audio Extraction → VAD Segmentation → Speech Recognition → (Optional) Translation
|
||||
4. **Review & Edit** — View and modify recognition results and translations in the subtitle editor
|
||||
5. **Export Subtitles** — Export as SRT, VTT, or ASS format
|
||||
|
||||
## Screenshots
|
||||
|
||||
| Subtitle | Subtitle Editor |
|
||||
|:---:|:---:|
|
||||
|  |  |
|
||||
|
||||
## Installation
|
||||
|
||||
Download the installer for your platform from [GitHub Releases](https://github.com/AndySkaura/crosssubtitle-ai/releases):
|
||||
|
||||
| Platform | Package |
|
||||
|:---:|:---:|
|
||||
| macOS (Apple Silicon) | `.dmg` |
|
||||
| Windows | `.exe` (NSIS Installer) |
|
||||
|
||||
> The Whisper model (~500MB) will be downloaded on first launch. An internet connection is required.
|
||||
|
||||
## Usage
|
||||
|
||||
### Quick Start
|
||||
|
||||
1. Open the app and select a mode from the top toolbar:
|
||||
- **Source** — Speech recognition only, outputs source language subtitles
|
||||
- **Translate** — Transcribes then translates via an LLM API
|
||||
2. Click "Add Task" or drag-and-drop files onto the window
|
||||
3. Wait for processing to complete
|
||||
4. Review and edit results in the subtitle editor on the right
|
||||
5. Click "Export" to save subtitles in your preferred format
|
||||
|
||||
### Translation Configuration
|
||||
|
||||
Before using the translation feature, configure the LLM API:
|
||||
|
||||
- Fill in the LLM API Base, API Key, and Model in "Advanced Settings"
|
||||
- Works with any OpenAI-compatible service, including:
|
||||
- **GLM (Zhipu AI)** — GLM-4.7-Flash available for free
|
||||
- **DeepSeek**
|
||||
- **ChatGPT**
|
||||
- **Self-hosted** — Ollama, vLLM, etc.
|
||||
|
||||
### Advanced Settings
|
||||
|
||||
- **Whisper Model Path** — Path to a local ggml model file
|
||||
- **VAD Model Path** — Path to a local Silero VAD ONNX model file
|
||||
- **Batch Size** — Number of segments to translate per batch (10-15)
|
||||
- **Context Size** — Number of preceding segments to include as context for translation (0-5)
|
||||
|
||||
## Development
|
||||
|
||||
### Prerequisites
|
||||
|
||||
- [Rust](https://www.rust-lang.org/) toolchain
|
||||
- [Node.js](https://nodejs.org/) (18+)
|
||||
- [FFmpeg](https://ffmpeg.org/) (must be available on the command line)
|
||||
- [CMake](https://cmake.org/) (required for compiling whisper-rs)
|
||||
|
||||
### Local Development
|
||||
|
||||
```bash
|
||||
# Clone the repository
|
||||
git clone https://github.com/AndySkaura/crosssubtitle-ai.git
|
||||
cd crosssubtitle-ai
|
||||
|
||||
# Install frontend dependencies
|
||||
npm install
|
||||
|
||||
# Start development mode
|
||||
npm run tauri-dev
|
||||
```
|
||||
|
||||
### Build
|
||||
|
||||
```bash
|
||||
# macOS DMG build
|
||||
npm run tauri-build-dmg
|
||||
|
||||
# Windows NSIS build
|
||||
npm run tauri-build-windows
|
||||
```
|
||||
|
||||
## Tech Stack
|
||||
|
||||
| Layer | Technology |
|
||||
|:---|:---|
|
||||
| Desktop Framework | [Tauri v2](https://v2.tauri.app/) |
|
||||
| Frontend | [Vue 3](https://vuejs.org/) + [TypeScript](https://www.typescriptlang.org/) |
|
||||
| State Management | [Pinia](https://pinia.vuejs.org/) |
|
||||
| Styling | [Tailwind CSS](https://tailwindcss.com/) |
|
||||
| Internationalization | [vue-i18n](https://vue-i18n.intlify.dev/) |
|
||||
| Speech Recognition | [whisper-rs](https://github.com/tazz4843/whisper-rs) (Whisper) |
|
||||
| Voice Detection | [ort](https://github.com/pykeio/ort) (Silero VAD ONNX) |
|
||||
| Audio Processing | FFmpeg |
|
||||
| LLM Translation | OpenAI-compatible API |
|
||||
|
||||
## Project Structure
|
||||
|
||||
```
|
||||
src/ Vue frontend
|
||||
components/ UI components (TaskQueue, SubtitleEditor)
|
||||
stores/ Pinia state management
|
||||
locales/ i18n locale files (zh-CN, en)
|
||||
lib/ Type definitions
|
||||
src-tauri/ Rust backend
|
||||
src/
|
||||
audio.rs Audio extraction & WAV reading
|
||||
vad.rs Silero VAD voice activity detection
|
||||
whisper.rs Whisper speech recognition interface
|
||||
translate.rs OpenAI-compatible translation interface
|
||||
subtitle.rs SRT / VTT / ASS export
|
||||
task.rs Task orchestration & event broadcasting
|
||||
state.rs Application state
|
||||
```
|
||||
|
||||
## License
|
||||
|
||||
This project is licensed under the [MIT](./LICENSE) License.
|
||||
|
||||
## Acknowledgements
|
||||
|
||||
- [whisper.cpp](https://github.com/ggerganov/whisper.cpp) — High-performance Whisper inference implementation
|
||||
- [Silero VAD](https://github.com/snakers4/silero-vad) — High-accuracy voice activity detection
|
||||
- [Tauri](https://tauri.app/) — Lightweight desktop application framework
|
||||
- All contributors and users
|
||||
|
||||
---
|
||||
|
||||
<div align="center">
|
||||
Made by <a href="https://kuraa.cc">kuraa</a>
|
||||
</div>
|
||||
193
README.md
193
README.md
@ -1,61 +1,178 @@
|
||||
<picture>
|
||||
<source media="(prefers-color-scheme: dark)" srcset="readme/screenshot-main.png">
|
||||
<img src="readme/screenshot-main.png" alt="CrossSubtitle-AI 截图" width="100%">
|
||||
</picture>
|
||||
|
||||
<div align="center">
|
||||
|
||||
# CrossSubtitle-AI
|
||||
|
||||
基于 `Tauri v2 + Vue 3 + Pinia + Tailwind CSS` 的本地优先字幕工作台,覆盖以下 MVP 链路:
|
||||
**AI 驱动的本地优先字幕工作台**
|
||||
|
||||
- 导入音视频文件并创建任务队列
|
||||
- 使用 `ffmpeg` 抽取 16kHz 单声道 WAV
|
||||
- 执行基础 VAD 切分并生成语音片段时间轴
|
||||
- 进入 Whisper 转录/翻译环节
|
||||
- 可选接入 OpenAI-compatible 接口生成中文译文
|
||||
- 实时推送任务进度和字幕片段
|
||||
- 导出 `SRT / VTT / ASS`
|
||||
[](https://github.com/AndySkaura/crosssubtitle-ai/releases)
|
||||
[](https://github.com/AndySkaura/crosssubtitle-ai/blob/main/LICENSE)
|
||||
[](#)
|
||||
|
||||
## 目录结构
|
||||
[English](./README.en.md) · **简体中文**
|
||||
|
||||
- `src/`: Vue 前端界面、Pinia 状态、字幕编辑器
|
||||
- `src-tauri/src/audio.rs`: 音频抽取与 WAV 读取
|
||||
- `src-tauri/src/vad.rs`: VAD API 与基础能量检测实现
|
||||
- `src-tauri/src/whisper.rs`: Whisper 接口层
|
||||
- `src-tauri/src/translate.rs`: OpenAI-compatible 滑动窗口翻译
|
||||
- `src-tauri/src/subtitle.rs`: SRT / VTT / ASS 导出
|
||||
- `src-tauri/src/task.rs`: 任务编排与事件广播
|
||||
</div>
|
||||
|
||||
## 当前实现说明
|
||||
---
|
||||
|
||||
- 当前仓库已补齐完整工程骨架与核心数据流。
|
||||
- 前端 `npm run build` 已通过,Rust 侧 `cargo check` 已通过。
|
||||
- `whisper.rs` 已接入真实 `whisper-rs`,会基于 VAD 片段逐段转录;目标语言为英文时启用 Whisper 原生 `translate`。
|
||||
- `vad.rs` 已接入 `ort` 版 Silero VAD 推理入口;当模型缺失或推理失败时,会自动回退到能量检测,保证链路不断。
|
||||
## 简介
|
||||
|
||||
## 运行前准备
|
||||
CrossSubtitle-AI 是一款**本地优先**的音视频字幕处理工具。它利用 [Whisper](https://github.com/ggerganov/whisper.cpp) 进行语音识别,结合 [Silero VAD](https://github.com/snakers4/silero-vad) 进行语音活动检测,并支持接入 OpenAI 兼容接口进行智能翻译,帮助你将音视频文件快速转录并翻译为双语字幕。
|
||||
|
||||
1. 安装 Rust 工具链。
|
||||
2. 安装 `cmake`,`whisper-rs-sys` 在首次编译时需要它。
|
||||
3. 安装 `ffmpeg`,并确保可通过命令行直接调用。
|
||||
4. 安装前端依赖:
|
||||
整个过程在本地完成语音识别,无需上传音视频文件到任何服务器,保护你的数据隐私。
|
||||
|
||||
## 功能特性
|
||||
|
||||
- **语音识别** — 基于 Whisper 的高精度语音转文字,支持中文、英文、日文、韩文、法文等 17 种源语言
|
||||
- **语音活动检测** — Silero VAD 精准切分语音片段,自动过滤静音区域
|
||||
- **智能翻译** — 接入 OpenAI 兼容接口(如智谱 GLM、DeepSeek、ChatGPT 等),将原文翻译为目标语言
|
||||
- **音频抽取** — 内置 FFmpeg 自动抽取音频并转换为 16kHz 单声道 WAV
|
||||
- **多种导出格式** — 支持 SRT、VTT、ASS 三种字幕格式导出
|
||||
- **双语导出** — 支持原文 + 译文并排显示的双语字幕导出
|
||||
- **字幕编辑器** — 内置字幕编辑器,支持逐条修改原文和译文
|
||||
- **拖拽导入** — 支持拖拽文件快速创建任务
|
||||
- **任务队列** — 批量处理多个音视频文件,实时查看处理进度
|
||||
- **双语界面** — 内置中文 / 英文界面切换
|
||||
- **本地优先** — 语音识别完全在本地运行,无需上传数据
|
||||
|
||||
## 使用流程
|
||||
|
||||
1. **选择模式** — 选择「原文」仅做语音识别,或「翻译」模式在识别后自动翻译
|
||||
2. **添加任务** — 点击「添加任务」按钮或直接拖拽音视频文件到窗口
|
||||
3. **等待处理** — 任务将依次经历:音频抽取 → VAD 切分 → 语音识别 →(可选)翻译
|
||||
4. **编辑校对** — 在字幕编辑器中逐条查看、修改识别结果和译文
|
||||
5. **导出字幕** — 导出为 SRT、VTT 或 ASS 格式
|
||||
|
||||
## 截图
|
||||
|
||||
| 示例 | 字幕编辑器 |
|
||||
|:---:|:---:|
|
||||
|  |  |
|
||||
|
||||
## 安装
|
||||
|
||||
从 [GitHub Releases](https://github.com/AndySkaura/crosssubtitle-ai/releases) 下载对应平台的安装包:
|
||||
|
||||
| 平台 | 安装包 |
|
||||
|:---:|:---:|
|
||||
| macOS (Apple Silicon) | `.dmg` |
|
||||
| Windows | `.exe` (NSIS 安装包) |
|
||||
|
||||
> 首次启动时需要下载 Whisper 模型(约 500MB),请确保网络通畅。
|
||||
|
||||
## 使用方式
|
||||
|
||||
### 快速开始
|
||||
|
||||
1. 打开应用,在顶部工具栏选择工作模式:
|
||||
- **原文** — 仅进行语音识别,输出原文字幕
|
||||
- **翻译** — 识别后调用 LLM 接口翻译为指定语言
|
||||
2. 点击「添加任务」或拖拽文件到窗口
|
||||
3. 等待任务处理完成
|
||||
4. 在右侧字幕编辑器中查看和修改结果
|
||||
5. 点击「导出」选择格式保存字幕文件
|
||||
|
||||
### 翻译模式配置
|
||||
|
||||
使用翻译功能前需要配置 LLM API:
|
||||
|
||||
- 在「高级设置」中填入 LLM API Base、API Key 和 Model
|
||||
- 支持任何兼容 OpenAI API 的服务,如:
|
||||
- **智谱 GLM** — 推荐免费使用 GLM-4.7-Flash
|
||||
- **DeepSeek**
|
||||
- **ChatGPT**
|
||||
- **自建服务** — 如 Ollama、vLLM 等
|
||||
|
||||
### 高级设置
|
||||
|
||||
- **Whisper 模型路径** — 指定本地 ggml 模型文件路径
|
||||
- **VAD 模型路径** — 指定本地 Silero VAD ONNX 模型路径
|
||||
- **批大小 (Batch Size)** — 每批翻译的片段数 (10-15)
|
||||
- **上下文 (Context Size)** — 翻译时参考的上下文片段数 (0-5)
|
||||
|
||||
## 开发
|
||||
|
||||
### 环境要求
|
||||
|
||||
- [Rust](https://www.rust-lang.org/) 工具链
|
||||
- [Node.js](https://nodejs.org/) (18+)
|
||||
- [FFmpeg](https://ffmpeg.org/)(需在命令行中可用)
|
||||
- [CMake](https://cmake.org/)(编译 whisper-rs 需要)
|
||||
|
||||
### 本地开发
|
||||
|
||||
```bash
|
||||
# 克隆仓库
|
||||
git clone https://github.com/AndySkaura/crosssubtitle-ai.git
|
||||
cd crosssubtitle-ai
|
||||
|
||||
# 安装前端依赖
|
||||
npm install
|
||||
|
||||
# 启动开发模式
|
||||
npm run tauri-dev
|
||||
```
|
||||
|
||||
5. 如需中文翻译,配置环境变量:
|
||||
### 构建
|
||||
|
||||
```bash
|
||||
export OPENAI_API_BASE=https://your-openai-compatible-endpoint/v1
|
||||
export OPENAI_API_KEY=your_api_key
|
||||
export OPENAI_MODEL=GLM-4-Flash-250414
|
||||
# macOS DMG 构建
|
||||
npm run tauri-build-dmg
|
||||
|
||||
# Windows NSIS 构建
|
||||
npm run tauri-build-windows
|
||||
```
|
||||
|
||||
6. 若要真正启用 ONNX Runtime 推理,请确保本机存在可被 `ort` 动态加载的 ONNX Runtime 库,或按你的部署方式提供运行库。
|
||||
## 技术栈
|
||||
|
||||
7. 启动桌面应用:
|
||||
| 层级 | 技术 |
|
||||
|:---|:---|
|
||||
| 桌面框架 | [Tauri v2](https://v2.tauri.app/) |
|
||||
| 前端框架 | [Vue 3](https://vuejs.org/) + [TypeScript](https://www.typescriptlang.org/) |
|
||||
| 状态管理 | [Pinia](https://pinia.vuejs.org/) |
|
||||
| 样式 | [Tailwind CSS](https://tailwindcss.com/) |
|
||||
| 国际化 | [vue-i18n](https://vue-i18n.intlify.dev/) |
|
||||
| 语音识别 | [whisper-rs](https://github.com/tazz4843/whisper-rs) (Whisper) |
|
||||
| 语音检测 | [ort](https://github.com/pykeio/ort) (Silero VAD ONNX) |
|
||||
| 音频处理 | FFmpeg |
|
||||
| LLM 翻译 | OpenAI-compatible API |
|
||||
|
||||
```bash
|
||||
npm run dev
|
||||
## 项目结构
|
||||
|
||||
```
|
||||
src/ Vue 前端界面
|
||||
components/ 组件 (TaskQueue, SubtitleEditor)
|
||||
stores/ Pinia 状态管理
|
||||
locales/ 国际化文件 (zh-CN, en)
|
||||
lib/ 类型定义
|
||||
src-tauri/ Rust 后端
|
||||
src/
|
||||
audio.rs 音频抽取与 WAV 读取
|
||||
vad.rs Silero VAD 语音活动检测
|
||||
whisper.rs Whisper 语音识别接口
|
||||
translate.rs OpenAI 兼容翻译接口
|
||||
subtitle.rs SRT / VTT / ASS 导出
|
||||
task.rs 任务编排与事件广播
|
||||
state.rs 应用状态
|
||||
```
|
||||
|
||||
## 下一步建议
|
||||
## 开源协议
|
||||
|
||||
- 为 `src-tauri/src/vad.rs` 补模型输入名自适应和更多异常日志。
|
||||
- 加入文件选择器、任务恢复、批量导出与测试用例。
|
||||
- 为 `whisper-rs` 增加硬件加速参数与模型配置面板。
|
||||
本项目基于 [MIT](./LICENSE) 协议开源。
|
||||
|
||||
## 致谢
|
||||
|
||||
- [whisper.cpp](https://github.com/ggerganov/whisper.cpp) — 高性能 Whisper 推理实现
|
||||
- [Silero VAD](https://github.com/snakers4/silero-vad) — 高精度语音活动检测
|
||||
- [Tauri](https://tauri.app/) — 轻量级桌面应用框架
|
||||
- 所有贡献者和用户
|
||||
|
||||
---
|
||||
|
||||
<div align="center">
|
||||
由 <a href="https://kuraa.cc">kuraa</a> 制作
|
||||
</div>
|
||||
|
||||
BIN
readme/screenshot-editor.png
Normal file
BIN
readme/screenshot-editor.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 387 KiB |
BIN
readme/screenshot-main.png
Normal file
BIN
readme/screenshot-main.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 1.1 MiB |
Loading…
Reference in New Issue
Block a user