新增md

2026-05-04 15:02:40 +08:00 · 2026-05-04 15:02:40 +08:00 · 77199541cd
commit 77199541cd
parent 25a8cd077b
5 changed files with 354 additions and 38 deletions
--- a/21
+++ b/21
@ -0,0 +1,21 @@
+MIT License
+
+Copyright (c) 2025 kuraa
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
--- a/README.en.md
+++ b/README.en.md
@ -0,0 +1,178 @@
+<picture>
+  <source media="(prefers-color-scheme: dark)" srcset="readme/screenshot-main.png">
+  <img src="readme/screenshot-main.png" alt="CrossSubtitle-AI Screenshot" width="100%">
+</picture>
+
+<div align="center">
+
+# CrossSubtitle-AI
+
+**AI-Powered, Local-First Subtitle Workbench**
+
+[![GitHub Release](https://img.shields.io/github/v/release/AndySkaura/crosssubtitle-ai?style=flat-square)](https://github.com/AndySkaura/crosssubtitle-ai/releases)
+[![GitHub License](https://img.shields.io/github/license/AndySkaura/crosssubtitle-ai?style=flat-square)](https://github.com/AndySkaura/crosssubtitle-ai/blob/main/LICENSE)
+[![Platform](https://img.shields.io/badge/platform-macOS%20%7C%20Windows-blue?style=flat-square)](#)
+
+**English** · [简体中文](./README.md)
+
+</div>
+
+---
+
+## About
+
+CrossSubtitle-AI is a **local-first** audio/video subtitle processing tool. It uses [Whisper](https://github.com/ggerganov/whisper.cpp) for speech recognition, [Silero VAD](https://github.com/snakers4/silero-vad) for voice activity detection, and supports OpenAI-compatible APIs for intelligent translation — helping you quickly transcribe and translate media files into bilingual subtitles.
+
+All speech recognition runs locally on your machine. No audio or video files are ever uploaded to any server, ensuring your data privacy.
+
+## Features
+
+- **Speech Recognition** — High-accuracy speech-to-text powered by Whisper, supporting 17 source languages including Chinese, English, Japanese, Korean, French, and more
+- **Voice Activity Detection** — Silero VAD precisely splits speech segments and automatically filters out silence
+- **Smart Translation** — Connect to any OpenAI-compatible API (GLM, DeepSeek, ChatGPT, etc.) to translate transcripts into your target language
+- **Audio Extraction** — Built-in FFmpeg automatically extracts audio and converts to 16kHz mono WAV
+- **Multiple Export Formats** — Export subtitles in SRT, VTT, and ASS formats
+- **Bilingual Export** — Export side-by-side original + translated bilingual subtitles
+- **Subtitle Editor** — Built-in editor for modifying both source text and translations line by line
+- **Drag & Drop** — Drag and drop files to quickly create tasks
+- **Task Queue** — Batch process multiple media files with real-time progress tracking
+- **Bilingual UI** — Switch between Chinese and English interface languages
+- **Local-First** — Speech recognition runs entirely locally, no data upload required
+
+## Workflow
+
+1. **Choose Mode** — Select "Source" for transcription only, or "Translate" mode for automatic translation after transcription
+2. **Add Task** — Click "Add Task" or drag-and-drop media files onto the window
+3. **Wait for Processing** — Tasks go through: Audio Extraction → VAD Segmentation → Speech Recognition → (Optional) Translation
+4. **Review & Edit** — View and modify recognition results and translations in the subtitle editor
+5. **Export Subtitles** — Export as SRT, VTT, or ASS format
+
+## Screenshots
+
+| Subtitle | Subtitle Editor |
+|:---:|:---:|
+| ![Subtitle](readme/screenshot-main.png) | ![Subtitle Editor](readme/screenshot-editor.png) |
+
+## Installation
+
+Download the installer for your platform from [GitHub Releases](https://github.com/AndySkaura/crosssubtitle-ai/releases):
+
+| Platform | Package |
+|:---:|:---:|
+| macOS (Apple Silicon) | `.dmg` |
+| Windows | `.exe` (NSIS Installer) |
+
+> The Whisper model (~500MB) will be downloaded on first launch. An internet connection is required.
+
+## Usage
+
+### Quick Start
+
+1. Open the app and select a mode from the top toolbar:
+   - **Source** — Speech recognition only, outputs source language subtitles
+   - **Translate** — Transcribes then translates via an LLM API
+2. Click "Add Task" or drag-and-drop files onto the window
+3. Wait for processing to complete
+4. Review and edit results in the subtitle editor on the right
+5. Click "Export" to save subtitles in your preferred format
+
+### Translation Configuration
+
+Before using the translation feature, configure the LLM API:
+
+- Fill in the LLM API Base, API Key, and Model in "Advanced Settings"
+- Works with any OpenAI-compatible service, including:
+  - **GLM (Zhipu AI)** — GLM-4.7-Flash available for free
+  - **DeepSeek**
+  - **ChatGPT**
+  - **Self-hosted** — Ollama, vLLM, etc.
+
+### Advanced Settings
+
+- **Whisper Model Path** — Path to a local ggml model file
+- **VAD Model Path** — Path to a local Silero VAD ONNX model file
+- **Batch Size** — Number of segments to translate per batch (10-15)
+- **Context Size** — Number of preceding segments to include as context for translation (0-5)
+
+## Development
+
+### Prerequisites
+
+- [Rust](https://www.rust-lang.org/) toolchain
+- [Node.js](https://nodejs.org/) (18+)
+- [FFmpeg](https://ffmpeg.org/) (must be available on the command line)
+- [CMake](https://cmake.org/) (required for compiling whisper-rs)
+
+### Local Development
+
+```bash
+# Clone the repository
+git clone https://github.com/AndySkaura/crosssubtitle-ai.git
+cd crosssubtitle-ai
+
+# Install frontend dependencies
+npm install
+
+# Start development mode
+npm run tauri-dev
+```
+
+### Build
+
+```bash
+# macOS DMG build
+npm run tauri-build-dmg
+
+# Windows NSIS build
+npm run tauri-build-windows
+```
+
+## Tech Stack
+
+| Layer | Technology |
+|:---|:---|
+| Desktop Framework | [Tauri v2](https://v2.tauri.app/) |
+| Frontend | [Vue 3](https://vuejs.org/) + [TypeScript](https://www.typescriptlang.org/) |
+| State Management | [Pinia](https://pinia.vuejs.org/) |
+| Styling | [Tailwind CSS](https://tailwindcss.com/) |
+| Internationalization | [vue-i18n](https://vue-i18n.intlify.dev/) |
+| Speech Recognition | [whisper-rs](https://github.com/tazz4843/whisper-rs) (Whisper) |
+| Voice Detection | [ort](https://github.com/pykeio/ort) (Silero VAD ONNX) |
+| Audio Processing | FFmpeg |
+| LLM Translation | OpenAI-compatible API |
+
+## Project Structure
+
+```
+src/                      Vue frontend
+  components/             UI components (TaskQueue, SubtitleEditor)
+  stores/                 Pinia state management
+  locales/                i18n locale files (zh-CN, en)
+  lib/                    Type definitions
+src-tauri/                Rust backend
+  src/
+    audio.rs              Audio extraction & WAV reading
+    vad.rs                Silero VAD voice activity detection
+    whisper.rs            Whisper speech recognition interface
+    translate.rs          OpenAI-compatible translation interface
+    subtitle.rs           SRT / VTT / ASS export
+    task.rs               Task orchestration & event broadcasting
+    state.rs              Application state
+```
+
+## License
+
+This project is licensed under the [MIT](./LICENSE) License.
+
+## Acknowledgements
+
+- [whisper.cpp](https://github.com/ggerganov/whisper.cpp) — High-performance Whisper inference implementation
+- [Silero VAD](https://github.com/snakers4/silero-vad) — High-accuracy voice activity detection
+- [Tauri](https://tauri.app/) — Lightweight desktop application framework
+- All contributors and users
+
+---
+
+<div align="center">
+  Made by <a href="https://kuraa.cc">kuraa</a>
+</div>
--- a/README.md
+++ b/README.md
@ -1,61 +1,178 @@
+<picture>
+  <source media="(prefers-color-scheme: dark)" srcset="readme/screenshot-main.png">
+  <img src="readme/screenshot-main.png" alt="CrossSubtitle-AI 截图" width="100%">
+</picture>
+
+<div align="center">
+
 # CrossSubtitle-AI

-基于 `Tauri v2 + Vue 3 + Pinia + Tailwind CSS` 的本地优先字幕工作台，覆盖以下 MVP 链路：
+**AI 驱动的本地优先字幕工作台**

- 导入音视频文件并创建任务队列
- 使用 `ffmpeg` 抽取 16kHz 单声道 WAV
- 执行基础 VAD 切分并生成语音片段时间轴
- 进入 Whisper 转录/翻译环节
- 可选接入 OpenAI-compatible 接口生成中文译文
- 实时推送任务进度和字幕片段
- 导出 `SRT / VTT / ASS`
+[![GitHub Release](https://img.shields.io/github/v/release/AndySkaura/crosssubtitle-ai?style=flat-square)](https://github.com/AndySkaura/crosssubtitle-ai/releases)
+[![GitHub License](https://img.shields.io/github/license/AndySkaura/crosssubtitle-ai?style=flat-square)](https://github.com/AndySkaura/crosssubtitle-ai/blob/main/LICENSE)
+[![Platform](https://img.shields.io/badge/platform-macOS%20%7C%20Windows-blue?style=flat-square)](#)

-## 目录结构
+[English](./README.en.md) · **简体中文**

- `src/`: Vue 前端界面、Pinia 状态、字幕编辑器
- `src-tauri/src/audio.rs`: 音频抽取与 WAV 读取
- `src-tauri/src/vad.rs`: VAD API 与基础能量检测实现
- `src-tauri/src/whisper.rs`: Whisper 接口层
- `src-tauri/src/translate.rs`: OpenAI-compatible 滑动窗口翻译
- `src-tauri/src/subtitle.rs`: SRT / VTT / ASS 导出
- `src-tauri/src/task.rs`: 任务编排与事件广播
+</div>

-## 当前实现说明
+---

- 当前仓库已补齐完整工程骨架与核心数据流。
- 前端 `npm run build` 已通过，Rust 侧 `cargo check` 已通过。
- `whisper.rs` 已接入真实 `whisper-rs`，会基于 VAD 片段逐段转录；目标语言为英文时启用 Whisper 原生 `translate`。
- `vad.rs` 已接入 `ort` 版 Silero VAD 推理入口；当模型缺失或推理失败时，会自动回退到能量检测，保证链路不断。
+## 简介

-## 运行前准备
+CrossSubtitle-AI 是一款**本地优先**的音视频字幕处理工具。它利用 [Whisper](https://github.com/ggerganov/whisper.cpp) 进行语音识别，结合 [Silero VAD](https://github.com/snakers4/silero-vad) 进行语音活动检测，并支持接入 OpenAI 兼容接口进行智能翻译，帮助你将音视频文件快速转录并翻译为双语字幕。

-1. 安装 Rust 工具链。
-2. 安装 `cmake`，`whisper-rs-sys` 在首次编译时需要它。
-3. 安装 `ffmpeg`，并确保可通过命令行直接调用。
-4. 安装前端依赖：
+整个过程在本地完成语音识别，无需上传音视频文件到任何服务器，保护你的数据隐私。
+
+## 功能特性
+
+- **语音识别** — 基于 Whisper 的高精度语音转文字，支持中文、英文、日文、韩文、法文等 17 种源语言
+- **语音活动检测** — Silero VAD 精准切分语音片段，自动过滤静音区域
+- **智能翻译** — 接入 OpenAI 兼容接口（如智谱 GLM、DeepSeek、ChatGPT 等），将原文翻译为目标语言
+- **音频抽取** — 内置 FFmpeg 自动抽取音频并转换为 16kHz 单声道 WAV
+- **多种导出格式** — 支持 SRT、VTT、ASS 三种字幕格式导出
+- **双语导出** — 支持原文 + 译文并排显示的双语字幕导出
+- **字幕编辑器** — 内置字幕编辑器，支持逐条修改原文和译文
+- **拖拽导入** — 支持拖拽文件快速创建任务
+- **任务队列** — 批量处理多个音视频文件，实时查看处理进度
+- **双语界面** — 内置中文 / 英文界面切换
+- **本地优先** — 语音识别完全在本地运行，无需上传数据
+
+## 使用流程
+
+1. **选择模式** — 选择「原文」仅做语音识别，或「翻译」模式在识别后自动翻译
+2. **添加任务** — 点击「添加任务」按钮或直接拖拽音视频文件到窗口
+3. **等待处理** — 任务将依次经历：音频抽取 → VAD 切分 → 语音识别 →（可选）翻译
+4. **编辑校对** — 在字幕编辑器中逐条查看、修改识别结果和译文
+5. **导出字幕** — 导出为 SRT、VTT 或 ASS 格式
+
+## 截图
+
+| 示例 | 字幕编辑器 |
+|:---:|:---:|
+| ![示例](readme/screenshot-main.png) | ![字幕编辑器](readme/screenshot-editor.png) |
+
+## 安装
+
+从 [GitHub Releases](https://github.com/AndySkaura/crosssubtitle-ai/releases) 下载对应平台的安装包：
+
+| 平台 | 安装包 |
+|:---:|:---:|
+| macOS (Apple Silicon) | `.dmg` |
+| Windows | `.exe` (NSIS 安装包) |
+
+> 首次启动时需要下载 Whisper 模型（约 500MB），请确保网络通畅。
+
+## 使用方式
+
+### 快速开始
+
+1. 打开应用，在顶部工具栏选择工作模式：
+   - **原文** — 仅进行语音识别，输出原文字幕
+   - **翻译** — 识别后调用 LLM 接口翻译为指定语言
+2. 点击「添加任务」或拖拽文件到窗口
+3. 等待任务处理完成
+4. 在右侧字幕编辑器中查看和修改结果
+5. 点击「导出」选择格式保存字幕文件
+
+### 翻译模式配置
+
+使用翻译功能前需要配置 LLM API：
+
+- 在「高级设置」中填入 LLM API Base、API Key 和 Model
+- 支持任何兼容 OpenAI API 的服务，如：
+  - **智谱 GLM** — 推荐免费使用 GLM-4.7-Flash
+  - **DeepSeek**
+  - **ChatGPT**
+  - **自建服务** — 如 Ollama、vLLM 等
+
+### 高级设置
+
+- **Whisper 模型路径** — 指定本地 ggml 模型文件路径
+- **VAD 模型路径** — 指定本地 Silero VAD ONNX 模型路径
+- **批大小 (Batch Size)** — 每批翻译的片段数 (10-15)
+- **上下文 (Context Size)** — 翻译时参考的上下文片段数 (0-5)
+
+## 开发
+
+### 环境要求
+
+- [Rust](https://www.rust-lang.org/) 工具链
+- [Node.js](https://nodejs.org/) (18+)
+- [FFmpeg](https://ffmpeg.org/)（需在命令行中可用）
+- [CMake](https://cmake.org/)（编译 whisper-rs 需要）
+
+### 本地开发

 ```bash
+# 克隆仓库
+git clone https://github.com/AndySkaura/crosssubtitle-ai.git
+cd crosssubtitle-ai
+
+# 安装前端依赖
 npm install
+
+# 启动开发模式
+npm run tauri-dev
 ```

-5. 如需中文翻译，配置环境变量：
+### 构建

 ```bash
-export OPENAI_API_BASE=https://your-openai-compatible-endpoint/v1
-export OPENAI_API_KEY=your_api_key
-export OPENAI_MODEL=GLM-4-Flash-250414
+# macOS DMG 构建
+npm run tauri-build-dmg
+
+# Windows NSIS 构建
+npm run tauri-build-windows
 ```

-6. 若要真正启用 ONNX Runtime 推理，请确保本机存在可被 `ort` 动态加载的 ONNX Runtime 库，或按你的部署方式提供运行库。
+## 技术栈

-7. 启动桌面应用：
+| 层级 | 技术 |
+|:---|:---|
+| 桌面框架 | [Tauri v2](https://v2.tauri.app/) |
+| 前端框架 | [Vue 3](https://vuejs.org/) + [TypeScript](https://www.typescriptlang.org/) |
+| 状态管理 | [Pinia](https://pinia.vuejs.org/) |
+| 样式 | [Tailwind CSS](https://tailwindcss.com/) |
+| 国际化 | [vue-i18n](https://vue-i18n.intlify.dev/) |
+| 语音识别 | [whisper-rs](https://github.com/tazz4843/whisper-rs) (Whisper) |
+| 语音检测 | [ort](https://github.com/pykeio/ort) (Silero VAD ONNX) |
+| 音频处理 | FFmpeg |
+| LLM 翻译 | OpenAI-compatible API |

-```bash
-npm run dev
+## 项目结构
+
+```
+src/                      Vue 前端界面
+  components/             组件 (TaskQueue, SubtitleEditor)
+  stores/                 Pinia 状态管理
+  locales/                国际化文件 (zh-CN, en)
+  lib/                    类型定义
+src-tauri/                Rust 后端
+  src/
+    audio.rs              音频抽取与 WAV 读取
+    vad.rs                Silero VAD 语音活动检测
+    whisper.rs            Whisper 语音识别接口
+    translate.rs          OpenAI 兼容翻译接口
+    subtitle.rs           SRT / VTT / ASS 导出
+    task.rs               任务编排与事件广播
+    state.rs              应用状态
 ```

-## 下一步建议
+## 开源协议

- 为 `src-tauri/src/vad.rs` 补模型输入名自适应和更多异常日志。
- 加入文件选择器、任务恢复、批量导出与测试用例。
- 为 `whisper-rs` 增加硬件加速参数与模型配置面板。
+本项目基于 [MIT](./LICENSE) 协议开源。
+
+## 致谢
+
+- [whisper.cpp](https://github.com/ggerganov/whisper.cpp) — 高性能 Whisper 推理实现
+- [Silero VAD](https://github.com/snakers4/silero-vad) — 高精度语音活动检测
+- [Tauri](https://tauri.app/) — 轻量级桌面应用框架
+- 所有贡献者和用户
+
+---
+
+<div align="center">
+  由 <a href="https://kuraa.cc">kuraa</a> 制作
+</div>
--- a/readme/screenshot-editor.png
+++ b/readme/screenshot-editor.png
--- a/readme/screenshot-main.png
+++ b/readme/screenshot-main.png