prep for win 11
This commit is contained in:
112
README.md
112
README.md
@ -0,0 +1,112 @@
|
|||||||
|
# Audiobook Generator — Windows 11 Setup Guide
|
||||||
|
|
||||||
|
This guide is written for someone who has never used Python or the command line.
|
||||||
|
Follow the steps in order and you'll be generating audiobook chapters with a gaming GPU.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## What you'll need
|
||||||
|
|
||||||
|
| Requirement | Why |
|
||||||
|
|---|---|
|
||||||
|
| Windows 11 PC with a modern NVIDIA GPU | Fast audio generation using CUDA |
|
||||||
|
| ~5 GB free disk space | Python, PyTorch, and the TTS model |
|
||||||
|
| Internet connection (first-time only) | Downloads packages and the AI voice model |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Step 1 — Install Python
|
||||||
|
|
||||||
|
1. Go to **https://www.python.org/downloads/**
|
||||||
|
2. Click the big yellow **"Download Python 3.11.x"** button
|
||||||
|
3. Run the installer
|
||||||
|
4. **IMPORTANT:** On the first screen, tick the box that says **"Add Python to PATH"** before you click Install Now
|
||||||
|
|
||||||
|
If you skipped that checkbox, uninstall Python and reinstall with the box ticked.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Step 2 — Get the project files
|
||||||
|
|
||||||
|
You should have a folder (e.g. `voice_model`) containing the project. Make sure it contains:
|
||||||
|
|
||||||
|
```
|
||||||
|
setup_windows.bat
|
||||||
|
run_gui.bat
|
||||||
|
run_audiobook.bat
|
||||||
|
requirements.txt
|
||||||
|
gui_proper_noun_player.py
|
||||||
|
create_audiobook_lightbringer.py
|
||||||
|
Audio Text for Novel Lightbringer\ ← your text files go here
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Step 3 — Run Setup (one time only)
|
||||||
|
|
||||||
|
1. Open the `voice_model` folder in File Explorer
|
||||||
|
2. Double-click **`setup_windows.bat`**
|
||||||
|
3. A black terminal window will open and run through 5 steps:
|
||||||
|
- Checks Python is installed
|
||||||
|
- Creates a private Python environment
|
||||||
|
- Downloads PyTorch with GPU (CUDA) support — **~2.5 GB, be patient**
|
||||||
|
- Installs the remaining packages
|
||||||
|
- Downloads the Kokoro AI voice model — **~330 MB**
|
||||||
|
4. When it says **"Setup complete!"**, press any key to close
|
||||||
|
|
||||||
|
You only need to do this once.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Step 4 — Launch the GUI (Proper Noun Player)
|
||||||
|
|
||||||
|
1. Double-click **`run_gui.bat`**
|
||||||
|
2. The Proper Noun Player window opens
|
||||||
|
3. Use it to review and fix how proper nouns are pronounced before generating audio
|
||||||
|
|
||||||
|
**Controls:**
|
||||||
|
- Click a word in the Review list to hear it
|
||||||
|
- Type a phonetic spelling in the box at the bottom and press Enter to save a fix
|
||||||
|
- Press Enter without changing anything to mark the word as Correct
|
||||||
|
- Press Space to replay the current word
|
||||||
|
- Click "Apply Fixes to Text" when done to save a pronunciation-corrected text file
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Step 5 — Create the Audiobook
|
||||||
|
|
||||||
|
1. Double-click **`run_audiobook.bat`**
|
||||||
|
2. A menu appears:
|
||||||
|
- **1** — Generate ALL chapters (this can take many hours — leave it running overnight)
|
||||||
|
- **2** — Just list what chapters were detected (safe, instant)
|
||||||
|
- **3** — Generate a short preview clip of each chapter (quick test)
|
||||||
|
- **4** — Generate specific chapter numbers only
|
||||||
|
3. Choose an option and press Enter
|
||||||
|
4. When finished, the `.wav` files will be in the `output_audiobook_lightbringer` folder
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Troubleshooting
|
||||||
|
|
||||||
|
**"Python was not found"**
|
||||||
|
→ Python is not installed, or you forgot to tick "Add Python to PATH". Reinstall Python.
|
||||||
|
|
||||||
|
**The window opens and immediately closes**
|
||||||
|
→ Right-click the `.bat` file → "Run as administrator", or open a new terminal window first:
|
||||||
|
press `Win + R`, type `cmd`, press Enter, then drag the `.bat` file into that window and press Enter.
|
||||||
|
|
||||||
|
**Audio generation is very slow**
|
||||||
|
→ The GPU (CUDA) version of PyTorch may not have installed correctly. Re-run `setup_windows.bat`.
|
||||||
|
|
||||||
|
**"No .txt files found in Audio Text for Novel Lightbringer"**
|
||||||
|
→ Make sure your chapter text files are placed in the `Audio Text for Novel Lightbringer` subfolder.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Output files
|
||||||
|
|
||||||
|
| Folder | Contents |
|
||||||
|
|---|---|
|
||||||
|
| `output_audiobook_lightbringer\` | One `.wav` file per chapter |
|
||||||
|
| `output_proper_nouns\` | Pronunciation fix data (JSON) |
|
||||||
|
| `proper_nouns_audio\` | Cached audio for each proper noun |
|
||||||
|
|||||||
306
create_audiobook_lightbringer.py
Normal file
306
create_audiobook_lightbringer.py
Normal file
@ -0,0 +1,306 @@
|
|||||||
|
"""
|
||||||
|
create_audiobook_lightbringer.py
|
||||||
|
─────────────────────────────────
|
||||||
|
Generate the "A Darkness Rising" audiobook — one file per chapter/prologue.
|
||||||
|
|
||||||
|
Reads all .txt files from NOVEL_DIR, detects Prologue + Chapter headings,
|
||||||
|
and writes one .wav per chapter into OUTPUT_DIR.
|
||||||
|
|
||||||
|
Usage:
|
||||||
|
python create_audiobook_lightbringer.py # all chapters
|
||||||
|
python create_audiobook_lightbringer.py --list # list detected chapters
|
||||||
|
python create_audiobook_lightbringer.py 0 1 2 # prologue + ch1 + ch2
|
||||||
|
python create_audiobook_lightbringer.py --preview # short preview clips
|
||||||
|
|
||||||
|
Output filenames:
|
||||||
|
chapter_00_prologue.wav
|
||||||
|
chapter_01_homecoming.wav
|
||||||
|
chapter_02_the_anhuil_ehlar.wav
|
||||||
|
...
|
||||||
|
"""
|
||||||
|
|
||||||
|
import argparse
|
||||||
|
import re
|
||||||
|
import time
|
||||||
|
import numpy as np
|
||||||
|
import soundfile as sf
|
||||||
|
import torch
|
||||||
|
from pathlib import Path
|
||||||
|
from kokoro import KPipeline
|
||||||
|
|
||||||
|
# ── Config ─────────────────────────────────────────────────────────────────────
|
||||||
|
NOVEL_DIR = Path("Audio Text for Novel Lightbringer")
|
||||||
|
OUTPUT_DIR = Path("output_audiobook_lightbringer")
|
||||||
|
SAMPLE_RATE = 24000
|
||||||
|
SPEED = 1.0
|
||||||
|
LANG_CODE = "a" # American English
|
||||||
|
VOICE = "am_onyx" # default narrator voice
|
||||||
|
|
||||||
|
# Regex that matches a chapter/prologue heading line (case-insensitive).
|
||||||
|
# Group 1 captures the chapter number (or None for Prologue).
|
||||||
|
# Group 2 captures the optional subtitle after " - ".
|
||||||
|
_HEADING_RE = re.compile(
|
||||||
|
r"^(?:Chapter\s+(\d+)\s*(?:-\s*(.+))?|(Prologue))\s*$",
|
||||||
|
re.IGNORECASE,
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
# ── Helpers ────────────────────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
def _slug(text: str) -> str:
|
||||||
|
"""Convert title text to a filesystem-safe slug."""
|
||||||
|
text = text.lower()
|
||||||
|
text = re.sub(r"[^a-z0-9]+", "_", text)
|
||||||
|
return text.strip("_")
|
||||||
|
|
||||||
|
|
||||||
|
def load_all_chapters(novel_dir: Path) -> list[dict]:
|
||||||
|
"""
|
||||||
|
Read all .txt files in *novel_dir* in sorted order, detect Prologue /
|
||||||
|
Chapter headings, and return a list of chapter dicts:
|
||||||
|
{
|
||||||
|
"num": int, # 0 = Prologue
|
||||||
|
"title": str, # subtitle portion, e.g. "Homecoming"
|
||||||
|
"label": str, # human label, e.g. "Chapter 1 - Homecoming"
|
||||||
|
"slug": str, # e.g. "chapter_01_homecoming"
|
||||||
|
"text": str, # full body text of the chapter
|
||||||
|
}
|
||||||
|
Chapters from multiple files are concatenated in sorted-filename order.
|
||||||
|
"""
|
||||||
|
txt_files = sorted(novel_dir.glob("*.txt"))
|
||||||
|
if not txt_files:
|
||||||
|
raise FileNotFoundError(f"No .txt files found in '{novel_dir}'")
|
||||||
|
|
||||||
|
# Collect (chapter_num, title_line, body_lines) across all files
|
||||||
|
raw: list[tuple[int, str, list[str]]] = [] # (num, heading_text, body)
|
||||||
|
current_num: int | None = None
|
||||||
|
current_heading: str = ""
|
||||||
|
current_body: list[str] = []
|
||||||
|
|
||||||
|
def _flush():
|
||||||
|
if current_num is not None:
|
||||||
|
raw.append((current_num, current_heading, list(current_body)))
|
||||||
|
|
||||||
|
for fpath in txt_files:
|
||||||
|
lines = fpath.read_text(encoding="utf-8").splitlines()
|
||||||
|
for line in lines:
|
||||||
|
m = _HEADING_RE.match(line.strip())
|
||||||
|
if m:
|
||||||
|
_flush()
|
||||||
|
if m.group(3): # Prologue
|
||||||
|
current_num = 0
|
||||||
|
current_heading = "Prologue"
|
||||||
|
else: # Chapter N
|
||||||
|
current_num = int(m.group(1))
|
||||||
|
subtitle = (m.group(2) or "").strip()
|
||||||
|
current_heading = f"Chapter {current_num}" + (f" - {subtitle}" if subtitle else "")
|
||||||
|
current_body = [line] # keep heading inside text
|
||||||
|
else:
|
||||||
|
if current_num is not None:
|
||||||
|
current_body.append(line)
|
||||||
|
_flush()
|
||||||
|
|
||||||
|
# Build chapter dicts, deduplicated and sorted by number
|
||||||
|
seen: set[int] = set()
|
||||||
|
chapters: list[dict] = []
|
||||||
|
for num, heading, body in sorted(raw, key=lambda x: x[0]):
|
||||||
|
if num in seen:
|
||||||
|
continue
|
||||||
|
seen.add(num)
|
||||||
|
# Derive subtitle / slug
|
||||||
|
subtitle = ""
|
||||||
|
sm = re.match(r"Chapter\s+\d+\s*-\s*(.+)", heading, re.IGNORECASE)
|
||||||
|
if sm:
|
||||||
|
subtitle = sm.group(1).strip()
|
||||||
|
elif heading.lower() == "prologue":
|
||||||
|
subtitle = "Prologue"
|
||||||
|
|
||||||
|
num_str = f"{num:02d}"
|
||||||
|
if subtitle:
|
||||||
|
slug = f"chapter_{num_str}_{_slug(subtitle)}"
|
||||||
|
else:
|
||||||
|
slug = f"chapter_{num_str}"
|
||||||
|
|
||||||
|
chapters.append({
|
||||||
|
"num": num,
|
||||||
|
"title": subtitle or heading,
|
||||||
|
"label": heading,
|
||||||
|
"slug": slug,
|
||||||
|
"text": "\n".join(body),
|
||||||
|
})
|
||||||
|
|
||||||
|
return chapters
|
||||||
|
|
||||||
|
|
||||||
|
def clean_text(text: str) -> str:
|
||||||
|
"""Strip formatting artifacts and normalise whitespace for TTS."""
|
||||||
|
# Remove horizontal-rule lines (underscores / asterisks / dashes)
|
||||||
|
text = re.sub(r"^[_\-\*\s]{3,}\s*$", "", text, flags=re.MULTILINE)
|
||||||
|
# Collapse 3+ blank lines to 2
|
||||||
|
text = re.sub(r"\n{3,}", "\n\n", text)
|
||||||
|
return text.strip()
|
||||||
|
|
||||||
|
|
||||||
|
def _fmt_duration(seconds: float) -> str:
|
||||||
|
if seconds >= 60:
|
||||||
|
m, s = divmod(int(seconds), 60)
|
||||||
|
return f"{m}m {s:02d}s"
|
||||||
|
return f"{seconds:.0f}s"
|
||||||
|
|
||||||
|
|
||||||
|
def generate_audio(pipeline: KPipeline, text: str, voice: str,
|
||||||
|
output_path: Path) -> float:
|
||||||
|
"""Generate audio and return wall-clock seconds elapsed."""
|
||||||
|
t0 = time.monotonic()
|
||||||
|
chunks = []
|
||||||
|
for _, _, chunk_audio in pipeline(text, voice=voice, speed=SPEED):
|
||||||
|
if hasattr(chunk_audio, "numpy"):
|
||||||
|
chunk_audio = chunk_audio.cpu().numpy()
|
||||||
|
chunk_audio = np.atleast_1d(chunk_audio.squeeze())
|
||||||
|
if chunk_audio.size > 0:
|
||||||
|
chunks.append(chunk_audio)
|
||||||
|
|
||||||
|
elapsed = time.monotonic() - t0
|
||||||
|
if chunks:
|
||||||
|
audio = np.concatenate(chunks, axis=0)
|
||||||
|
sf.write(str(output_path), audio, SAMPLE_RATE)
|
||||||
|
duration = len(audio) / SAMPLE_RATE
|
||||||
|
print(f" ✓ Saved '{output_path.name}' "
|
||||||
|
f"({duration:.1f}s audio | {elapsed:.1f}s wall-clock)")
|
||||||
|
else:
|
||||||
|
print(f" ✗ No audio produced for voice='{voice}'")
|
||||||
|
return elapsed
|
||||||
|
|
||||||
|
|
||||||
|
# ── Main ───────────────────────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
def main() -> None:
|
||||||
|
parser = argparse.ArgumentParser(
|
||||||
|
description="Generate 'A Darkness Rising' audiobook, one file per chapter."
|
||||||
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
"chapters", nargs="*", type=int,
|
||||||
|
help="Chapter numbers to generate (0 = Prologue). Default: all.",
|
||||||
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
"--list", action="store_true",
|
||||||
|
help="Print detected chapters and exit.",
|
||||||
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
"--voice", default=VOICE,
|
||||||
|
help=f"Kokoro voice to use (default: {VOICE}).",
|
||||||
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
"--preview", nargs="?", const=3000, type=int, metavar="CHARS",
|
||||||
|
help="Generate short preview clips (default: 3000 chars). "
|
||||||
|
"Output filenames get a _preview suffix.",
|
||||||
|
)
|
||||||
|
args = parser.parse_args()
|
||||||
|
|
||||||
|
print("Loading chapters …")
|
||||||
|
all_chapters = load_all_chapters(NOVEL_DIR)
|
||||||
|
|
||||||
|
if args.list:
|
||||||
|
print(f"\nDetected {len(all_chapters)} chapters:\n")
|
||||||
|
print(f" {'#':>4} {'Label':<45} {'Chars':>8} {'Output filename'}")
|
||||||
|
print(f" {'─'*4} {'─'*45} {'─'*8} {'─'*30}")
|
||||||
|
for ch in all_chapters:
|
||||||
|
chars = len(clean_text(ch["text"]))
|
||||||
|
print(f" {ch['num']:>4} {ch['label']:<45} {chars:>8,} {ch['slug']}.wav")
|
||||||
|
return
|
||||||
|
|
||||||
|
# Filter to requested subset
|
||||||
|
if args.chapters:
|
||||||
|
requested = set(args.chapters)
|
||||||
|
run_chapters = [ch for ch in all_chapters if ch["num"] in requested]
|
||||||
|
missing = requested - {ch["num"] for ch in run_chapters}
|
||||||
|
if missing:
|
||||||
|
print(f"⚠ Chapter(s) not found: {sorted(missing)}")
|
||||||
|
else:
|
||||||
|
run_chapters = all_chapters
|
||||||
|
|
||||||
|
if not run_chapters:
|
||||||
|
print("No chapters selected. Use --list to see available chapters.")
|
||||||
|
return
|
||||||
|
|
||||||
|
voice = args.voice
|
||||||
|
device = "cuda" if torch.cuda.is_available() else "cpu"
|
||||||
|
print(f"Device: {device}")
|
||||||
|
if device == "cuda":
|
||||||
|
print(f"GPU: {torch.cuda.get_device_name(0)}")
|
||||||
|
print(f"Voice: {voice}")
|
||||||
|
|
||||||
|
OUTPUT_DIR.mkdir(exist_ok=True)
|
||||||
|
|
||||||
|
# Pre-compute char counts
|
||||||
|
chapter_chars = {ch["num"]: len(clean_text(ch["text"])) for ch in run_chapters}
|
||||||
|
|
||||||
|
preview_note = (f" ⚡ PREVIEW MODE — capped at {args.preview:,} chars/chapter\n"
|
||||||
|
if args.preview else "")
|
||||||
|
print(f"\n{preview_note}{'─'*65}")
|
||||||
|
print(f" {'#':>4} {'Label':<40} {'Chars':>8}")
|
||||||
|
print(f" {'─'*4} {'─'*40} {'─'*8}")
|
||||||
|
for ch in run_chapters:
|
||||||
|
print(f" {ch['num']:>4} {ch['label']:<40} {chapter_chars[ch['num']]:>8,}")
|
||||||
|
print(f" {'─'*55}")
|
||||||
|
total_chars = sum(chapter_chars.values())
|
||||||
|
print(f" {'TOTAL':<45} {total_chars:>8,}\n")
|
||||||
|
|
||||||
|
print("Initialising Kokoro pipeline …")
|
||||||
|
pipeline = KPipeline(lang_code=LANG_CODE)
|
||||||
|
|
||||||
|
chars_per_sec: float | None = None
|
||||||
|
timing_rows: list[tuple[str, int, float]] = []
|
||||||
|
|
||||||
|
for ch in run_chapters:
|
||||||
|
text = clean_text(ch["text"])
|
||||||
|
if not text:
|
||||||
|
print(f"\n[{ch['label']}] ⚠ Empty text — skipping")
|
||||||
|
continue
|
||||||
|
|
||||||
|
preview_chars = args.preview
|
||||||
|
if preview_chars and len(text) > preview_chars:
|
||||||
|
cut = text.rfind(" ", 0, preview_chars)
|
||||||
|
text = text[: cut if cut > 0 else preview_chars]
|
||||||
|
|
||||||
|
chars = len(text)
|
||||||
|
preview_tag = "_preview" if args.preview else ""
|
||||||
|
out_path = OUTPUT_DIR / f"{ch['slug']}{preview_tag}.wav"
|
||||||
|
|
||||||
|
if chars_per_sec is not None:
|
||||||
|
eta_str = _fmt_duration(chars / chars_per_sec)
|
||||||
|
print(f"\n[{ch['label']}] voice={voice} → {out_path.name} (est. {eta_str})")
|
||||||
|
else:
|
||||||
|
print(f"\n[{ch['label']}] voice={voice} → {out_path.name} (calibration run)")
|
||||||
|
|
||||||
|
elapsed = generate_audio(pipeline, text, voice, out_path)
|
||||||
|
timing_rows.append((ch["label"], chars, elapsed))
|
||||||
|
|
||||||
|
total_done = sum(c for _, c, _ in timing_rows)
|
||||||
|
total_elapsed_done = sum(e for _, _, e in timing_rows)
|
||||||
|
if total_elapsed_done > 0:
|
||||||
|
chars_per_sec = total_done / total_elapsed_done
|
||||||
|
print(f" ⏱ Calibration: {chars_per_sec:.0f} chars/sec")
|
||||||
|
|
||||||
|
# Summary
|
||||||
|
print("\n" + "─" * 65)
|
||||||
|
print(f" {'Chapter':<35} {'Chars':>7} {'Actual':>8} {'Est':>8}")
|
||||||
|
print("─" * 65)
|
||||||
|
for i, (label, chars, elapsed) in enumerate(timing_rows):
|
||||||
|
actual_str = _fmt_duration(elapsed)
|
||||||
|
prior_chars = sum(c for _, c, _ in timing_rows[:i])
|
||||||
|
prior_elapsed = sum(e for _, _, e in timing_rows[:i])
|
||||||
|
if prior_elapsed > 0:
|
||||||
|
est_str = _fmt_duration(chars / (prior_chars / prior_elapsed))
|
||||||
|
else:
|
||||||
|
est_str = "(first)"
|
||||||
|
print(f" {label:<35} {chars:>7,} {actual_str:>8} {est_str:>8}")
|
||||||
|
total_elapsed = sum(e for _, _, e in timing_rows)
|
||||||
|
print("─" * 65)
|
||||||
|
print(f" {'TOTAL':<35} {sum(c for _,c,_ in timing_rows):>7,} "
|
||||||
|
f"{_fmt_duration(total_elapsed):>8}")
|
||||||
|
print("\nDone.")
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
main()
|
||||||
@ -33,8 +33,12 @@ SPEED = 1.0
|
|||||||
LANG_CODE = "a" # 'a' = American English
|
LANG_CODE = "a" # 'a' = American English
|
||||||
|
|
||||||
# ── Available Kokoro voices (American English, lang_code='a') ──────────────────
|
# ── Available Kokoro voices (American English, lang_code='a') ──────────────────
|
||||||
|
# af_bella – American female [downloaded]
|
||||||
# af_heart – warm American female [downloaded]
|
# af_heart – warm American female [downloaded]
|
||||||
# af_nicole – American female [downloaded]
|
# af_nicole – American female [downloaded]
|
||||||
|
# af_river – American female [downloaded]
|
||||||
|
# af_sarah – American female [downloaded]
|
||||||
|
# af_sky – American female [downloaded]
|
||||||
# am_adam – American male (deep) [downloaded]
|
# am_adam – American male (deep) [downloaded]
|
||||||
# am_echo – American male [downloaded]
|
# am_echo – American male [downloaded]
|
||||||
# am_eric – American male [downloaded]
|
# am_eric – American male [downloaded]
|
||||||
@ -56,7 +60,7 @@ LANG_CODE = "a" # 'a' = American English
|
|||||||
BOOKS = [
|
BOOKS = [
|
||||||
# label (start_line1, start_line2) voice output_wav
|
# label (start_line1, start_line2) voice output_wav
|
||||||
("Introduction", ("Introduction", "The Book of the Nem"), "af_heart", "00_introduction.wav"),
|
("Introduction", ("Introduction", "The Book of the Nem"), "af_heart", "00_introduction.wav"),
|
||||||
("Book of Hagoth", ("THE BOOK OF HAGOTH", "THE SON OF HAGMENI,"), "am_fenrir", "01_hagoth.wav"),
|
("Book of Hagoth", ("THE BOOK OF HAGOTH", "THE SON OF HAGMENI,"), "am_santa", "01_hagoth.wav"),
|
||||||
("Shi-Tugo I", ("THE FIRST BOOK OF SHI-TUGO", "FORMER WARRIOR, AMMONITE"), "am_eric", "02_shi_tugo_1.wav"),
|
("Shi-Tugo I", ("THE FIRST BOOK OF SHI-TUGO", "FORMER WARRIOR, AMMONITE"), "am_eric", "02_shi_tugo_1.wav"),
|
||||||
("Sanempet", ("THE BOOK OF SANEMPET", "THE SON OF HAGMENI,"), "am_liam", "03_sanempet.wav"),
|
("Sanempet", ("THE BOOK OF SANEMPET", "THE SON OF HAGMENI,"), "am_liam", "03_sanempet.wav"),
|
||||||
("Oug", ("THE BOOK OF OUG", "THE SON OF SANEMPET"), "am_michael", "04_oug.wav"),
|
("Oug", ("THE BOOK OF OUG", "THE SON OF SANEMPET"), "am_michael", "04_oug.wav"),
|
||||||
@ -65,7 +69,7 @@ BOOKS = [
|
|||||||
("Samuel the Lamanite I", ("THE FIRST BOOK", "OF SAMUEL THE LAMANITE"), "am_echo", "07_samuel_lamanite_1.wav"),
|
("Samuel the Lamanite I", ("THE FIRST BOOK", "OF SAMUEL THE LAMANITE"), "am_echo", "07_samuel_lamanite_1.wav"),
|
||||||
("Samuel the Lamanite II", ("THE SECOND BOOK", "OF SAMUEL THE LAMANITE"), "am_echo", "08_samuel_lamanite_2.wav"),
|
("Samuel the Lamanite II", ("THE SECOND BOOK", "OF SAMUEL THE LAMANITE"), "am_echo", "08_samuel_lamanite_2.wav"),
|
||||||
("Manti", ("THE BOOK OF MANTI", "THE SON OF OUG"), "am_onyx", "09_manti.wav"),
|
("Manti", ("THE BOOK OF MANTI", "THE SON OF OUG"), "am_onyx", "09_manti.wav"),
|
||||||
("Pa Nat I", ("THE FIRST BOOK OF PA NAT", "THE DAUGHTER OF SHIMLEI"), "af_nicole", "10_pa_nat_1.wav"),
|
("Pa Nat I", ("THE FIRST BOOK OF PA NAT", "THE DAUGHTER OF SHIMLEI"), "af_bella", "10_pa_nat_1.wav"),
|
||||||
("Moroni I", ("THE FIRST BOOK OF MORONI", "THE SON OF MORMON,"), "am_adam", "11_moroni_1.wav"),
|
("Moroni I", ("THE FIRST BOOK OF MORONI", "THE SON OF MORMON,"), "am_adam", "11_moroni_1.wav"),
|
||||||
("Moroni II", ("THE SECOND BOOK OF MORONI", "THE SON OF MORMON,"), "am_adam", "12_moroni_2.wav"),
|
("Moroni II", ("THE SECOND BOOK OF MORONI", "THE SON OF MORMON,"), "am_adam", "12_moroni_2.wav"),
|
||||||
("Moroni III", ("THE THIRD BOOK OF MORONI", "THE SON OF MORMON,"), "am_adam", "13_moroni_3.wav"),
|
("Moroni III", ("THE THIRD BOOK OF MORONI", "THE SON OF MORMON,"), "am_adam", "13_moroni_3.wav"),
|
||||||
@ -183,6 +187,11 @@ def main() -> None:
|
|||||||
"--list", action="store_true",
|
"--list", action="store_true",
|
||||||
help="Print all enabled book labels and exit."
|
help="Print all enabled book labels and exit."
|
||||||
)
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
"--preview", nargs="?", const=3000, type=int, metavar="CHARS",
|
||||||
|
help="Generate a short preview clip per book (default: 3000 chars). "
|
||||||
|
"Output filenames get a _preview suffix."
|
||||||
|
)
|
||||||
args = parser.parse_args()
|
args = parser.parse_args()
|
||||||
|
|
||||||
enabled_labels = [label for label, _, _, _ in BOOKS]
|
enabled_labels = [label for label, _, _, _ in BOOKS]
|
||||||
@ -230,7 +239,8 @@ def main() -> None:
|
|||||||
}
|
}
|
||||||
|
|
||||||
# Print char count summary before starting
|
# Print char count summary before starting
|
||||||
print(f"\n{'─' * 52}")
|
preview_note = f" ⚡ PREVIEW MODE — capped at {args.preview:,} chars/book\n" if args.preview else ""
|
||||||
|
print(f"\n{preview_note}{'─' * 52}")
|
||||||
print(f" {'Section':<30} {'Chars':>8}")
|
print(f" {'Section':<30} {'Chars':>8}")
|
||||||
print(f"{'─' * 52}")
|
print(f"{'─' * 52}")
|
||||||
for label, _, _, wav_name in run_books:
|
for label, _, _, wav_name in run_books:
|
||||||
@ -253,7 +263,14 @@ def main() -> None:
|
|||||||
print(f"\n[{label}] ⚠ Empty text — skipping")
|
print(f"\n[{label}] ⚠ Empty text — skipping")
|
||||||
continue
|
continue
|
||||||
|
|
||||||
chars = section_chars[label]
|
# Preview mode: truncate to requested char limit at a word boundary
|
||||||
|
preview_chars = args.preview
|
||||||
|
if preview_chars:
|
||||||
|
if len(text) > preview_chars:
|
||||||
|
cut = text.rfind(" ", 0, preview_chars)
|
||||||
|
text = text[: cut if cut > 0 else preview_chars]
|
||||||
|
|
||||||
|
chars = len(text)
|
||||||
|
|
||||||
# Print ETA once we have a calibration rate
|
# Print ETA once we have a calibration rate
|
||||||
if chars_per_sec is not None:
|
if chars_per_sec is not None:
|
||||||
@ -264,7 +281,8 @@ def main() -> None:
|
|||||||
print(f"\n[{label}] voice={voice} → {wav_name} (timing calibration run)")
|
print(f"\n[{label}] voice={voice} → {wav_name} (timing calibration run)")
|
||||||
|
|
||||||
stem, ext = wav_name.rsplit(".", 1)
|
stem, ext = wav_name.rsplit(".", 1)
|
||||||
out_path = OUTPUT_DIR / f"{stem}_{voice}.{ext}"
|
preview_tag = "_preview" if preview_chars else ""
|
||||||
|
out_path = OUTPUT_DIR / f"{stem}_{voice}{preview_tag}.{ext}"
|
||||||
elapsed = generate_audio(pipeline, text, voice, out_path)
|
elapsed = generate_audio(pipeline, text, voice, out_path)
|
||||||
timing_rows.append((label, chars, elapsed))
|
timing_rows.append((label, chars, elapsed))
|
||||||
|
|
||||||
|
|||||||
352
create_temple_voices.py
Normal file
352
create_temple_voices.py
Normal file
@ -0,0 +1,352 @@
|
|||||||
|
"""
|
||||||
|
create_temple_voices.py
|
||||||
|
────────────────────────
|
||||||
|
Generate the "Sacred Temple Writings" section of the Nem audiobook using one
|
||||||
|
distinct Microsoft Edge neural TTS voice per character (NOT Kokoro).
|
||||||
|
|
||||||
|
Uses the free edge-tts library which streams Microsoft Azure neural voices.
|
||||||
|
Audio is stitched into a single WAV and saved to OUTPUT_DIR.
|
||||||
|
|
||||||
|
Usage:
|
||||||
|
python create_temple_voices.py # full render
|
||||||
|
python create_temple_voices.py --preview 40 # first 40 segments only
|
||||||
|
python create_temple_voices.py --print-segments # inspect parsed segments
|
||||||
|
python create_temple_voices.py --list-voices # list available en voices
|
||||||
|
|
||||||
|
Voice assignments live in CHARACTER_VOICES below — easy to customise.
|
||||||
|
Run --list-voices to discover all available edge-tts voice names.
|
||||||
|
"""
|
||||||
|
|
||||||
|
import argparse
|
||||||
|
import asyncio
|
||||||
|
import re
|
||||||
|
import subprocess
|
||||||
|
import time
|
||||||
|
from collections import Counter
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
import numpy as np
|
||||||
|
import soundfile as sf
|
||||||
|
import edge_tts
|
||||||
|
|
||||||
|
# ── File / output config ───────────────────────────────────────────────────────
|
||||||
|
_FIXED_FILE = Path("Audio Master Nem Full (TTS Fixed).txt")
|
||||||
|
_ORIG_FILE = Path("Audio Master Nem Full.txt")
|
||||||
|
SOURCE_FILE = _FIXED_FILE if _FIXED_FILE.exists() else _ORIG_FILE
|
||||||
|
|
||||||
|
OUTPUT_DIR = Path("output_temple_voices")
|
||||||
|
OUTPUT_FILE = "sacred_temple_writings_multivoice.wav"
|
||||||
|
|
||||||
|
SAMPLE_RATE = 24_000 # Hz — final WAV sample rate
|
||||||
|
PAUSE_SAME = 350 # ms silence between same-speaker segments
|
||||||
|
PAUSE_CHANGE = 650 # ms silence between different-speaker segments
|
||||||
|
|
||||||
|
# ── Section boundary markers (match create_audiobook_nem.py BOOKS order) ──────
|
||||||
|
# Sacred Temple Writings starts at "THE SACRED" / "TEMPLE WRITINGS"
|
||||||
|
# and ends just before "THE FIRST BOOK" / "OF SAMUEL THE LAMANITE"
|
||||||
|
_SEC_START_L1 = "THE SACRED"
|
||||||
|
_SEC_START_L2 = "TEMPLE WRITINGS"
|
||||||
|
_SEC_END_L1 = "THE FIRST BOOK"
|
||||||
|
_SEC_END_L2 = "OF SAMUEL THE LAMANITE"
|
||||||
|
|
||||||
|
# ── Character → edge-tts voice ────────────────────────────────────────────────
|
||||||
|
# Run python create_temple_voices.py --list-voices to see all available voices.
|
||||||
|
# Keys must match the speaker labels exactly as they appear in the source file.
|
||||||
|
CHARACTER_VOICES: dict[str, str] = {
|
||||||
|
# ── Celestial beings ───────────────────────────────────────────────────────
|
||||||
|
"Narrator": "en-US-GuyNeural", # calm neutral narrator
|
||||||
|
"Elohim Heavenly Mother": "en-US-JennyNeural", # warm, wise matriarch
|
||||||
|
"Elohim Heavenly Father": "en-US-AndrewMultilingualNeural", # expressive, authoritative
|
||||||
|
"Jehovah": "en-US-AndrewNeural", # clear, gentle divine
|
||||||
|
"Angel of the Lord": "en-US-BrianNeural", # ethereal divine messenger
|
||||||
|
"Holy Ghost": "en-US-EricNeural", # quiet, inward, spiritual
|
||||||
|
"Holy Ghost Elders": "en-US-BrianNeural", # measured elder council
|
||||||
|
|
||||||
|
# ── Dark beings ────────────────────────────────────────────────────────────
|
||||||
|
"Lucifer": "en-CA-LiamNeural", # smooth, persuasive tempter
|
||||||
|
"Satan": "en-US-SteffanNeural", # cold, commanding adversary
|
||||||
|
|
||||||
|
# ── Mortal / earth characters ──────────────────────────────────────────────
|
||||||
|
"Michael": "en-US-RogerNeural", # noble warrior archangel
|
||||||
|
"Adam": "en-US-ChristopherNeural", # earnest first man
|
||||||
|
"Eve": "en-US-AriaNeural", # curious, warm first woman
|
||||||
|
|
||||||
|
# ── Apostles ───────────────────────────────────────────────────────────────
|
||||||
|
"Peter": "en-GB-RyanNeural", # firm British apostle
|
||||||
|
"James": "en-AU-WilliamMultilingualNeural", # steady Australian voice
|
||||||
|
"John": "en-IE-ConnorNeural", # gentle Irish apostle
|
||||||
|
|
||||||
|
# ── Other roles ────────────────────────────────────────────────────────────
|
||||||
|
"Preacher": "en-US-AvaNeural", # bold emphatic preacher
|
||||||
|
"Mob": "en-US-MichelleNeural", # crowd / multitude voice
|
||||||
|
"The Voice of the Mob": "en-US-MichelleNeural", # alias used in some editions
|
||||||
|
}
|
||||||
|
|
||||||
|
# Voice used when a speaker label isn't found in CHARACTER_VOICES
|
||||||
|
FALLBACK_VOICE = "en-US-GuyNeural"
|
||||||
|
|
||||||
|
# Lines/patterns that are ceremony stage-directions → read by Narrator
|
||||||
|
_STAGE_NARRATOR = re.compile(
|
||||||
|
r"^(Break for Instruction|Resume Session|All\s+arise|"
|
||||||
|
r"CHAPTER\s*\d*|________________+|────+)",
|
||||||
|
re.IGNORECASE,
|
||||||
|
)
|
||||||
|
|
||||||
|
# Lines to skip entirely (decorative / empty)
|
||||||
|
_SKIP_RE = re.compile(r"^[—\-_\s\u2014\u2013]*$")
|
||||||
|
|
||||||
|
|
||||||
|
# ── Section extraction ─────────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
def extract_section(source: Path) -> str:
|
||||||
|
"""Return text of the Sacred Temple Writings section."""
|
||||||
|
lines = source.read_text(encoding="utf-8").splitlines()
|
||||||
|
in_sec = False
|
||||||
|
out: list[str] = []
|
||||||
|
|
||||||
|
for i, line in enumerate(lines):
|
||||||
|
s = line.strip()
|
||||||
|
if not in_sec:
|
||||||
|
if (s.upper() == _SEC_START_L1 and
|
||||||
|
i + 1 < len(lines) and
|
||||||
|
lines[i + 1].strip().upper().startswith(_SEC_START_L2)):
|
||||||
|
in_sec = True
|
||||||
|
else:
|
||||||
|
# End just before the next section
|
||||||
|
if (s.upper() == _SEC_END_L1 and
|
||||||
|
i + 1 < len(lines) and
|
||||||
|
lines[i + 1].strip().upper().startswith(_SEC_END_L2)):
|
||||||
|
break
|
||||||
|
out.append(line)
|
||||||
|
|
||||||
|
if not out:
|
||||||
|
raise RuntimeError(
|
||||||
|
f"Could not locate 'Sacred Temple Writings' in '{source}'.\n"
|
||||||
|
"Ensure the source file has a line exactly matching "
|
||||||
|
f"'{_SEC_START_L1}' followed by '{_SEC_START_L2}'."
|
||||||
|
)
|
||||||
|
return "\n".join(out)
|
||||||
|
|
||||||
|
|
||||||
|
# ── Segment parser ─────────────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
def _speaker_regex(characters: list[str]) -> re.Pattern:
|
||||||
|
"""Regex matching [optional-number] CharacterName: text"""
|
||||||
|
# Sort longest-first so "Holy Ghost Elders" matches before "Holy Ghost"
|
||||||
|
names = sorted(characters, key=len, reverse=True)
|
||||||
|
pat = "|".join(re.escape(n) for n in names)
|
||||||
|
return re.compile(r"^\d*\s*(" + pat + r")\s*:\s*(.*)", re.IGNORECASE)
|
||||||
|
|
||||||
|
|
||||||
|
def parse_segments(text: str) -> list[tuple[str, str]]:
|
||||||
|
"""
|
||||||
|
Convert section text into a list of (normalised_speaker, spoken_text) tuples.
|
||||||
|
Non-attributed prose becomes Narrator lines.
|
||||||
|
"""
|
||||||
|
char_re = _speaker_regex(list(CHARACTER_VOICES.keys()))
|
||||||
|
|
||||||
|
# Build a quick lowercase→canonical lookup for speaker name normalisation
|
||||||
|
canon: dict[str, str] = {k.lower(): k for k in CHARACTER_VOICES}
|
||||||
|
|
||||||
|
segments: list[tuple[str, str]] = []
|
||||||
|
cur_speaker = "Narrator"
|
||||||
|
buf: list[str] = []
|
||||||
|
|
||||||
|
def flush() -> None:
|
||||||
|
combined = " ".join(l.strip() for l in buf if l.strip())
|
||||||
|
if combined:
|
||||||
|
segments.append((cur_speaker, combined))
|
||||||
|
buf.clear()
|
||||||
|
|
||||||
|
for raw in text.splitlines():
|
||||||
|
line = raw.strip()
|
||||||
|
|
||||||
|
if not line or _SKIP_RE.match(line):
|
||||||
|
continue
|
||||||
|
|
||||||
|
# Stage direction → Narrator reads it
|
||||||
|
if _STAGE_NARRATOR.match(line):
|
||||||
|
flush()
|
||||||
|
cur_speaker = "Narrator"
|
||||||
|
buf.append(line)
|
||||||
|
continue
|
||||||
|
|
||||||
|
# "The words of Jehovah … are in blue." — formatting note, skip
|
||||||
|
if re.search(r"are in blue|words of jehovah", line, re.IGNORECASE):
|
||||||
|
continue
|
||||||
|
|
||||||
|
m = char_re.match(line)
|
||||||
|
if m:
|
||||||
|
flush()
|
||||||
|
raw_name = m.group(1)
|
||||||
|
cur_speaker = canon.get(raw_name.lower(), raw_name)
|
||||||
|
spoken = m.group(2).strip()
|
||||||
|
if spoken:
|
||||||
|
buf.append(spoken)
|
||||||
|
else:
|
||||||
|
# Continuation of current speaker (or unattributed narrator prose)
|
||||||
|
buf.append(line)
|
||||||
|
|
||||||
|
flush()
|
||||||
|
return segments
|
||||||
|
|
||||||
|
|
||||||
|
# ── Audio generation ───────────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
async def _tts_bytes(text: str, voice: str) -> bytes:
|
||||||
|
"""Stream edge-tts and return raw MP3 bytes."""
|
||||||
|
communicate = edge_tts.Communicate(text, voice)
|
||||||
|
data = bytearray()
|
||||||
|
async for chunk in communicate.stream():
|
||||||
|
if chunk["type"] == "audio":
|
||||||
|
data.extend(chunk["data"])
|
||||||
|
return bytes(data)
|
||||||
|
|
||||||
|
|
||||||
|
def _mp3_to_numpy(mp3: bytes) -> np.ndarray:
|
||||||
|
"""Decode MP3 bytes → mono float32 numpy array at SAMPLE_RATE using ffmpeg."""
|
||||||
|
cmd = [
|
||||||
|
"ffmpeg", "-hide_banner", "-loglevel", "error",
|
||||||
|
"-i", "pipe:0", # read MP3 from stdin
|
||||||
|
"-f", "f32le", # raw 32-bit little-endian float PCM
|
||||||
|
"-acodec", "pcm_f32le",
|
||||||
|
"-ac", "1", # mono
|
||||||
|
"-ar", str(SAMPLE_RATE), # resample to target rate
|
||||||
|
"pipe:1", # write PCM to stdout
|
||||||
|
]
|
||||||
|
result = subprocess.run(cmd, input=mp3, capture_output=True, check=True)
|
||||||
|
return np.frombuffer(result.stdout, dtype=np.float32).copy()
|
||||||
|
|
||||||
|
|
||||||
|
def _silence(ms: int) -> np.ndarray:
|
||||||
|
return np.zeros(int(SAMPLE_RATE * ms / 1000), dtype=np.float32)
|
||||||
|
|
||||||
|
|
||||||
|
async def render(
|
||||||
|
segments: list[tuple[str, str]],
|
||||||
|
preview: int | None = None,
|
||||||
|
) -> np.ndarray:
|
||||||
|
"""Generate and stitch all segment audio; return concatenated float32 array."""
|
||||||
|
if preview is not None:
|
||||||
|
segments = segments[:preview]
|
||||||
|
|
||||||
|
parts: list[np.ndarray] = []
|
||||||
|
last_speaker: str | None = None
|
||||||
|
t0 = time.monotonic()
|
||||||
|
|
||||||
|
for idx, (speaker, text) in enumerate(segments, 1):
|
||||||
|
voice = CHARACTER_VOICES.get(speaker, FALLBACK_VOICE)
|
||||||
|
marker = "⚠" if speaker not in CHARACTER_VOICES else " "
|
||||||
|
print(f" {marker}[{idx:>4}/{len(segments)}] {speaker:<28} {voice}")
|
||||||
|
|
||||||
|
try:
|
||||||
|
mp3 = await _tts_bytes(text, voice)
|
||||||
|
except Exception as exc:
|
||||||
|
print(f" ↳ ERROR with '{voice}': {exc} — falling back to {FALLBACK_VOICE}")
|
||||||
|
mp3 = await _tts_bytes(text, FALLBACK_VOICE)
|
||||||
|
|
||||||
|
audio = _mp3_to_numpy(mp3)
|
||||||
|
|
||||||
|
if parts:
|
||||||
|
gap = PAUSE_SAME if speaker == last_speaker else PAUSE_CHANGE
|
||||||
|
parts.append(_silence(gap))
|
||||||
|
parts.append(audio)
|
||||||
|
last_speaker = speaker
|
||||||
|
|
||||||
|
elapsed = time.monotonic() - t0
|
||||||
|
print(f"\n ✓ {len(segments)} segments in {elapsed:.0f}s")
|
||||||
|
return np.concatenate(parts) if parts else np.array([], dtype=np.float32)
|
||||||
|
|
||||||
|
|
||||||
|
# ── Voice listing ──────────────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
async def _list_voices_async() -> None:
|
||||||
|
voices = await edge_tts.list_voices()
|
||||||
|
english = sorted(
|
||||||
|
(v for v in voices if v["Locale"].startswith("en-")),
|
||||||
|
key=lambda v: (v["Locale"], v["ShortName"]),
|
||||||
|
)
|
||||||
|
print(f"\n {'Locale':<12} {'Short Name':<45} Gender")
|
||||||
|
print(" " + "─" * 68)
|
||||||
|
for v in english:
|
||||||
|
print(f" {v['Locale']:<12} {v['ShortName']:<45} {v['Gender']}")
|
||||||
|
print(f"\n {len(english)} English voices total.")
|
||||||
|
|
||||||
|
|
||||||
|
# ── CLI / main ─────────────────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
def main() -> None:
|
||||||
|
ap = argparse.ArgumentParser(
|
||||||
|
description="Render Sacred Temple Writings with per-character edge-tts voices."
|
||||||
|
)
|
||||||
|
ap.add_argument("--list-voices", action="store_true",
|
||||||
|
help="Print all available English edge-tts voices and exit.")
|
||||||
|
ap.add_argument("--print-segments", action="store_true",
|
||||||
|
help="Print parsed (speaker, text) segments and exit.")
|
||||||
|
ap.add_argument("--preview", type=int, metavar="N",
|
||||||
|
help="Render only the first N segments (quick test).")
|
||||||
|
args = ap.parse_args()
|
||||||
|
|
||||||
|
if args.list_voices:
|
||||||
|
asyncio.run(_list_voices_async())
|
||||||
|
return
|
||||||
|
|
||||||
|
# ── Extract & parse ────────────────────────────────────────────────────────
|
||||||
|
print(f"Source : {SOURCE_FILE}")
|
||||||
|
text = extract_section(SOURCE_FILE)
|
||||||
|
print(f"Section: {len(text):,} chars extracted\n")
|
||||||
|
|
||||||
|
segments = parse_segments(text)
|
||||||
|
|
||||||
|
if args.print_segments:
|
||||||
|
print(f"Parsed {len(segments)} segments:\n")
|
||||||
|
for i, (spkr, txt) in enumerate(segments, 1):
|
||||||
|
snippet = txt[:90] + ("…" if len(txt) > 90 else "")
|
||||||
|
voice = CHARACTER_VOICES.get(spkr, f"{FALLBACK_VOICE} ⚠")
|
||||||
|
print(f" {i:>4}. [{spkr}] ({voice})\n {snippet}\n")
|
||||||
|
return
|
||||||
|
|
||||||
|
# ── Summary table ──────────────────────────────────────────────────────────
|
||||||
|
counts = Counter(s for s, _ in segments)
|
||||||
|
unrecognised = {s for s in counts if s not in CHARACTER_VOICES}
|
||||||
|
|
||||||
|
print(f"Parsed {len(segments)} segments across {len(counts)} speakers:\n")
|
||||||
|
print(f" {'Speaker':<28} {'Segs':>5} {'Voice'}")
|
||||||
|
print(f" {'─'*28} {'─'*5} {'─'*45}")
|
||||||
|
for spkr, voice in CHARACTER_VOICES.items():
|
||||||
|
if counts[spkr]:
|
||||||
|
print(f" {spkr:<28} {counts[spkr]:>5} {voice}")
|
||||||
|
for spkr in sorted(unrecognised):
|
||||||
|
print(f" {spkr:<28} {counts[spkr]:>5} {FALLBACK_VOICE} ⚠ unrecognised")
|
||||||
|
|
||||||
|
total_chars = sum(len(t) for _, t in segments)
|
||||||
|
print(f"\n Total chars: {total_chars:,}")
|
||||||
|
if args.preview:
|
||||||
|
print(f" ⚡ PREVIEW MODE — rendering first {args.preview} segments only")
|
||||||
|
|
||||||
|
# ── GPU note ───────────────────────────────────────────────────────────────
|
||||||
|
# edge-tts is cloud-based (Microsoft Azure neural, free) — GPU not used.
|
||||||
|
print("\nNote: edge-tts uses Microsoft's servers (free, no API key needed).\n"
|
||||||
|
" Render speed depends on your internet connection.\n")
|
||||||
|
|
||||||
|
# ── Render ─────────────────────────────────────────────────────────────────
|
||||||
|
OUTPUT_DIR.mkdir(exist_ok=True)
|
||||||
|
out_path = OUTPUT_DIR / (
|
||||||
|
f"sacred_temple_writings_preview{args.preview}.wav"
|
||||||
|
if args.preview else OUTPUT_FILE
|
||||||
|
)
|
||||||
|
|
||||||
|
print("Rendering segments …\n")
|
||||||
|
audio = asyncio.run(render(segments, args.preview))
|
||||||
|
|
||||||
|
if audio.size > 0:
|
||||||
|
sf.write(str(out_path), audio, SAMPLE_RATE)
|
||||||
|
dur = len(audio) / SAMPLE_RATE
|
||||||
|
m, s = divmod(int(dur), 60)
|
||||||
|
print(f"\n✓ Saved '{out_path}' ({m}m {s:02d}s audio | {SAMPLE_RATE} Hz)")
|
||||||
|
else:
|
||||||
|
print("✗ No audio produced — check parsing with --print-segments")
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
main()
|
||||||
@ -1,4 +1,7 @@
|
|||||||
[
|
[
|
||||||
|
"Hagar",
|
||||||
|
"Ammonite",
|
||||||
|
"Seth",
|
||||||
"Ninety-Two",
|
"Ninety-Two",
|
||||||
"Gilgal",
|
"Gilgal",
|
||||||
"Nat",
|
"Nat",
|
||||||
@ -107,7 +110,6 @@
|
|||||||
"Ninety",
|
"Ninety",
|
||||||
"Nemenha",
|
"Nemenha",
|
||||||
"Nem",
|
"Nem",
|
||||||
"Lord'S",
|
|
||||||
"Levitical",
|
"Levitical",
|
||||||
"Obedience",
|
"Obedience",
|
||||||
"Consecration",
|
"Consecration",
|
||||||
|
|||||||
@ -6,19 +6,30 @@
|
|||||||
"Lehis": "Leehis",
|
"Lehis": "Leehis",
|
||||||
"Lehies": "Leehis",
|
"Lehies": "Leehis",
|
||||||
"Liahona": "Leeahona",
|
"Liahona": "Leeahona",
|
||||||
"Alma": "Al-ma",
|
|
||||||
"Gadiantons": "Gadeeantuns",
|
"Gadiantons": "Gadeeantuns",
|
||||||
"Laban": "Layban",
|
"Laban": "Layban",
|
||||||
"Mosiah": "Moziah",
|
"Mosiah": "Moziah",
|
||||||
"Nehors": "Kneehores",
|
"Nehors": "Kneehores",
|
||||||
"Tarry": "Tarery",
|
"Tarry": "Tarery",
|
||||||
"Nephihah": "Kneefihah",
|
|
||||||
"Nephihet": "Kneefihet",
|
|
||||||
"Nephite": "Kneefite",
|
|
||||||
"Nephites": "Kneefites",
|
"Nephites": "Kneefites",
|
||||||
"Nephi-Im": "Kneefi-Im",
|
"Anti-Nephi-Lehies": "Anti-Kneef-eye-Leehis",
|
||||||
"Nephitish": "Kneefitish",
|
"Lamanite": "Laymanite",
|
||||||
"Zenephi": "Zekneefi",
|
"Lamanites": "Laymanites",
|
||||||
"Moroni": "Mor-oh-nye",
|
"Lamb'S": "Lamb's",
|
||||||
"Nephi": "Knee-fye"
|
"Sarai": "Sa-rye",
|
||||||
|
"Telestial": "Tea-lestial",
|
||||||
|
"Lord'S": "Lord's",
|
||||||
|
"Helaman": "He-la-mun",
|
||||||
|
"Alma": "Al-ma",
|
||||||
|
"Nephihah": "Kneef-eyehah",
|
||||||
|
"Nephihet": "Kneef-eyehet",
|
||||||
|
"Nephite": "Kneefight",
|
||||||
|
"Nephi-Im": "Kneef-eye-Im",
|
||||||
|
"Zenephi": "Ze-kneef-eye",
|
||||||
|
"Nephitish": "Kneefight-ish",
|
||||||
|
"Moroni": "Moh-roh-nye",
|
||||||
|
"Nephi": "Knee-fye",
|
||||||
|
"Hagar": "Hag-ar",
|
||||||
|
"Oug": "Ohg",
|
||||||
|
"Ougan": "Ohgan"
|
||||||
}
|
}
|
||||||
42
run_audiobook.bat
Normal file
42
run_audiobook.bat
Normal file
@ -0,0 +1,42 @@
|
|||||||
|
@echo off
|
||||||
|
title Create Audiobook
|
||||||
|
|
||||||
|
:: Change to the folder this .bat file lives in
|
||||||
|
cd /d "%~dp0"
|
||||||
|
|
||||||
|
:: Check setup has been run
|
||||||
|
if not exist .venv\Scripts\python.exe (
|
||||||
|
echo ERROR: Setup has not been run yet.
|
||||||
|
echo Please double-click setup_windows.bat first.
|
||||||
|
pause
|
||||||
|
exit /b 1
|
||||||
|
)
|
||||||
|
|
||||||
|
echo ============================================================
|
||||||
|
echo Audiobook Creator
|
||||||
|
echo ============================================================
|
||||||
|
echo.
|
||||||
|
echo Options:
|
||||||
|
echo 1 - Generate ALL chapters (may take many hours)
|
||||||
|
echo 2 - List detected chapters only
|
||||||
|
echo 3 - Generate a short PREVIEW of each chapter
|
||||||
|
echo 4 - Generate specific chapters (enter numbers next)
|
||||||
|
echo.
|
||||||
|
set /p CHOICE="Enter choice (1/2/3/4): "
|
||||||
|
|
||||||
|
if "%CHOICE%"=="1" (
|
||||||
|
.venv\Scripts\python create_audiobook_lightbringer.py
|
||||||
|
) else if "%CHOICE%"=="2" (
|
||||||
|
.venv\Scripts\python create_audiobook_lightbringer.py --list
|
||||||
|
) else if "%CHOICE%"=="3" (
|
||||||
|
.venv\Scripts\python create_audiobook_lightbringer.py --preview
|
||||||
|
) else if "%CHOICE%"=="4" (
|
||||||
|
set /p CHAPTERS="Enter chapter numbers separated by spaces (e.g. 0 1 2): "
|
||||||
|
.venv\Scripts\python create_audiobook_lightbringer.py %CHAPTERS%
|
||||||
|
) else (
|
||||||
|
echo Invalid choice.
|
||||||
|
)
|
||||||
|
|
||||||
|
echo.
|
||||||
|
echo Done. Output files are in the output_audiobook_lightbringer folder.
|
||||||
|
pause
|
||||||
21
run_gui.bat
Normal file
21
run_gui.bat
Normal file
@ -0,0 +1,21 @@
|
|||||||
|
@echo off
|
||||||
|
title Proper Noun GUI
|
||||||
|
|
||||||
|
:: Change to the folder this .bat file lives in
|
||||||
|
cd /d "%~dp0"
|
||||||
|
|
||||||
|
:: Check setup has been run
|
||||||
|
if not exist .venv\Scripts\python.exe (
|
||||||
|
echo ERROR: Setup has not been run yet.
|
||||||
|
echo Please double-click setup_windows.bat first.
|
||||||
|
pause
|
||||||
|
exit /b 1
|
||||||
|
)
|
||||||
|
|
||||||
|
echo Starting Proper Noun Player GUI...
|
||||||
|
.venv\Scripts\python gui_proper_noun_player.py
|
||||||
|
if errorlevel 1 (
|
||||||
|
echo.
|
||||||
|
echo The application closed with an error. See message above.
|
||||||
|
pause
|
||||||
|
)
|
||||||
86
setup_windows.bat
Normal file
86
setup_windows.bat
Normal file
@ -0,0 +1,86 @@
|
|||||||
|
@echo off
|
||||||
|
setlocal EnableDelayedExpansion
|
||||||
|
title Audiobook Setup
|
||||||
|
|
||||||
|
echo ============================================================
|
||||||
|
echo Audiobook Setup for Windows 11
|
||||||
|
echo ============================================================
|
||||||
|
echo.
|
||||||
|
|
||||||
|
:: ── 1. Check Python ──────────────────────────────────────────────────────────
|
||||||
|
echo [1/5] Checking Python installation...
|
||||||
|
python --version >nul 2>&1
|
||||||
|
if errorlevel 1 (
|
||||||
|
echo.
|
||||||
|
echo ERROR: Python was not found.
|
||||||
|
echo.
|
||||||
|
echo Please install Python 3.11 from https://www.python.org/downloads/
|
||||||
|
echo IMPORTANT: On the installer, tick "Add Python to PATH" before clicking Install.
|
||||||
|
echo.
|
||||||
|
echo After installing, close this window and double-click setup_windows.bat again.
|
||||||
|
pause
|
||||||
|
exit /b 1
|
||||||
|
)
|
||||||
|
|
||||||
|
for /f "tokens=2 delims= " %%v in ('python --version 2^>^&1') do set PY_VER=%%v
|
||||||
|
echo Found Python %PY_VER%
|
||||||
|
echo.
|
||||||
|
|
||||||
|
:: ── 2. Create virtual environment ────────────────────────────────────────────
|
||||||
|
echo [2/5] Creating virtual environment (.venv)...
|
||||||
|
if exist .venv (
|
||||||
|
echo .venv already exists, skipping creation.
|
||||||
|
) else (
|
||||||
|
python -m venv .venv
|
||||||
|
if errorlevel 1 (
|
||||||
|
echo ERROR: Failed to create virtual environment.
|
||||||
|
pause
|
||||||
|
exit /b 1
|
||||||
|
)
|
||||||
|
echo Virtual environment created.
|
||||||
|
)
|
||||||
|
echo.
|
||||||
|
|
||||||
|
:: ── 3. Install PyTorch with CUDA (for gaming GPU) ────────────────────────────
|
||||||
|
echo [3/5] Installing PyTorch with CUDA 12.4 support (this may take a while)...
|
||||||
|
echo Downloading ~2.5 GB — please be patient.
|
||||||
|
echo.
|
||||||
|
.venv\Scripts\pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124
|
||||||
|
if errorlevel 1 (
|
||||||
|
echo.
|
||||||
|
echo WARNING: CUDA build failed. Falling back to CPU-only PyTorch.
|
||||||
|
echo Audio generation will be slower but will still work.
|
||||||
|
.venv\Scripts\pip install torch
|
||||||
|
)
|
||||||
|
echo.
|
||||||
|
|
||||||
|
:: ── 4. Install remaining packages ────────────────────────────────────────────
|
||||||
|
echo [4/5] Installing remaining packages (kokoro, soundfile, sounddevice)...
|
||||||
|
.venv\Scripts\pip install -r requirements.txt
|
||||||
|
if errorlevel 1 (
|
||||||
|
echo ERROR: Package installation failed. Check your internet connection.
|
||||||
|
pause
|
||||||
|
exit /b 1
|
||||||
|
)
|
||||||
|
echo.
|
||||||
|
|
||||||
|
:: ── 5. Download the Kokoro TTS model ─────────────────────────────────────────
|
||||||
|
echo [5/5] Downloading the Kokoro TTS model (hexgrad/Kokoro-82M, ~330 MB)...
|
||||||
|
echo This only happens once.
|
||||||
|
echo.
|
||||||
|
.venv\Scripts\python -c "from kokoro import KPipeline; KPipeline(lang_code='a', repo_id='hexgrad/Kokoro-82M'); print('Model ready.')"
|
||||||
|
if errorlevel 1 (
|
||||||
|
echo.
|
||||||
|
echo WARNING: Model download failed. It will retry the first time you run the app.
|
||||||
|
echo Make sure you have an internet connection on first launch.
|
||||||
|
)
|
||||||
|
|
||||||
|
echo.
|
||||||
|
echo ============================================================
|
||||||
|
echo Setup complete!
|
||||||
|
echo.
|
||||||
|
echo To launch the GUI: double-click run_gui.bat
|
||||||
|
echo To create the audiobook: double-click run_audiobook.bat
|
||||||
|
echo ============================================================
|
||||||
|
echo.
|
||||||
|
pause
|
||||||
Reference in New Issue
Block a user