Compare commits
14 Commits
f0e0adf24b
...
main
| Author | SHA1 | Date | |
|---|---|---|---|
| e9ddbb586a | |||
| 894144c84a | |||
| 69639342e3 | |||
| 125cb25cf8 | |||
| 8a1362fe0b | |||
| 0d00176a18 | |||
| 3c2c3d241e | |||
| 224f97d0c6 | |||
| 6e2e0f9af7 | |||
| c1301fee18 | |||
| 6781efe3f3 | |||
| 44bc757f3f | |||
| 6cefc3c862 | |||
| 949bd7c203 |
2
.envrc
Normal file
2
.envrc
Normal file
@ -0,0 +1,2 @@
|
||||
export VIRTUAL_ENV="$PWD/.venv"
|
||||
export PATH="$VIRTUAL_ENV/bin:$PATH"
|
||||
7
.gitignore
vendored
7
.gitignore
vendored
@ -3,6 +3,9 @@ __pycache__/
|
||||
*.pyc
|
||||
*.pyo
|
||||
.venv/
|
||||
build/
|
||||
dist/
|
||||
*.spec
|
||||
|
||||
# Audio files
|
||||
*.wav
|
||||
@ -14,6 +17,10 @@ proper_nouns_audio/
|
||||
# Generated data (JSON files in output_proper_nouns/ are tracked)
|
||||
output_proper_nouns/remaining_review.txt
|
||||
|
||||
# Generated PDFs and LaTeX files
|
||||
*.pdf
|
||||
*.tex
|
||||
|
||||
# Text files (except proper_nouns.txt)
|
||||
*.txt
|
||||
!proper_nouns.txt
|
||||
|
||||
4
.vscode/settings.json
vendored
Normal file
4
.vscode/settings.json
vendored
Normal file
@ -0,0 +1,4 @@
|
||||
{
|
||||
"python.defaultInterpreterPath": ".venv/bin/python",
|
||||
"python.terminal.activateEnvironment": true
|
||||
}
|
||||
125
README.md
125
README.md
@ -0,0 +1,125 @@
|
||||
# Audiobook Creator
|
||||
|
||||
AI-powered audiobook generator using the [Kokoro TTS](https://github.com/hexgrad/kokoro) model.
|
||||
Generates high-quality narrated `.wav` files from plain-text novels, with a GUI tool for auditing and fixing proper noun pronunciations per book.
|
||||
|
||||
---
|
||||
|
||||
## Features
|
||||
|
||||
- **Multi-book support** — each book's proper nouns, fixes, and audio are fully isolated
|
||||
- **Proper Noun GUI** — hear every extracted name, mark it correct or type a phonetic fix
|
||||
- **Audiobook generation** — one `.wav` per chapter, GPU-accelerated via CUDA
|
||||
- **In-GUI extraction** — click one button to run NLP extraction and generate audio, no separate scripts needed
|
||||
- **Apply Fixes** — writes a TTS-ready copy of the source text with all phonetic substitutions applied
|
||||
|
||||
---
|
||||
|
||||
## Project structure
|
||||
|
||||
```
|
||||
Audio Text for Novel Lightbringer/ ← multi-file book (chapters as .txt)
|
||||
Audio Master Nem Full.txt ← single-file book
|
||||
|
||||
gui_proper_noun_player.py ← proper noun auditing GUI
|
||||
create_audiobook_lightbringer.py ← generate Lightbringer audiobook chapters
|
||||
create_audiobook_nem.py ← generate Nem audiobook chapters
|
||||
|
||||
output_audiobook_lightbringer/ ← chapter WAV output
|
||||
output_audiobook/ ← Nem WAV output
|
||||
output_proper_nouns/<book>/ ← manifest + JSON fix data per book
|
||||
proper_nouns_audio/<book>/ ← word audio + replacements cache per book
|
||||
|
||||
requirements.txt
|
||||
setup_windows.bat ← one-click Windows setup
|
||||
run_gui.bat ← launch GUI on Windows
|
||||
run_audiobook.bat ← generate audiobook on Windows
|
||||
---
|
||||
|
||||
## Setup (Windows - Easiest for Non-Tech Users)
|
||||
|
||||
1. **Download** the project as a ZIP file from GitHub
|
||||
2. **Extract** the ZIP to a folder on your computer (e.g., `C:\audiobook-creator`)
|
||||
3. **Double-click** `setup_windows.bat` and wait for it to finish installing everything (may take 10-20 minutes)
|
||||
4. **Double-click** `run_gui.bat` to launch the Proper Noun Player GUI
|
||||
5. **Double-click** `run_audiobook.bat` to generate audiobook chapters
|
||||
|
||||
That's it! The setup script handles Python installation, virtual environment, and all dependencies automatically.
|
||||
|
||||
---
|
||||
|
||||
## Setup (Linux / Mac)
|
||||
|
||||
```bash
|
||||
python3.12 -m venv .venv
|
||||
source .venv/bin/activate
|
||||
pip install torch --index-url https://download.pytorch.org/whl/cu124 # CUDA 12.4
|
||||
pip install -r requirements.txt
|
||||
python -m spacy download en_core_web_sm
|
||||
```
|
||||
|
||||
> For CPU-only: replace the torch line with `pip install torch`
|
||||
|
||||
---
|
||||
|
||||
## Setup (Windows)
|
||||
|
||||
See [SETUP_WINDOWS.md](SETUP_WINDOWS.md) for a step-by-step guide aimed at non-technical users.
|
||||
|
||||
---
|
||||
|
||||
## Usage
|
||||
|
||||
### Proper Noun GUI
|
||||
|
||||
```bash
|
||||
.venv/bin/python gui_proper_noun_player.py
|
||||
```
|
||||
|
||||
1. Select a book from the dropdown
|
||||
2. Click **⚙ Extract & Generate Audio** — extracts proper nouns via spaCy and generates a TTS clip for each one
|
||||
3. Click words in the Review list to hear them; press Enter to mark correct or type a phonetic replacement first
|
||||
4. Click **⇄ Apply Fixes to Text** to write a pronunciation-corrected copy of the source file
|
||||
|
||||
### Generate Audiobook
|
||||
|
||||
```bash
|
||||
# All chapters
|
||||
.venv/bin/python create_audiobook_lightbringer.py
|
||||
|
||||
# List chapters only
|
||||
.venv/bin/python create_audiobook_lightbringer.py --list
|
||||
|
||||
# Preview clips
|
||||
.venv/bin/python create_audiobook_lightbringer.py --preview
|
||||
|
||||
# Specific chapters
|
||||
.venv/bin/python create_audiobook_lightbringer.py 0 1 2
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Dependencies
|
||||
|
||||
| Package | Purpose |
|
||||
|---|---|
|
||||
| `kokoro` | Kokoro-82M TTS model |
|
||||
| `torch` | GPU inference |
|
||||
| `soundfile` / `sounddevice` | Audio I/O |
|
||||
| `numpy` | Audio array operations |
|
||||
| `spacy` + `en_core_web_sm` | Proper noun extraction (NER + PROPN) |
|
||||
| `wordfreq` | Common-word filter during extraction |
|
||||
|
||||
---
|
||||
|
||||
## Output
|
||||
|
||||
| Path | Contents |
|
||||
|---|---|
|
||||
| `output_audiobook_lightbringer/` | `chapter_01_homecoming.wav`, … |
|
||||
| `output_proper_nouns/<book>/manifest.json` | Word → WAV filename map |
|
||||
| `output_proper_nouns/<book>/pronunciation_fixes.json` | `{"Nephi": "Kneephi", …}` |
|
||||
| `output_proper_nouns/<book>/correct_words.json` | Words confirmed correct |
|
||||
| `proper_nouns_audio/<book>/` | Per-word audio clips |
|
||||
| `proper_nouns_audio/<book>/replacements_cache/` | Cached phonetic fix clips |
|
||||
|
||||
|
||||
134
SETUP_WINDOWS.md
Normal file
134
SETUP_WINDOWS.md
Normal file
@ -0,0 +1,134 @@
|
||||
# Audiobook Creator — Windows 11 Setup Guide
|
||||
|
||||
This guide is written for someone who has never used Python or the command line.
|
||||
Follow the steps in order and you will be generating audiobook chapters with your gaming GPU.
|
||||
|
||||
---
|
||||
|
||||
## What you will need
|
||||
|
||||
| Requirement | Why |
|
||||
|---|---|
|
||||
| Windows 11 PC with a modern NVIDIA GPU | Fast audio generation using CUDA |
|
||||
| ~5 GB free disk space | Python, PyTorch, and the AI voice model |
|
||||
| Internet connection (first-time only) | Downloads packages and the Kokoro voice model |
|
||||
|
||||
---
|
||||
|
||||
## Step 1 — Install Python
|
||||
|
||||
1. Go to **https://www.python.org/downloads/**
|
||||
2. Click the big yellow **"Download Python 3.12.x"** button
|
||||
3. Run the installer
|
||||
4. **IMPORTANT:** On the very first screen of the installer, tick the checkbox that says **"Add Python to PATH"** before clicking Install Now
|
||||
|
||||
> If you missed that checkbox, uninstall Python from Windows Settings and reinstall it with the box ticked.
|
||||
|
||||
---
|
||||
|
||||
## Step 2 — Get the project files
|
||||
|
||||
You should have a folder called `audiobook_creator` (or similar) containing the project files. Make sure it includes these files:
|
||||
|
||||
```
|
||||
setup_windows.bat
|
||||
run_gui.bat
|
||||
run_audiobook.bat
|
||||
requirements.txt
|
||||
gui_proper_noun_player.py
|
||||
create_audiobook_lightbringer.py
|
||||
Audio Text for Novel Lightbringer\ ← your chapter text files go here
|
||||
```
|
||||
|
||||
If you received a ZIP file, extract it first so the folder is not inside another folder.
|
||||
|
||||
---
|
||||
|
||||
## Step 3 — Run Setup (one time only)
|
||||
|
||||
1. Open the project folder in File Explorer
|
||||
2. Double-click **`setup_windows.bat`**
|
||||
3. A black terminal window opens and runs through these steps automatically:
|
||||
- Checks Python is installed
|
||||
- Creates a private Python environment (`.venv` folder)
|
||||
- Downloads PyTorch with GPU (CUDA) support — **about 2.5 GB, this takes several minutes**
|
||||
- Installs the remaining packages (kokoro, spaCy, etc.)
|
||||
- Downloads the spaCy English language model
|
||||
- Downloads the Kokoro AI voice model — **about 330 MB**
|
||||
4. When it says **"Setup complete!"**, press any key to close the window
|
||||
|
||||
You only need to do this once. If you run it again it will safely skip anything already installed.
|
||||
|
||||
---
|
||||
|
||||
## Step 4 — Review Proper Noun Pronunciations (GUI)
|
||||
|
||||
Before generating the audiobook, it helps to check how unusual names are pronounced.
|
||||
|
||||
1. Double-click **`run_gui.bat`**
|
||||
2. The Proper Noun Pronunciation Auditor window opens
|
||||
3. Select your book from the dropdown at the top
|
||||
4. Click **⚙ Extract & Generate Audio** — this scans the text and creates a short audio clip for every proper noun found (takes a few minutes the first time)
|
||||
5. Click any word in the **To Review** list to hear how it sounds
|
||||
6. If it sounds wrong, type the phonetic spelling in the box at the bottom and press **Enter** to save a fix
|
||||
- Example: type `Kneephi` instead of `Nephi`
|
||||
7. If it sounds correct, just press **Enter** without changing anything
|
||||
8. When you are done reviewing, click **⇄ Apply Fixes to Text** to save a corrected copy of the source text
|
||||
|
||||
**Keyboard shortcuts:**
|
||||
| Key | Action |
|
||||
|---|---|
|
||||
| Space | Replay current word |
|
||||
| Enter | Mark correct (or save fix if text was changed) |
|
||||
| Escape | Reset the fix box, go back to word list |
|
||||
| s | Stop audio |
|
||||
| ↑ / ↓ | Navigate the word list from the fix box |
|
||||
| Delete | Move a word back to Review from Correct or Fixes |
|
||||
|
||||
---
|
||||
|
||||
## Step 5 — Generate the Audiobook
|
||||
|
||||
1. Double-click **`run_audiobook.bat`**
|
||||
2. A menu appears — type the number of your choice and press Enter:
|
||||
|
||||
| Option | What it does |
|
||||
|---|---|
|
||||
| 1 | Generate **all chapters** — can take many hours, safe to leave running overnight |
|
||||
| 2 | **List** detected chapters only — instant, nothing is generated |
|
||||
| 3 | Generate a short **preview clip** of each chapter — quick sanity check |
|
||||
| 4 | Generate **specific chapters** — enter chapter numbers separated by spaces |
|
||||
|
||||
3. When finished, `.wav` files will be in the `output_audiobook_lightbringer` folder
|
||||
|
||||
---
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
**"Python was not found"**
|
||||
→ Python is not installed, or you forgot to tick "Add Python to PATH" during installation. Uninstall and reinstall Python from https://www.python.org/downloads/ making sure to tick that box.
|
||||
|
||||
**The black window opens and immediately closes**
|
||||
→ There was an error. To see it: press `Win + R`, type `cmd`, press Enter, then drag the `.bat` file into that black window and press Enter. The error message will stay visible.
|
||||
|
||||
**Audio generation is very slow (taking hours per chapter)**
|
||||
→ The GPU version of PyTorch may not have installed correctly. Re-run `setup_windows.bat` — it will reinstall just that part.
|
||||
|
||||
**"No .txt files found in Audio Text for Novel Lightbringer"**
|
||||
→ Make sure your chapter `.txt` files are inside the `Audio Text for Novel Lightbringer` subfolder, not loose in the main project folder.
|
||||
|
||||
**The GUI says "No manifest yet"**
|
||||
→ You need to click **⚙ Extract & Generate Audio** first for that book.
|
||||
|
||||
**Antivirus blocks the .bat files**
|
||||
→ Right-click the `.bat` file, choose Properties, and click "Unblock" at the bottom. Then try again.
|
||||
|
||||
---
|
||||
|
||||
## Output files
|
||||
|
||||
| Folder | Contents |
|
||||
|---|---|
|
||||
| `output_audiobook_lightbringer\` | One `.wav` file per chapter |
|
||||
| `output_proper_nouns\<book>\` | Pronunciation data (JSON files) |
|
||||
| `proper_nouns_audio\<book>\` | Cached word audio clips |
|
||||
402
create_audiobook.py
Normal file
402
create_audiobook.py
Normal file
@ -0,0 +1,402 @@
|
||||
"""
|
||||
create_audiobook.py
|
||||
------------------
|
||||
Generic audiobook generator for text files that contain chapter headings.
|
||||
|
||||
Supported heading formats (single-line headings):
|
||||
- Prologue
|
||||
- Chapter 12
|
||||
- Chapter 12 - Chapter Name
|
||||
- Chapter - 12
|
||||
- Chapter - 12 - Chapter Name
|
||||
|
||||
Features:
|
||||
- Parses chapters from one or more input files/directories
|
||||
- Caches parsed chapter data for faster re-runs when source files are unchanged
|
||||
- Warns about missing chapter numbers (example: found 1,2,4 -> warns about 3)
|
||||
- Generates one .wav per chapter with Kokoro
|
||||
|
||||
Examples:
|
||||
python create_audiobook.py --input "Audio Text for Novel Lightbringer"
|
||||
python create_audiobook.py --input novel.txt --list
|
||||
python create_audiobook.py --input novel.txt 0 1 2 --voice am_michael
|
||||
python create_audiobook.py --input novel.txt --preview 3000
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import argparse
|
||||
import hashlib
|
||||
import json
|
||||
import re
|
||||
import time
|
||||
from pathlib import Path
|
||||
|
||||
import numpy as np
|
||||
import soundfile as sf
|
||||
import torch
|
||||
from kokoro import KPipeline
|
||||
|
||||
SAMPLE_RATE = 24000
|
||||
SPEED = 1.0
|
||||
LANG_CODE = "a"
|
||||
VOICE = "am_onyx"
|
||||
CACHE_VERSION = 1
|
||||
|
||||
PROLOGUE_RE = re.compile(r"^\s*Prologue\s*$", re.IGNORECASE)
|
||||
CHAPTER_RE_1 = re.compile(r"^\s*Chapter\s*-\s*(\d+)(?:\s*-\s*(.+))?\s*$", re.IGNORECASE)
|
||||
CHAPTER_RE_2 = re.compile(r"^\s*Chapter\s+(\d+)(?:\s*-\s*(.+))?\s*$", re.IGNORECASE)
|
||||
RULE_RE = re.compile(r"^[_\-*\s]{3,}\s*$")
|
||||
|
||||
|
||||
def _slug(text: str) -> str:
|
||||
text = text.lower()
|
||||
text = re.sub(r"[^a-z0-9]+", "_", text)
|
||||
return text.strip("_")
|
||||
|
||||
|
||||
def _clean_text(text: str) -> str:
|
||||
text = RULE_RE.sub("", text)
|
||||
text = re.sub(r"\n{3,}", "\n\n", text)
|
||||
return text.strip()
|
||||
|
||||
|
||||
def _fmt_duration(seconds: float) -> str:
|
||||
h, rem = divmod(int(seconds), 3600)
|
||||
m, s = divmod(rem, 60)
|
||||
if h > 0:
|
||||
return f"{h}h {m:02d}m {s:02d}s"
|
||||
if m > 0:
|
||||
return f"{m}m {s:02d}s"
|
||||
return f"{s}s"
|
||||
|
||||
|
||||
def _chapter_heading(line: str) -> tuple[int, str, str] | None:
|
||||
stripped = line.strip()
|
||||
if PROLOGUE_RE.match(stripped):
|
||||
return (0, "Prologue", "Prologue")
|
||||
|
||||
m = CHAPTER_RE_1.match(stripped)
|
||||
if not m:
|
||||
m = CHAPTER_RE_2.match(stripped)
|
||||
if not m:
|
||||
return None
|
||||
|
||||
num = int(m.group(1))
|
||||
title = (m.group(2) or "").strip()
|
||||
label = f"Chapter {num}" + (f" - {title}" if title else "")
|
||||
return (num, title, label)
|
||||
|
||||
|
||||
def _resolve_txt_files(inputs: list[str]) -> list[Path]:
|
||||
txt_files: list[Path] = []
|
||||
for raw in inputs:
|
||||
path = Path(raw)
|
||||
if path.is_file():
|
||||
if path.suffix.lower() == ".txt":
|
||||
txt_files.append(path)
|
||||
continue
|
||||
if path.is_dir():
|
||||
txt_files.extend(sorted(path.glob("*.txt")))
|
||||
|
||||
deduped = sorted({p.resolve() for p in txt_files})
|
||||
return deduped
|
||||
|
||||
|
||||
def _signature_for_files(files: list[Path]) -> list[dict]:
|
||||
sig = []
|
||||
for p in files:
|
||||
st = p.stat()
|
||||
sig.append({
|
||||
"path": str(p),
|
||||
"size": st.st_size,
|
||||
"mtime_ns": st.st_mtime_ns,
|
||||
})
|
||||
return sig
|
||||
|
||||
|
||||
def _cache_path(output_dir: Path, files: list[Path]) -> Path:
|
||||
cache_dir = output_dir / ".cache"
|
||||
digest = hashlib.sha256("\n".join(str(p) for p in files).encode("utf-8")).hexdigest()[:12]
|
||||
return cache_dir / f"parse_{digest}.json"
|
||||
|
||||
|
||||
def _load_cached_chapters(cache_file: Path, file_sig: list[dict]) -> list[dict] | None:
|
||||
if not cache_file.exists():
|
||||
return None
|
||||
|
||||
try:
|
||||
data = json.loads(cache_file.read_text(encoding="utf-8"))
|
||||
except Exception:
|
||||
return None
|
||||
|
||||
if data.get("version") != CACHE_VERSION:
|
||||
return None
|
||||
if data.get("file_signature") != file_sig:
|
||||
return None
|
||||
|
||||
chapters = data.get("chapters")
|
||||
if not isinstance(chapters, list):
|
||||
return None
|
||||
return chapters
|
||||
|
||||
|
||||
def _save_cached_chapters(cache_file: Path, file_sig: list[dict], chapters: list[dict]) -> None:
|
||||
cache_file.parent.mkdir(parents=True, exist_ok=True)
|
||||
payload = {
|
||||
"version": CACHE_VERSION,
|
||||
"file_signature": file_sig,
|
||||
"chapters": chapters,
|
||||
}
|
||||
cache_file.write_text(json.dumps(payload, ensure_ascii=False), encoding="utf-8")
|
||||
|
||||
|
||||
def _parse_chapters(files: list[Path]) -> tuple[list[dict], set[int]]:
|
||||
chapters: list[dict] = []
|
||||
duplicates: set[int] = set()
|
||||
seen: set[int] = set()
|
||||
current: dict | None = None
|
||||
|
||||
def flush_current() -> None:
|
||||
if current is not None:
|
||||
current["text"] = "".join(current.pop("lines"))
|
||||
num = current["num"]
|
||||
if num in seen:
|
||||
duplicates.add(num)
|
||||
return
|
||||
seen.add(num)
|
||||
chapters.append(current)
|
||||
|
||||
for fpath in files:
|
||||
with fpath.open("r", encoding="utf-8") as fh:
|
||||
for line in fh:
|
||||
info = _chapter_heading(line)
|
||||
if info is not None:
|
||||
flush_current()
|
||||
num, title, label = info
|
||||
num_str = f"{num:02d}"
|
||||
if num == 0:
|
||||
slug = "chapter_00_prologue"
|
||||
elif title:
|
||||
slug = f"chapter_{num_str}_{_slug(title)}"
|
||||
else:
|
||||
slug = f"chapter_{num_str}"
|
||||
current = {
|
||||
"num": num,
|
||||
"title": title,
|
||||
"label": label,
|
||||
"slug": slug,
|
||||
"lines": [line],
|
||||
}
|
||||
elif current is not None:
|
||||
current["lines"].append(line)
|
||||
|
||||
flush_current()
|
||||
chapters.sort(key=lambda c: c["num"])
|
||||
return chapters, duplicates
|
||||
|
||||
|
||||
def load_all_chapters_with_cache(inputs: list[str], output_dir: Path, force_reparse: bool = False) -> tuple[list[dict], bool, set[int], list[Path]]:
|
||||
files = _resolve_txt_files(inputs)
|
||||
if not files:
|
||||
raise FileNotFoundError("No .txt files found in --input paths")
|
||||
|
||||
file_sig = _signature_for_files(files)
|
||||
cache_file = _cache_path(output_dir, files)
|
||||
|
||||
if not force_reparse:
|
||||
cached = _load_cached_chapters(cache_file, file_sig)
|
||||
if cached is not None:
|
||||
return cached, True, set(), files
|
||||
|
||||
chapters, duplicates = _parse_chapters(files)
|
||||
_save_cached_chapters(cache_file, file_sig, chapters)
|
||||
return chapters, False, duplicates, files
|
||||
|
||||
|
||||
def warn_missing_chapters(chapters: list[dict]) -> None:
|
||||
nums = sorted(ch["num"] for ch in chapters if ch["num"] > 0)
|
||||
if not nums:
|
||||
return
|
||||
missing = [n for n in range(nums[0], nums[-1] + 1) if n not in set(nums)]
|
||||
if missing:
|
||||
print(f"WARNING: missing chapter numbers detected: {missing}")
|
||||
|
||||
|
||||
def generate_audio(pipeline: KPipeline, text: str, voice: str, output_path: Path) -> float:
|
||||
t0 = time.monotonic()
|
||||
chunks = []
|
||||
for _, _, chunk_audio in pipeline(text, voice=voice, speed=SPEED):
|
||||
if hasattr(chunk_audio, "numpy"):
|
||||
chunk_audio = chunk_audio.cpu().numpy()
|
||||
chunk_audio = np.atleast_1d(chunk_audio.squeeze())
|
||||
if chunk_audio.size > 0:
|
||||
chunks.append(chunk_audio)
|
||||
|
||||
elapsed = time.monotonic() - t0
|
||||
if chunks:
|
||||
audio = np.concatenate(chunks, axis=0)
|
||||
sf.write(str(output_path), audio, SAMPLE_RATE)
|
||||
duration = len(audio) / SAMPLE_RATE
|
||||
print(
|
||||
f" OK saved '{output_path.name}' "
|
||||
f"({_fmt_duration(duration)} audio | {_fmt_duration(elapsed)} wall-clock)"
|
||||
)
|
||||
else:
|
||||
print(f" ERROR no audio produced for voice='{voice}'")
|
||||
return elapsed
|
||||
|
||||
|
||||
def main() -> None:
|
||||
parser = argparse.ArgumentParser(description="Generate an audiobook from chapterized text files.")
|
||||
parser.add_argument(
|
||||
"chapters",
|
||||
nargs="*",
|
||||
type=int,
|
||||
help="Chapter numbers to generate (0 = Prologue). Default: all.",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--input",
|
||||
nargs="+",
|
||||
required=True,
|
||||
help="One or more .txt files and/or directories containing .txt files.",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--output",
|
||||
default="output_audiobook",
|
||||
help="Output directory for generated chapter audio.",
|
||||
)
|
||||
parser.add_argument("--list", action="store_true", help="Print detected chapters and exit.")
|
||||
parser.add_argument("--voice", default=VOICE, help=f"Kokoro voice to use (default: {VOICE}).")
|
||||
parser.add_argument(
|
||||
"--preview",
|
||||
nargs="?",
|
||||
const=3000,
|
||||
type=int,
|
||||
metavar="CHARS",
|
||||
help="Generate short preview clips capped at CHARS (default: 3000).",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--reparse",
|
||||
action="store_true",
|
||||
help="Ignore cache and re-parse chapters from source files.",
|
||||
)
|
||||
args = parser.parse_args()
|
||||
|
||||
output_dir = Path(args.output)
|
||||
output_dir.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
print("Loading chapters...")
|
||||
chapters, used_cache, duplicates, files = load_all_chapters_with_cache(
|
||||
args.input, output_dir, force_reparse=args.reparse
|
||||
)
|
||||
|
||||
print(f"Input files: {len(files)}")
|
||||
print(f"Parse cache: {'HIT' if used_cache else 'MISS'}")
|
||||
|
||||
if duplicates:
|
||||
print(f"WARNING: duplicate chapter numbers were found and ignored: {sorted(duplicates)}")
|
||||
|
||||
if not chapters:
|
||||
print("WARNING: no chapters found.")
|
||||
print("Expected headings like: 'Prologue' or 'Chapter 12 - Name' or 'Chapter - 12'")
|
||||
return
|
||||
|
||||
warn_missing_chapters(chapters)
|
||||
|
||||
if args.list:
|
||||
print(f"\nDetected {len(chapters)} chapters:\n")
|
||||
print(f" {'#':>4} {'Label':<45} {'Chars':>8} {'Output filename'}")
|
||||
print(f" {'-' * 4} {'-' * 45} {'-' * 8} {'-' * 30}")
|
||||
for ch in chapters:
|
||||
chars = len(_clean_text(ch["text"]))
|
||||
print(f" {ch['num']:>4} {ch['label']:<45} {chars:>8,} {ch['slug']}.wav")
|
||||
return
|
||||
|
||||
if args.chapters:
|
||||
requested = set(args.chapters)
|
||||
run_chapters = [ch for ch in chapters if ch["num"] in requested]
|
||||
missing_req = sorted(requested - {ch["num"] for ch in run_chapters})
|
||||
if missing_req:
|
||||
print(f"WARNING: requested chapter(s) not found: {missing_req}")
|
||||
else:
|
||||
run_chapters = chapters
|
||||
|
||||
if not run_chapters:
|
||||
print("No chapters selected. Use --list to see available chapters.")
|
||||
return
|
||||
|
||||
device = "cuda" if torch.cuda.is_available() else "cpu"
|
||||
print(f"Device: {device}")
|
||||
if device == "cuda":
|
||||
print(f"GPU: {torch.cuda.get_device_name(0)}")
|
||||
print(f"Voice: {args.voice}")
|
||||
|
||||
chapter_chars = {ch["num"]: len(_clean_text(ch["text"])) for ch in run_chapters}
|
||||
total_chars = sum(chapter_chars.values())
|
||||
|
||||
preview_note = f"PREVIEW MODE: capped at {args.preview:,} chars/chapter" if args.preview else ""
|
||||
if preview_note:
|
||||
print(preview_note)
|
||||
|
||||
print("\nPlan:")
|
||||
for ch in run_chapters:
|
||||
print(f" {ch['num']:>3} {ch['label']} ({chapter_chars[ch['num']]:,} chars)")
|
||||
print(f" TOTAL: {total_chars:,} chars\n")
|
||||
|
||||
print("Initializing Kokoro pipeline...")
|
||||
pipeline = KPipeline(lang_code=LANG_CODE)
|
||||
|
||||
chars_per_sec: float | None = None
|
||||
timing_rows: list[tuple[str, int, float]] = []
|
||||
|
||||
for ch in run_chapters:
|
||||
text = _clean_text(ch["text"])
|
||||
if not text:
|
||||
print(f"[{ch['label']}] WARNING empty text, skipping")
|
||||
continue
|
||||
|
||||
if args.preview and len(text) > args.preview:
|
||||
cut = text.rfind(" ", 0, args.preview)
|
||||
text = text[: cut if cut > 0 else args.preview]
|
||||
|
||||
chars = len(text)
|
||||
preview_tag = "_preview" if args.preview else ""
|
||||
out_path = output_dir / f"{ch['slug']}{preview_tag}.wav"
|
||||
|
||||
if chars_per_sec is not None:
|
||||
eta = _fmt_duration(chars / chars_per_sec)
|
||||
print(f"\n[{ch['label']}] -> {out_path.name} (est. {eta})")
|
||||
else:
|
||||
print(f"\n[{ch['label']}] -> {out_path.name} (calibration run)")
|
||||
|
||||
elapsed = generate_audio(pipeline, text, args.voice, out_path)
|
||||
timing_rows.append((ch["label"], chars, elapsed))
|
||||
|
||||
done_chars = sum(c for _, c, _ in timing_rows)
|
||||
done_elapsed = sum(e for _, _, e in timing_rows)
|
||||
if done_elapsed > 0:
|
||||
chars_per_sec = done_chars / done_elapsed
|
||||
remaining = total_chars - done_chars
|
||||
eta_total = _fmt_duration(remaining / chars_per_sec) if remaining > 0 else "0s"
|
||||
print(f" Speed: {chars_per_sec:.0f} chars/sec | Estimated remaining: {eta_total}")
|
||||
|
||||
print("\nSummary:")
|
||||
print(f" {'Chapter':<35} {'Chars':>7} {'Actual':>8} {'Est':>8}")
|
||||
print(" " + "-" * 65)
|
||||
for i, (label, chars, elapsed) in enumerate(timing_rows):
|
||||
actual_str = _fmt_duration(elapsed)
|
||||
prior_chars = sum(c for _, c, _ in timing_rows[:i])
|
||||
prior_elapsed = sum(e for _, _, e in timing_rows[:i])
|
||||
est_str = _fmt_duration(chars / (prior_chars / prior_elapsed)) if prior_elapsed > 0 else "(first)"
|
||||
print(f" {label:<35} {chars:>7,} {actual_str:>8} {est_str:>8}")
|
||||
|
||||
total_elapsed = sum(e for _, _, e in timing_rows)
|
||||
total_done_chars = sum(c for _, c, _ in timing_rows)
|
||||
print(" " + "-" * 65)
|
||||
print(f" {'TOTAL':<35} {total_done_chars:>7,} {_fmt_duration(total_elapsed):>8}")
|
||||
print("\nDone.")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
311
create_audiobook_lightbringer.py
Normal file
311
create_audiobook_lightbringer.py
Normal file
@ -0,0 +1,311 @@
|
||||
"""
|
||||
create_audiobook_lightbringer.py
|
||||
─────────────────────────────────
|
||||
Generate the "A Darkness Rising" audiobook — one file per chapter/prologue.
|
||||
|
||||
Reads all .txt files from NOVEL_DIR, detects Prologue + Chapter headings,
|
||||
and writes one .wav per chapter into OUTPUT_DIR.
|
||||
|
||||
Usage:
|
||||
python create_audiobook_lightbringer.py # all chapters
|
||||
python create_audiobook_lightbringer.py --list # list detected chapters
|
||||
python create_audiobook_lightbringer.py 0 1 2 # prologue + ch1 + ch2
|
||||
python create_audiobook_lightbringer.py --preview # short preview clips
|
||||
|
||||
Output filenames:
|
||||
chapter_00_prologue.wav
|
||||
chapter_01_homecoming.wav
|
||||
chapter_02_the_anhuil_ehlar.wav
|
||||
...
|
||||
"""
|
||||
|
||||
import argparse
|
||||
import re
|
||||
import time
|
||||
import numpy as np
|
||||
import soundfile as sf
|
||||
import torch
|
||||
from pathlib import Path
|
||||
from kokoro import KPipeline
|
||||
|
||||
# ── Config ─────────────────────────────────────────────────────────────────────
|
||||
NOVEL_DIR = Path("Audio Text for Novel Lightbringer")
|
||||
OUTPUT_DIR = Path("output_audiobook_lightbringer")
|
||||
SAMPLE_RATE = 24000
|
||||
SPEED = 1.0
|
||||
LANG_CODE = "a" # American English
|
||||
VOICE = "am_onyx" # default narrator voice
|
||||
|
||||
# Regex that matches a chapter/prologue heading line (case-insensitive).
|
||||
# Group 1 captures the chapter number (or None for Prologue).
|
||||
# Group 2 captures the optional subtitle after " - ".
|
||||
_HEADING_RE = re.compile(
|
||||
r"^(?:Chapter\s+(\d+)\s*(?:-\s*(.+))?|(Prologue))\s*$",
|
||||
re.IGNORECASE,
|
||||
)
|
||||
|
||||
|
||||
# ── Helpers ────────────────────────────────────────────────────────────────────
|
||||
|
||||
def _slug(text: str) -> str:
|
||||
"""Convert title text to a filesystem-safe slug."""
|
||||
text = text.lower()
|
||||
text = re.sub(r"[^a-z0-9]+", "_", text)
|
||||
return text.strip("_")
|
||||
|
||||
|
||||
def load_all_chapters(novel_dir: Path) -> list[dict]:
|
||||
"""
|
||||
Read all .txt files in *novel_dir* in sorted order, detect Prologue /
|
||||
Chapter headings, and return a list of chapter dicts:
|
||||
{
|
||||
"num": int, # 0 = Prologue
|
||||
"title": str, # subtitle portion, e.g. "Homecoming"
|
||||
"label": str, # human label, e.g. "Chapter 1 - Homecoming"
|
||||
"slug": str, # e.g. "chapter_01_homecoming"
|
||||
"text": str, # full body text of the chapter
|
||||
}
|
||||
Chapters from multiple files are concatenated in sorted-filename order.
|
||||
"""
|
||||
txt_files = sorted(novel_dir.glob("*.txt"))
|
||||
if not txt_files:
|
||||
raise FileNotFoundError(f"No .txt files found in '{novel_dir}'")
|
||||
|
||||
# Collect (chapter_num, title_line, body_lines) across all files
|
||||
raw: list[tuple[int, str, list[str]]] = [] # (num, heading_text, body)
|
||||
current_num: int | None = None
|
||||
current_heading: str = ""
|
||||
current_body: list[str] = []
|
||||
|
||||
def _flush():
|
||||
if current_num is not None:
|
||||
raw.append((current_num, current_heading, list(current_body)))
|
||||
|
||||
for fpath in txt_files:
|
||||
lines = fpath.read_text(encoding="utf-8").splitlines()
|
||||
for line in lines:
|
||||
m = _HEADING_RE.match(line.strip())
|
||||
if m:
|
||||
_flush()
|
||||
if m.group(3): # Prologue
|
||||
current_num = 0
|
||||
current_heading = "Prologue"
|
||||
else: # Chapter N
|
||||
current_num = int(m.group(1))
|
||||
subtitle = (m.group(2) or "").strip()
|
||||
current_heading = f"Chapter {current_num}" + (f" - {subtitle}" if subtitle else "")
|
||||
current_body = [line] # keep heading inside text
|
||||
else:
|
||||
if current_num is not None:
|
||||
current_body.append(line)
|
||||
_flush()
|
||||
|
||||
# Build chapter dicts, deduplicated and sorted by number
|
||||
seen: set[int] = set()
|
||||
chapters: list[dict] = []
|
||||
for num, heading, body in sorted(raw, key=lambda x: x[0]):
|
||||
if num in seen:
|
||||
continue
|
||||
seen.add(num)
|
||||
# Derive subtitle / slug
|
||||
subtitle = ""
|
||||
sm = re.match(r"Chapter\s+\d+\s*-\s*(.+)", heading, re.IGNORECASE)
|
||||
if sm:
|
||||
subtitle = sm.group(1).strip()
|
||||
elif heading.lower() == "prologue":
|
||||
subtitle = "Prologue"
|
||||
|
||||
num_str = f"{num:02d}"
|
||||
if subtitle:
|
||||
slug = f"chapter_{num_str}_{_slug(subtitle)}"
|
||||
else:
|
||||
slug = f"chapter_{num_str}"
|
||||
|
||||
chapters.append({
|
||||
"num": num,
|
||||
"title": subtitle or heading,
|
||||
"label": heading,
|
||||
"slug": slug,
|
||||
"text": "\n".join(body),
|
||||
})
|
||||
|
||||
return chapters
|
||||
|
||||
|
||||
def clean_text(text: str) -> str:
|
||||
"""Strip formatting artifacts and normalise whitespace for TTS."""
|
||||
# Remove horizontal-rule lines (underscores / asterisks / dashes)
|
||||
text = re.sub(r"^[_\-\*\s]{3,}\s*$", "", text, flags=re.MULTILINE)
|
||||
# Collapse 3+ blank lines to 2
|
||||
text = re.sub(r"\n{3,}", "\n\n", text)
|
||||
return text.strip()
|
||||
|
||||
|
||||
def _fmt_duration(seconds: float) -> str:
|
||||
h, rem = divmod(int(seconds), 3600)
|
||||
m, s = divmod(rem, 60)
|
||||
if h > 0:
|
||||
return f"{h}h {m:02d}m {s:02d}s"
|
||||
if m > 0:
|
||||
return f"{m}m {s:02d}s"
|
||||
return f"{s}s"
|
||||
|
||||
|
||||
def generate_audio(pipeline: KPipeline, text: str, voice: str,
|
||||
output_path: Path) -> float:
|
||||
"""Generate audio and return wall-clock seconds elapsed."""
|
||||
t0 = time.monotonic()
|
||||
chunks = []
|
||||
for _, _, chunk_audio in pipeline(text, voice=voice, speed=SPEED):
|
||||
if hasattr(chunk_audio, "numpy"):
|
||||
chunk_audio = chunk_audio.cpu().numpy()
|
||||
chunk_audio = np.atleast_1d(chunk_audio.squeeze())
|
||||
if chunk_audio.size > 0:
|
||||
chunks.append(chunk_audio)
|
||||
|
||||
elapsed = time.monotonic() - t0
|
||||
if chunks:
|
||||
audio = np.concatenate(chunks, axis=0)
|
||||
sf.write(str(output_path), audio, SAMPLE_RATE)
|
||||
duration = len(audio) / SAMPLE_RATE
|
||||
print(f" ✓ Saved '{output_path.name}' "
|
||||
f"({_fmt_duration(duration)} audio | {_fmt_duration(elapsed)} wall-clock)")
|
||||
else:
|
||||
print(f" ✗ No audio produced for voice='{voice}'")
|
||||
return elapsed
|
||||
|
||||
|
||||
# ── Main ───────────────────────────────────────────────────────────────────────
|
||||
|
||||
def main() -> None:
|
||||
parser = argparse.ArgumentParser(
|
||||
description="Generate 'A Darkness Rising' audiobook, one file per chapter."
|
||||
)
|
||||
parser.add_argument(
|
||||
"chapters", nargs="*", type=int,
|
||||
help="Chapter numbers to generate (0 = Prologue). Default: all.",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--list", action="store_true",
|
||||
help="Print detected chapters and exit.",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--voice", default=VOICE,
|
||||
help=f"Kokoro voice to use (default: {VOICE}).",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--preview", nargs="?", const=3000, type=int, metavar="CHARS",
|
||||
help="Generate short preview clips (default: 3000 chars). "
|
||||
"Output filenames get a _preview suffix.",
|
||||
)
|
||||
args = parser.parse_args()
|
||||
|
||||
print("Loading chapters …")
|
||||
all_chapters = load_all_chapters(NOVEL_DIR)
|
||||
|
||||
if args.list:
|
||||
print(f"\nDetected {len(all_chapters)} chapters:\n")
|
||||
print(f" {'#':>4} {'Label':<45} {'Chars':>8} {'Output filename'}")
|
||||
print(f" {'─'*4} {'─'*45} {'─'*8} {'─'*30}")
|
||||
for ch in all_chapters:
|
||||
chars = len(clean_text(ch["text"]))
|
||||
print(f" {ch['num']:>4} {ch['label']:<45} {chars:>8,} {ch['slug']}.wav")
|
||||
return
|
||||
|
||||
# Filter to requested subset
|
||||
if args.chapters:
|
||||
requested = set(args.chapters)
|
||||
run_chapters = [ch for ch in all_chapters if ch["num"] in requested]
|
||||
missing = requested - {ch["num"] for ch in run_chapters}
|
||||
if missing:
|
||||
print(f"⚠ Chapter(s) not found: {sorted(missing)}")
|
||||
else:
|
||||
run_chapters = all_chapters
|
||||
|
||||
if not run_chapters:
|
||||
print("No chapters selected. Use --list to see available chapters.")
|
||||
return
|
||||
|
||||
voice = args.voice
|
||||
device = "cuda" if torch.cuda.is_available() else "cpu"
|
||||
print(f"Device: {device}")
|
||||
if device == "cuda":
|
||||
print(f"GPU: {torch.cuda.get_device_name(0)}")
|
||||
print(f"Voice: {voice}")
|
||||
|
||||
OUTPUT_DIR.mkdir(exist_ok=True)
|
||||
|
||||
# Pre-compute char counts
|
||||
chapter_chars = {ch["num"]: len(clean_text(ch["text"])) for ch in run_chapters}
|
||||
|
||||
preview_note = (f" ⚡ PREVIEW MODE — capped at {args.preview:,} chars/chapter\n"
|
||||
if args.preview else "")
|
||||
print(f"\n{preview_note}{'─'*65}")
|
||||
print(f" {'#':>4} {'Label':<40} {'Chars':>8}")
|
||||
print(f" {'─'*4} {'─'*40} {'─'*8}")
|
||||
for ch in run_chapters:
|
||||
print(f" {ch['num']:>4} {ch['label']:<40} {chapter_chars[ch['num']]:>8,}")
|
||||
print(f" {'─'*55}")
|
||||
total_chars = sum(chapter_chars.values())
|
||||
print(f" {'TOTAL':<45} {total_chars:>8,}\n")
|
||||
|
||||
print("Initialising Kokoro pipeline …")
|
||||
pipeline = KPipeline(lang_code=LANG_CODE)
|
||||
|
||||
chars_per_sec: float | None = None
|
||||
timing_rows: list[tuple[str, int, float]] = []
|
||||
|
||||
for ch in run_chapters:
|
||||
text = clean_text(ch["text"])
|
||||
if not text:
|
||||
print(f"\n[{ch['label']}] ⚠ Empty text — skipping")
|
||||
continue
|
||||
|
||||
preview_chars = args.preview
|
||||
if preview_chars and len(text) > preview_chars:
|
||||
cut = text.rfind(" ", 0, preview_chars)
|
||||
text = text[: cut if cut > 0 else preview_chars]
|
||||
|
||||
chars = len(text)
|
||||
preview_tag = "_preview" if args.preview else ""
|
||||
out_path = OUTPUT_DIR / f"{ch['slug']}{preview_tag}.wav"
|
||||
|
||||
if chars_per_sec is not None:
|
||||
eta_str = _fmt_duration(chars / chars_per_sec)
|
||||
print(f"\n[{ch['label']}] voice={voice} → {out_path.name} (est. {eta_str})")
|
||||
else:
|
||||
print(f"\n[{ch['label']}] voice={voice} → {out_path.name} (calibration run)")
|
||||
|
||||
elapsed = generate_audio(pipeline, text, voice, out_path)
|
||||
timing_rows.append((ch["label"], chars, elapsed))
|
||||
|
||||
total_done = sum(c for _, c, _ in timing_rows)
|
||||
total_elapsed_done = sum(e for _, _, e in timing_rows)
|
||||
if total_elapsed_done > 0:
|
||||
chars_per_sec = total_done / total_elapsed_done
|
||||
remaining = total_chars - total_done
|
||||
eta_overall = _fmt_duration(remaining / chars_per_sec) if remaining > 0 else "0s"
|
||||
print(f" ⏱ Speed: {chars_per_sec:.0f} chars/sec | Est. overall remaining: {eta_overall}")
|
||||
|
||||
# Summary
|
||||
print("\n" + "─" * 65)
|
||||
print(f" {'Chapter':<35} {'Chars':>7} {'Actual':>8} {'Est':>8}")
|
||||
print("─" * 65)
|
||||
for i, (label, chars, elapsed) in enumerate(timing_rows):
|
||||
actual_str = _fmt_duration(elapsed)
|
||||
prior_chars = sum(c for _, c, _ in timing_rows[:i])
|
||||
prior_elapsed = sum(e for _, _, e in timing_rows[:i])
|
||||
if prior_elapsed > 0:
|
||||
est_str = _fmt_duration(chars / (prior_chars / prior_elapsed))
|
||||
else:
|
||||
est_str = "(first)"
|
||||
print(f" {label:<35} {chars:>7,} {actual_str:>8} {est_str:>8}")
|
||||
total_elapsed = sum(e for _, _, e in timing_rows)
|
||||
print("─" * 65)
|
||||
print(f" {'TOTAL':<35} {sum(c for _,c,_ in timing_rows):>7,} "
|
||||
f"{_fmt_duration(total_elapsed):>8}")
|
||||
print("\nDone.")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
@ -4,13 +4,19 @@ audiobook_nem.py
|
||||
Generate the Book of the Nem audiobook — one unique voice per book/section.
|
||||
|
||||
Usage:
|
||||
python audiobook_nem.py
|
||||
python create_audiobook_nem.py # all enabled books
|
||||
python create_audiobook_nem.py --list # list available book labels
|
||||
python create_audiobook_nem.py Introduction
|
||||
python create_audiobook_nem.py "Book of Hagoth"
|
||||
python create_audiobook_nem.py Introduction "Book of Hagoth"
|
||||
|
||||
To skip a section, comment out its entry in BOOKS below.
|
||||
To permanently skip a section, comment out its entry in BOOKS below.
|
||||
Output .wav files are written to OUTPUT_DIR (created automatically).
|
||||
"""
|
||||
|
||||
import argparse
|
||||
import re
|
||||
import time
|
||||
import numpy as np
|
||||
import soundfile as sf
|
||||
import torch
|
||||
@ -27,8 +33,12 @@ SPEED = 1.0
|
||||
LANG_CODE = "a" # 'a' = American English
|
||||
|
||||
# ── Available Kokoro voices (American English, lang_code='a') ──────────────────
|
||||
# af_heart – warm American female [downloaded]
|
||||
# af_bella – American female [downloaded]
|
||||
# af_heart – warm American female [downloaded]
|
||||
# af_nicole – American female [downloaded]
|
||||
# af_river – American female [downloaded]
|
||||
# af_sarah – American female [downloaded]
|
||||
# af_sky – American female [downloaded]
|
||||
# am_adam – American male (deep) [downloaded]
|
||||
# am_echo – American male [downloaded]
|
||||
# am_eric – American male [downloaded]
|
||||
@ -40,30 +50,30 @@ LANG_CODE = "a" # 'a' = American English
|
||||
# am_santa – American male [downloaded] (not used)
|
||||
|
||||
# ── Book definitions ───────────────────────────────────────────────────────────
|
||||
# Format: (label, start_marker, voice, output_wav)
|
||||
# start_marker – exact text of the FIRST line of the section header in the source
|
||||
# (leading/trailing whitespace is ignored when matching)
|
||||
# Format: (label, (start_line1, start_line2), voice, output_wav)
|
||||
# start_line1 – exact text of the FIRST line of the section header
|
||||
# start_line2 – prefix of the SECOND line (used together for unambiguous matching)
|
||||
# voice – Kokoro voice name
|
||||
# output_wav – filename saved inside OUTPUT_DIR
|
||||
#
|
||||
# Comment out any line to skip that section entirely.
|
||||
BOOKS = [
|
||||
# label start_marker voice output_wav
|
||||
("Introduction", "Introduction", "af_heart", "00_introduction.wav"),
|
||||
("Book of Hagoth", "THE BOOK OF HAGOTH", "am_fenrir", "01_hagoth.wav"),
|
||||
("Shi-Tugo I", "THE FIRST BOOK OF SHI-TUGO", "am_eric", "02_shi_tugo_1.wav"),
|
||||
("Sanempet", "THE BOOK OF SANEMPET", "am_liam", "03_sanempet.wav"),
|
||||
("Oug", "THE BOOK OF OUG", "am_michael", "04_oug.wav"),
|
||||
("Temple Writings of Oug", "THE BOOK OF", "am_michael", "05_temple_writings_oug.wav"),
|
||||
("Sacred Temple Writings", "THE SACRED", "am_michael", "06_sacred_temple_writings.wav"),
|
||||
("Samuel the Lamanite I", "THE FIRST BOOK", "am_echo", "07_samuel_lamanite_1.wav"),
|
||||
("Samuel the Lamanite II", "THE SECOND BOOK", "am_echo", "08_samuel_lamanite_2.wav"),
|
||||
("Manti", "THE BOOK OF MANTI", "am_onyx", "09_manti.wav"),
|
||||
("Pa Nat I", "THE FIRST BOOK OF PA NAT", "af_nicole", "10_pa_nat_1.wav"),
|
||||
("Moroni I", "THE FIRST BOOK OF MORONI", "am_adam", "11_moroni_1.wav"),
|
||||
("Moroni II", "THE SECOND BOOK OF MORONI", "am_adam", "12_moroni_2.wav"),
|
||||
("Moroni III", "THE THIRD BOOK OF MORONI", "am_adam", "13_moroni_3.wav"),
|
||||
("Shioni", "THE BOOK OF SHIONI", "am_puck", "14_shioni.wav"),
|
||||
# label (start_line1, start_line2) voice output_wav
|
||||
("Introduction", ("Introduction", "The Book of the Nem"), "af_heart", "00_introduction.wav"),
|
||||
("Book of Hagoth", ("THE BOOK OF HAGOTH", "THE SON OF HAGMENI,"), "am_santa", "01_hagoth.wav"),
|
||||
("Shi-Tugo I", ("THE FIRST BOOK OF SHI-TUGO", "FORMER WARRIOR, AMMONITE"), "am_eric", "02_shi_tugo_1.wav"),
|
||||
("Sanempet", ("THE BOOK OF SANEMPET", "THE SON OF HAGMENI,"), "am_liam", "03_sanempet.wav"),
|
||||
("Oug", ("THE BOOK OF OUG", "THE SON OF SANEMPET"), "am_michael", "04_oug.wav"),
|
||||
("Temple Writings of Oug", ("THE BOOK OF", "THE TEMPLE WRITINGS"), "am_michael", "05_temple_writings_oug.wav"),
|
||||
("Sacred Temple Writings", ("THE SACRED", "TEMPLE WRITINGS"), "am_michael", "06_sacred_temple_writings.wav"),
|
||||
("Samuel the Lamanite I", ("THE FIRST BOOK", "OF SAMUEL THE LAMANITE"), "am_echo", "07_samuel_lamanite_1.wav"),
|
||||
("Samuel the Lamanite II", ("THE SECOND BOOK", "OF SAMUEL THE LAMANITE"), "am_echo", "08_samuel_lamanite_2.wav"),
|
||||
("Manti", ("THE BOOK OF MANTI", "THE SON OF OUG"), "am_onyx", "09_manti.wav"),
|
||||
("Pa Nat I", ("THE FIRST BOOK OF PA NAT", "THE DAUGHTER OF SHIMLEI"), "af_bella", "10_pa_nat_1.wav"),
|
||||
("Moroni I", ("THE FIRST BOOK OF MORONI", "THE SON OF MORMON,"), "am_adam", "11_moroni_1.wav"),
|
||||
("Moroni II", ("THE SECOND BOOK OF MORONI", "THE SON OF MORMON,"), "am_adam", "12_moroni_2.wav"),
|
||||
("Moroni III", ("THE THIRD BOOK OF MORONI", "THE SON OF MORMON,"), "am_adam", "13_moroni_3.wav"),
|
||||
("Shioni", ("THE BOOK OF SHIONI", "THE SON OF MORONI"), "am_puck", "14_shioni.wav"),
|
||||
]
|
||||
|
||||
# ── Helpers ────────────────────────────────────────────────────────────────────
|
||||
@ -71,23 +81,36 @@ BOOKS = [
|
||||
def load_and_split(source: Path, books: list) -> dict[str, str]:
|
||||
"""
|
||||
Read the source file and split it into sections keyed by label.
|
||||
Each section starts at its start_marker line and ends just before the
|
||||
next section's start_marker.
|
||||
Each section starts at its (start_line1, start_line2) marker pair and
|
||||
ends just before the next section's marker.
|
||||
|
||||
Marker positions are always detected from the *original* unmodified file
|
||||
(_ORIG_FILE) when it exists, so that phonetic fixes applied to section
|
||||
headings in the TTS-fixed file can never break section detection. The
|
||||
line numbers are identical in both files because word-level replacements
|
||||
never add or remove lines.
|
||||
"""
|
||||
raw_lines = source.read_text(encoding="utf-8").splitlines()
|
||||
# Use the original (un-fixed) file for marker detection so phonetic
|
||||
# changes to heading lines don't break matching.
|
||||
marker_source = _ORIG_FILE if _ORIG_FILE.exists() else source
|
||||
marker_lines = marker_source.read_text(encoding="utf-8").splitlines()
|
||||
|
||||
# Build a mapping: marker_text → index in BOOKS
|
||||
markers = [(label, marker.strip()) for label, marker, _, _ in books]
|
||||
# The content to actually return comes from `source` (may be fixed file).
|
||||
content_lines = source.read_text(encoding="utf-8").splitlines()
|
||||
|
||||
# Find the line index of each marker's first occurrence
|
||||
# Build a mapping: (label, line1, line2) for each book
|
||||
markers = [(label, m[0].strip(), m[1].strip()) for label, m, _, _ in books]
|
||||
|
||||
# Find the line index of each marker's first occurrence (two-line match)
|
||||
marker_positions: list[tuple[int, int]] = [] # (line_idx, books_idx)
|
||||
for book_idx, (label, marker) in enumerate(markers):
|
||||
for line_idx, line in enumerate(raw_lines):
|
||||
if line.strip() == marker:
|
||||
for book_idx, (label, m1, m2) in enumerate(markers):
|
||||
for line_idx, line in enumerate(marker_lines[:-1]):
|
||||
if (line.strip().upper() == m1.upper() and
|
||||
marker_lines[line_idx + 1].strip().upper().startswith(m2.upper())):
|
||||
marker_positions.append((line_idx, book_idx))
|
||||
break
|
||||
else:
|
||||
print(f" ⚠ Marker not found for '{label}': '{marker}' — skipping")
|
||||
print(f" ⚠ Marker not found for '{label}': '{m1}' / '{m2}' — skipping")
|
||||
|
||||
marker_positions.sort(key=lambda x: x[0])
|
||||
|
||||
@ -97,8 +120,8 @@ def load_and_split(source: Path, books: list) -> dict[str, str]:
|
||||
if rank + 1 < len(marker_positions):
|
||||
end_line = marker_positions[rank + 1][0]
|
||||
else:
|
||||
end_line = len(raw_lines)
|
||||
text = "\n".join(raw_lines[line_idx:end_line]).strip()
|
||||
end_line = len(content_lines)
|
||||
text = "\n".join(content_lines[line_idx:end_line]).strip()
|
||||
sections[label] = text
|
||||
|
||||
return sections
|
||||
@ -118,8 +141,21 @@ def clean_text(text: str) -> str:
|
||||
return text.strip()
|
||||
|
||||
|
||||
def _fmt_duration(seconds: float) -> str:
|
||||
"""Format seconds as 'Xh Ym Zs', 'Xm Ys', or 'Xs'."""
|
||||
h, rem = divmod(int(seconds), 3600)
|
||||
m, s = divmod(rem, 60)
|
||||
if h > 0:
|
||||
return f"{h}h {m:02d}m {s:02d}s"
|
||||
if m > 0:
|
||||
return f"{m}m {s:02d}s"
|
||||
return f"{s}s"
|
||||
|
||||
|
||||
def generate_audio(pipeline: KPipeline, text: str, voice: str,
|
||||
output_path: Path) -> None:
|
||||
output_path: Path) -> float:
|
||||
"""Generate audio and return wall-clock seconds elapsed."""
|
||||
t0 = time.monotonic()
|
||||
chunks = []
|
||||
for _, _, chunk_audio in pipeline(text, voice=voice, speed=SPEED):
|
||||
if hasattr(chunk_audio, "numpy"):
|
||||
@ -131,15 +167,55 @@ def generate_audio(pipeline: KPipeline, text: str, voice: str,
|
||||
if chunks:
|
||||
audio = np.concatenate(chunks, axis=0)
|
||||
sf.write(str(output_path), audio, SAMPLE_RATE)
|
||||
elapsed = time.monotonic() - t0
|
||||
duration = len(audio) / SAMPLE_RATE
|
||||
print(f" ✓ Saved '{output_path.name}' ({duration:.1f}s)")
|
||||
print(f" ✓ Saved '{output_path.name}' ({_fmt_duration(duration)} audio | {_fmt_duration(elapsed)} wall-clock)")
|
||||
else:
|
||||
elapsed = time.monotonic() - t0
|
||||
print(f" ✗ No audio produced for voice='{voice}'")
|
||||
return elapsed
|
||||
|
||||
|
||||
# ── Main ───────────────────────────────────────────────────────────────────────
|
||||
|
||||
def main() -> None:
|
||||
# ── CLI ────────────────────────────────────────────────────────────
|
||||
parser = argparse.ArgumentParser(description="Generate Nem audiobook sections.")
|
||||
parser.add_argument(
|
||||
"books", nargs="*",
|
||||
help="Labels of sections to generate (default: all enabled books). "
|
||||
"Use --list to see available labels."
|
||||
)
|
||||
parser.add_argument(
|
||||
"--list", action="store_true",
|
||||
help="Print all enabled book labels and exit."
|
||||
)
|
||||
parser.add_argument(
|
||||
"--preview", nargs="?", const=3000, type=int, metavar="CHARS",
|
||||
help="Generate a short preview clip per book (default: 3000 chars). "
|
||||
"Output filenames get a _preview suffix."
|
||||
)
|
||||
args = parser.parse_args()
|
||||
|
||||
enabled_labels = [label for label, _, _, _ in BOOKS]
|
||||
|
||||
if args.list:
|
||||
print("Enabled books:")
|
||||
for label in enabled_labels:
|
||||
print(f" {label}")
|
||||
return
|
||||
|
||||
# Filter to requested subset, preserving BOOKS order
|
||||
if args.books:
|
||||
unknown = [b for b in args.books if b not in enabled_labels]
|
||||
if unknown:
|
||||
print(f"Unknown book label(s): {', '.join(unknown)}")
|
||||
print(f"Run with --list to see available labels.")
|
||||
return
|
||||
run_books = [b for b in BOOKS if b[0] in args.books]
|
||||
else:
|
||||
run_books = list(BOOKS)
|
||||
|
||||
device = "cuda" if torch.cuda.is_available() else "cpu"
|
||||
print(f"Device: {device}")
|
||||
if device == "cuda":
|
||||
@ -150,25 +226,95 @@ def main() -> None:
|
||||
print(f"\nSource: '{SOURCE_FILE}'"
|
||||
+ (" ✓ (TTS fixed)" if SOURCE_FILE == _FIXED_FILE else
|
||||
" ⚠ (original — run 'Apply Fixes to Text' in the GUI to use phonetic fixes)"))
|
||||
# Always split using ALL books for correct section boundaries,
|
||||
# but only generate for run_books.
|
||||
sections = load_and_split(SOURCE_FILE, BOOKS)
|
||||
print(f" Found {len(sections)} sections.\n")
|
||||
print(f" Found {len(sections)} sections ({len(run_books)} selected).\n")
|
||||
|
||||
print("Initialising Kokoro pipeline …")
|
||||
pipeline = KPipeline(lang_code=LANG_CODE)
|
||||
|
||||
for label, marker, voice, wav_name in BOOKS:
|
||||
if label not in sections:
|
||||
continue # marker was not found; warning already printed
|
||||
# Pre-compute char counts for all sections so we can estimate ETAs
|
||||
section_chars: dict[str, int] = {
|
||||
label: len(clean_text(sections[label]))
|
||||
for label, _, _, _ in run_books
|
||||
if label in sections
|
||||
}
|
||||
|
||||
print(f"\n[{label}] voice={voice} → {wav_name}")
|
||||
text = clean_text(sections[label])
|
||||
if not text:
|
||||
print(" ⚠ Empty text — skipping")
|
||||
# Print char count summary before starting
|
||||
preview_note = f" ⚡ PREVIEW MODE — capped at {args.preview:,} chars/book\n" if args.preview else ""
|
||||
print(f"\n{preview_note}{'─' * 52}")
|
||||
print(f" {'Section':<30} {'Chars':>8}")
|
||||
print(f"{'─' * 52}")
|
||||
for label, _, _, wav_name in run_books:
|
||||
if label in section_chars:
|
||||
print(f" {label:<30} {section_chars[label]:>8,}")
|
||||
print(f"{'─' * 52}")
|
||||
total_chars = sum(section_chars.values())
|
||||
print(f" {'TOTAL':<30} {total_chars:>8,}")
|
||||
print()
|
||||
|
||||
chars_per_sec: float | None = None # derived from the first book that finishes
|
||||
timing_rows: list[tuple[str, int, float]] = [] # (label, chars, elapsed)
|
||||
|
||||
for label, _marker, voice, wav_name in run_books:
|
||||
if label not in sections:
|
||||
continue
|
||||
|
||||
out_path = OUTPUT_DIR / wav_name
|
||||
generate_audio(pipeline, text, voice, out_path)
|
||||
text = clean_text(sections[label])
|
||||
if not text:
|
||||
print(f"\n[{label}] ⚠ Empty text — skipping")
|
||||
continue
|
||||
|
||||
# Preview mode: truncate to requested char limit at a word boundary
|
||||
preview_chars = args.preview
|
||||
if preview_chars:
|
||||
if len(text) > preview_chars:
|
||||
cut = text.rfind(" ", 0, preview_chars)
|
||||
text = text[: cut if cut > 0 else preview_chars]
|
||||
|
||||
chars = len(text)
|
||||
|
||||
# Print ETA once we have a calibration rate
|
||||
if chars_per_sec is not None:
|
||||
eta_sec = chars / chars_per_sec
|
||||
eta_str = _fmt_duration(eta_sec)
|
||||
print(f"\n[{label}] voice={voice} → {wav_name} (est. {eta_str})")
|
||||
else:
|
||||
print(f"\n[{label}] voice={voice} → {wav_name} (timing calibration run)")
|
||||
|
||||
stem, ext = wav_name.rsplit(".", 1)
|
||||
preview_tag = "_preview" if preview_chars else ""
|
||||
out_path = OUTPUT_DIR / f"{stem}_{voice}{preview_tag}.{ext}"
|
||||
elapsed = generate_audio(pipeline, text, voice, out_path)
|
||||
timing_rows.append((label, chars, elapsed))
|
||||
|
||||
# Update calibration as a cumulative average after every book
|
||||
total_chars_done = sum(c for _, c, _ in timing_rows)
|
||||
total_elapsed_done = sum(e for _, _, e in timing_rows)
|
||||
if total_elapsed_done > 0:
|
||||
chars_per_sec = total_chars_done / total_elapsed_done
|
||||
remaining = total_chars - total_chars_done
|
||||
eta_overall = _fmt_duration(remaining / chars_per_sec) if remaining > 0 else "0s"
|
||||
print(f" ⏱ Speed: {chars_per_sec:.0f} chars/sec | Est. overall remaining: {eta_overall}")
|
||||
|
||||
# ── Summary ────────────────────────────────────────────────────────────────
|
||||
print("\n" + "─" * 60)
|
||||
print(f" {'Section':<30} {'Chars':>7} {'Actual':>8} {'Est':>8}")
|
||||
print("─" * 60)
|
||||
for i, (label, chars, elapsed) in enumerate(timing_rows):
|
||||
actual_str = _fmt_duration(elapsed)
|
||||
# Estimate using the cumulative rate *before* this book was added
|
||||
prior_chars = sum(c for _, c, _ in timing_rows[:i])
|
||||
prior_elapsed = sum(e for _, _, e in timing_rows[:i])
|
||||
if prior_elapsed > 0:
|
||||
est_str = _fmt_duration(chars / (prior_chars / prior_elapsed))
|
||||
else:
|
||||
est_str = "(first run)"
|
||||
print(f" {label:<30} {chars:>7,} {actual_str:>8} {est_str:>8}")
|
||||
total_elapsed = sum(e for _, _, e in timing_rows)
|
||||
print("─" * 60)
|
||||
print(f" {'TOTAL':<30} {sum(c for _,c,_ in timing_rows):>7,} {_fmt_duration(total_elapsed):>8}")
|
||||
print("\nDone.")
|
||||
|
||||
|
||||
|
||||
352
create_temple_voices.py
Normal file
352
create_temple_voices.py
Normal file
@ -0,0 +1,352 @@
|
||||
"""
|
||||
create_temple_voices.py
|
||||
────────────────────────
|
||||
Generate the "Sacred Temple Writings" section of the Nem audiobook using one
|
||||
distinct Microsoft Edge neural TTS voice per character (NOT Kokoro).
|
||||
|
||||
Uses the free edge-tts library which streams Microsoft Azure neural voices.
|
||||
Audio is stitched into a single WAV and saved to OUTPUT_DIR.
|
||||
|
||||
Usage:
|
||||
python create_temple_voices.py # full render
|
||||
python create_temple_voices.py --preview 40 # first 40 segments only
|
||||
python create_temple_voices.py --print-segments # inspect parsed segments
|
||||
python create_temple_voices.py --list-voices # list available en voices
|
||||
|
||||
Voice assignments live in CHARACTER_VOICES below — easy to customise.
|
||||
Run --list-voices to discover all available edge-tts voice names.
|
||||
"""
|
||||
|
||||
import argparse
|
||||
import asyncio
|
||||
import re
|
||||
import subprocess
|
||||
import time
|
||||
from collections import Counter
|
||||
from pathlib import Path
|
||||
|
||||
import numpy as np
|
||||
import soundfile as sf
|
||||
import edge_tts
|
||||
|
||||
# ── File / output config ───────────────────────────────────────────────────────
|
||||
_FIXED_FILE = Path("Audio Master Nem Full (TTS Fixed).txt")
|
||||
_ORIG_FILE = Path("Audio Master Nem Full.txt")
|
||||
SOURCE_FILE = _FIXED_FILE if _FIXED_FILE.exists() else _ORIG_FILE
|
||||
|
||||
OUTPUT_DIR = Path("output_temple_voices")
|
||||
OUTPUT_FILE = "sacred_temple_writings_multivoice.wav"
|
||||
|
||||
SAMPLE_RATE = 24_000 # Hz — final WAV sample rate
|
||||
PAUSE_SAME = 350 # ms silence between same-speaker segments
|
||||
PAUSE_CHANGE = 650 # ms silence between different-speaker segments
|
||||
|
||||
# ── Section boundary markers (match create_audiobook_nem.py BOOKS order) ──────
|
||||
# Sacred Temple Writings starts at "THE SACRED" / "TEMPLE WRITINGS"
|
||||
# and ends just before "THE FIRST BOOK" / "OF SAMUEL THE LAMANITE"
|
||||
_SEC_START_L1 = "THE SACRED"
|
||||
_SEC_START_L2 = "TEMPLE WRITINGS"
|
||||
_SEC_END_L1 = "THE FIRST BOOK"
|
||||
_SEC_END_L2 = "OF SAMUEL THE LAMANITE"
|
||||
|
||||
# ── Character → edge-tts voice ────────────────────────────────────────────────
|
||||
# Run python create_temple_voices.py --list-voices to see all available voices.
|
||||
# Keys must match the speaker labels exactly as they appear in the source file.
|
||||
CHARACTER_VOICES: dict[str, str] = {
|
||||
# ── Celestial beings ───────────────────────────────────────────────────────
|
||||
"Narrator": "en-US-GuyNeural", # calm neutral narrator
|
||||
"Elohim Heavenly Mother": "en-US-JennyNeural", # warm, wise matriarch
|
||||
"Elohim Heavenly Father": "en-US-AndrewMultilingualNeural", # expressive, authoritative
|
||||
"Jehovah": "en-US-AndrewNeural", # clear, gentle divine
|
||||
"Angel of the Lord": "en-US-BrianNeural", # ethereal divine messenger
|
||||
"Holy Ghost": "en-US-EricNeural", # quiet, inward, spiritual
|
||||
"Holy Ghost Elders": "en-US-BrianNeural", # measured elder council
|
||||
|
||||
# ── Dark beings ────────────────────────────────────────────────────────────
|
||||
"Lucifer": "en-CA-LiamNeural", # smooth, persuasive tempter
|
||||
"Satan": "en-US-SteffanNeural", # cold, commanding adversary
|
||||
|
||||
# ── Mortal / earth characters ──────────────────────────────────────────────
|
||||
"Michael": "en-US-RogerNeural", # noble warrior archangel
|
||||
"Adam": "en-US-ChristopherNeural", # earnest first man
|
||||
"Eve": "en-US-AriaNeural", # curious, warm first woman
|
||||
|
||||
# ── Apostles ───────────────────────────────────────────────────────────────
|
||||
"Peter": "en-GB-RyanNeural", # firm British apostle
|
||||
"James": "en-AU-WilliamMultilingualNeural", # steady Australian voice
|
||||
"John": "en-IE-ConnorNeural", # gentle Irish apostle
|
||||
|
||||
# ── Other roles ────────────────────────────────────────────────────────────
|
||||
"Preacher": "en-US-AvaNeural", # bold emphatic preacher
|
||||
"Mob": "en-US-MichelleNeural", # crowd / multitude voice
|
||||
"The Voice of the Mob": "en-US-MichelleNeural", # alias used in some editions
|
||||
}
|
||||
|
||||
# Voice used when a speaker label isn't found in CHARACTER_VOICES
|
||||
FALLBACK_VOICE = "en-US-GuyNeural"
|
||||
|
||||
# Lines/patterns that are ceremony stage-directions → read by Narrator
|
||||
_STAGE_NARRATOR = re.compile(
|
||||
r"^(Break for Instruction|Resume Session|All\s+arise|"
|
||||
r"CHAPTER\s*\d*|________________+|────+)",
|
||||
re.IGNORECASE,
|
||||
)
|
||||
|
||||
# Lines to skip entirely (decorative / empty)
|
||||
_SKIP_RE = re.compile(r"^[—\-_\s\u2014\u2013]*$")
|
||||
|
||||
|
||||
# ── Section extraction ─────────────────────────────────────────────────────────
|
||||
|
||||
def extract_section(source: Path) -> str:
|
||||
"""Return text of the Sacred Temple Writings section."""
|
||||
lines = source.read_text(encoding="utf-8").splitlines()
|
||||
in_sec = False
|
||||
out: list[str] = []
|
||||
|
||||
for i, line in enumerate(lines):
|
||||
s = line.strip()
|
||||
if not in_sec:
|
||||
if (s.upper() == _SEC_START_L1 and
|
||||
i + 1 < len(lines) and
|
||||
lines[i + 1].strip().upper().startswith(_SEC_START_L2)):
|
||||
in_sec = True
|
||||
else:
|
||||
# End just before the next section
|
||||
if (s.upper() == _SEC_END_L1 and
|
||||
i + 1 < len(lines) and
|
||||
lines[i + 1].strip().upper().startswith(_SEC_END_L2)):
|
||||
break
|
||||
out.append(line)
|
||||
|
||||
if not out:
|
||||
raise RuntimeError(
|
||||
f"Could not locate 'Sacred Temple Writings' in '{source}'.\n"
|
||||
"Ensure the source file has a line exactly matching "
|
||||
f"'{_SEC_START_L1}' followed by '{_SEC_START_L2}'."
|
||||
)
|
||||
return "\n".join(out)
|
||||
|
||||
|
||||
# ── Segment parser ─────────────────────────────────────────────────────────────
|
||||
|
||||
def _speaker_regex(characters: list[str]) -> re.Pattern:
|
||||
"""Regex matching [optional-number] CharacterName: text"""
|
||||
# Sort longest-first so "Holy Ghost Elders" matches before "Holy Ghost"
|
||||
names = sorted(characters, key=len, reverse=True)
|
||||
pat = "|".join(re.escape(n) for n in names)
|
||||
return re.compile(r"^\d*\s*(" + pat + r")\s*:\s*(.*)", re.IGNORECASE)
|
||||
|
||||
|
||||
def parse_segments(text: str) -> list[tuple[str, str]]:
|
||||
"""
|
||||
Convert section text into a list of (normalised_speaker, spoken_text) tuples.
|
||||
Non-attributed prose becomes Narrator lines.
|
||||
"""
|
||||
char_re = _speaker_regex(list(CHARACTER_VOICES.keys()))
|
||||
|
||||
# Build a quick lowercase→canonical lookup for speaker name normalisation
|
||||
canon: dict[str, str] = {k.lower(): k for k in CHARACTER_VOICES}
|
||||
|
||||
segments: list[tuple[str, str]] = []
|
||||
cur_speaker = "Narrator"
|
||||
buf: list[str] = []
|
||||
|
||||
def flush() -> None:
|
||||
combined = " ".join(l.strip() for l in buf if l.strip())
|
||||
if combined:
|
||||
segments.append((cur_speaker, combined))
|
||||
buf.clear()
|
||||
|
||||
for raw in text.splitlines():
|
||||
line = raw.strip()
|
||||
|
||||
if not line or _SKIP_RE.match(line):
|
||||
continue
|
||||
|
||||
# Stage direction → Narrator reads it
|
||||
if _STAGE_NARRATOR.match(line):
|
||||
flush()
|
||||
cur_speaker = "Narrator"
|
||||
buf.append(line)
|
||||
continue
|
||||
|
||||
# "The words of Jehovah … are in blue." — formatting note, skip
|
||||
if re.search(r"are in blue|words of jehovah", line, re.IGNORECASE):
|
||||
continue
|
||||
|
||||
m = char_re.match(line)
|
||||
if m:
|
||||
flush()
|
||||
raw_name = m.group(1)
|
||||
cur_speaker = canon.get(raw_name.lower(), raw_name)
|
||||
spoken = m.group(2).strip()
|
||||
if spoken:
|
||||
buf.append(spoken)
|
||||
else:
|
||||
# Continuation of current speaker (or unattributed narrator prose)
|
||||
buf.append(line)
|
||||
|
||||
flush()
|
||||
return segments
|
||||
|
||||
|
||||
# ── Audio generation ───────────────────────────────────────────────────────────
|
||||
|
||||
async def _tts_bytes(text: str, voice: str) -> bytes:
|
||||
"""Stream edge-tts and return raw MP3 bytes."""
|
||||
communicate = edge_tts.Communicate(text, voice)
|
||||
data = bytearray()
|
||||
async for chunk in communicate.stream():
|
||||
if chunk["type"] == "audio":
|
||||
data.extend(chunk["data"])
|
||||
return bytes(data)
|
||||
|
||||
|
||||
def _mp3_to_numpy(mp3: bytes) -> np.ndarray:
|
||||
"""Decode MP3 bytes → mono float32 numpy array at SAMPLE_RATE using ffmpeg."""
|
||||
cmd = [
|
||||
"ffmpeg", "-hide_banner", "-loglevel", "error",
|
||||
"-i", "pipe:0", # read MP3 from stdin
|
||||
"-f", "f32le", # raw 32-bit little-endian float PCM
|
||||
"-acodec", "pcm_f32le",
|
||||
"-ac", "1", # mono
|
||||
"-ar", str(SAMPLE_RATE), # resample to target rate
|
||||
"pipe:1", # write PCM to stdout
|
||||
]
|
||||
result = subprocess.run(cmd, input=mp3, capture_output=True, check=True)
|
||||
return np.frombuffer(result.stdout, dtype=np.float32).copy()
|
||||
|
||||
|
||||
def _silence(ms: int) -> np.ndarray:
|
||||
return np.zeros(int(SAMPLE_RATE * ms / 1000), dtype=np.float32)
|
||||
|
||||
|
||||
async def render(
|
||||
segments: list[tuple[str, str]],
|
||||
preview: int | None = None,
|
||||
) -> np.ndarray:
|
||||
"""Generate and stitch all segment audio; return concatenated float32 array."""
|
||||
if preview is not None:
|
||||
segments = segments[:preview]
|
||||
|
||||
parts: list[np.ndarray] = []
|
||||
last_speaker: str | None = None
|
||||
t0 = time.monotonic()
|
||||
|
||||
for idx, (speaker, text) in enumerate(segments, 1):
|
||||
voice = CHARACTER_VOICES.get(speaker, FALLBACK_VOICE)
|
||||
marker = "⚠" if speaker not in CHARACTER_VOICES else " "
|
||||
print(f" {marker}[{idx:>4}/{len(segments)}] {speaker:<28} {voice}")
|
||||
|
||||
try:
|
||||
mp3 = await _tts_bytes(text, voice)
|
||||
except Exception as exc:
|
||||
print(f" ↳ ERROR with '{voice}': {exc} — falling back to {FALLBACK_VOICE}")
|
||||
mp3 = await _tts_bytes(text, FALLBACK_VOICE)
|
||||
|
||||
audio = _mp3_to_numpy(mp3)
|
||||
|
||||
if parts:
|
||||
gap = PAUSE_SAME if speaker == last_speaker else PAUSE_CHANGE
|
||||
parts.append(_silence(gap))
|
||||
parts.append(audio)
|
||||
last_speaker = speaker
|
||||
|
||||
elapsed = time.monotonic() - t0
|
||||
print(f"\n ✓ {len(segments)} segments in {elapsed:.0f}s")
|
||||
return np.concatenate(parts) if parts else np.array([], dtype=np.float32)
|
||||
|
||||
|
||||
# ── Voice listing ──────────────────────────────────────────────────────────────
|
||||
|
||||
async def _list_voices_async() -> None:
|
||||
voices = await edge_tts.list_voices()
|
||||
english = sorted(
|
||||
(v for v in voices if v["Locale"].startswith("en-")),
|
||||
key=lambda v: (v["Locale"], v["ShortName"]),
|
||||
)
|
||||
print(f"\n {'Locale':<12} {'Short Name':<45} Gender")
|
||||
print(" " + "─" * 68)
|
||||
for v in english:
|
||||
print(f" {v['Locale']:<12} {v['ShortName']:<45} {v['Gender']}")
|
||||
print(f"\n {len(english)} English voices total.")
|
||||
|
||||
|
||||
# ── CLI / main ─────────────────────────────────────────────────────────────────
|
||||
|
||||
def main() -> None:
|
||||
ap = argparse.ArgumentParser(
|
||||
description="Render Sacred Temple Writings with per-character edge-tts voices."
|
||||
)
|
||||
ap.add_argument("--list-voices", action="store_true",
|
||||
help="Print all available English edge-tts voices and exit.")
|
||||
ap.add_argument("--print-segments", action="store_true",
|
||||
help="Print parsed (speaker, text) segments and exit.")
|
||||
ap.add_argument("--preview", type=int, metavar="N",
|
||||
help="Render only the first N segments (quick test).")
|
||||
args = ap.parse_args()
|
||||
|
||||
if args.list_voices:
|
||||
asyncio.run(_list_voices_async())
|
||||
return
|
||||
|
||||
# ── Extract & parse ────────────────────────────────────────────────────────
|
||||
print(f"Source : {SOURCE_FILE}")
|
||||
text = extract_section(SOURCE_FILE)
|
||||
print(f"Section: {len(text):,} chars extracted\n")
|
||||
|
||||
segments = parse_segments(text)
|
||||
|
||||
if args.print_segments:
|
||||
print(f"Parsed {len(segments)} segments:\n")
|
||||
for i, (spkr, txt) in enumerate(segments, 1):
|
||||
snippet = txt[:90] + ("…" if len(txt) > 90 else "")
|
||||
voice = CHARACTER_VOICES.get(spkr, f"{FALLBACK_VOICE} ⚠")
|
||||
print(f" {i:>4}. [{spkr}] ({voice})\n {snippet}\n")
|
||||
return
|
||||
|
||||
# ── Summary table ──────────────────────────────────────────────────────────
|
||||
counts = Counter(s for s, _ in segments)
|
||||
unrecognised = {s for s in counts if s not in CHARACTER_VOICES}
|
||||
|
||||
print(f"Parsed {len(segments)} segments across {len(counts)} speakers:\n")
|
||||
print(f" {'Speaker':<28} {'Segs':>5} {'Voice'}")
|
||||
print(f" {'─'*28} {'─'*5} {'─'*45}")
|
||||
for spkr, voice in CHARACTER_VOICES.items():
|
||||
if counts[spkr]:
|
||||
print(f" {spkr:<28} {counts[spkr]:>5} {voice}")
|
||||
for spkr in sorted(unrecognised):
|
||||
print(f" {spkr:<28} {counts[spkr]:>5} {FALLBACK_VOICE} ⚠ unrecognised")
|
||||
|
||||
total_chars = sum(len(t) for _, t in segments)
|
||||
print(f"\n Total chars: {total_chars:,}")
|
||||
if args.preview:
|
||||
print(f" ⚡ PREVIEW MODE — rendering first {args.preview} segments only")
|
||||
|
||||
# ── GPU note ───────────────────────────────────────────────────────────────
|
||||
# edge-tts is cloud-based (Microsoft Azure neural, free) — GPU not used.
|
||||
print("\nNote: edge-tts uses Microsoft's servers (free, no API key needed).\n"
|
||||
" Render speed depends on your internet connection.\n")
|
||||
|
||||
# ── Render ─────────────────────────────────────────────────────────────────
|
||||
OUTPUT_DIR.mkdir(exist_ok=True)
|
||||
out_path = OUTPUT_DIR / (
|
||||
f"sacred_temple_writings_preview{args.preview}.wav"
|
||||
if args.preview else OUTPUT_FILE
|
||||
)
|
||||
|
||||
print("Rendering segments …\n")
|
||||
audio = asyncio.run(render(segments, args.preview))
|
||||
|
||||
if audio.size > 0:
|
||||
sf.write(str(out_path), audio, SAMPLE_RATE)
|
||||
dur = len(audio) / SAMPLE_RATE
|
||||
m, s = divmod(int(dur), 60)
|
||||
print(f"\n✓ Saved '{out_path}' ({m}m {s:02d}s audio | {SAMPLE_RATE} Hz)")
|
||||
else:
|
||||
print("✗ No audio produced — check parsing with --print-segments")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
@ -18,6 +18,25 @@ from collections import defaultdict
|
||||
from pathlib import Path
|
||||
|
||||
import spacy
|
||||
from wordfreq import top_n_list
|
||||
|
||||
# ── Top 10 000 most-frequent English words ──────────────────────────
|
||||
TOP_10K_ENGLISH: frozenset[str] = frozenset(top_n_list("en", 10_000))
|
||||
|
||||
# Words in the top-10k list that are genuine proper nouns in this text —
|
||||
# keep them despite the frequency filter.
|
||||
PROPER_NOUN_WHITELIST: frozenset[str] = frozenset({
|
||||
# Biblical names
|
||||
"aaron", "abel", "abraham", "adam", "cain", "eden", "egypt",
|
||||
"elijah", "ephraim", "eve", "gad", "ham", "isaac", "israel",
|
||||
"jacob", "james", "jehovah", "john", "joseph", "judah",
|
||||
"laban", "lehi", "levi", "micah", "michael", "moses", "noah",
|
||||
"peter", "pharaoh", "samuel", "sarah", "sarai", "seth", "simeon",
|
||||
"timothy", "zion",
|
||||
# Book-specific names that happen to match English words
|
||||
"alma", "ether", "gideon", "limhi", "mormon", "moroni", "mulek",
|
||||
"mosiah", "nephi", "satan", "sidon",
|
||||
})
|
||||
|
||||
SOURCE = Path("Audio Master Nem Full.txt")
|
||||
OUTPUT = Path("proper_nouns.txt")
|
||||
@ -35,12 +54,29 @@ ORG_LABELS = {"ORG", "NORP"}
|
||||
OTHER_LABELS = {"EVENT", "WORK_OF_ART", "LAW", "PRODUCT", "LANGUAGE"}
|
||||
|
||||
# ── Noise filters ──────────────────────────────────────────────────────────────
|
||||
# All-caps lines are section headers, not spoken names — skip them.
|
||||
# Also skip very short tokens that are likely artefacts.
|
||||
SKIP_PATTERNS = re.compile(
|
||||
r"^(THE|A|AN|AND|OF|IN|TO|FOR|BY|AT|IS|WAS|BE|HE|SHE|IT|"
|
||||
r"CHAPTER|VERSE|YEA|BEHOLD|LORD|GOD|CHRIST|HOLY|GHOST)$"
|
||||
)
|
||||
# Common English words that should be dropped when splitting multi-word entities.
|
||||
STOP_WORDS: set[str] = {
|
||||
"A", "AN", "AND", "AS", "AT", "BE", "BUT", "BY",
|
||||
"DO", "DID", "DOTH",
|
||||
"EVEN", "FOR", "FROM",
|
||||
"HAD", "HAS", "HAVE", "HATH", "HE", "HER", "HIS", "HOW",
|
||||
"I", "IN", "IS", "IT", "ITS",
|
||||
"MAY", "ME", "MORE", "MY",
|
||||
"NAY", "NO", "NOT", "NOW",
|
||||
"OF", "OR", "OUR",
|
||||
"SHALL", "SHE", "SO", "SOME",
|
||||
"THAT", "THE", "THEE", "THEIR", "THEN", "THERE", "THESE", "THEY",
|
||||
"THIS", "THOSE", "THOU", "THUS", "THY", "TO",
|
||||
"UP", "UPON", "US",
|
||||
"WAS", "WE", "WHEN", "WHERE", "WHICH", "WHO", "WILL", "WITH",
|
||||
"YE", "YEA", "YET", "YOU", "YOUR",
|
||||
# Book-specific common words not worth flagging
|
||||
"BEHOLD", "CHAPTER", "CHRIST", "GOD", "GHOST", "HOLY", "LORD", "VERSE",
|
||||
# Generic nouns that slip through NER
|
||||
"CITY", "DAYS", "DAY", "GREAT", "LAND", "MAN", "MEN", "NEW",
|
||||
"PEOPLE", "SON", "TIME",
|
||||
}
|
||||
|
||||
|
||||
def is_noise(text: str) -> bool:
|
||||
t = text.strip()
|
||||
@ -48,9 +84,12 @@ def is_noise(text: str) -> bool:
|
||||
return True
|
||||
if t.isupper() and len(t) > 4: # all-caps section header word
|
||||
return True
|
||||
if SKIP_PATTERNS.match(t.upper()):
|
||||
if t.upper() in STOP_WORDS:
|
||||
return True
|
||||
if re.search(r"[^a-zA-Z\-' ]", t): # contains digits or symbols
|
||||
if re.search(r"[^a-zA-Z\-']", t): # contains digits, spaces, or symbols
|
||||
return True
|
||||
# Drop common English words (no hyphens) unless whitelisted as proper nouns.
|
||||
if "-" not in t and t.lower() in TOP_10K_ENGLISH and t.lower() not in PROPER_NOUN_WHITELIST:
|
||||
return True
|
||||
return False
|
||||
|
||||
@ -60,6 +99,11 @@ def canonical(text: str) -> str:
|
||||
return " ".join(text.split()).title()
|
||||
|
||||
|
||||
def split_words(phrase: str) -> list[str]:
|
||||
"""Split a phrase on spaces; hyphenated words are kept as one token."""
|
||||
return phrase.split()
|
||||
|
||||
|
||||
# ── Read and process ───────────────────────────────────────────────────────────
|
||||
print(f"Reading '{SOURCE}' …")
|
||||
raw_text = SOURCE.read_text(encoding="utf-8")
|
||||
@ -71,20 +115,23 @@ doc = nlp(raw_text)
|
||||
buckets: dict[str, set[str]] = defaultdict(set)
|
||||
|
||||
# 1. NER pass — trust spaCy's entity labels
|
||||
# Multi-word entities (e.g. "Peter James John") are split into individual
|
||||
# words; hyphenated words (e.g. "Anti-Nephi-Lehi") stay as one token.
|
||||
for ent in doc.ents:
|
||||
name = canonical(ent.text)
|
||||
if is_noise(name):
|
||||
continue
|
||||
if ent.label_ in PERSON_LABELS:
|
||||
buckets["People & Characters"].add(name)
|
||||
elif ent.label_ in PLACE_LABELS:
|
||||
buckets["Places & Lands"].add(name)
|
||||
elif ent.label_ in ORG_LABELS:
|
||||
buckets["Groups & Nations"].add(name)
|
||||
elif ent.label_ in OTHER_LABELS:
|
||||
buckets["Other Named Things"].add(name)
|
||||
else:
|
||||
buckets["Other Named Things"].add(name)
|
||||
phrase = canonical(ent.text)
|
||||
for word in split_words(phrase):
|
||||
if is_noise(word):
|
||||
continue
|
||||
if ent.label_ in PERSON_LABELS:
|
||||
buckets["People & Characters"].add(word)
|
||||
elif ent.label_ in PLACE_LABELS:
|
||||
buckets["Places & Lands"].add(word)
|
||||
elif ent.label_ in ORG_LABELS:
|
||||
buckets["Groups & Nations"].add(word)
|
||||
elif ent.label_ in OTHER_LABELS:
|
||||
buckets["Other Named Things"].add(word)
|
||||
else:
|
||||
buckets["Other Named Things"].add(word)
|
||||
|
||||
# 2. PROPN pass — catch names spaCy didn't recognise as entities
|
||||
# Only include tokens that are inside a sentence (not at position 0)
|
||||
@ -97,13 +144,13 @@ for token in doc:
|
||||
continue # skip all-caps
|
||||
if token.i == token.sent.start:
|
||||
continue # skip sentence-initial (could be any word)
|
||||
name = canonical(text)
|
||||
if is_noise(name):
|
||||
word = canonical(text)
|
||||
if is_noise(word):
|
||||
continue
|
||||
# Only add if not already captured by NER
|
||||
already_captured = any(name in s for s in buckets.values())
|
||||
already_captured = any(word in s for s in buckets.values())
|
||||
if not already_captured:
|
||||
buckets["Unclassified Proper Nouns"].add(name)
|
||||
buckets["Unclassified Proper Nouns"].add(word)
|
||||
|
||||
# ── Write output ───────────────────────────────────────────────────────────────
|
||||
GROUP_ORDER = [
|
||||
|
||||
801
format_scripture.py
Normal file
801
format_scripture.py
Normal file
@ -0,0 +1,801 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
create_scripture_pdf.py
|
||||
════════════════════════
|
||||
Convert the Book of the Nem plain-text file into two scripture-style PDFs:
|
||||
|
||||
nem_kindle.pdf – single-column, sized for e-readers (4.5" × 6.5")
|
||||
nem_paper.pdf – two-column, Book of Mormon style (5.5" × 8.5")
|
||||
|
||||
Requirements (Debian/Ubuntu):
|
||||
sudo apt-get install texlive-latex-extra texlive-fonts-recommended
|
||||
|
||||
The key packages used are:
|
||||
extsizes – for 9 pt document class (paper format)
|
||||
tgpagella – TeX Gyre Pagella (Palatino-clone) font
|
||||
multicol – two-column layout without hard page breaks
|
||||
microtype – improved text justification and hyphenation
|
||||
fancyhdr – running headers and footers
|
||||
needspace – prevent orphaned headings
|
||||
|
||||
Usage:
|
||||
python create_scripture_pdf.py
|
||||
python create_scripture_pdf.py --input "Audio Master Nem Full.txt"
|
||||
python create_scripture_pdf.py --kindle-only
|
||||
python create_scripture_pdf.py --paper-only
|
||||
python create_scripture_pdf.py --output-dir ./pdfs
|
||||
python create_scripture_pdf.py --keep-tex # keep .tex files for debugging
|
||||
"""
|
||||
|
||||
import argparse
|
||||
import re
|
||||
import subprocess
|
||||
import sys
|
||||
import tempfile
|
||||
from dataclasses import dataclass
|
||||
from pathlib import Path
|
||||
from typing import Optional
|
||||
|
||||
# ── Default paths ──────────────────────────────────────────────────────────────
|
||||
INPUT_FILE = Path("Audio Master Nem Full.txt")
|
||||
OUTPUT_DIR = Path("output_pdf")
|
||||
|
||||
# ══════════════════════════════════════════════════════════════════════════════
|
||||
# LaTeX helper
|
||||
# ══════════════════════════════════════════════════════════════════════════════
|
||||
|
||||
_LATEX_TRANS = str.maketrans({
|
||||
"\\": r"\textbackslash{}",
|
||||
"&": r"\&",
|
||||
"%": r"\%",
|
||||
"$": r"\$",
|
||||
"#": r"\#",
|
||||
"_": r"\_",
|
||||
"{": r"\{",
|
||||
"}": r"\}",
|
||||
"~": r"\textasciitilde{}",
|
||||
"^": r"\textasciicircum{}",
|
||||
"\u2014": "---", # em dash
|
||||
"\u2013": "--", # en dash
|
||||
"\u2018": "`", # left single quote
|
||||
"\u2019": "'", # right single quote
|
||||
"\u201c": "``", # left double quote
|
||||
"\u201d": "''", # right double quote
|
||||
"\u2026": r"\ldots{}", # ellipsis
|
||||
"\u00e9": r"\'e",
|
||||
"\u00e8": r"\`e",
|
||||
"\u00ea": r"\^e",
|
||||
"\u00e0": r"\`a",
|
||||
"\u00e2": r"\^a",
|
||||
"\u00f3": r"\'o",
|
||||
"\u00ed": r"\'{\i}",
|
||||
})
|
||||
|
||||
|
||||
def esc(text: str) -> str:
|
||||
"""Escape special LaTeX characters in a string."""
|
||||
return text.translate(_LATEX_TRANS)
|
||||
|
||||
|
||||
# ══════════════════════════════════════════════════════════════════════════════
|
||||
# Document element types
|
||||
# ══════════════════════════════════════════════════════════════════════════════
|
||||
|
||||
@dataclass
|
||||
class TitlePage:
|
||||
lines: list
|
||||
|
||||
|
||||
@dataclass
|
||||
class BookHeader:
|
||||
"""One or more heading lines that introduce a new book/section."""
|
||||
lines: list # list of str
|
||||
|
||||
|
||||
@dataclass
|
||||
class Chapter:
|
||||
num: int
|
||||
subtitle: Optional[str] = None
|
||||
|
||||
|
||||
@dataclass
|
||||
class SectionHeading:
|
||||
"""Short heading within a chapter (e.g. MARRIAGE, BAPTISM)."""
|
||||
text: str
|
||||
|
||||
|
||||
@dataclass
|
||||
class Verse:
|
||||
num: int
|
||||
text: str
|
||||
|
||||
|
||||
@dataclass
|
||||
class Paragraph:
|
||||
text: str
|
||||
|
||||
|
||||
# ══════════════════════════════════════════════════════════════════════════════
|
||||
# Parser
|
||||
# ══════════════════════════════════════════════════════════════════════════════
|
||||
|
||||
_RE_VERSE = re.compile(r"^\s*(\d+)\s+(.*)")
|
||||
_RE_CHAPTER = re.compile(r"^\s*CHAPTER\s+(\d+)\s*$", re.IGNORECASE)
|
||||
_RE_DIVIDER = re.compile(r"^_{4,}")
|
||||
|
||||
# Lines longer than this are treated as body paragraphs rather than headings
|
||||
MAX_HEADING_LEN = 120
|
||||
|
||||
|
||||
def _is_verse(line: str) -> bool:
|
||||
"""Line starts with a verse number followed by text."""
|
||||
m = _RE_VERSE.match(line)
|
||||
return bool(m) and int(m.group(1)) > 0
|
||||
|
||||
|
||||
def _is_chapter(line: str) -> bool:
|
||||
return bool(_RE_CHAPTER.match(line.strip()))
|
||||
|
||||
|
||||
def _is_divider(line: str) -> bool:
|
||||
return bool(_RE_DIVIDER.match(line.strip()))
|
||||
|
||||
|
||||
def _is_allcaps(line: str) -> bool:
|
||||
s = line.strip()
|
||||
return bool(s) and s == s.upper() and any(c.isalpha() for c in s)
|
||||
|
||||
|
||||
def parse(text: str) -> list:
|
||||
"""Parse the scripture text into a list of Element objects."""
|
||||
lines = text.splitlines()
|
||||
elements = []
|
||||
n = len(lines)
|
||||
i = 0
|
||||
|
||||
# ── Title page: short lines before the first divider ──────────────────────
|
||||
# Short lines (≤80 chars) are the actual title. Long prose before the first
|
||||
# divider is ignored so it does not duplicate the later labeled Introduction.
|
||||
title_lines = []
|
||||
while i < n and not _is_divider(lines[i]):
|
||||
title_lines.append(lines[i])
|
||||
i += 1
|
||||
actual_title = []
|
||||
for l in title_lines:
|
||||
s = l.strip()
|
||||
if not s:
|
||||
continue
|
||||
if len(s) <= 80:
|
||||
actual_title.append(s)
|
||||
if actual_title:
|
||||
elements.append(TitlePage(lines=actual_title))
|
||||
|
||||
# ── Main pass ─────────────────────────────────────────────────────────────
|
||||
after_divider = False
|
||||
|
||||
while i < n:
|
||||
raw = lines[i]
|
||||
line = raw.strip()
|
||||
|
||||
# ── Divider ───────────────────────────────────────────────────────────
|
||||
if _is_divider(raw):
|
||||
after_divider = True
|
||||
i += 1
|
||||
continue
|
||||
|
||||
# ── Blank line ────────────────────────────────────────────────────────
|
||||
if not line:
|
||||
i += 1
|
||||
continue
|
||||
|
||||
# ── After a divider: collect section/book header ───────────────────
|
||||
# Collect all short non-verse non-chapter lines immediately following
|
||||
# the divider. Stop as soon as we hit a long prose line or body content.
|
||||
if after_divider:
|
||||
after_divider = False
|
||||
header_lines = []
|
||||
j = i
|
||||
while j < n:
|
||||
s = lines[j].strip()
|
||||
if not s: # blank: keep scanning
|
||||
j += 1
|
||||
continue
|
||||
if _is_verse(lines[j]) or _is_chapter(lines[j]):
|
||||
break # reached verse/chapter body
|
||||
if len(s) > MAX_HEADING_LEN:
|
||||
break # long prose line: stop here
|
||||
header_lines.append(s)
|
||||
j += 1
|
||||
if header_lines:
|
||||
elements.append(BookHeader(lines=header_lines))
|
||||
i = j
|
||||
continue
|
||||
|
||||
# ── Chapter heading ────────────────────────────────────────────────
|
||||
m = _RE_CHAPTER.match(line)
|
||||
if m:
|
||||
num = int(m.group(1))
|
||||
# Look ahead for an optional subtitle (short non-verse line)
|
||||
j = i + 1
|
||||
subtitle = None
|
||||
while j < n and not lines[j].strip():
|
||||
j += 1
|
||||
if j < n:
|
||||
ns = lines[j].strip()
|
||||
if (ns
|
||||
and not _is_verse(lines[j])
|
||||
and not _is_chapter(lines[j])
|
||||
and not _is_divider(lines[j])
|
||||
and len(ns) <= MAX_HEADING_LEN):
|
||||
subtitle = ns
|
||||
i = j + 1
|
||||
else:
|
||||
i += 1
|
||||
else:
|
||||
i += 1
|
||||
elements.append(Chapter(num=num, subtitle=subtitle))
|
||||
continue
|
||||
|
||||
# ── All-caps lines: either a BookHeader cluster or a SectionHeading ─
|
||||
# If the cluster of consecutive all-caps lines is followed (after any
|
||||
# blanks) by a CHAPTER heading, treat the whole cluster as a BookHeader.
|
||||
# Otherwise treat only the first line as a SectionHeading.
|
||||
if _is_allcaps(line) and len(line) <= MAX_HEADING_LEN and not _is_verse(raw):
|
||||
# Gather consecutive all-caps lines (blanks skipped)
|
||||
j = i
|
||||
caps_block = []
|
||||
while j < n:
|
||||
s = lines[j].strip()
|
||||
if not s:
|
||||
j += 1
|
||||
continue
|
||||
if (_is_allcaps(s)
|
||||
and len(s) <= MAX_HEADING_LEN
|
||||
and not _is_verse(lines[j])
|
||||
and not _is_chapter(lines[j])
|
||||
and not _is_divider(lines[j])):
|
||||
caps_block.append(s)
|
||||
j += 1
|
||||
else:
|
||||
break
|
||||
# Look past any blanks to see if a chapter heading follows
|
||||
k = j
|
||||
while k < n and not lines[k].strip():
|
||||
k += 1
|
||||
if k < n and _is_chapter(lines[k]):
|
||||
# This cluster is a book/section header
|
||||
elements.append(BookHeader(lines=caps_block))
|
||||
i = j
|
||||
else:
|
||||
# Single inline section subheading (MARRIAGE, BAPTISM, etc.)
|
||||
elements.append(SectionHeading(text=caps_block[0] if caps_block else line))
|
||||
i = i + 1
|
||||
continue
|
||||
|
||||
# ── Verse ─────────────────────────────────────────────────────────
|
||||
if _is_verse(raw):
|
||||
mfull = _RE_VERSE.match(raw)
|
||||
elements.append(Verse(num=int(mfull.group(1)), text=mfull.group(2).strip()))
|
||||
i += 1
|
||||
continue
|
||||
|
||||
# ── Paragraph ─────────────────────────────────────────────────────
|
||||
elements.append(Paragraph(text=line))
|
||||
i += 1
|
||||
|
||||
return elements
|
||||
|
||||
|
||||
# ══════════════════════════════════════════════════════════════════════════════
|
||||
# LaTeX generation
|
||||
# ══════════════════════════════════════════════════════════════════════════════
|
||||
|
||||
_PREAMBLE_SHARED = r"""
|
||||
\usepackage[T1]{fontenc}
|
||||
\usepackage[utf8]{inputenc}
|
||||
\usepackage{tgpagella}
|
||||
\usepackage{microtype}
|
||||
\usepackage{fancyhdr}
|
||||
\usepackage{needspace}
|
||||
\setlength{\headheight}{14pt}
|
||||
\addtolength{\topmargin}{-2pt}
|
||||
\usepackage[hidelinks]{hyperref}
|
||||
"""
|
||||
|
||||
|
||||
def _hrule() -> str:
|
||||
return r"\noindent\rule{\linewidth}{0.3pt}"
|
||||
|
||||
|
||||
# ── Kindle (single-column, e-reader sized) ────────────────────────────────────
|
||||
|
||||
def build_kindle_latex(elements: list) -> str:
|
||||
"""Build a single-column LaTeX document sized for e-readers."""
|
||||
out = []
|
||||
# extarticle (from extsizes) gives us 11pt; plain article also supports it
|
||||
out.append(r"\documentclass[11pt]{extarticle}")
|
||||
out.append(r"""
|
||||
\usepackage[paperwidth=4.5in,paperheight=6.5in,
|
||||
top=0.08in,bottom=0.5in,
|
||||
inner=0.42in,outer=0.38in,
|
||||
headheight=12pt,headsep=6pt,
|
||||
includehead]{geometry}""")
|
||||
out.append(_PREAMBLE_SHARED)
|
||||
out.append(r"""
|
||||
\pagestyle{fancy}
|
||||
\fancyhf{}
|
||||
\fancyhead[C]{\small\itshape\nouppercase{\leftmark}}
|
||||
\fancyfoot[C]{\small\thepage}
|
||||
\renewcommand{\headrulewidth}{0.3pt}
|
||||
|
||||
\setlength{\parindent}{0pt}
|
||||
\setlength{\parskip}{3pt plus 1pt minus 1pt}
|
||||
|
||||
\begin{document}
|
||||
""")
|
||||
# Handle title page separately so we can insert TOC after it
|
||||
title_els = [e for e in elements if isinstance(e, TitlePage)]
|
||||
body_els = [e for e in elements if not isinstance(e, TitlePage)]
|
||||
if title_els:
|
||||
out.append(r"\clearpage")
|
||||
out.append(r"\thispagestyle{empty}")
|
||||
out.append(r"\vspace*{1.3in}")
|
||||
out.append(r"\begin{center}")
|
||||
for j, tl in enumerate(title_els[0].lines):
|
||||
s = tl.strip()
|
||||
if not s:
|
||||
continue
|
||||
if j < 3:
|
||||
out.append(r"{\LARGE\bfseries " + esc(s) + r"} \\[8pt]")
|
||||
else:
|
||||
out.append(r"{\large " + esc(s) + r"} \\[4pt]")
|
||||
out.append(r"\end{center}")
|
||||
out.append(r"\clearpage")
|
||||
out.append(r"\renewcommand{\contentsname}{Table of Contents}")
|
||||
out.append(r"\tableofcontents")
|
||||
out.append(r"\clearpage")
|
||||
_emit_elements(out, body_els, kindle=True)
|
||||
out.append(r"\end{document}")
|
||||
return "\n".join(out)
|
||||
|
||||
|
||||
# ── Paper / BOM style (two-column) ────────────────────────────────────────────
|
||||
|
||||
def build_paper_latex(elements: list) -> str:
|
||||
"""Build a two-column, Book of Mormon-style LaTeX document."""
|
||||
out = []
|
||||
# extarticle (from extsizes) for 9pt support
|
||||
out.append(r"\documentclass[9pt,twoside]{extarticle}")
|
||||
out.append(r"""
|
||||
\usepackage[paperwidth=5.5in,paperheight=8.5in,
|
||||
top=0.08in,bottom=0.55in,
|
||||
inner=0.5in,outer=0.42in,
|
||||
headheight=10pt,headsep=5pt,
|
||||
includehead]{geometry}""")
|
||||
out.append(_PREAMBLE_SHARED)
|
||||
out.append(r"""
|
||||
\usepackage{multicol}
|
||||
\setlength{\columnsep}{0.22in}
|
||||
\setlength{\columnseprule}{0.3pt}
|
||||
|
||||
\pagestyle{fancy}
|
||||
\fancyhf{}
|
||||
\fancyhead[LE]{\footnotesize\itshape\nouppercase{\leftmark}}
|
||||
\fancyhead[RO]{\footnotesize\itshape\nouppercase{\rightmark}}
|
||||
\fancyfoot[C]{\scriptsize\thepage}
|
||||
\renewcommand{\headrulewidth}{0.3pt}
|
||||
|
||||
\setlength{\parindent}{0pt}
|
||||
\setlength{\parskip}{1pt}
|
||||
|
||||
\begin{document}
|
||||
""")
|
||||
|
||||
# Emit the title page outside multicols (single-column block)
|
||||
title_els = [e for e in elements if isinstance(e, TitlePage)]
|
||||
body_els = [e for e in elements if not isinstance(e, TitlePage)]
|
||||
|
||||
if title_els:
|
||||
out.append(r"\begin{center}")
|
||||
for j, tl in enumerate(title_els[0].lines):
|
||||
s = tl.strip()
|
||||
if not s:
|
||||
continue
|
||||
if j < 3:
|
||||
out.append(r"{\large\bfseries " + esc(s) + r"} \\[3pt]")
|
||||
else:
|
||||
out.append(r"{\small " + esc(s) + r"} \\[1pt]")
|
||||
out.append(r"\end{center}")
|
||||
out.append(r"\medskip")
|
||||
|
||||
out.append(r"\renewcommand{\contentsname}{Table of Contents}")
|
||||
out.append(r"\tableofcontents")
|
||||
out.append(r"\clearpage")
|
||||
|
||||
# Skip any leading front-matter paragraphs before the first section header.
|
||||
# For paper output, the intro should begin at the labeled "Introduction"
|
||||
# section rather than repeating the pre-divider prose block.
|
||||
first_section = next(
|
||||
(i for i, el in enumerate(body_els) if isinstance(el, BookHeader)),
|
||||
len(body_els),
|
||||
)
|
||||
paper_body_els = body_els[first_section:]
|
||||
|
||||
# Split intro (before first real book) from main body.
|
||||
# A "real book" is a BookHeader that is followed by at least one Chapter
|
||||
# before the next BookHeader. "Introduction" and similar preamble sections
|
||||
# are BookHeaders too but have no chapters, so they stay in the intro.
|
||||
first_book = len(paper_body_els)
|
||||
for i, el in enumerate(paper_body_els):
|
||||
if isinstance(el, BookHeader):
|
||||
# Check if a Chapter follows before the next BookHeader
|
||||
for j in range(i + 1, len(paper_body_els)):
|
||||
if isinstance(paper_body_els[j], Chapter):
|
||||
first_book = i
|
||||
break
|
||||
if isinstance(paper_body_els[j], BookHeader):
|
||||
break
|
||||
if first_book < len(paper_body_els):
|
||||
break
|
||||
intro_els = paper_body_els[:first_book]
|
||||
main_els = paper_body_els[first_book:]
|
||||
|
||||
if intro_els:
|
||||
_emit_elements(out, intro_els, kindle=True, compact_headers=True)
|
||||
out.append(r"\clearpage")
|
||||
|
||||
out.append(r"\begin{multicols}{2}")
|
||||
_emit_elements(out, main_els, kindle=False)
|
||||
out.append(r"\end{multicols}")
|
||||
out.append(r"\end{document}")
|
||||
return "\n".join(out)
|
||||
|
||||
|
||||
# ── Body emitter ──────────────────────────────────────────────────────────────
|
||||
|
||||
def _emit_elements(
|
||||
out: list,
|
||||
elements: list,
|
||||
kindle: bool,
|
||||
indent: bool = False,
|
||||
compact_headers: bool = False,
|
||||
) -> None:
|
||||
"""Translate parsed Element objects into LaTeX markup."""
|
||||
|
||||
for el in elements:
|
||||
|
||||
# ── Title page (kindle only; paper handles it before multicols) ──────
|
||||
if isinstance(el, TitlePage):
|
||||
if kindle:
|
||||
out.append(r"\clearpage")
|
||||
out.append(r"\thispagestyle{empty}")
|
||||
out.append(r"\vspace*{1.3in}")
|
||||
out.append(r"\begin{center}")
|
||||
for j, tl in enumerate(el.lines):
|
||||
s = tl.strip()
|
||||
if not s:
|
||||
continue
|
||||
if j < 3:
|
||||
out.append(r"{\LARGE\bfseries " + esc(s) + r"} \\[8pt]")
|
||||
else:
|
||||
out.append(r"{\large " + esc(s) + r"} \\[4pt]")
|
||||
out.append(r"\end{center}")
|
||||
out.append(r"\clearpage")
|
||||
|
||||
# ── Book / section header ────────────────────────────────────────────
|
||||
elif isinstance(el, BookHeader):
|
||||
lines = el.lines
|
||||
|
||||
if kindle:
|
||||
# Start a new page for each major book
|
||||
out.append(r"\clearpage")
|
||||
out.append(r"\phantomsection\addcontentsline{toc}{section}{" + esc(lines[0]) + r"}")
|
||||
out.append(r"\vspace*{0pt}" if compact_headers else r"\vspace*{0.1in}")
|
||||
out.append(r"\begin{center}")
|
||||
out.append(_hrule())
|
||||
out.append(r"\\[6pt]")
|
||||
out.append(r"{\bfseries\large " + esc(lines[0]) + r"}")
|
||||
for ln in lines[1:]:
|
||||
out.append(r"\\ [3pt]{\normalsize\itshape " + esc(ln) + r"}")
|
||||
out.append(r"\\[6pt]")
|
||||
out.append(_hrule())
|
||||
out.append(r"\end{center}")
|
||||
out.append(r"\markboth{" + esc(lines[0]) + r"}{" + esc(lines[0]) + r"}")
|
||||
out.append(r"\vspace{5pt}")
|
||||
|
||||
else:
|
||||
# Inline heading within the two-column flow
|
||||
# Refuse to start a new book in the bottom half of a column
|
||||
out.append(r"\needspace{0.5\textheight}")
|
||||
out.append(r"\phantomsection\addcontentsline{toc}{section}{" + esc(lines[0]) + r"}")
|
||||
out.append(r"\begin{center}")
|
||||
out.append(_hrule())
|
||||
out.append(r"\\[2pt]")
|
||||
out.append(r"{\bfseries " + esc(lines[0]) + r"}")
|
||||
for ln in lines[1:]:
|
||||
out.append(r"\\ {\small\itshape " + esc(ln) + r"}")
|
||||
out.append(r"\\[2pt]")
|
||||
out.append(_hrule())
|
||||
out.append(r"\end{center}")
|
||||
out.append(r"\markboth{" + esc(lines[0]) + r"}{" + esc(lines[0]) + r"}")
|
||||
out.append(r"\vspace{2pt}")
|
||||
|
||||
# ── Chapter heading ──────────────────────────────────────────────────
|
||||
elif isinstance(el, Chapter):
|
||||
label = f"CHAPTER {el.num}"
|
||||
|
||||
if kindle:
|
||||
out.append(r"\phantomsection\addcontentsline{toc}{subsection}{" + esc(label) + r"}")
|
||||
out.append(r"\needspace{4\baselineskip}")
|
||||
out.append(r"\vspace{14pt}")
|
||||
out.append(r"\begin{center}")
|
||||
out.append(r"{\bfseries\large " + esc(label) + r"}")
|
||||
if el.subtitle:
|
||||
out.append(r"\\ [3pt]{\normalsize\itshape " + esc(el.subtitle) + r"}")
|
||||
out.append(r"\end{center}")
|
||||
out.append(r"\markright{" + esc(label) + r"}")
|
||||
out.append(r"\vspace{6pt}")
|
||||
|
||||
else:
|
||||
out.append(r"\phantomsection\addcontentsline{toc}{subsection}{" + esc(label) + r"}")
|
||||
out.append(r"\needspace{2\baselineskip}")
|
||||
out.append(r"\vspace{3pt}")
|
||||
out.append(r"\begin{center}")
|
||||
out.append(r"{\bfseries " + esc(label) + r"}")
|
||||
if el.subtitle:
|
||||
out.append(r"\\ {\small\itshape " + esc(el.subtitle) + r"}")
|
||||
out.append(r"\end{center}")
|
||||
out.append(r"\markright{" + esc(label) + r"}")
|
||||
out.append(r"\vspace{1pt}")
|
||||
|
||||
# ── Section subheading (MARRIAGE, BAPTISM, etc.) ────────────────────
|
||||
elif isinstance(el, SectionHeading):
|
||||
if kindle:
|
||||
out.append(r"\vspace{8pt}")
|
||||
out.append(r"\begin{center}{\bfseries " + esc(el.text) + r"}\end{center}")
|
||||
out.append(r"\vspace{4pt}")
|
||||
else:
|
||||
out.append(r"\vspace{3pt}")
|
||||
out.append(
|
||||
r"\begin{center}{\bfseries\small " + esc(el.text) + r"}\end{center}"
|
||||
)
|
||||
out.append(r"\vspace{1pt}")
|
||||
|
||||
# ── Verse ────────────────────────────────────────────────────────────
|
||||
elif isinstance(el, Verse):
|
||||
body = esc(el.text)
|
||||
if kindle:
|
||||
# Bold inline number (not superscript) for readability on screen
|
||||
vnum = r"\textbf{" + str(el.num) + r"}"
|
||||
out.append(r"\noindent " + vnum + r"~" + body)
|
||||
out.append(r"\par\smallskip")
|
||||
else:
|
||||
vnum = r"\textbf{" + str(el.num) + r"}"
|
||||
out.append(r"\noindent " + vnum + r"~" + body + r"\par")
|
||||
|
||||
# ── Paragraph (prose intro, commentary, etc.) ───────────────────────
|
||||
elif isinstance(el, Paragraph):
|
||||
body = esc(el.text)
|
||||
if kindle:
|
||||
out.append(r"\noindent " + body)
|
||||
out.append(r"\par\smallskip")
|
||||
elif indent:
|
||||
out.append(body + r"\par\medskip")
|
||||
else:
|
||||
out.append(r"\noindent " + body + r"\par")
|
||||
|
||||
|
||||
# ══════════════════════════════════════════════════════════════════════════════
|
||||
# Utility: book limiter
|
||||
# ══════════════════════════════════════════════════════════════════════════════
|
||||
|
||||
def truncate_to_books(elements: list, max_books: int) -> list:
|
||||
"""Return only the first *max_books* BookHeader sections (and their content).
|
||||
Title-page and front-matter paragraphs before the first BookHeader are always kept.
|
||||
"""
|
||||
if max_books <= 0:
|
||||
return elements
|
||||
count = 0
|
||||
result = []
|
||||
for el in elements:
|
||||
if isinstance(el, BookHeader):
|
||||
count += 1
|
||||
if count > max_books:
|
||||
break
|
||||
result.append(el)
|
||||
return result
|
||||
|
||||
|
||||
# ══════════════════════════════════════════════════════════════════════════════
|
||||
# PDF compilation
|
||||
# ══════════════════════════════════════════════════════════════════════════════
|
||||
|
||||
def _find_compiler() -> tuple:
|
||||
"""Return (compiler_path, compiler_type) or (None, None) if none found."""
|
||||
import shutil
|
||||
# Also probe common absolute paths in case the dir isn't on $PATH
|
||||
candidates = {
|
||||
"pdflatex": ["/usr/bin/pdflatex", "/usr/local/bin/pdflatex"],
|
||||
"tectonic": ["/usr/bin/tectonic", "/usr/local/bin/tectonic"],
|
||||
}
|
||||
for cmd, extra_paths in candidates.items():
|
||||
found = shutil.which(cmd)
|
||||
if found:
|
||||
return found, cmd
|
||||
for p in extra_paths:
|
||||
if Path(p).exists():
|
||||
return p, cmd
|
||||
return None, None
|
||||
|
||||
|
||||
def compile_pdf(tex_src: str, output_pdf: Path,
|
||||
keep_tex: bool = False,
|
||||
compiler_path: str = "/usr/bin/pdflatex",
|
||||
compiler_type: str = "pdflatex") -> bool:
|
||||
"""
|
||||
Write *tex_src* into a temp directory, run the LaTeX compiler, and copy
|
||||
the resulting PDF to *output_pdf*. Supports ``pdflatex`` and ``tectonic``.
|
||||
Returns True on success.
|
||||
"""
|
||||
with tempfile.TemporaryDirectory() as tmp:
|
||||
tmp_path = Path(tmp)
|
||||
tex_file = tmp_path / "document.tex"
|
||||
tex_file.write_text(tex_src, encoding="utf-8")
|
||||
|
||||
if compiler_type == "tectonic":
|
||||
# Tectonic compiles in one pass and downloads missing packages.
|
||||
passes = 1
|
||||
cmd_base = [compiler_path, "document.tex"]
|
||||
else:
|
||||
# pdflatex needs two passes to get page headers right.
|
||||
passes = 2
|
||||
cmd_base = [compiler_path, "-interaction=nonstopmode",
|
||||
"-halt-on-error", "document.tex"]
|
||||
|
||||
for pass_num in range(1, passes + 1):
|
||||
result = subprocess.run(
|
||||
cmd_base, cwd=tmp, capture_output=True, text=True,
|
||||
)
|
||||
if result.returncode != 0:
|
||||
print(f" [compiler error on pass {pass_num}]", file=sys.stderr)
|
||||
print(result.stdout[-3000:], file=sys.stderr)
|
||||
if result.stderr:
|
||||
print(result.stderr[-1000:], file=sys.stderr)
|
||||
if keep_tex:
|
||||
dest = output_pdf.with_suffix(".tex")
|
||||
dest.write_text(tex_src, encoding="utf-8")
|
||||
print(f" TeX source saved to: {dest}", file=sys.stderr)
|
||||
return False
|
||||
|
||||
pdf_out = tmp_path / "document.pdf"
|
||||
if pdf_out.exists():
|
||||
output_pdf.parent.mkdir(parents=True, exist_ok=True)
|
||||
output_pdf.write_bytes(pdf_out.read_bytes())
|
||||
if keep_tex:
|
||||
dest = output_pdf.with_suffix(".tex")
|
||||
dest.write_text(tex_src, encoding="utf-8")
|
||||
return True
|
||||
|
||||
print(" [compiler ran but document.pdf was not produced]", file=sys.stderr)
|
||||
return False
|
||||
|
||||
|
||||
# ══════════════════════════════════════════════════════════════════════════════
|
||||
# Main
|
||||
# ══════════════════════════════════════════════════════════════════════════════
|
||||
|
||||
_INSTALL_INSTRUCTIONS = """
|
||||
No LaTeX compiler found. Install one of the following:
|
||||
|
||||
Arch / CachyOS / Manjaro:
|
||||
sudo pacman -S texlive-basic texlive-latex texlive-latexrecommended \\
|
||||
texlive-latexextra texlive-fontsrecommended
|
||||
|
||||
Debian / Ubuntu:
|
||||
sudo apt-get install texlive-latex-extra texlive-fonts-recommended
|
||||
|
||||
--- OR --- (self-contained, downloads packages on first use)
|
||||
sudo pacman -S tectonic
|
||||
# or: cargo install tectonic
|
||||
"""
|
||||
|
||||
|
||||
def main():
|
||||
parser = argparse.ArgumentParser(
|
||||
description="Generate scripture-style PDFs from the Book of the Nem text.",
|
||||
formatter_class=argparse.RawDescriptionHelpFormatter,
|
||||
epilog=__doc__,
|
||||
)
|
||||
parser.add_argument(
|
||||
"--input", type=Path, default=INPUT_FILE,
|
||||
help=f"Input plain-text file (default: {INPUT_FILE})",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--output-dir", type=Path, default=OUTPUT_DIR,
|
||||
help=f"Output directory (default: {OUTPUT_DIR})",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--kindle-only", action="store_true",
|
||||
help="Generate only the Kindle (single-column) PDF.",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--paper-only", action="store_true",
|
||||
help="Generate only the paper (two-column) PDF.",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--keep-tex", action="store_true",
|
||||
help="Save the intermediate .tex files alongside each PDF.",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--max-books", type=int, default=0, metavar="N",
|
||||
help="Limit output to the first N book sections (0 = no limit).",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--tex-only", action="store_true",
|
||||
help="Write .tex files only — do not attempt PDF compilation. "
|
||||
"Useful when a LaTeX compiler is not available.",
|
||||
)
|
||||
args = parser.parse_args()
|
||||
|
||||
src_path: Path = args.input
|
||||
if not src_path.exists():
|
||||
sys.exit(f"ERROR: Input file not found: {src_path}")
|
||||
|
||||
print(f"Reading: {src_path}")
|
||||
text = src_path.read_text(encoding="utf-8", errors="replace")
|
||||
|
||||
elements = parse(text)
|
||||
if args.max_books > 0:
|
||||
elements = truncate_to_books(elements, args.max_books)
|
||||
print(f" Limiting to first {args.max_books} book(s).")
|
||||
books = sum(1 for e in elements if isinstance(e, BookHeader))
|
||||
chapters = sum(1 for e in elements if isinstance(e, Chapter))
|
||||
verses = sum(1 for e in elements if isinstance(e, Verse))
|
||||
print(f" Parsed: {books} books/sections, {chapters} chapters, {verses} verses")
|
||||
|
||||
out_dir: Path = args.output_dir
|
||||
out_dir.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
# Locate compiler (unless --tex-only)
|
||||
compiler_path, compiler_type = None, None
|
||||
if not args.tex_only:
|
||||
compiler_path, compiler_type = _find_compiler()
|
||||
if not compiler_path:
|
||||
print(_INSTALL_INSTRUCTIONS, file=sys.stderr)
|
||||
print("Falling back to --tex-only mode: .tex files will be written "
|
||||
"but not compiled.", file=sys.stderr)
|
||||
args.tex_only = True
|
||||
else:
|
||||
print(f" Using compiler: {compiler_path}")
|
||||
|
||||
def _write_or_compile(tex: str, pdf_path: Path, label: str):
|
||||
if args.tex_only or args.keep_tex:
|
||||
tex_path = pdf_path.with_suffix(".tex")
|
||||
tex_path.write_text(tex, encoding="utf-8")
|
||||
print(f" ✓ TeX saved: {tex_path}")
|
||||
if args.tex_only:
|
||||
return
|
||||
print(f" Compiling {label} PDF …")
|
||||
ok = compile_pdf(tex, pdf_path, keep_tex=args.keep_tex,
|
||||
compiler_path=compiler_path,
|
||||
compiler_type=compiler_type)
|
||||
if ok:
|
||||
print(f" ✓ {pdf_path}")
|
||||
else:
|
||||
print(f" ✗ {label} PDF failed — see errors above.")
|
||||
|
||||
# ── Kindle PDF ────────────────────────────────────────────────────────────
|
||||
if not args.paper_only:
|
||||
print(f"\nKindle PDF (single-column, 4.5\"×6.5\") …")
|
||||
tex = build_kindle_latex(elements)
|
||||
_write_or_compile(tex, out_dir / "nem_phone.pdf", "Kindle")
|
||||
|
||||
# ── Paper / BOM-style PDF ────────────────────────────────────────────────
|
||||
if not args.kindle_only:
|
||||
print(f"\nPaper PDF (two-column BOM style, 5.5\"×8.5\") …")
|
||||
tex = build_paper_latex(elements)
|
||||
_write_or_compile(tex, out_dir / "nem_paper.pdf", "Paper")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
File diff suppressed because it is too large
Load Diff
@ -0,0 +1,778 @@
|
||||
{
|
||||
"Aaagast": "aaagast.wav",
|
||||
"Abby": "abby.wav",
|
||||
"Abigail": "abigail.wav",
|
||||
"Abodey": "abodey.wav",
|
||||
"Abriyyah": "abriyyah.wav",
|
||||
"Abyss": "abyss.wav",
|
||||
"Adamantine": "adamantine.wav",
|
||||
"Addobes": "addobes.wav",
|
||||
"Adobbes": "adobbes.wav",
|
||||
"Aedrick": "aedrick.wav",
|
||||
"Aegis": "aegis.wav",
|
||||
"Aegrir": "aegrir.wav",
|
||||
"Afire": "afire.wav",
|
||||
"Agatha": "agatha.wav",
|
||||
"Agony": "agony.wav",
|
||||
"Agrarian": "agrarian.wav",
|
||||
"Aheer": "aheer.wav",
|
||||
"Ahman": "ahman.wav",
|
||||
"Ailondel": "ailondel.wav",
|
||||
"Airk": "airk.wav",
|
||||
"Al-Astan": "al_astan.wav",
|
||||
"Alchemist": "alchemist.wav",
|
||||
"Alvrin": "alvrin.wav",
|
||||
"Amarantha": "amarantha.wav",
|
||||
"Amaryllis": "amaryllis.wav",
|
||||
"Ananduil": "ananduil.wav",
|
||||
"Anaudriel": "anaudriel.wav",
|
||||
"Andrahel": "andrahel.wav",
|
||||
"Anhuil": "anhuil.wav",
|
||||
"Anhuil-Ehlar": "anhuil_ehlar.wav",
|
||||
"Anhuil-Elhar": "anhuil_elhar.wav",
|
||||
"Anjeer": "anjeer.wav",
|
||||
"Ankh": "ankh.wav",
|
||||
"Annalise": "annalise.wav",
|
||||
"Anointing": "anointing.wav",
|
||||
"Anoush": "anoush.wav",
|
||||
"Anuil": "anuil.wav",
|
||||
"Anvilhammer": "anvilhammer.wav",
|
||||
"Ara": "ara.wav",
|
||||
"Aragast": "aragast.wav",
|
||||
"Aragst": "aragst.wav",
|
||||
"Aralon": "aralon.wav",
|
||||
"Aran": "aran.wav",
|
||||
"Arans": "arans.wav",
|
||||
"Arashan": "arashan.wav",
|
||||
"Arbiter": "arbiter.wav",
|
||||
"Archmage": "archmage.wav",
|
||||
"Archwizard": "archwizard.wav",
|
||||
"Ardrick": "ardrick.wav",
|
||||
"Argast": "argast.wav",
|
||||
"Armbrook": "armbrook.wav",
|
||||
"Armory": "armory.wav",
|
||||
"Arn": "arn.wav",
|
||||
"Arn-Del": "arn_del.wav",
|
||||
"Asheer": "asheer.wav",
|
||||
"Aske": "aske.wav",
|
||||
"Aster": "aster.wav",
|
||||
"Astor": "astor.wav",
|
||||
"Astral": "astral.wav",
|
||||
"Astride": "astride.wav",
|
||||
"Astute": "astute.wav",
|
||||
"Avery": "avery.wav",
|
||||
"Avorein": "avorein.wav",
|
||||
"Await": "await.wav",
|
||||
"Awww": "awww.wav",
|
||||
"Axehammer": "axehammer.wav",
|
||||
"Ayana": "ayana.wav",
|
||||
"Ayron": "ayron.wav",
|
||||
"Azuremoon": "azuremoon.wav",
|
||||
"Badlands": "badlands.wav",
|
||||
"Baelen": "baelen.wav",
|
||||
"Bah": "bah.wav",
|
||||
"Ballista": "ballista.wav",
|
||||
"Bancroft": "bancroft.wav",
|
||||
"Baras": "baras.wav",
|
||||
"Barek": "barek.wav",
|
||||
"Barge": "barge.wav",
|
||||
"Barrik": "barrik.wav",
|
||||
"Battlelord": "battlelord.wav",
|
||||
"Bazaar": "bazaar.wav",
|
||||
"Bearas": "bearas.wav",
|
||||
"Bearasagain": "bearasagain.wav",
|
||||
"Bearasand": "bearasand.wav",
|
||||
"Bearasasked": "bearasasked.wav",
|
||||
"Bearasat": "bearasat.wav",
|
||||
"Bearasbegan": "bearasbegan.wav",
|
||||
"Bearasbowed": "bearasbowed.wav",
|
||||
"Bearascan": "bearascan.wav",
|
||||
"Bearasdown": "bearasdown.wav",
|
||||
"Bearasemerged": "bearasemerged.wav",
|
||||
"Bearasfelt": "bearasfelt.wav",
|
||||
"Bearasfor": "bearasfor.wav",
|
||||
"Bearashad": "bearashad.wav",
|
||||
"Bearashas": "bearashas.wav",
|
||||
"Bearasheld": "bearasheld.wav",
|
||||
"Bearashesitantly": "bearashesitantly.wav",
|
||||
"Bearasin": "bearasin.wav",
|
||||
"Bearasleading": "bearasleading.wav",
|
||||
"Bearasmust": "bearasmust.wav",
|
||||
"Bearasnodded": "bearasnodded.wav",
|
||||
"Bearasperplexed": "bearasperplexed.wav",
|
||||
"Bearasquickly": "bearasquickly.wav",
|
||||
"Bearasreleased": "bearasreleased.wav",
|
||||
"Bearassaid": "bearassaid.wav",
|
||||
"Bearassat": "bearassat.wav",
|
||||
"Bearassimply": "bearassimply.wav",
|
||||
"Bearasslowly": "bearasslowly.wav",
|
||||
"Bearassome": "bearassome.wav",
|
||||
"Bearasspeaks": "bearasspeaks.wav",
|
||||
"Bearassteeled": "bearassteeled.wav",
|
||||
"Bearasstood": "bearasstood.wav",
|
||||
"Bearasthat": "bearasthat.wav",
|
||||
"Bearasthen": "bearasthen.wav",
|
||||
"Bearasto": "bearasto.wav",
|
||||
"Bearastrailed": "bearastrailed.wav",
|
||||
"Bearaswandered": "bearaswandered.wav",
|
||||
"Bearaswho": "bearaswho.wav",
|
||||
"Bearaswith": "bearaswith.wav",
|
||||
"Beldvorth": "beldvorth.wav",
|
||||
"Belegast": "belegast.wav",
|
||||
"Berstag": "berstag.wav",
|
||||
"Beydell": "beydell.wav",
|
||||
"Blackfeather": "blackfeather.wav",
|
||||
"Blackroot": "blackroot.wav",
|
||||
"Blargh": "blargh.wav",
|
||||
"Bledvorth": "bledvorth.wav",
|
||||
"Blessings": "blessings.wav",
|
||||
"Bloodstone": "bloodstone.wav",
|
||||
"Bloodtone": "bloodtone.wav",
|
||||
"Bogard": "bogard.wav",
|
||||
"Boldar": "boldar.wav",
|
||||
"Bolton": "bolton.wav",
|
||||
"Bon": "bon.wav",
|
||||
"Boomer": "boomer.wav",
|
||||
"Bouldershaun": "bouldershaun.wav",
|
||||
"Boulevarde": "boulevarde.wav",
|
||||
"Brahma": "brahma.wav",
|
||||
"Bramble": "bramble.wav",
|
||||
"Brambleburr": "brambleburr.wav",
|
||||
"Brambleburrs": "brambleburrs.wav",
|
||||
"Branson": "branson.wav",
|
||||
"Bravado": "bravado.wav",
|
||||
"Brax": "brax.wav",
|
||||
"Braz": "braz.wav",
|
||||
"Brazen": "brazen.wav",
|
||||
"Brazenclaw": "brazenclaw.wav",
|
||||
"Brazenclaws": "brazenclaws.wav",
|
||||
"Breeches": "breeches.wav",
|
||||
"Brendan": "brendan.wav",
|
||||
"Brethren": "brethren.wav",
|
||||
"Brickhorn": "brickhorn.wav",
|
||||
"Caldwell": "caldwell.wav",
|
||||
"Calico": "calico.wav",
|
||||
"Caller": "caller.wav",
|
||||
"Camels": "camels.wav",
|
||||
"Canals": "canals.wav",
|
||||
"Captains": "captains.wav",
|
||||
"Caravan": "caravan.wav",
|
||||
"Caswold": "caswold.wav",
|
||||
"Causeway": "causeway.wav",
|
||||
"Cavalier": "cavalier.wav",
|
||||
"Cavern": "cavern.wav",
|
||||
"Cherrytree": "cherrytree.wav",
|
||||
"Chieftain": "chieftain.wav",
|
||||
"Chivalrous": "chivalrous.wav",
|
||||
"Chun": "chun.wav",
|
||||
"Citadel": "citadel.wav",
|
||||
"Clarn": "clarn.wav",
|
||||
"Claw": "claw.wav",
|
||||
"Cleric": "cleric.wav",
|
||||
"Cobblestone": "cobblestone.wav",
|
||||
"Contessa": "contessa.wav",
|
||||
"Corporal": "corporal.wav",
|
||||
"Cotswold": "cotswold.wav",
|
||||
"Councillor": "councillor.wav",
|
||||
"Councilman": "councilman.wav",
|
||||
"Councilmen": "councilmen.wav",
|
||||
"Councilor": "councilor.wav",
|
||||
"Crimson": "crimson.wav",
|
||||
"Crismon": "crismon.wav",
|
||||
"Cylan": "cylan.wav",
|
||||
"Dai": "dai.wav",
|
||||
"Dalthanis": "dalthanis.wav",
|
||||
"Dank": "dank.wav",
|
||||
"Dayr": "dayr.wav",
|
||||
"Dedric": "dedric.wav",
|
||||
"Delgra": "delgra.wav",
|
||||
"Delic": "delic.wav",
|
||||
"Denizen": "denizen.wav",
|
||||
"Denizens": "denizens.wav",
|
||||
"Deric": "deric.wav",
|
||||
"Derrbane": "derrbane.wav",
|
||||
"Derro": "derro.wav",
|
||||
"Derrobane": "derrobane.wav",
|
||||
"Dibble": "dibble.wav",
|
||||
"Diblon": "diblon.wav",
|
||||
"Dire": "dire.wav",
|
||||
"Dis": "dis.wav",
|
||||
"Dobson": "dobson.wav",
|
||||
"Dorian": "dorian.wav",
|
||||
"Dorza": "dorza.wav",
|
||||
"Dragonbane": "dragonbane.wav",
|
||||
"Dragonsbane": "dragonsbane.wav",
|
||||
"Drakor": "drakor.wav",
|
||||
"Draygon": "draygon.wav",
|
||||
"Drefan": "drefan.wav",
|
||||
"Ducan": "ducan.wav",
|
||||
"Duggan": "duggan.wav",
|
||||
"Dulak": "dulak.wav",
|
||||
"Dunca": "dunca.wav",
|
||||
"Dune": "dune.wav",
|
||||
"Dur": "dur.wav",
|
||||
"Dur-Hakan": "dur_hakan.wav",
|
||||
"Durgane": "durgane.wav",
|
||||
"Durthaim": "durthaim.wav",
|
||||
"Durthrim": "durthrim.wav",
|
||||
"Dwarf": "dwarf.wav",
|
||||
"Dwarven": "dwarven.wav",
|
||||
"Earlson": "earlson.wav",
|
||||
"Eastward": "eastward.wav",
|
||||
"Effigius": "effigius.wav",
|
||||
"Ehlar": "ehlar.wav",
|
||||
"El-Ran": "el_ran.wav",
|
||||
"El-Shen": "el_shen.wav",
|
||||
"Elan": "elan.wav",
|
||||
"Elessel": "elessel.wav",
|
||||
"Elf": "elf.wav",
|
||||
"Elhar": "elhar.wav",
|
||||
"Elishan": "elishan.wav",
|
||||
"Eliza": "eliza.wav",
|
||||
"Elliswan": "elliswan.wav",
|
||||
"Elliwsan": "elliwsan.wav",
|
||||
"Elodea": "elodea.wav",
|
||||
"Elshan": "elshan.wav",
|
||||
"Elven": "elven.wav",
|
||||
"Elvenkind": "elvenkind.wav",
|
||||
"Elves": "elves.wav",
|
||||
"Elvrathas": "elvrathas.wav",
|
||||
"Elysium": "elysium.wav",
|
||||
"Emaleen": "emaleen.wav",
|
||||
"Eminence": "eminence.wav",
|
||||
"Emissary": "emissary.wav",
|
||||
"Emporium": "emporium.wav",
|
||||
"Enaru": "enaru.wav",
|
||||
"Endaleth": "endaleth.wav",
|
||||
"Envoy": "envoy.wav",
|
||||
"Eppres": "eppres.wav",
|
||||
"Eradication": "eradication.wav",
|
||||
"Eru": "eru.wav",
|
||||
"Eshela": "eshela.wav",
|
||||
"Ethereal": "ethereal.wav",
|
||||
"Eushon": "eushon.wav",
|
||||
"Eushownava": "eushownava.wav",
|
||||
"Everdark": "everdark.wav",
|
||||
"Everytime": "everytime.wav",
|
||||
"Eylana": "eylana.wav",
|
||||
"Eylanan": "eylanan.wav",
|
||||
"Ezrin": "ezrin.wav",
|
||||
"F-Fine": "f_fine.wav",
|
||||
"F-Forgive": "f_forgive.wav",
|
||||
"Faerie": "faerie.wav",
|
||||
"Fairik": "fairik.wav",
|
||||
"Fargus": "fargus.wav",
|
||||
"Fark": "fark.wav",
|
||||
"Farraj": "farraj.wav",
|
||||
"Farush": "farush.wav",
|
||||
"Feasthall": "feasthall.wav",
|
||||
"Featherstone": "featherstone.wav",
|
||||
"Felaria": "felaria.wav",
|
||||
"Feliq": "feliq.wav",
|
||||
"Felnck": "felnck.wav",
|
||||
"Felnick": "felnick.wav",
|
||||
"Felnicks": "felnicks.wav",
|
||||
"Felnik": "felnik.wav",
|
||||
"Fenaya": "fenaya.wav",
|
||||
"Feneya": "feneya.wav",
|
||||
"Ferrus": "ferrus.wav",
|
||||
"Fey": "fey.wav",
|
||||
"Firebane": "firebane.wav",
|
||||
"Fireshard": "fireshard.wav",
|
||||
"Foomwairma": "foomwairma.wav",
|
||||
"Forger": "forger.wav",
|
||||
"Frandor": "frandor.wav",
|
||||
"Friarsdai": "friarsdai.wav",
|
||||
"Fumairma": "fumairma.wav",
|
||||
"Fumwairma": "fumwairma.wav",
|
||||
"Galantholas": "galantholas.wav",
|
||||
"Galathorn": "galathorn.wav",
|
||||
"Galen": "galen.wav",
|
||||
"Galonti": "galonti.wav",
|
||||
"Garb": "garb.wav",
|
||||
"Gareth": "gareth.wav",
|
||||
"Garvek": "garvek.wav",
|
||||
"Gaunt": "gaunt.wav",
|
||||
"Gavin": "gavin.wav",
|
||||
"Geez": "geez.wav",
|
||||
"Ghurauk": "ghurauk.wav",
|
||||
"Gilandras": "gilandras.wav",
|
||||
"Gilard": "gilard.wav",
|
||||
"Gilchis": "gilchis.wav",
|
||||
"Gilchris": "gilchris.wav",
|
||||
"Gilding": "gilding.wav",
|
||||
"Gilrick": "gilrick.wav",
|
||||
"Glades": "glades.wav",
|
||||
"Glanthalas": "glanthalas.wav",
|
||||
"Glantholas": "glantholas.wav",
|
||||
"Glimmerwyn": "glimmerwyn.wav",
|
||||
"Gloomstone": "gloomstone.wav",
|
||||
"Gnaum": "gnaum.wav",
|
||||
"Gnomish": "gnomish.wav",
|
||||
"Goblinkin": "goblinkin.wav",
|
||||
"Goldsheen": "goldsheen.wav",
|
||||
"Gorath": "gorath.wav",
|
||||
"Gore": "gore.wav",
|
||||
"Gorg": "gorg.wav",
|
||||
"Gorlyn": "gorlyn.wav",
|
||||
"Gorstad": "gorstad.wav",
|
||||
"Gotto": "gotto.wav",
|
||||
"Graces": "graces.wav",
|
||||
"Graffel": "graffel.wav",
|
||||
"Grandmaster": "grandmaster.wav",
|
||||
"Granitestone": "granitestone.wav",
|
||||
"Gratzel": "gratzel.wav",
|
||||
"Graystrom": "graystrom.wav",
|
||||
"Greathaven": "greathaven.wav",
|
||||
"Gregarious": "gregarious.wav",
|
||||
"Gregor": "gregor.wav",
|
||||
"Griffon": "griffon.wav",
|
||||
"Grimbold": "grimbold.wav",
|
||||
"Gripp": "gripp.wav",
|
||||
"Grizzled": "grizzled.wav",
|
||||
"Grog": "grog.wav",
|
||||
"Grogg": "grogg.wav",
|
||||
"Grotto": "grotto.wav",
|
||||
"Gruff": "gruff.wav",
|
||||
"Gruul": "gruul.wav",
|
||||
"Guardarm": "guardarm.wav",
|
||||
"Gustafson": "gustafson.wav",
|
||||
"Guza": "guza.wav",
|
||||
"Gylis": "gylis.wav",
|
||||
"Habani": "habani.wav",
|
||||
"Hagatha": "hagatha.wav",
|
||||
"Hakan": "hakan.wav",
|
||||
"Hallowed": "hallowed.wav",
|
||||
"Halthessala": "halthessala.wav",
|
||||
"Hammerhaft": "hammerhaft.wav",
|
||||
"Har": "har.wav",
|
||||
"Harbrim": "harbrim.wav",
|
||||
"Harbrin": "harbrin.wav",
|
||||
"Hardrock": "hardrock.wav",
|
||||
"Harrik": "harrik.wav",
|
||||
"Hauberk": "hauberk.wav",
|
||||
"Hazards": "hazards.wav",
|
||||
"Headmaster": "headmaster.wav",
|
||||
"Heed": "heed.wav",
|
||||
"Hells": "hells.wav",
|
||||
"Henceforth": "henceforth.wav",
|
||||
"Hendel": "hendel.wav",
|
||||
"Heshbani": "heshbani.wav",
|
||||
"Hesta": "hesta.wav",
|
||||
"Hestra": "hestra.wav",
|
||||
"Heykingygladtomeetyouireallylikeithereitremindsmeofmyhome": "heykingygladtomeetyouireallylikeithereitremindsmeofmyhome.wav",
|
||||
"Highlands": "highlands.wav",
|
||||
"Highlord": "highlord.wav",
|
||||
"Hillsfar": "hillsfar.wav",
|
||||
"Hmmm": "hmmm.wav",
|
||||
"Homecoming": "homecoming.wav",
|
||||
"Horblaster": "horblaster.wav",
|
||||
"Horde": "horde.wav",
|
||||
"Horgard": "horgard.wav",
|
||||
"Hornblade": "hornblade.wav",
|
||||
"Hornblaster": "hornblaster.wav",
|
||||
"Horned": "horned.wav",
|
||||
"Hrumph": "hrumph.wav",
|
||||
"Huen": "huen.wav",
|
||||
"Hylan": "hylan.wav",
|
||||
"Illuminant": "illuminant.wav",
|
||||
"Illuminated": "illuminated.wav",
|
||||
"Illumination": "illumination.wav",
|
||||
"Ilrodel": "ilrodel.wav",
|
||||
"Imp": "imp.wav",
|
||||
"Inquisitor": "inquisitor.wav",
|
||||
"Ironblade": "ironblade.wav",
|
||||
"Ironbound": "ironbound.wav",
|
||||
"Ironguard": "ironguard.wav",
|
||||
"Ironhold": "ironhold.wav",
|
||||
"Ironspear": "ironspear.wav",
|
||||
"Irontree": "irontree.wav",
|
||||
"Iston": "iston.wav",
|
||||
"Jabari": "jabari.wav",
|
||||
"Jabbed": "jabbed.wav",
|
||||
"Jacob": "jacob.wav",
|
||||
"Jad": "jad.wav",
|
||||
"Janson": "janson.wav",
|
||||
"Jasyen": "jasyen.wav",
|
||||
"Jayden": "jayden.wav",
|
||||
"Jaylan": "jaylan.wav",
|
||||
"Jaysen": "jaysen.wav",
|
||||
"Jewel": "jewel.wav",
|
||||
"Jors": "jors.wav",
|
||||
"Jovially": "jovially.wav",
|
||||
"Kaash": "kaash.wav",
|
||||
"Kah": "kah.wav",
|
||||
"Kalzaduum": "kalzaduum.wav",
|
||||
"Karnak": "karnak.wav",
|
||||
"Kaspar": "kaspar.wav",
|
||||
"Kassie": "kassie.wav",
|
||||
"Keldris": "keldris.wav",
|
||||
"Kelshard": "kelshard.wav",
|
||||
"Kelvesh": "kelvesh.wav",
|
||||
"Kelvin": "kelvin.wav",
|
||||
"Kelwane": "kelwane.wav",
|
||||
"Kev": "kev.wav",
|
||||
"Khaki": "khaki.wav",
|
||||
"Kihee": "kihee.wav",
|
||||
"Kihee-Uust": "kihee_uust.wav",
|
||||
"Kiiri": "kiiri.wav",
|
||||
"Kin": "kin.wav",
|
||||
"Kirri": "kirri.wav",
|
||||
"Kisleth": "kisleth.wav",
|
||||
"Knelt": "knelt.wav",
|
||||
"Knight-Corporal": "knight_corporal.wav",
|
||||
"Knight-Lieutenant": "knight_lieutenant.wav",
|
||||
"Knight-Major": "knight_major.wav",
|
||||
"Knight-Sergeant": "knight_sergeant.wav",
|
||||
"Knighthand": "knighthand.wav",
|
||||
"Knighthood": "knighthood.wav",
|
||||
"Knowin": "knowin.wav",
|
||||
"Kodan": "kodan.wav",
|
||||
"Kor": "kor.wav",
|
||||
"Kor-Roth": "kor_roth.wav",
|
||||
"Kordan": "kordan.wav",
|
||||
"Koreth": "koreth.wav",
|
||||
"Korin": "korin.wav",
|
||||
"Kraelheimgar": "kraelheimgar.wav",
|
||||
"Kraven": "kraven.wav",
|
||||
"Kris": "kris.wav",
|
||||
"Krisleth": "krisleth.wav",
|
||||
"Kronlin": "kronlin.wav",
|
||||
"Kudah": "kudah.wav",
|
||||
"Kuerana": "kuerana.wav",
|
||||
"Kunah": "kunah.wav",
|
||||
"Kwenal": "kwenal.wav",
|
||||
"Kyfurn": "kyfurn.wav",
|
||||
"Kylic": "kylic.wav",
|
||||
"Ladell": "ladell.wav",
|
||||
"Laird": "laird.wav",
|
||||
"Leng": "leng.wav",
|
||||
"Lesik": "lesik.wav",
|
||||
"Lightbinger": "lightbinger.wav",
|
||||
"Lightbrigner": "lightbrigner.wav",
|
||||
"Lightbringer": "lightbringer.wav",
|
||||
"Lightbringers": "lightbringers.wav",
|
||||
"Lightrbinger": "lightrbinger.wav",
|
||||
"Liu": "liu.wav",
|
||||
"Lon": "lon.wav",
|
||||
"Lon-Ell": "lon_ell.wav",
|
||||
"Longsword": "longsword.wav",
|
||||
"Lordship": "lordship.wav",
|
||||
"Lumisha": "lumisha.wav",
|
||||
"Lyceum": "lyceum.wav",
|
||||
"Macabress": "macabress.wav",
|
||||
"Madam": "madam.wav",
|
||||
"Magician": "magician.wav",
|
||||
"Magister": "magister.wav",
|
||||
"Magistry": "magistry.wav",
|
||||
"Magorian": "magorian.wav",
|
||||
"Majesties": "majesties.wav",
|
||||
"Maldrood": "maldrood.wav",
|
||||
"Malrood": "malrood.wav",
|
||||
"Manchu": "manchu.wav",
|
||||
"Marches": "marches.wav",
|
||||
"Marlee": "marlee.wav",
|
||||
"Masta": "masta.wav",
|
||||
"Matriarch": "matriarch.wav",
|
||||
"Matriarchs": "matriarchs.wav",
|
||||
"Meknathar": "meknathar.wav",
|
||||
"Menthal": "menthal.wav",
|
||||
"Ming": "ming.wav",
|
||||
"Minotaur": "minotaur.wav",
|
||||
"Minotaurs": "minotaurs.wav",
|
||||
"Mister": "mister.wav",
|
||||
"Misty": "misty.wav",
|
||||
"Mithral": "mithral.wav",
|
||||
"Mithrin": "mithrin.wav",
|
||||
"Mitral": "mitral.wav",
|
||||
"Mmmm": "mmmm.wav",
|
||||
"Moans": "moans.wav",
|
||||
"Molgol": "molgol.wav",
|
||||
"Monarchy": "monarchy.wav",
|
||||
"Morther": "morther.wav",
|
||||
"Motioning": "motioning.wav",
|
||||
"Mustaches": "mustaches.wav",
|
||||
"Mutters": "mutters.wav",
|
||||
"Mylee": "mylee.wav",
|
||||
"Nahzim": "nahzim.wav",
|
||||
"Nefaleem": "nefaleem.wav",
|
||||
"Nestor": "nestor.wav",
|
||||
"Nesven": "nesven.wav",
|
||||
"Neverthoughtidseeyouprancingaroundwithabunchofelfgirls": "neverthoughtidseeyouprancingaroundwithabunchofelfgirls.wav",
|
||||
"Nijel": "nijel.wav",
|
||||
"Nik": "nik.wav",
|
||||
"Nimbly": "nimbly.wav",
|
||||
"Nimgalad": "nimgalad.wav",
|
||||
"Nirvana": "nirvana.wav",
|
||||
"Noivebeenhereandtherelookingformykinrumoredtodwellhereinthisforest": "noivebeenhereandtherelookingformykinrumoredtodwellhereinthisforest.wav",
|
||||
"Nollon": "nollon.wav",
|
||||
"Nomadic": "nomadic.wav",
|
||||
"Nook": "nook.wav",
|
||||
"Nurn": "nurn.wav",
|
||||
"Nym": "nym.wav",
|
||||
"Oakheart": "oakheart.wav",
|
||||
"Oakleaf": "oakleaf.wav",
|
||||
"Odie": "odie.wav",
|
||||
"Odo": "odo.wav",
|
||||
"Ododrian": "ododrian.wav",
|
||||
"Odoiran": "odoiran.wav",
|
||||
"Odorain": "odorain.wav",
|
||||
"Odoriain": "odoriain.wav",
|
||||
"Odorian": "odorian.wav",
|
||||
"Odorians": "odorians.wav",
|
||||
"Ody": "ody.wav",
|
||||
"Off-Worlder": "off_worlder.wav",
|
||||
"Ogrin": "ogrin.wav",
|
||||
"Olde": "olde.wav",
|
||||
"Onas": "onas.wav",
|
||||
"Ooo": "ooo.wav",
|
||||
"Oorian": "oorian.wav",
|
||||
"Oranoc": "oranoc.wav",
|
||||
"Orbs": "orbs.wav",
|
||||
"Orehand": "orehand.wav",
|
||||
"Orgrin": "orgrin.wav",
|
||||
"Orin": "orin.wav",
|
||||
"Orkosh": "orkosh.wav",
|
||||
"Oroset": "oroset.wav",
|
||||
"Orson": "orson.wav",
|
||||
"Oslagil": "oslagil.wav",
|
||||
"Overlord": "overlord.wav",
|
||||
"Paladin": "paladin.wav",
|
||||
"Paladin-King": "paladin_king.wav",
|
||||
"Patriarch": "patriarch.wav",
|
||||
"Patriarchs": "patriarchs.wav",
|
||||
"Penance": "penance.wav",
|
||||
"Penelope": "penelope.wav",
|
||||
"Periwinkle": "periwinkle.wav",
|
||||
"Pilgrim": "pilgrim.wav",
|
||||
"Pinnacle": "pinnacle.wav",
|
||||
"Pricilla": "pricilla.wav",
|
||||
"Priestess": "priestess.wav",
|
||||
"Primer": "primer.wav",
|
||||
"Priscilla": "priscilla.wav",
|
||||
"Prologue": "prologue.wav",
|
||||
"Prudent": "prudent.wav",
|
||||
"Quartzhand": "quartzhand.wav",
|
||||
"Racah": "racah.wav",
|
||||
"Rachelle": "rachelle.wav",
|
||||
"Radiant": "radiant.wav",
|
||||
"Rah'Zi": "rah_zi.wav",
|
||||
"Rasheer": "rasheer.wav",
|
||||
"Raslan": "raslan.wav",
|
||||
"Ravenburg": "ravenburg.wav",
|
||||
"Ravenhill": "ravenhill.wav",
|
||||
"Ravensburg": "ravensburg.wav",
|
||||
"Razentia": "razentia.wav",
|
||||
"Realms": "realms.wav",
|
||||
"Redhorn": "redhorn.wav",
|
||||
"Reflexively": "reflexively.wav",
|
||||
"Reinys": "reinys.wav",
|
||||
"Retort": "retort.wav",
|
||||
"Roc": "roc.wav",
|
||||
"Rockport": "rockport.wav",
|
||||
"Rolands": "rolands.wav",
|
||||
"Rolden": "rolden.wav",
|
||||
"Rooks": "rooks.wav",
|
||||
"Roth": "roth.wav",
|
||||
"Rothsholm": "rothsholm.wav",
|
||||
"Rouge": "rouge.wav",
|
||||
"Rustigar": "rustigar.wav",
|
||||
"Sarnel": "sarnel.wav",
|
||||
"Satyrsdai": "satyrsdai.wav",
|
||||
"Scaly": "scaly.wav",
|
||||
"Scepter": "scepter.wav",
|
||||
"Seagull": "seagull.wav",
|
||||
"Sedition": "sedition.wav",
|
||||
"Seeker": "seeker.wav",
|
||||
"Sehlaba": "sehlaba.wav",
|
||||
"Seker": "seker.wav",
|
||||
"Seker-Ankh": "seker_ankh.wav",
|
||||
"Selna": "selna.wav",
|
||||
"Senica": "senica.wav",
|
||||
"Sentinel": "sentinel.wav",
|
||||
"Septuigen": "septuigen.wav",
|
||||
"Sergeant-Major": "sergeant_major.wav",
|
||||
"Serk": "serk.wav",
|
||||
"Sgt": "sgt.wav",
|
||||
"Shadeem": "shadeem.wav",
|
||||
"Shae": "shae.wav",
|
||||
"Shal": "shal.wav",
|
||||
"Shalahz": "shalahz.wav",
|
||||
"Shalaz": "shalaz.wav",
|
||||
"Shalazah": "shalazah.wav",
|
||||
"Shambhu": "shambhu.wav",
|
||||
"Shambu": "shambu.wav",
|
||||
"Shanay": "shanay.wav",
|
||||
"Shatterdawn": "shatterdawn.wav",
|
||||
"Shdeem": "shdeem.wav",
|
||||
"Shelna": "shelna.wav",
|
||||
"Shen": "shen.wav",
|
||||
"Shrouded": "shrouded.wav",
|
||||
"Shyrra": "shyrra.wav",
|
||||
"Sigil": "sigil.wav",
|
||||
"Silverbane": "silverbane.wav",
|
||||
"Silvernote": "silvernote.wav",
|
||||
"Silvervein": "silvervein.wav",
|
||||
"Silverwind": "silverwind.wav",
|
||||
"Sirjif": "sirjif.wav",
|
||||
"Sis": "sis.wav",
|
||||
"Skeptically": "skeptically.wav",
|
||||
"Slagg": "slagg.wav",
|
||||
"Slaver": "slaver.wav",
|
||||
"Slavers": "slavers.wav",
|
||||
"Slick": "slick.wav",
|
||||
"Solstice": "solstice.wav",
|
||||
"Soren": "soren.wav",
|
||||
"Sorrow": "sorrow.wav",
|
||||
"Sosa": "sosa.wav",
|
||||
"Soulseeker": "soulseeker.wav",
|
||||
"Soulsinger": "soulsinger.wav",
|
||||
"Sparks": "sparks.wav",
|
||||
"Spellbooks": "spellbooks.wav",
|
||||
"Spikehorn": "spikehorn.wav",
|
||||
"Stairwell": "stairwell.wav",
|
||||
"Stalker": "stalker.wav",
|
||||
"Stealthy": "stealthy.wav",
|
||||
"Steelaxe": "steelaxe.wav",
|
||||
"Steelclaw": "steelclaw.wav",
|
||||
"Steelhorn": "steelhorn.wav",
|
||||
"Steward": "steward.wav",
|
||||
"Stiletto": "stiletto.wav",
|
||||
"Stonefirger": "stonefirger.wav",
|
||||
"Stoneforger": "stoneforger.wav",
|
||||
"Stonehelm": "stonehelm.wav",
|
||||
"Stonehold": "stonehold.wav",
|
||||
"Stoner": "stoner.wav",
|
||||
"Sunder": "sunder.wav",
|
||||
"Surly": "surly.wav",
|
||||
"Swung": "swung.wav",
|
||||
"Symphonic": "symphonic.wav",
|
||||
"Ta-Lar": "ta_lar.wav",
|
||||
"Taeriel": "taeriel.wav",
|
||||
"Tailor": "tailor.wav",
|
||||
"Talaer": "talaer.wav",
|
||||
"Tallspear": "tallspear.wav",
|
||||
"Targoth": "targoth.wav",
|
||||
"Tarnen": "tarnen.wav",
|
||||
"Tathan": "tathan.wav",
|
||||
"Tavern": "tavern.wav",
|
||||
"Tellin": "tellin.wav",
|
||||
"Thane": "thane.wav",
|
||||
"Thanes": "thanes.wav",
|
||||
"Theocratic": "theocratic.wav",
|
||||
"Therak": "therak.wav",
|
||||
"Therondil": "therondil.wav",
|
||||
"Thorn": "thorn.wav",
|
||||
"Thranis": "thranis.wav",
|
||||
"Throgg": "throgg.wav",
|
||||
"Thunderstrike": "thunderstrike.wav",
|
||||
"Tien": "tien.wav",
|
||||
"Tillborne": "tillborne.wav",
|
||||
"Tinbreaker": "tinbreaker.wav",
|
||||
"Tome": "tome.wav",
|
||||
"Torak": "torak.wav",
|
||||
"Toren": "toren.wav",
|
||||
"Torgath": "torgath.wav",
|
||||
"Torgoth": "torgoth.wav",
|
||||
"Traitor": "traitor.wav",
|
||||
"Triesse": "triesse.wav",
|
||||
"Tumark": "tumark.wav",
|
||||
"Tumbler": "tumbler.wav",
|
||||
"Turcan": "turcan.wav",
|
||||
"Turog": "turog.wav",
|
||||
"Twinsdai": "twinsdai.wav",
|
||||
"Twyleen": "twyleen.wav",
|
||||
"Tyrant": "tyrant.wav",
|
||||
"Udda": "udda.wav",
|
||||
"Uhrn": "uhrn.wav",
|
||||
"Ulagra": "ulagra.wav",
|
||||
"Ulrik": "ulrik.wav",
|
||||
"Umbrin": "umbrin.wav",
|
||||
"Umfray": "umfray.wav",
|
||||
"Undwin": "undwin.wav",
|
||||
"Unison": "unison.wav",
|
||||
"Urhn": "urhn.wav",
|
||||
"Uryna": "uryna.wav",
|
||||
"Uust": "uust.wav",
|
||||
"Vagrant": "vagrant.wav",
|
||||
"Valdarin": "valdarin.wav",
|
||||
"Valeth": "valeth.wav",
|
||||
"Valindar": "valindar.wav",
|
||||
"Valinor": "valinor.wav",
|
||||
"Valis": "valis.wav",
|
||||
"Vanessa": "vanessa.wav",
|
||||
"Varann": "varann.wav",
|
||||
"Varsis": "varsis.wav",
|
||||
"Varu": "varu.wav",
|
||||
"Vedra": "vedra.wav",
|
||||
"Velicia": "velicia.wav",
|
||||
"Velvet": "velvet.wav",
|
||||
"Vendar": "vendar.wav",
|
||||
"Venessa": "venessa.wav",
|
||||
"Vengeance": "vengeance.wav",
|
||||
"Vermin": "vermin.wav",
|
||||
"Verness": "verness.wav",
|
||||
"Verr": "verr.wav",
|
||||
"Verr-": "verr.wav",
|
||||
"Verr-Asses": "verr_asses.wav",
|
||||
"Veya": "veya.wav",
|
||||
"Viscount": "viscount.wav",
|
||||
"Vizier": "vizier.wav",
|
||||
"Vlainor": "vlainor.wav",
|
||||
"Volan": "volan.wav",
|
||||
"Volstan": "volstan.wav",
|
||||
"Vorann": "vorann.wav",
|
||||
"Vorgak": "vorgak.wav",
|
||||
"Vorum": "vorum.wav",
|
||||
"Vuhnalya": "vuhnalya.wav",
|
||||
"Vyn": "vyn.wav",
|
||||
"Wallbreaker": "wallbreaker.wav",
|
||||
"Wanton": "wanton.wav",
|
||||
"Warfrost": "warfrost.wav",
|
||||
"Wargog": "wargog.wav",
|
||||
"Warstar": "warstar.wav",
|
||||
"Warthog": "warthog.wav",
|
||||
"Weaving": "weaving.wav",
|
||||
"Weee": "weee.wav",
|
||||
"Wettstein": "wettstein.wav",
|
||||
"Wh": "wh.wav",
|
||||
"Wha": "wha.wav",
|
||||
"Whatchya": "whatchya.wav",
|
||||
"Wheni": "wheni.wav",
|
||||
"Whitehand": "whitehand.wav",
|
||||
"Whoah": "whoah.wav",
|
||||
"Williamsburg": "williamsburg.wav",
|
||||
"Willowbrook": "willowbrook.wav",
|
||||
"Windrift": "windrift.wav",
|
||||
"Windsdai": "windsdai.wav",
|
||||
"Witchwyrd": "witchwyrd.wav",
|
||||
"Witchwyrds": "witchwyrds.wav",
|
||||
"Wolfclaw": "wolfclaw.wav",
|
||||
"Woodlan": "woodlan.wav",
|
||||
"Woodland": "woodland.wav",
|
||||
"Wooo": "wooo.wav",
|
||||
"Worlder": "worlder.wav",
|
||||
"Wrath": "wrath.wav",
|
||||
"Wuzy": "wuzy.wav",
|
||||
"Wynshorn": "wynshorn.wav",
|
||||
"Wyren": "wyren.wav",
|
||||
"Yahnig": "yahnig.wav",
|
||||
"Yan": "yan.wav",
|
||||
"Yar": "yar.wav",
|
||||
"Yer": "yer.wav",
|
||||
"Yolan": "yolan.wav",
|
||||
"Yoos": "yoos.wav",
|
||||
"Yurik": "yurik.wav",
|
||||
"Zalrek": "zalrek.wav",
|
||||
"Zeb": "zeb.wav",
|
||||
"Zelph": "zelph.wav",
|
||||
"Zha": "zha.wav",
|
||||
"Zhong": "zhong.wav",
|
||||
"Zhong-Goo": "zhong_goo.wav",
|
||||
"Zinger": "zinger.wav",
|
||||
"Zirak": "zirak.wav",
|
||||
"Zurn": "zurn.wav",
|
||||
"Zyzaren": "zyzaren.wav",
|
||||
"Zyzarn": "zyzarn.wav",
|
||||
"Zyzren": "zyzren.wav"
|
||||
}
|
||||
@ -0,0 +1,20 @@
|
||||
{
|
||||
"Anhuil-Elhar": "An-WHEEL AY-Lar",
|
||||
"Anhuil-Ehlar": "An-WHEEL AY-Lar",
|
||||
"Aegrir": "Ay-Greer",
|
||||
"Baras": "BARE-iss",
|
||||
"Emaleen": "EMMA-lean",
|
||||
"Eushownava": "You-SHOWN-Eh-Vah",
|
||||
"Graffel": "Gra-FELL",
|
||||
"Greathaven": "GREAT-Haven",
|
||||
"Jaylan": "JAY-Lin",
|
||||
"Neverthoughtidseeyouprancingaroundwithabunchofelfgirls": "Never thought I'd see you prancing around with a bunch of elf girls",
|
||||
"Nijel": "NYE-jell",
|
||||
"Noivebeenhereandtherelookingformykinrumoredtodwellhereinthisforest": "No I've been here and there looking for my kin rumored to dwell here in this forest",
|
||||
"Odoiran": "Oh-DORIAN",
|
||||
"Ody": "Oh-Dee",
|
||||
"Seker-Ankh": "Seker-Ahnk",
|
||||
"Rasheer": "Raw-SHEAR",
|
||||
"Valinor": "Vala-nor",
|
||||
"Varsis": "Ver-Asis"
|
||||
}
|
||||
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
@ -1,28 +1,35 @@
|
||||
{
|
||||
"Gadianton Robbers": "Gadeeantun Robbers",
|
||||
"Gadianton": "Gadeeantun",
|
||||
"Coriantumr": "Coryantomer",
|
||||
"Laman": "Layman",
|
||||
"Lehi And Nephi": "Leehi And Nephi",
|
||||
"Lehi": "Leehi",
|
||||
"Lehi Mathonihah": "Leehi Mathonihah",
|
||||
"Lehis": "Leehis",
|
||||
"Lehies": "Leehis",
|
||||
"Liahona": "Leeahona",
|
||||
"Moroni": "Morero-ni",
|
||||
"Alma": "Al-ma",
|
||||
"Gadiantons": "Gadeeantuns",
|
||||
"Laban": "Layban",
|
||||
"Mosiah": "Moziah",
|
||||
"Mosiah The King": "Moziah The King",
|
||||
"Nehors": "Kneehores",
|
||||
"Samuel The Lamanite": "Samuel The Laymanite",
|
||||
"Tarry": "Tarery",
|
||||
"The Lamanite Twins": "The Laymanite Twins",
|
||||
"The Lamanites Of Ammon": "The Laymanites Of Ammon",
|
||||
"The Lamanites Of The Land Of Zarahemla": "The Laymanites Of The Land Of Zarahemla",
|
||||
"The Lamanites Of The Land Southward": "The Laymanites Of The Land Southward",
|
||||
"The Lamanites Of The People Of Ammon": "The Laymanites Of The People Of Ammon",
|
||||
"The Lamb'S Book Of Life": "The Lamb's Book Of Life",
|
||||
"The Land Of Nephi": "The Land Of Kneefi"
|
||||
"Nephites": "Kneefites",
|
||||
"Anti-Nephi-Lehies": "Anti-Kneef-eye-Leehis",
|
||||
"Lamanite": "Laymanite",
|
||||
"Lamanites": "Laymanites",
|
||||
"Lamb'S": "Lamb's",
|
||||
"Sarai": "Sa-rye",
|
||||
"Telestial": "Tea-lestial",
|
||||
"Lord'S": "Lord's",
|
||||
"Helaman": "He-la-mun",
|
||||
"Alma": "Al-ma",
|
||||
"Nephihah": "Kneef-eyehah",
|
||||
"Nephihet": "Kneef-eyehet",
|
||||
"Nephite": "Kneefight",
|
||||
"Nephi-Im": "Kneef-eye-Im",
|
||||
"Zenephi": "Ze-kneef-eye",
|
||||
"Nephitish": "Kneefight-ish",
|
||||
"Moroni": "Moh-roh-nye",
|
||||
"Nephi": "Knee-fye",
|
||||
"Hagar": "Hag-ar",
|
||||
"Oug": "Ohg",
|
||||
"Ougan": "Ohgan"
|
||||
}
|
||||
30
output_proper_nouns/visions_glory_canada/manifest.json
Normal file
30
output_proper_nouns/visions_glory_canada/manifest.json
Normal file
@ -0,0 +1,30 @@
|
||||
{
|
||||
"Adam": "adam.wav",
|
||||
"Adam-Ondi-Ahman": "adam_ondi_ahman.wav",
|
||||
"Ahman": "ahman.wav",
|
||||
"Alma": "alma.wav",
|
||||
"Apostles": "apostles.wav",
|
||||
"Brethren": "brethren.wav",
|
||||
"Cardston": "cardston.wav",
|
||||
"Ephraim": "ephraim.wav",
|
||||
"Evolving": "evolving.wav",
|
||||
"Holies": "holies.wav",
|
||||
"Israel": "israel.wav",
|
||||
"Joseph": "joseph.wav",
|
||||
"Knelt": "knelt.wav",
|
||||
"Lehi": "lehi.wav",
|
||||
"Liahona": "liahona.wav",
|
||||
"Millennium": "millennium.wav",
|
||||
"Mormon": "mormon.wav",
|
||||
"Moroni": "moroni.wav",
|
||||
"Mosiah": "mosiah.wav",
|
||||
"Nauvoo": "nauvoo.wav",
|
||||
"Quorum": "quorum.wav",
|
||||
"Rachael": "rachael.wav",
|
||||
"Savior": "savior.wav",
|
||||
"Thummim": "thummim.wav",
|
||||
"Urim": "urim.wav",
|
||||
"Vignette": "vignette.wav",
|
||||
"Zachary": "zachary.wav",
|
||||
"Zion": "zion.wav"
|
||||
}
|
||||
@ -0,0 +1,30 @@
|
||||
{
|
||||
"Adam": "adam.wav",
|
||||
"Adam-Ondi-Ahman": "adam_ondi_ahman.wav",
|
||||
"Ahman": "ahman.wav",
|
||||
"Alma": "alma.wav",
|
||||
"Apostles": "apostles.wav",
|
||||
"Brethren": "brethren.wav",
|
||||
"Cardston": "cardston.wav",
|
||||
"Ephraim": "ephraim.wav",
|
||||
"Evolving": "evolving.wav",
|
||||
"Holies": "holies.wav",
|
||||
"Israel": "israel.wav",
|
||||
"Joseph": "joseph.wav",
|
||||
"Knelt": "knelt.wav",
|
||||
"Lehi": "lehi.wav",
|
||||
"Liahona": "liahona.wav",
|
||||
"Millennium": "millennium.wav",
|
||||
"Mormon": "mormon.wav",
|
||||
"Moroni": "moroni.wav",
|
||||
"Mosiah": "mosiah.wav",
|
||||
"Nauvoo": "nauvoo.wav",
|
||||
"Quorum": "quorum.wav",
|
||||
"Rachael": "rachael.wav",
|
||||
"Savior": "savior.wav",
|
||||
"Thummim": "thummim.wav",
|
||||
"Urim": "urim.wav",
|
||||
"Vignette": "vignette.wav",
|
||||
"Zachary": "zachary.wav",
|
||||
"Zion": "zion.wav"
|
||||
}
|
||||
18
projects.json
Normal file
18
projects.json
Normal file
@ -0,0 +1,18 @@
|
||||
[
|
||||
{
|
||||
"name": "Audio Text for Novel Lightbringer",
|
||||
"source_paths": [
|
||||
"/home/dillon/_code/voice_model/Audio Text for Novel Lightbringer/Audio Text for Novel Lightbringer.txt"
|
||||
],
|
||||
"proper_nouns_output_dir": "output_proper_nouns/audio_text_for_novel_lightbringer",
|
||||
"proper_nouns_audio_dir": "proper_nouns_audio/audio_text_for_novel_lightbringer"
|
||||
},
|
||||
{
|
||||
"name": "visions glory canada",
|
||||
"source_paths": [
|
||||
"/home/dillon/_code/voice_model/Visions of Glory_ Zion in Canada pg 162-193.txt"
|
||||
],
|
||||
"proper_nouns_output_dir": "output_proper_nouns/visions_glory_canada",
|
||||
"proper_nouns_audio_dir": "proper_nouns_audio/visions_glory_canada"
|
||||
}
|
||||
]
|
||||
1345
proper_nouns.txt
1345
proper_nouns.txt
File diff suppressed because it is too large
Load Diff
42
run_audiobook.bat
Normal file
42
run_audiobook.bat
Normal file
@ -0,0 +1,42 @@
|
||||
@echo off
|
||||
title Create Audiobook
|
||||
|
||||
:: Change to the folder this .bat file lives in
|
||||
cd /d "%~dp0"
|
||||
|
||||
:: Check setup has been run
|
||||
if not exist .venv\Scripts\python.exe (
|
||||
echo ERROR: Setup has not been run yet.
|
||||
echo Please double-click setup_windows.bat first.
|
||||
pause
|
||||
exit /b 1
|
||||
)
|
||||
|
||||
echo ============================================================
|
||||
echo Audiobook Creator
|
||||
echo ============================================================
|
||||
echo.
|
||||
echo Options:
|
||||
echo 1 - Generate ALL chapters (may take many hours)
|
||||
echo 2 - List detected chapters only
|
||||
echo 3 - Generate a short PREVIEW of each chapter
|
||||
echo 4 - Generate specific chapters (enter numbers next)
|
||||
echo.
|
||||
set /p CHOICE="Enter choice (1/2/3/4): "
|
||||
|
||||
if "%CHOICE%"=="1" (
|
||||
.venv\Scripts\python create_audiobook_lightbringer.py
|
||||
) else if "%CHOICE%"=="2" (
|
||||
.venv\Scripts\python create_audiobook_lightbringer.py --list
|
||||
) else if "%CHOICE%"=="3" (
|
||||
.venv\Scripts\python create_audiobook_lightbringer.py --preview
|
||||
) else if "%CHOICE%"=="4" (
|
||||
set /p CHAPTERS="Enter chapter numbers separated by spaces (e.g. 0 1 2): "
|
||||
.venv\Scripts\python create_audiobook_lightbringer.py %CHAPTERS%
|
||||
) else (
|
||||
echo Invalid choice.
|
||||
)
|
||||
|
||||
echo.
|
||||
echo Done. Output files are in the output_audiobook_lightbringer folder.
|
||||
pause
|
||||
21
run_gui.bat
Normal file
21
run_gui.bat
Normal file
@ -0,0 +1,21 @@
|
||||
@echo off
|
||||
title Proper Noun GUI
|
||||
|
||||
:: Change to the folder this .bat file lives in
|
||||
cd /d "%~dp0"
|
||||
|
||||
:: Check setup has been run
|
||||
if not exist .venv\Scripts\python.exe (
|
||||
echo ERROR: Setup has not been run yet.
|
||||
echo Please double-click setup_windows.bat first.
|
||||
pause
|
||||
exit /b 1
|
||||
)
|
||||
|
||||
echo Starting Proper Noun Player GUI...
|
||||
.venv\Scripts\python gui_proper_noun_player.py
|
||||
if errorlevel 1 (
|
||||
echo.
|
||||
echo The application closed with an error. See message above.
|
||||
pause
|
||||
)
|
||||
93
setup_windows.bat
Normal file
93
setup_windows.bat
Normal file
@ -0,0 +1,93 @@
|
||||
@echo off
|
||||
setlocal EnableDelayedExpansion
|
||||
title Audiobook Setup
|
||||
|
||||
echo ============================================================
|
||||
echo Audiobook Setup for Windows 11
|
||||
echo ============================================================
|
||||
echo.
|
||||
|
||||
:: ── 1. Check Python ──────────────────────────────────────────────────────────
|
||||
echo [1/5] Checking Python installation...
|
||||
python --version >nul 2>&1
|
||||
if errorlevel 1 (
|
||||
echo.
|
||||
echo ERROR: Python was not found.
|
||||
echo.
|
||||
echo Please install Python 3.12 from https://www.python.org/downloads/
|
||||
echo IMPORTANT: On the installer, tick "Add Python to PATH" before clicking Install.
|
||||
echo.
|
||||
echo After installing, close this window and double-click setup_windows.bat again.
|
||||
pause
|
||||
exit /b 1
|
||||
)
|
||||
|
||||
for /f "tokens=2 delims= " %%v in ('python --version 2^>^&1') do set PY_VER=%%v
|
||||
echo Found Python %PY_VER%
|
||||
echo.
|
||||
|
||||
:: ── 2. Create virtual environment ────────────────────────────────────────────
|
||||
echo [2/5] Creating virtual environment (.venv)...
|
||||
if exist .venv (
|
||||
echo .venv already exists, skipping creation.
|
||||
) else (
|
||||
python -m venv .venv
|
||||
if errorlevel 1 (
|
||||
echo ERROR: Failed to create virtual environment.
|
||||
pause
|
||||
exit /b 1
|
||||
)
|
||||
echo Virtual environment created.
|
||||
)
|
||||
echo.
|
||||
|
||||
:: ── 3. Install PyTorch with CUDA (for gaming GPU) ────────────────────────────
|
||||
echo [3/5] Installing PyTorch with CUDA 12.4 support (this may take a while)...
|
||||
echo Downloading ~2.5 GB — please be patient.
|
||||
echo.
|
||||
.venv\Scripts\pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124
|
||||
if errorlevel 1 (
|
||||
echo.
|
||||
echo WARNING: CUDA build failed. Falling back to CPU-only PyTorch.
|
||||
echo Audio generation will be slower but will still work.
|
||||
.venv\Scripts\pip install torch
|
||||
)
|
||||
echo.
|
||||
|
||||
:: ── 4. Install remaining packages ────────────────────────────────────────────
|
||||
echo [4/5] Installing remaining packages (kokoro, soundfile, sounddevice, spacy, wordfreq)...
|
||||
.venv\Scripts\pip install -r requirements.txt
|
||||
if errorlevel 1 (
|
||||
echo ERROR: Package installation failed. Check your internet connection.
|
||||
pause
|
||||
exit /b 1
|
||||
)
|
||||
|
||||
echo Downloading spaCy English language model (en_core_web_sm, ~15 MB)...
|
||||
.venv\Scripts\python -m spacy download en_core_web_sm
|
||||
if errorlevel 1 (
|
||||
echo WARNING: spaCy model download failed. Proper noun extraction will not work
|
||||
echo until you re-run: .venv\Scripts\python -m spacy download en_core_web_sm
|
||||
)
|
||||
echo.
|
||||
|
||||
:: ── 5. Download the Kokoro TTS model ─────────────────────────────────────────
|
||||
echo [5/5] Downloading the Kokoro TTS model (hexgrad/Kokoro-82M, ~330 MB)...
|
||||
echo This only happens once.
|
||||
echo.
|
||||
.venv\Scripts\python -c "from kokoro import KPipeline; KPipeline(lang_code='a', repo_id='hexgrad/Kokoro-82M'); print('Model ready.')"
|
||||
if errorlevel 1 (
|
||||
echo.
|
||||
echo WARNING: Model download failed. It will retry the first time you run the app.
|
||||
echo Make sure you have an internet connection on first launch.
|
||||
)
|
||||
|
||||
echo.
|
||||
echo ============================================================
|
||||
echo Setup complete!
|
||||
echo.
|
||||
echo To launch the GUI: double-click run_gui.bat
|
||||
echo To create the audiobook: double-click run_audiobook.bat
|
||||
echo ============================================================
|
||||
echo.
|
||||
pause
|
||||
Reference in New Issue
Block a user