What is this

Tools for manipulating PPP voice and text data and an attempt to make combined synthetic training data for pony-related LLMs, made by a codelet.

ppp2.py

New and only slightly improved version of ppp.py Example:

from ppp2 import PPPDataset2, FolderSpec, ExportSpec
dataset = PPPDataset2.from_file(
    folder_specs = [FolderSpec(
        path = r'D:\MLP_Samples\AIData\Master file'
    )],
    characters=['Twilight'])
dataset.export(
    specs = [
        ExportSpec(
            export_path = r'D:\Code\GPT-SoVITS\Twilight_data',
            list_path = r'D:\Code\GPT-SoVITS\filelist_Twilight.list'
        )
    ],
    filename_formatter = lambda parse: f'{parse.process_idx}.wav',
    fileline_formatter = lambda parse: f'{parse.out_path}|{parse.char}|en|{parse.line}'
)

ppp.py

Point it at your Sliced Dialogue directory. Can be used to reformat datasets for training voice models (example is for PITS). Also has an updated version of horsewords.clean for ARPAbet substitutions.

FiMFiction Tools

Various tools for trimming/sampling the fimficOmegaV3 dataset.

fimfarchive.py

A tool for accessing and manipulating a downloaded copy of the fimfarchive archives.

Text training data

Tools for scraping sources like the FiMFiction wiki for episode summaries, episode titles/transcripts, Wikipedia episode synopsis, as well as experiments for using LLMs via oobabooga api to create synthetic training data.

Name		Name	Last commit message	Last commit date
Latest commit History 62 Commits
pics		pics
tier0		tier0
tier1		tier1
PPPData_1000s.png		PPPData_1000s.png
PPPData_1000s_table.png		PPPData_1000s_table.png
PPPData_100s.png		PPPData_100s.png
PPPData_100s_table.png		PPPData_100s_table.png
PPPData_10s.png		PPPData_10s.png
PPPData_10s_table.png		PPPData_10s_table.png
PPPData_1s.png		PPPData_1s.png
PPPData_1s_table.png		PPPData_1s_table.png
PPPData_dur_10000s.png		PPPData_dur_10000s.png
PPPData_dur_1000s.png		PPPData_dur_1000s.png
PPPData_dur_100s.png		PPPData_dur_100s.png
PPPData_dur_10s.png		PPPData_dur_10s.png
PPPData_dur_1s.png		PPPData_dur_1s.png
README.md		README.md
autotrainer_rvc.py		autotrainer_rvc.py
autotrainer_svc5.py		autotrainer_svc5.py
characters_tally.ipynb		characters_tally.ipynb
cmudict-0.7b.txt		cmudict-0.7b.txt
data_realigner.ipynb		data_realigner.ipynb
episodes_demucs.ipynb		episodes_demucs.ipynb
episodes_labels_index.json		episodes_labels_index.json
extras_labels_index.json		extras_labels_index.json
fim_movie_auto.txt		fim_movie_auto.txt
fim_rainbow_roadtrip_auto.txt		fim_rainbow_roadtrip_auto.txt
fimfarchive.py		fimfarchive.py
fimfic_ranker.py		fimfic_ranker.py
fimfic_sampler.py		fimfic_sampler.py
fimfic_trimmer.py		fimfic_trimmer.py
fimfic_trimmer2.py		fimfic_trimmer2.py
g2p_utils.py		g2p_utils.py
horsewords.clean		horsewords.clean
horsewords_llm.ipynb		horsewords_llm.ipynb
horsewords_raw_output.json		horsewords_raw_output.json
horsewords_raw_output_thinking.json		horsewords_raw_output_thinking.json
lewd.py		lewd.py
match3.txt		match3.txt
new_horsewords.clean		new_horsewords.clean
ppp.py		ppp.py
ppp2.ipynb		ppp2.ipynb
ppp2.py		ppp2.py
prettyprinter.py		prettyprinter.py
requirements.txt		requirements.txt
sfx.py		sfx.py
util.py		util.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

What is this

ppp2.py

ppp.py

FiMFiction Tools

fimfarchive.py

Text training data

About

Releases

Packages

Languages

effusiveperiscope/PPPDataset

Folders and files

Latest commit

History

Repository files navigation

What is this

ppp2.py

ppp.py

FiMFiction Tools

fimfarchive.py

Text training data

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages