Skip to content

Convert english phrases into phonetic japanese kana approximations; also known as Englishru.

License

Notifications You must be signed in to change notification settings

Luigi-Pizzolito/English2KanaTransliteration

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

English2KanaTransliteration

Go Reference

Convert English phrases into phonetic Japanese kana approximations; also known as Englishru. Does not translate English into Japanese, but translates English words into their approximate pronounciations in Japanese.

Based on the English to Katakana transcription code written in Python by Yoko Harada (@yokolet) Please see that repo for details on the phonetic conversion.

English to phoneme conversion based on CMUDict. Kanji to Katakana convertion based on KANJIDIC2. Thanks to JMDict and kana. Please refer to those licenses for non-free implementations.

It is a port in Golang with some additional functions:

  • Filtering functions to split, parse, and rejoin sentences which contain punctuation or improper contractions.
  • Also accepts Japanese input; converts any Kanji characters into their most common Hiragana pronounciation, converts Hiragana into Katakana, leaves Katakana as is.
  • Also accepts Romaji input.
  • strict input cleaning mode for use with TTS input that does not understand punctuation and other chars. See header below.

Usage Example

Below is an example go file to test this module. It reads input from stdin, converts the English sentences into their Japanese transliteration and prints them to stdout.

package main

import (
	"github.com/Luigi-Pizzolito/English2KanaTransliteration"
	"bufio"
	"fmt"
	"os"
)

func main() {
	// Create an instance of AllToKana
	allToKana := kanatrans.NewAllToKana()

	// Listen to stdin indefinitely
	reader := bufio.NewReader(os.Stdin)
	for {
		line, err := reader.ReadString('\n')
		if err != nil {
			break // Exit loop on error
		}

		// Call convertString function with the accumulated line
		result := allToKana.Convert(line)

		// Output the result
		fmt.Print(result+"\n")
	}
}

Sample Output:

❯ go run .
Hello there.
ヘロー ゼアー。
With this program, you can make Japanese text to speech speak in English!
ウィズ ジス プローラ、 ユー キャン メイク ジャーンイーズ テックスト ツー スピーチ スピーク イン イングシュ!
watashi wa miku desu~
ワタシ ワ ミク デス〜
Hello! こんにちは~ ヘロー, miki松原。
ヘロー! コンニチハ〜 ヘロー、 ミキショウゲン。

Using individual modules

All2Katakana

// Create an instance of AllToKana
allToKana := kanatrans.NewAllToKana()
// Usage
kana := allToKana.Convert("Hello! watashiwa 初音ミク.")
// -> ヘロー! ワタシワ ショオンミク。

Eng2Katakana

// Create an instance of EngToKana
engToKana := kanatrans.NewEngToKana()
// Usage
kana := engToKana.TranscriptSentence("Hello World!")
// -> ヘローワールド

Kanji2Katakana

// Create an instance of KanjiToKana
kanjiToKana := kanatrans.NewKanjiToKana()
// Usage
kana := kanjiToKana.Convert("初音")
// -> ショオン

This needs some work, it just takes the most common pronouciation of each Kanji instead of the correct one for the context. Pull requests are welcome!

Hiragana2Katakana

// Create an instance of HiraganaToKana
hiraganaToKana := kanatrans.NewHiraganaToKana()
// Usage
kana := hiraganaToKana.Convert("こんにちは")
// -> コンニチハ

Romaji2Katakana

// Create an instance of RomajiToKana
romajiToKana := kanatrans.NewRomajiToKana()
// Usage
kana := romajiToKana.Convert("kita kita desu")
// -> キタ キタ デス

ConvertPunctuation

// Usage
japanesePunctuation := kanatrans.ConvertToJapanesePunctuation("Hello, World!")
// -> Hello、 World!

Note for using with Japanese-only text-to-speech (TTS)

This module is intended to allow TTS which only support Japanese to speak english (such as AquesTalk, Softalk, etc). These TTS usually have some limitations in what punctuation may be present in the input; with only commas and stops being interpreted as a pause and all other punctuation causing an error.

To use this module for such TTS input, you may enable strict input cleaning mode (only Japanese comma and stop on output) by passing a bool in the initialiser for EngToKana, RomajiToKana and AllToKana classes:

// Create an instance of AllToKana with strict punctuation output
allToKana := kanatrans.NewAllToKana(true)
// Create an instance of EngToKana with strict punctuation output
engToKana := kanatrans.NewEngToKana(true)
// Create an instance of RomajiToKana with strict punctuation output
romajiToKana := kanatrans.NewRomajiToKana(true)

You may also use the function kanatrans.ConvertToJapanesePunctuationRestricted instead of kanatrans.ConvertToJapanesePunctuation.

Custom callbacks to proccess Kanji, Kana, English & Punctuation

Internally, the AllToKana proccess function uses a KanjiSplitter class to call func(string) string functions which handle Kanji, Kana, English and Punctuation respectively:

// Create an instance of KanjiSplitter with proccesing callbacks
kanjiSplitter := kanatrans.NewKanjiSplitter(
	kanjiToKana.Convert,					// Kanji callback
	hiraganaToKana.Convert,					// Gana & Kana callback
	engToKana.TranscriptSentence,			// English callback
	ConvertToJapanesePunctuation,			// Punctuation callback
)

If required, you may use a KanjiSplitter with custom callback functions to provide different processing.