Skip to content

Commit

Permalink
Major changes in HotwordDetector in engine.py and added Mycroft wakeword
Browse files Browse the repository at this point in the history
  • Loading branch information
TheSeriousProgrammer committed Dec 31, 2021
1 parent f7a6867 commit 62162a6
Show file tree
Hide file tree
Showing 14 changed files with 113 additions and 37 deletions.
49 changes: 32 additions & 17 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -95,19 +95,18 @@ The pathname of the generated wakeword needs to passed to the HotwordDetector de
```python
HotwordDetector(
hotword="hello",
reference_file = "/full/path/name/of/hello_ref.json")
reference_file = "/full/path/name/of/hello_ref.json"),
activation_count = 3 #2 by default
)
```

Few wakewords such as **Google**, **Firefox**, **Alexa**, **Mobile**, **Siri** the library has predefined embeddings readily available in the library installation directory, its path is readily available in the following variable
Few wakewords such as **Mycroft**, **Google**, **Firefox**, **Alexa**, **Mobile**, **Siri** the library has predefined embeddings readily available in the library installation directory, its path is readily available in the following variable

```python
from eff_word_net import samples_loc
```

<br>


## Try your first single hotword detection script

```python
Expand All @@ -116,18 +115,19 @@ from eff_word_net.streams import SimpleMicStream
from eff_word_net.engine import HotwordDetector
from eff_word_net import samples_loc

alexa_hw = HotwordDetector(
hotword="Alexa",
reference_file = os.path.join(samples_loc,"alexa_ref.json"),
mycroft_hw = HotwordDetector(
hotword="Mycroft",
reference_file = os.path.join(samples_loc,"mycroft_ref.json"),
activation_count=3
)

mic_stream = SimpleMicStream()
mic_stream.start_stream()

print("Say Alexa ")
print("Say Mycroft ")
while True :
frame = mic_stream.getFrame()
result = alexa_hw.checkFrame(frame)
result = mycroft_hw.checkFrame(frame)
if(result):
print("Wakeword uttered")

Expand All @@ -145,6 +145,7 @@ of running `checkFrame()` of each wakeword individually
import os
from eff_word_net.streams import SimpleMicStream
from eff_word_net import samples_loc
print(samples_loc)

alexa_hw = HotwordDetector(
hotword="Alexa",
Expand All @@ -153,31 +154,44 @@ alexa_hw = HotwordDetector(

siri_hw = HotwordDetector(
hotword="Siri",
reference_file = os.path.join(samples_loc,"siri_ref.json")
)
reference_file = os.path.join(samples_loc,"siri_ref.json"),
)

google_hw = HotwordDetector(
hotword="Google",
reference_file = os.path.join(samples_loc,"google_ref.json")
mycroft_hw = HotwordDetector(
hotword="mycroft",
reference_file = os.path.join(samples_loc,"mycroft_ref.json"),
activation_count=3
)

multi_hw_engine = MultiHotwordDetector(
detector_collection = [alexa_hw,siri_hw,google_hw]
) # Efficient multi hotword detector
detector_collection = [
alexa_hw,
siri_hw,
mycroft_hw,
],
)

mic_stream = SimpleMicStream()
mic_stream.start_stream()

print("Say Google / Alexa / Siri")
print("Say Mycroft / Alexa / Siri")

while True :
frame = mic_stream.getFrame()
result = multi_hw_engine.findBestMatch(frame)
if(None not in result):
print(result[0],f",Confidence {result[1]:0.4f}")

```
<br>

Access documentation of the library from here : https://ant-brain.github.io/EfficientWord-Net/


## About `activation_count` in `HotwordDetector`
Documenatation with detailed explanation on the usage of `activation_count` parameter in `HotwordDetector` is in the making , For now understand that for long hotwords 3 is advisable and 2 for smaller hotwords. If the detector gives out multiple triggers for a single utterance, try increasing `activation_count`. To experiment begin with smaller values. Default value for the same is 2


## FAQ :
* **Hotword Perfomance is bad** : if you are having some issue like this , feel to ask the same in [discussions](https://github.com/Ant-Brain/EfficientWord-Net/discussions/4)

Expand All @@ -189,6 +203,7 @@ Access documentation of the library from here : https://ant-brain.github.io/Effi

* Add audio file handler in streams. PR's are welcome.
* Remove librosa requirement to encourage generating reference files directly in edge devices
* Add more detailed documentation explaining slider window concept

## SUPPORT US:
Our hotword detector's performance is notably low when compared to Porcupine. We have thought about better NN architectures for the engine and hope to outperform Porcupine. This has been our undergrad project. Hence your support and encouragement will motivate us to develop the engine. If you loved this project recommend this to your peers, give us a 🌟 in Github and a clap 👏 in [medium](https://link.medium.com/yMBmWGM03kb).
Expand Down
Binary file removed dist/EfficientWord-Net-0.0.1.tar.gz
Binary file not shown.
Binary file removed dist/EfficientWord_Net-0.0.1-py3-none-any.whl
Binary file not shown.
97 changes: 79 additions & 18 deletions eff_word_net/engine.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,14 @@ class HotwordDetector :
EfficientWord based HotwordDetector Engine implementation class
"""

def __init__(self,hotword:str,reference_file:str,threshold:float=0.85):
def __init__(
self,
hotword:str,
reference_file:str,
threshold:float=0.9,
activation_count=2,
continuous=True,
verbose = False):
"""
Intializes hotword detector instance
Expand All @@ -28,6 +35,8 @@ def __init__(self,hotword:str,reference_file:str,threshold:float=0.85):
threshold: float value between 0 and 1 , min similarity score
required for a match
continuous: bool value to know if a HotwordDetector is operating on a single continuous stream , else false
"""
assert isfile(reference_file), \
"Reference File Path Invalid"
Expand All @@ -43,10 +52,21 @@ def __init__(self,hotword:str,reference_file:str,threshold:float=0.85):

self.hotword = hotword
self.threshold = threshold
self.continuous = continuous

self.__repeat_count = 0
self.__activation_count = activation_count
self.verbose = verbose

self.__relaxation_time_step = 4 #number of cycles to prevent recall after a trigger
self.__is_it_a_trigger = False

def __repr__(self):
return f"Hotword: {self.hotword}"

def is_it_a_trigger(self):
return self.__is_it_a_trigger

def getMatchScoreVector(self,inp_vec:np.array) -> float :
"""
**Use this directly only if u know what you are doing**
Expand All @@ -71,8 +91,24 @@ def getMatchScoreVector(self,inp_vec:np.array) -> float :
for i in top3 :
out+= (1-out) * i

return out
#assert self.redundancy_count>0 , "redundancy_count count can only be greater than 0"

self.__is_it_a_trigger = False

if self.__repeat_count < 0 :
self.__repeat_count += 1

elif out > self.threshold :
if self.__repeat_count == self.__activation_count -1 :
self.__repeat_count = - self.__relaxation_time_step
self.__is_it_a_trigger = True
else:
self.__repeat_count +=1

elif self.__repeat_count > 0:
self.__repeat_count -= 1

return out

def checkVector(self,inp_vec:np.array) -> bool:
"""
Expand All @@ -85,7 +121,12 @@ def checkVector(self,inp_vec:np.array) -> bool:
assert inp_vec.shape == (1,128), \
"Inp vector should be of shape (1,128)"

return self.getMatchScoreVector(inp_vec) > self.threshold
score = self.getMatchScoreVector(inp_vec)

return self.is_it_a_trigger() if self.continuous else score >= self.threshold

def get_repeat_count(self)-> int :
return self.__repeat_count

def getMatchScoreFrame(
self,
Expand All @@ -110,6 +151,7 @@ def getMatchScoreFrame(
"""

"""
if(not unsafe):
upperPoint = max(
(
Expand All @@ -118,6 +160,7 @@ def getMatchScoreFrame(
)
if(upperPoint > 0.2):
return False
"""

assert inp_audio_frame.shape == (RATE,), \
f"Audio frame needs to be a 1 sec {RATE}Hz sampled vector"
Expand All @@ -126,7 +169,7 @@ def getMatchScoreFrame(
audioToVector(
inp_audio_frame
)
)
)


def checkFrame(self,inp_audio_frame:np.array,unsafe:bool = False) -> bool :
Expand All @@ -152,6 +195,7 @@ def checkFrame(self,inp_audio_frame:np.array,unsafe:bool = False) -> bool :
assert inp_audio_frame.shape == (RATE,), \
f"Audio frame needs to be a 1 sec {RATE}Hz sampled vector"

"""
if(not unsafe):
upperPoint = max(
(
Expand All @@ -160,8 +204,10 @@ def checkFrame(self,inp_audio_frame:np.array,unsafe:bool = False) -> bool :
)
if(upperPoint > 0.2):
return False
"""
score = self.getMatchScoreFrame(inp_audio_frame)

return self.getMatchScoreFrame(inp_audio_frame) > self.threshold
return self.is_it_a_trigger() if self.continuous else score >= self.threshold

HotwordDetectorArray = List[HotwordDetector]
MatchInfo = Tuple[HotwordDetector,float]
Expand All @@ -176,6 +222,7 @@ class MultiHotwordDetector :
def __init__(
self,
detector_collection:HotwordDetectorArray,
continuous=True
):
"""
Inp Parameters:
Expand All @@ -190,6 +237,7 @@ def __init__(
"Mixed Array received, send HotwordDetector only array"

self.detector_collection = detector_collection
self.continous = continuous

def findBestMatch(
self,
Expand Down Expand Up @@ -218,6 +266,7 @@ def findBestMatch(
assert inp_audio_frame.shape == (RATE,), \
f"Audio frame needs to be a 1 sec {RATE}Hz sampled vector"

"""
if(not unsafe):
upperPoint = max(
(
Expand All @@ -226,16 +275,21 @@ def findBestMatch(
)
if(upperPoint > 0.2):
return None , None

"""
embedding = audioToVector(inp_audio_frame)

best_match_detector:str = None
best_match_score:float = 0.0

for detector in self.detector_collection :
score = detector.getMatchScoreVector(embedding)
if(score<detector.threshold):
continue
if(self.continous):
if(not detector.is_it_a_trigger()):
continue
else:
if(score < detector.threshold):
continue

if(score>best_match_score):
best_match_score = score
best_match_detector = detector
Expand Down Expand Up @@ -282,7 +336,7 @@ def findAllMatches(
embedding = audioToVector(inp_audio_frame)

matches:MatchInfoArray = []

best_match_score = 0.0
for detector in self.detector_collection :
score = detector.getMatchScoreVector(embedding)
Expand All @@ -301,29 +355,36 @@ def findAllMatches(
from eff_word_net.streams import SimpleMicStream
from eff_word_net import samples_loc
print(samples_loc)

alexa_hw = HotwordDetector(
hotword="Alexa",
reference_file = os.path.join(samples_loc,"alexa_ref.json"),
)

siri_hw = HotwordDetector(
hotword="Siri",
reference_file = os.path.join(samples_loc,"siri_ref.json")
)
reference_file = os.path.join(samples_loc,"siri_ref.json"),
)

google_hw = HotwordDetector(
hotword="Google",
reference_file = os.path.join(samples_loc,"google_ref.json")
)
mycroft_hw = HotwordDetector(
hotword="mycroft",
reference_file = os.path.join(samples_loc,"mycroft_ref.json"),
activation_count=3
)

multi_hw_engine = MultiHotwordDetector(
detector_collection = [alexa_hw,siri_hw,google_hw]
)
detector_collection = [
alexa_hw,
siri_hw,
mycroft_hw,
],
)

mic_stream = SimpleMicStream()
mic_stream.start_stream()

print("Say Google / Alexa / Siri")
print("Say Mycroft / Alexa / Siri")

while True :
frame = mic_stream.getFrame()
result = multi_hw_engine.findBestMatch(frame)
Expand Down
1 change: 1 addition & 0 deletions eff_word_net/sample_refs/mycroft_ref.json

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@

setup(
name = 'EfficientWord-Net',
version = '0.0.1',
version = '0.1.1',
description = 'Few Shot Learning based Hotword Detection Engine',
long_description = open("./README.md",'r').read(),
long_description_content_type = 'text/markdown',
Expand Down
1 change: 0 additions & 1 deletion wakewords/mobile_ref.json

This file was deleted.

Binary file not shown.
Binary file added wakewords/mycroft/mycroft_en-GB_JamesV3Voice.mp3
Binary file not shown.
Binary file added wakewords/mycroft/mycroft_en-GB_KateV3Voice.mp3
Binary file not shown.
Binary file not shown.
Binary file added wakewords/mycroft/mycroft_en-US_HenryV3Voice.mp3
Binary file not shown.
Binary file not shown.
Binary file added wakewords/mycroft/mycroft_en-US_OliviaV3Voice.mp3
Binary file not shown.

0 comments on commit 62162a6

Please sign in to comment.