Major changes in HotwordDetector in engine.py and added Mycroft wakeword

Ant-Brain · Dec 31, 2021 · 62162a6 · 62162a6
1 parent f7a6867
commit 62162a6
Show file tree

Hide file tree

Showing 14 changed files with 113 additions and 37 deletions.
diff --git a/README.md b/README.md
@@ -95,19 +95,18 @@ The pathname of the generated wakeword needs to passed to the HotwordDetector de
 ```python
 HotwordDetector(
         hotword="hello",
-        reference_file = "/full/path/name/of/hello_ref.json")
+        reference_file = "/full/path/name/of/hello_ref.json"),
+        activation_count = 3 #2 by default
 )
 ```
-
-Few wakewords such as **Google**, **Firefox**, **Alexa**, **Mobile**, **Siri** the library has predefined embeddings readily available in the library installation directory, its path is readily available in the following variable
+Few wakewords such as **Mycroft**, **Google**, **Firefox**, **Alexa**, **Mobile**, **Siri** the library has predefined embeddings readily available in the library installation directory, its path is readily available in the following variable
 
 ```python
 from eff_word_net import samples_loc
 ```
 
 <br>
 
-
 ## Try your first single hotword detection script
 
 ```python
@@ -116,18 +115,19 @@ from eff_word_net.streams import SimpleMicStream
 from eff_word_net.engine import HotwordDetector
 from eff_word_net import samples_loc
 
-alexa_hw = HotwordDetector(
-        hotword="Alexa",
-        reference_file = os.path.join(samples_loc,"alexa_ref.json"),
+mycroft_hw = HotwordDetector(
+        hotword="Mycroft",
+        reference_file = os.path.join(samples_loc,"mycroft_ref.json"),
+        activation_count=3
     )
 
 mic_stream = SimpleMicStream()
 mic_stream.start_stream()
 
-print("Say Alexa ")
+print("Say Mycroft ")
 while True :
     frame = mic_stream.getFrame()
-    result = alexa_hw.checkFrame(frame)
+    result = mycroft_hw.checkFrame(frame)
     if(result):
         print("Wakeword uttered")
 
@@ -145,6 +145,7 @@ of running `checkFrame()` of each wakeword individually
 import os
 from eff_word_net.streams import SimpleMicStream
 from eff_word_net import samples_loc
+print(samples_loc)
 
 alexa_hw = HotwordDetector(
         hotword="Alexa",
@@ -153,31 +154,44 @@ alexa_hw = HotwordDetector(
 
 siri_hw = HotwordDetector(
         hotword="Siri",
-        reference_file = os.path.join(samples_loc,"siri_ref.json")
-        )
+        reference_file = os.path.join(samples_loc,"siri_ref.json"),
+    )
 
-google_hw = HotwordDetector(
-        hotword="Google",
-        reference_file = os.path.join(samples_loc,"google_ref.json")
+mycroft_hw = HotwordDetector(
+        hotword="mycroft",
+        reference_file = os.path.join(samples_loc,"mycroft_ref.json"),
+        activation_count=3
     )
 
 multi_hw_engine = MultiHotwordDetector(
-        detector_collection = [alexa_hw,siri_hw,google_hw]
-    ) # Efficient multi hotword detector
+        detector_collection = [
+            alexa_hw,
+            siri_hw,
+            mycroft_hw,
+        ],
+    )
 
 mic_stream = SimpleMicStream()
 mic_stream.start_stream()
 
-print("Say Google / Alexa / Siri")
+print("Say Mycroft / Alexa / Siri")
+
 while True :
     frame = mic_stream.getFrame()
     result = multi_hw_engine.findBestMatch(frame)
     if(None not in result):
         print(result[0],f",Confidence {result[1]:0.4f}")
+
 ```
 <br>
 
 Access documentation of the library from here : https://ant-brain.github.io/EfficientWord-Net/
+
+
+## About `activation_count` in `HotwordDetector`
+Documenatation with detailed explanation on the usage of `activation_count` parameter in `HotwordDetector` is in the making , For now understand that for long hotwords 3 is advisable and 2 for smaller hotwords. If the detector gives out multiple triggers for a single utterance, try increasing `activation_count`. To experiment begin with smaller values. Default value for the same is 2
+
+
 ## FAQ :
 * **Hotword Perfomance is bad** : if you are having some issue like this , feel to ask the same in [discussions](https://github.com/Ant-Brain/EfficientWord-Net/discussions/4)
 
@@ -189,6 +203,7 @@ Access documentation of the library from here : https://ant-brain.github.io/Effi
 
 * Add audio file handler in streams. PR's are welcome.
 * Remove librosa requirement to encourage generating reference files directly in edge devices
+* Add more detailed documentation explaining slider window concept
 
 ## SUPPORT US:
 Our hotword detector's performance is notably low when compared to Porcupine. We have thought about better NN architectures for the engine and hope to outperform Porcupine. This has been our undergrad project. Hence your support and encouragement will motivate us to develop the engine. If you loved this project recommend this to your peers, give us a 🌟 in Github and a clap 👏 in [medium](https://link.medium.com/yMBmWGM03kb).

diff --git a/dist/EfficientWord-Net-0.0.1.tar.gz b/dist/EfficientWord-Net-0.0.1.tar.gz
diff --git a/dist/EfficientWord_Net-0.0.1-py3-none-any.whl b/dist/EfficientWord_Net-0.0.1-py3-none-any.whl
diff --git a/eff_word_net/engine.py b/eff_word_net/engine.py
@@ -14,7 +14,14 @@ class HotwordDetector :
     EfficientWord based HotwordDetector Engine implementation class
     """
 
-    def __init__(self,hotword:str,reference_file:str,threshold:float=0.85):
+    def __init__(
+            self,
+            hotword:str,
+            reference_file:str,
+            threshold:float=0.9,
+            activation_count=2,
+            continuous=True,
+            verbose = False):
         """
         Intializes hotword detector instance
 
@@ -28,6 +35,8 @@ def __init__(self,hotword:str,reference_file:str,threshold:float=0.85):
             threshold: float value between 0 and 1 , min similarity score
             required for a match
 
+            continuous: bool value to know if a HotwordDetector is operating on a single continuous stream , else false
+
         """
         assert isfile(reference_file), \
             "Reference File Path Invalid"
@@ -43,10 +52,21 @@ def __init__(self,hotword:str,reference_file:str,threshold:float=0.85):
 
         self.hotword = hotword
         self.threshold = threshold
+        self.continuous = continuous
+
+        self.__repeat_count = 0
+        self.__activation_count = activation_count
+        self.verbose = verbose
+
+        self.__relaxation_time_step = 4 #number of cycles to prevent recall after a trigger
+        self.__is_it_a_trigger = False
 
     def __repr__(self):
         return f"Hotword: {self.hotword}"
 
+    def is_it_a_trigger(self):
+        return self.__is_it_a_trigger
+
     def getMatchScoreVector(self,inp_vec:np.array) -> float :
         """
         **Use this directly only if u know what you are doing**
@@ -71,8 +91,24 @@ def getMatchScoreVector(self,inp_vec:np.array) -> float :
         for i in top3 :
             out+= (1-out) * i
 
-        return out
+        #assert self.redundancy_count>0 , "redundancy_count count can only be greater than 0"
+
+        self.__is_it_a_trigger = False
+
+        if self.__repeat_count < 0 :
+            self.__repeat_count += 1
 
+        elif out > self.threshold :
+            if self.__repeat_count == self.__activation_count -1 :
+                self.__repeat_count = - self.__relaxation_time_step
+                self.__is_it_a_trigger = True
+            else:
+                self.__repeat_count +=1
+
+        elif self.__repeat_count > 0:
+            self.__repeat_count -= 1
+
+        return out
 
     def checkVector(self,inp_vec:np.array) -> bool:
         """
@@ -85,7 +121,12 @@ def checkVector(self,inp_vec:np.array) -> bool:
         assert inp_vec.shape == (1,128), \
             "Inp vector should be of shape (1,128)"
 
-        return self.getMatchScoreVector(inp_vec) > self.threshold
+        score = self.getMatchScoreVector(inp_vec)
+
+        return self.is_it_a_trigger() if self.continuous else score >= self.threshold
+
+    def get_repeat_count(self)-> int :
+        return self.__repeat_count
 
     def getMatchScoreFrame(
             self,
@@ -110,6 +151,7 @@ def getMatchScoreFrame(
 
         """
 
+        """
         if(not unsafe):
             upperPoint = max(
                 (
@@ -118,6 +160,7 @@ def getMatchScoreFrame(
             )
             if(upperPoint > 0.2):
                 return False
+        """
 
         assert inp_audio_frame.shape == (RATE,), \
             f"Audio frame needs to be a 1 sec {RATE}Hz sampled vector"
@@ -126,7 +169,7 @@ def getMatchScoreFrame(
             audioToVector(
                 inp_audio_frame
             )
-        )
+            )
 
 
     def checkFrame(self,inp_audio_frame:np.array,unsafe:bool = False) -> bool :
@@ -152,6 +195,7 @@ def checkFrame(self,inp_audio_frame:np.array,unsafe:bool = False) -> bool :
         assert inp_audio_frame.shape == (RATE,), \
             f"Audio frame needs to be a 1 sec {RATE}Hz sampled vector"
 
+        """
         if(not unsafe):
             upperPoint = max(
                 (
@@ -160,8 +204,10 @@ def checkFrame(self,inp_audio_frame:np.array,unsafe:bool = False) -> bool :
             )
             if(upperPoint > 0.2):
                 return False
+        """
+        score = self.getMatchScoreFrame(inp_audio_frame)
 
-        return self.getMatchScoreFrame(inp_audio_frame) > self.threshold
+        return self.is_it_a_trigger() if self.continuous else score >= self.threshold
 
 HotwordDetectorArray = List[HotwordDetector]
 MatchInfo = Tuple[HotwordDetector,float]
@@ -176,6 +222,7 @@ class MultiHotwordDetector :
     def __init__(
         self,
         detector_collection:HotwordDetectorArray,
+        continuous=True
     ):
         """
         Inp Parameters:
@@ -190,6 +237,7 @@ def __init__(
                 "Mixed Array received, send HotwordDetector only array"
 
         self.detector_collection = detector_collection
+        self.continous = continuous
 
     def findBestMatch(
             self,
@@ -218,6 +266,7 @@ def findBestMatch(
         assert inp_audio_frame.shape == (RATE,), \
             f"Audio frame needs to be a 1 sec {RATE}Hz sampled vector"
 
+        """
         if(not unsafe):
             upperPoint = max(
                 (
@@ -226,16 +275,21 @@ def findBestMatch(
             )
             if(upperPoint > 0.2):
                 return None , None
-
+        """
         embedding = audioToVector(inp_audio_frame)
 
         best_match_detector:str = None
         best_match_score:float = 0.0
 
         for detector in self.detector_collection :
             score = detector.getMatchScoreVector(embedding)
-            if(score<detector.threshold):
-                continue
+            if(self.continous):
+                if(not detector.is_it_a_trigger()):
+                    continue
+            else:
+                if(score < detector.threshold):
+                    continue
+
             if(score>best_match_score):
                 best_match_score = score
                 best_match_detector = detector
@@ -282,7 +336,7 @@ def findAllMatches(
         embedding = audioToVector(inp_audio_frame)
 
         matches:MatchInfoArray = []
-        
+
         best_match_score = 0.0
         for detector in self.detector_collection :
             score = detector.getMatchScoreVector(embedding)
@@ -301,29 +355,36 @@ def findAllMatches(
     from eff_word_net.streams import SimpleMicStream
     from eff_word_net import samples_loc
     print(samples_loc)
+
     alexa_hw = HotwordDetector(
             hotword="Alexa",
             reference_file = os.path.join(samples_loc,"alexa_ref.json"),
         )
 
     siri_hw = HotwordDetector(
             hotword="Siri",
-            reference_file = os.path.join(samples_loc,"siri_ref.json")
-            )
+            reference_file = os.path.join(samples_loc,"siri_ref.json"),
+        )
 
-    google_hw = HotwordDetector(
-            hotword="Google",
-            reference_file = os.path.join(samples_loc,"google_ref.json")
-            )
+    mycroft_hw = HotwordDetector(
+            hotword="mycroft",
+            reference_file = os.path.join(samples_loc,"mycroft_ref.json"),
+            activation_count=3
+        )
 
     multi_hw_engine = MultiHotwordDetector(
-            detector_collection = [alexa_hw,siri_hw,google_hw]
-            )
+            detector_collection = [
+                alexa_hw,
+                siri_hw,
+                mycroft_hw,
+            ],
+        )
 
     mic_stream = SimpleMicStream()
     mic_stream.start_stream()
 
-    print("Say Google / Alexa / Siri")
+    print("Say Mycroft / Alexa / Siri")
+
     while True :
         frame = mic_stream.getFrame()
         result = multi_hw_engine.findBestMatch(frame)

diff --git a/eff_word_net/sample_refs/mycroft_ref.json b/eff_word_net/sample_refs/mycroft_ref.json
diff --git a/setup.py b/setup.py
@@ -3,7 +3,7 @@
 
 setup(
     name = 'EfficientWord-Net',
-    version = '0.0.1',
+    version = '0.1.1',
     description = 'Few Shot Learning based Hotword Detection Engine',
     long_description = open("./README.md",'r').read(),
     long_description_content_type = 'text/markdown',

diff --git a/wakewords/mobile_ref.json b/wakewords/mobile_ref.json
diff --git a/wakewords/mycroft/mycroft_en-GB_CharlotteV3Voice.mp3 b/wakewords/mycroft/mycroft_en-GB_CharlotteV3Voice.mp3
diff --git a/wakewords/mycroft/mycroft_en-GB_JamesV3Voice.mp3 b/wakewords/mycroft/mycroft_en-GB_JamesV3Voice.mp3
diff --git a/wakewords/mycroft/mycroft_en-GB_KateV3Voice.mp3 b/wakewords/mycroft/mycroft_en-GB_KateV3Voice.mp3
diff --git a/wakewords/mycroft/mycroft_en-US_AllisonV3Voice.mp3 b/wakewords/mycroft/mycroft_en-US_AllisonV3Voice.mp3
diff --git a/wakewords/mycroft/mycroft_en-US_HenryV3Voice.mp3 b/wakewords/mycroft/mycroft_en-US_HenryV3Voice.mp3
diff --git a/wakewords/mycroft/mycroft_en-US_MichaelV3Voice.mp3 b/wakewords/mycroft/mycroft_en-US_MichaelV3Voice.mp3
diff --git a/wakewords/mycroft/mycroft_en-US_OliviaV3Voice.mp3 b/wakewords/mycroft/mycroft_en-US_OliviaV3Voice.mp3