Skip to content

Commit

Permalink
Commit before v0.7
Browse files Browse the repository at this point in the history
  • Loading branch information
Rapfff committed Sep 27, 2022
1 parent cc0f12b commit 930be46
Show file tree
Hide file tree
Showing 22 changed files with 247 additions and 129 deletions.
3 changes: 2 additions & 1 deletion docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,8 @@
# ones.
extensions = [
'sphinx.ext.autodoc',
'sphinx.ext.autosummary'
'sphinx.ext.autosummary',
'sphinx.ext.intersphinx'
]
autoclass_content = 'both'
# Add any paths that contain templates here, relative to this directory.
Expand Down
23 changes: 23 additions & 0 deletions docs/source/formal.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
Formalism
=========
This page contains the formal presentation of most of the learning algorithm and models implemented in Jajapy.

* HMM
* Baum-Welch for HMMs :download:`pdf <pdfs/BW_for_HMM.pdf>`
* A Tutorial on HMMs and Selected Applications in Speech Recognition (L.R. Rabiner) :download:`pdf <https://web.ece.ucsb.edu/Faculty/Rabiner/ece259/Reprints/tutorial%20on%20hmm%20and%20applications.pdf>`

* MC
* Baum-Welch for MCs :download:`pdf <pdfs/BW_for_MC.pdf>`
* Learning Stochastic Regular Grammars by Means of STate Merging Method (R. Carrasco and J. Oncina) :download:`pdf <https://grfia.dlsi.ua.es/repositori/grfia/pubs/57/icgi1994.pdf>`

* MDP
* Baum-Welch for MDPs :download:`pdf <pdfs/BW_for_MDP.pdf>`
* Active Learning of Markov Decision Processes using Baum-Welch algorithm (G. Bacci et al.) :download:`pdf <https://arxiv.org/pdf/2110.03014.pdf>`
* Learning Deterministic Probabilistic Automata from a Model Checking Perspective (H. Mao et al.) :download:`pdf <https://people.cs.aau.dk/~tdn/papers/ML_Hua.pdf>`

* GoHMM
* Baum-Welch for GoHMMs :download:`pdf <pdfs/BW_for_GoHMM.pdf>`
* A New Improved Baum-Welch Algorithm for Unsupervised Learning for Continuous-Time HMM Using Spark :download:`pdf <http://www.inass.org/2020/2020022920.pdf>`

* MGoHMM
* Baum-Welch for MGoHMMs :download:`pdf <pdfs/BW_for_MGoHMM.pdf>`
2 changes: 1 addition & 1 deletion docs/source/help.rst
Original file line number Diff line number Diff line change
Expand Up @@ -123,7 +123,7 @@ To learn a model with jajapy, you need to have a training set. And to evaluate i
these sets are given, but one can imagine a situation you have the original model. Then you first need to generate the sets and then you can
use them.

Once a model is learnt, we can directly translate it to a *stormpy sparse model* and model check some properties (see the examples :ref:`stormpy-example`).
Once a model is learnt, we can directly translate it to a *stormpy sparse model* and model check some properties (see the examples :ref:`_stormpy-example`).


5. Examples
Expand Down
1 change: 1 addition & 0 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,7 @@ Content

help
tuto
formal
References


Expand Down
Binary file added docs/source/pdfs/BW_for_GoHMM.pdf
Binary file not shown.
Binary file added docs/source/pdfs/BW_for_HMM.pdf
Binary file not shown.
Binary file added docs/source/pdfs/BW_for_MC.pdf
Binary file not shown.
Binary file added docs/source/pdfs/BW_for_MDP.pdf
Binary file not shown.
Binary file added docs/source/pdfs/BW_for_MGoHMM.pdf
Binary file not shown.
24 changes: 19 additions & 5 deletions docs/source/tuto.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ Tutorial
1. A simple example with HMMs
-----------------------------

`python file <https://github.com/Rapfff/jajapy/tree/main/examples/01-hmms.py>`_
:download:`python file <https://github.com/Rapfff/jajapy/tree/main/examples/01-hmms.py>`

In this example, we will:

Expand Down Expand Up @@ -105,7 +105,7 @@ If ``quality`` is positive then we are overfitting.
2. An example with MC: random restart
-------------------------------------

`python file <https://github.com/Rapfff/jajapy/tree/main/examples/02-mcs.py>`_
:download:`python file <https://github.com/Rapfff/jajapy/tree/main/examples/02-mcs.py>`


This time we will try to learn the `Reber grammar <https://cnl.salk.edu/~schraudo/teach/NNcourse/reber.html>`_.
Expand Down Expand Up @@ -224,7 +224,7 @@ for a better readability.

3. An example with MDP: active learning
---------------------------------------
`python file <https://github.com/Rapfff/jajapy/tree/main/examples/03-mds.py>`_
:download:`python file <https://github.com/Rapfff/jajapy/tree/main/examples/03-mdps.py>`

Here, we will learn a MDP representing the following grid world:

Expand All @@ -249,7 +249,7 @@ First we create the original model.
def modelMDP_gridworld():
alphabet = ['S','M','G','C','W',"done"]
actions = list("nsew")
nb_states = 12
nb_states = 9
s0 = ja.MDP_state({'n': [(0,'W',1.0)],
's': [(3,'M',0.6),(4,'G',0.4)],
'e': [(1,'M',0.6),(4,'G',0.4)],
Expand Down Expand Up @@ -332,10 +332,24 @@ scheduler with probability 0.25.
.. _stormpy-example:

Now, one could ask for the scheduler which minimizes the number of step before reaching our objective,
the bottom-right state. For this, we can use stormpy:

.. code-block:: python
# MODEL CHECKING
#---------------
storm_model = ja.modeltoStorm(output_model)
properties = stormpy.parse_properties("Rmax=? [ F \"done\" ]")
result = stormpy.check_model_sparse(storm_model, properties[0], extract_scheduler=True)
scheduler = result.scheduler
print(result)
4. An advanced example with MC and model checking
-------------------------------------------------

`python file <https://github.com/Rapfff/jajapy/tree/main/examples/04-mcs_with_stormpy.py>`_
:download:`python file <https://github.com/Rapfff/jajapy/tree/main/examples/04-mcs_with_stormpy.py>`


.. image:: pictures/knuthdie.png
Expand Down
18 changes: 16 additions & 2 deletions examples/03-mdps.py
Original file line number Diff line number Diff line change
@@ -1,10 +1,11 @@
import jajapy as ja
from numpy import array
import stormpy

def modelMDP_gridworld():
alphabet = ['S','M','G','C','W',"done"]
actions = list("nsew")
nb_states = 12
nb_states = 9
s0 = ja.MDP_state({'n': [(0,'W',1.0)],
's': [(3,'M',0.6),(4,'G',0.4)],
'e': [(1,'M',0.6),(4,'G',0.4)],
Expand Down Expand Up @@ -56,7 +57,7 @@ def modelMDP_gridworld():
def example_3():
original_model = modelMDP_gridworld()
# SETS GENERATION
#------------------------
#----------------
# We generate 1000 sequences of 10 observations for each set
scheduler = ja.UniformScheduler(original_model.getActions())
training_set = original_model.generateSet(1000,10,scheduler)
Expand All @@ -73,6 +74,19 @@ def example_3():
print(output_model)
print(output_quality)

# MODEL CHECKING
#---------------
storm_model = ja.modeltoStorm(output_model)
print(storm_model)
formula_str = "Rmax=? [ F \"done\" ]"
properties = stormpy.parse_properties(formula_str)
result = stormpy.check_model_sparse(storm_model, properties[0], extract_scheduler=True)
scheduler = result.scheduler
print(result)
for state in storm_model.states:
choice = scheduler.get_choice(state)
action = choice.get_deterministic_choice()
print("In state {} choose action {}".format(state, output_model.actions[action]))

if __name__ == '__main__':
example_3()
53 changes: 44 additions & 9 deletions jajapy/base/BW.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
from sys import platform
from multiprocessing import cpu_count, Pool
from numpy import array, dot, append, zeros, ones
from numpy import array, dot, append, zeros, ones, longdouble, float64
from datetime import datetime
from .Set import Set

Expand Down Expand Up @@ -34,7 +34,7 @@ def h_tau(self,s1: int,s2: int,obs: str) -> float:
"""
return self.h.tau(s1,s2,obs)

def computeAlphas(self,sequence: list) -> array:
def computeAlphas(self, sequence: list, dtype=False) -> array:
"""
Compute the alpha values for ``sequence`` under the current BW
hypothesis.
Expand All @@ -43,6 +43,9 @@ def computeAlphas(self,sequence: list) -> array:
----------
sequence : list of str
sequence of observations.
dtype : numpy.scalar
If it set, the output will be a numpy array of this type,
otherwise it is a numpy array of float64.
Returns
-------
Expand All @@ -53,13 +56,17 @@ def computeAlphas(self,sequence: list) -> array:
init_arr = self.h.initial_state
zero_arr = zeros(shape=(len_seq*self.nb_states,))
alpha_matrix = append(init_arr,zero_arr).reshape(len_seq+1,self.nb_states)
if dtype != False:
alpha_matrix = alpha_matrix.astype(dtype)
else:
dtype = float64
for k in range(len_seq):
for s in range(self.nb_states):
p = array([self.h_tau(ss,s,sequence[k]) for ss in range(self.nb_states)])
p = array([self.h_tau(ss,s,sequence[k]) for ss in range(self.nb_states)],dtype=dtype)
alpha_matrix[k+1,s] = dot(alpha_matrix[k],p)
return alpha_matrix.T

def computeBetas(self,sequence: list) -> array:
def computeBetas(self, sequence: list, dtype=False) -> array:
"""
Compute the beta values for ``sequence`` under the current BW
hypothesis.
Expand All @@ -68,6 +75,9 @@ def computeBetas(self,sequence: list) -> array:
----------
sequence : list of str
sequence of observations.
dtype : numpy.scalar
If it set, the output will be a numpy array of this type,
otherwise it is a numpy array of float64.
Returns
-------
Expand All @@ -78,9 +88,13 @@ def computeBetas(self,sequence: list) -> array:
init_arr = ones(self.nb_states)
zero_arr = zeros(shape=(len_seq*self.nb_states,))
beta_matrix = append(zero_arr,init_arr).reshape(len_seq+1,self.nb_states)
if dtype != False:
beta_matrix = beta_matrix.astype(dtype)
else:
dtype = float64
for k in range(len(sequence)-1,-1,-1):
for s in range(self.nb_states):
p = array([self.h_tau(s,ss,sequence[k]) for ss in range(self.nb_states)])
p = array([self.h_tau(s,ss,sequence[k]) for ss in range(self.nb_states)],dtype=dtype)
beta_matrix[k,s] = dot(beta_matrix[k+1],p)
return beta_matrix.T

Expand All @@ -101,8 +115,17 @@ def _runProcesses(self,training_set):
return [res.get() for res in tasks if res.get() != False]
else:
return [self._processWork(seq, times) for seq,times in zip(training_set.sequences,training_set.times)]

def _endPrint(self,it,rt,ll):
print()
print("---------------------------------------------")
print("Learning finished")
print("Iterations:\t ",it)
print("Running time:\t ",rt)
print("---------------------------------------------")
print()

def fit(self,training_set: Set,initial_model,output_file: str,epsilon: float, pp: str):
def fit(self,training_set: Set,initial_model,output_file: str,epsilon: float, max_it: int, pp: str, verbose: bool):
"""
Fits the model according to ``traces``.
Expand All @@ -120,21 +143,28 @@ def fit(self,training_set: Set,initial_model,output_file: str,epsilon: float, pp
loglikelihood of the training set under the two last hypothesis is
lower than ``epsilon``. The lower this value the better the output,
but the longer the running time.
max_it: int
Maximal number of iterations. The algorithm will stop after `max_it`
iterations.
Default is infinity.
pp : str
Will be printed at each iteration.
verbose: bool
Print or not a small recap at the end of the learning.
Returns
-------
Model
fitted model.
"""
start_time = datetime.now()
self.h = initial_model
self.nb_states = self.h.nb_states

counter = 0
prevloglikelihood = 10
nb_traces = sum(training_set.times)
while True:
while counter < max_it:
print(pp, datetime.now(),counter, prevloglikelihood/nb_traces,end='\r')
temp = self._runProcesses(training_set)
self.hhat, currentloglikelihood = self._generateHhat(temp)
Expand All @@ -144,8 +174,13 @@ def fit(self,training_set: Set,initial_model,output_file: str,epsilon: float, pp
break
else:
prevloglikelihood = currentloglikelihood


running_time = datetime.now()-start_time

if output_file:
self.h.save(output_file)
print()

if verbose:
self._endPrint(counter,running_time,currentloglikelihood)

return self.h
12 changes: 12 additions & 0 deletions jajapy/base/Set.py
Original file line number Diff line number Diff line change
Expand Up @@ -224,3 +224,15 @@ def loadSet(file_path: str) -> Set:
f.close()
return Set(res_set[0],res_set[1],from_MDP)

def setFromList(l: list) -> Set:
"""
Convert a list of sequences of observations to a set.
Parameters
----------
l : list
list of sequences of observations.
"""
s = Set([],[])
s.setFromList(l)
return s
13 changes: 10 additions & 3 deletions jajapy/ctmc/BW_CTMC.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
from .CTMC import *
from ..base.BW import BW
from ..base.Set import Set
from numpy import array, zeros, dot, append, ones, log
from numpy import array, zeros, dot, append, ones, log, inf


class BW_CTMC(BW):
Expand Down Expand Up @@ -148,7 +148,7 @@ def computeBetas_nontimed(self,sequence: list) -> array:
def fit(self, traces: Set, initial_model: CTMC=None, nb_states: int=None,
random_initial_state: bool=False, min_exit_rate_time : int=1.0,
max_exit_rate_time: int=10.0, self_loop: bool = True,
output_file: str=None, epsilon: float=0.01, pp: str=''):
output_file: str=None, epsilon: float=0.01, max_it: int= inf, pp: str='', verbose: bool = True):
"""
Fits the model according to ``traces``.
Expand Down Expand Up @@ -185,8 +185,15 @@ def fit(self, traces: Set, initial_model: CTMC=None, nb_states: int=None,
loglikelihood of the training set under the two last hypothesis is
lower than ``epsilon``. The lower this value the better the output,
but the longer the running time. By default 0.01.
max_it: int
Maximal number of iterations. The algorithm will stop after `max_it`
iterations.
Default is infinity.
pp : str, optional
Will be printed at each iteration. By default ''
verbose: bool, optional
Print or not a small recap at the end of the learning.
Default is True.
Returns
-------
Expand All @@ -202,7 +209,7 @@ def fit(self, traces: Set, initial_model: CTMC=None, nb_states: int=None,
min_exit_rate_time, max_exit_rate_time,
self_loop, random_initial_state)
self.alphabet = initial_model.getAlphabet()
return super().fit(traces, initial_model, output_file, epsilon, pp)
return super().fit(traces, initial_model, output_file, epsilon, max_it, pp, verbose)


def splitTime(self,sequence: list) -> tuple:
Expand Down
13 changes: 10 additions & 3 deletions jajapy/ctmc/MM_CTMC_Composition.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,8 +11,8 @@ def fit(self, traces: Set, initial_model_1: CTMC=None, nb_states_1: int=None,
max_exit_rate_time_1: int=10.0, initial_model_2: CTMC=None,
nb_states_2: int=None, random_initial_state_2: bool=False,
min_exit_rate_time_2 : int=1.0, max_exit_rate_time_2: int=10.0,
output_file: str=None, epsilon: float=0.01, pp: str='',
to_update: int=None):
output_file: str=None, epsilon: float=0.01, max_it: int= inf, pp: str='',
to_update: int=None, verbose: bool = True):
"""
Fits the model according to ``traces``.
Expand Down Expand Up @@ -64,13 +64,20 @@ def fit(self, traces: Set, initial_model_1: CTMC=None, nb_states_1: int=None,
loglikelihood of the training set under the two last hypothesis is
lower than ``epsilon``. The lower this value the better the output,
but the longer the running time. By default 0.01.
max_it: int
Maximal number of iterations. The algorithm will stop after `max_it`
iterations.
Default is infinity.
pp : str, optional
Will be printed at each iteration. By default ''.
to_update: int, optional
If set to 1 only the first hypothesis will be updated,
If set to 2 only the second hypothesis will be updated,
If set to None the two hypothesis will be updated.
Default is None.
verbose: bool, optional
Print or not a small recap at the end of the learning.
Default is True.
Returns
-------
Expand Down Expand Up @@ -100,7 +107,7 @@ def fit(self, traces: Set, initial_model_1: CTMC=None, nb_states_1: int=None,
self.alphabets = [None,self.hs[1].getAlphabet(),self.hs[2].getAlphabet()]
self.disjoints_alphabet = len(set(self.alphabets[1]).intersection(set(self.alphabets[2]))) == 0
self.to_update = to_update
super().fit(traces, initial_model, output_file, epsilon, pp)
super().fit(traces, initial_model, output_file, epsilon, max_it, pp, verbose)
if output_file:
self.hs[1].save(output_file+"_1.txt")
self.hs[2].save(output_file+"_2.txt")
Expand Down
Loading

0 comments on commit 930be46

Please sign in to comment.