Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Future todo items #95

Open
csim063 opened this issue May 12, 2021 · 5 comments
Open

Future todo items #95

csim063 opened this issue May 12, 2021 · 5 comments

Comments

@csim063
Copy link
Collaborator

csim063 commented May 12, 2021

This issue acts as a list of all the items we would like to implement in the future, that while not fundamental to the packages usage would be nice to have.

@mspngnbrg
Copy link
Collaborator

mspngnbrg commented Nov 8, 2021

Improve finding of min_conf sites in the "saturation phase"? Running 100k iterations showed that the RCE decreased even after 50k, but quite slowly. Finding the few "important" sites worth of optimization may take a while with our random site-selection approach. I speculate that, in the saturation phase with only few "important" sites, repeatedly executing all sites in a row (in random order), would probably improve the chances to find the "important" sites earlier. So rather than (A): "find random site & find min_conf_species" for the saturation phase I would suggest (B): bring all sites in random order and, one site after another, find min_conf_species. If done with one cycle, start at (B) again. To summarize, if for example only site #42 out of 100 sites has potential to be improved, (B) would find it within 100 steps, but (A) could need much longer.

@bitbacchus
Copy link
Member

bitbacchus commented Nov 10, 2021

I am not sure

(1) if I understood your idea completely ;-)
(2) if it really would help, because there is likely no such thing as "a few most important sites" left in saturation phase, because they are only "important" because the other sites fit very well... I would love to find a shortcut but I doubt we'll find one...

Btw. after 50k iterations, the improvement was quite visible in the error plot:

Screenshot from 2021-10-30 22-54-20 (1)

@mspngnbrg
Copy link
Collaborator

A number of suggestions:

  1. rename max_iterations to optimization_epochs? We do not really iterate...
  2. rename target to objective for consistency?
  3. rename plot() to plot_error() or something else more distinguishable from the base::plot() function?

@csim063
Copy link
Collaborator Author

csim063 commented Dec 5, 2022

A suggestion from Karel Mokany. "...non-random selection of a site in which to change presence-absences". e.g. select sites that are creating the greatest error in prediction more often.

@mspngnbrg
Copy link
Collaborator

Add stopping criterion: I suggest having a rule like: spectre stops after no improvement in the last N iterations. Another option (less preferred by me) would be: spectre stops if improvement in the last N iterations was less than M%. The value of M should be relative, since the magnitude of error [=sum(abs(target - prediction))] depends on numbers of sites and species. Is that easy to implement, @bitbacchus ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants