Skip to content

6. Licence, Release Notes & Additional Information

cherhaus edited this page Sep 2, 2013 · 1 revision

Troubleshooting

  • In general, if something does not work as desired, try first to rerun with option -v. Scripts are quite verbose and in most cases ouptput allows a solid guess what went wrong.
  • Most frequently issues may occur during generation of persistent memory objects, i.e. due to lack of memory. In that case, fragmented semaphore arrays or memory segments may prevent generation of further memory objects. If this happens, remove all existing memory segments with fp2mem.pl -destroy, use ipcs -a to get a list of semaphores and then remove all remaining disturbing arrays and segments from memory with ipcrm -m or ipcrm -s together with the semaphore ids.

Version Info

V 0.05:

  • Included the option to report together with ParaSim results an additional (concatenated) data column (e.g. for Smiles output)
  • Included the option for rdkit2parasim.py and Molecule2ParaSim.xml to generate an additional Smiles column for ParaSim reference files
  • The file format does no longer expect a mandatory second column BITCOUNT. If not existing, bitcounts are calculated during file reading.
  • Eliminated a major bug which slowed down ParaSim during startup
  • Harmonized interfaces of rdkit2parasim.py with the other tools
  • For rdkit2parasim.py and Molecule2ParaSim.xml, an ID parameter is no longer mandatory when reading from Smiles. If not set, a column 'Index' will be created.

V 0.04:

  • This is an important bugfix release. The persistent memory segment size is no longer static but adapted to the used memory to avoid depletion of segment addresses
  • rdkit2parasim.py, Molecule2ParaSim.xml and simsearch.pl now allow not only filenames as query input parameters put also valid Smiles strings.

V 0.03:

  • Allow non-integer structure IDs
  • Achitecture: Externalize shared procedures (i.e. parsers) into module
  • Determine and control fingerprint length from fingerprint itself (no option -l)
  • Check query vs. reference fingerprint types to avoid mismatches

V 0.02:

  • Proof of concept

Development Roadmap

  • Run a ParaSim process for zero response times
  • Add a separate tool for statistical analyses of datasets (e.g. full histogrammes)
  • Accept hex format for fingerprints
  • Optionally report progress if output is redirected to file
  • Read fingerprints as blocks
  • Avoid manual recompilation for different processor architectures
  • Additional similarity indexes
  • Try a Windows version using Win32::MMF for shared memory and OpenMP for multithreading
  • Different input (FPS) and output formats

ParaSim vs. ChemFP

Andrew Dalke from Dalke Scientific develops and provides ChemFP, an OpenSource fingerprint toolbox optimized for fast similarity searches, which is currently about two to five time faster than ParaSim (see http://code.google.com/p/chem-fingerprints/). However, ParaSim was continued to be developed as a separate project with the specific goal to make use of persistent memory objects for frequently repeated large-scale similarity searches. In later stages of the development of ParaSim it will presumably be tried to implement ChemFP function calls into ParaSim. If one day ChemFP should make use of persistent memory objects by itself, further development of ParaSim may get obsolete.


Acknowledgements

Algorithms in the current version of ParaSim are inspired by and with kind permission contain concepts for speed-optimized bitcount calculations presented by Andrew Dalke from Dalke Scientific (http://www.dalkescientific.com, see detailed documentation).

Thanks to Thomas Fahle (http://www.thomas-fahle.de) for introduction to the concept of IPC::Sharelite.


Licence

In order to allow usage of ParaSim in different collaboration scenarios with academic or industrial partners, source code of the programme itself and all eventually evolving present and future supporting scripts and material is released under the GNU General Public Licence v3.