SPIKE is a tool box, which natural interface is python. You will find here some advices on using SPIKE.
SPIKE mostly defines a dataset NPKData
structure (usually called d
in the doc), which is composed of:
-
a large n-dimensional real
numpy
nparray object that contains the actual spectroscopic data- modified by all the processing methods, called
buffer
and accessed with thed.get_buffer()
andd.set_buffer()
methods
- modified by all the processing methods, called
-
one Axis() object per dimension, which contains all the characteristics along the given axis (calibration, data-type, size, etc…)
d.axis1
for a 1D spectrumd.axis1
andd.axis2
for a 2D spectrum - with thed.axes(i)
convenient method
-
some attibutes general to the dataset, in particular the
d.params
which contains a copy of the acquisition parameters, structured as a dictionnary -
a large (and expandable) set of methods that act on the dataset
the same organization is used regardless we are handling NMR Orbitrap or FTICR Data, however in each case, the Axis() object is just overloaded with the corresponding version for each analytical method
These several kind of NPK dataset are defined in spike.NMR
spike.FTICR
and spike.Orbitrap
which are typically imported in the beginning of a program.
These codes defines the basic objects NMRData
FTICRData
OrbiData
The initialisation methods of these object contain several options (creating from a file, a buffer, etc...), check the documentation.
An even easier way of creating a dataset is by importing from a spectroscopic data-format.
The formats handled by the program are defined in the File
folder, with format from NMR, FTICR and Orbitrap.
The whole program is organized to spare as much as possible memory space, so that the program is fast, and permits to handle large datasets in-memory. Because of this, the methods acting on the data (for instance Fourier Transform, or any other modification) actually modifies the data in-place.
Also, all methods attached to the NPK dataset returns the dataset itself, after modification (or no modification, even .display()
returns the dataset).
This permits a "pipeline" way of processing datasets which is extremely convenient.
so if you want to do:
d = Import_1D(myfile) # use the specific importer (for instance)
d.hanning() # apodisation
d.zf(2) # zero-filling
d.rfft() # real fast Fourier transform
d.modulus() # take the modulus
d.display() # show the results
you can do as well
d = Import_1D(myfile).hanning().zf(2).rfft().modulus().display() # as a pipeline
if you are just interested in the picture, you don't even have to store the result
Another very typical usage is to Import the dataset and just test some feature without modifying it.
For this, the .copy()
which duplicates the whole objet, comes very handy:
d = Import_1D(myfile) # use the specific importer to load the dataset
d.copy().hamming().rfft().display() # and use copy() in the pipeline to look at it without modifying d
This construct uses a minimal amount of memory, and does not clutter your variable space.
Along with the data held in the dataset, there is an Axis
object for each dimension of the dataset.
They hold all the metadata needed to characterize the unit in which the data are presented.
They are typed, and all heritate from the basic Axis()
object and there are several types:
NMRAxis
to handle NMR experiments, knows unit second(s) Hertz(h) and ppm(p)FTMSAxis
to handle FTMS experiments, knows unit second(s) Hertz(h) and m/z(mz)FTICRAxis
to handle FTICR experimentsOrbiAxis
to handle Orbitrap experimentsTimeAxis
to handle time evolution (for instance for LC-MS or LC-NMR) knows msec(ms), sec(s), min(m), and hours(h)LaplaceAxis
to handle Laplace transform, (for instance DOSY), knows damping(d)
all axes also know about index(i) which is the position in the buffer and about current(c) which is the currently selected unit with the .set_unit()
method, or with the .currentunit
attribute
The axes permit to convert from one unit system to another freely and back, with methods called .ytox(val)
which computes the value of val
from unit y
into unit x
, and .xtoy()
which performs the opposite.
All these methods use the internal calibration defined by appropriate attributes.
For instance in NMR:
d.axis1.ptoi(5.5) # returns the index at which the 5.5 ppm value is located
d.axis2.itoh(1234) # returns the frequency at index 1234 along F2 for a 2D
d.axis1.htop(1000) # returns the ppm value located at 1000 Hz
or in FTMS (FTICR as well as Orbitrap)
d.axis1.mztoi(138.85) # returns the index at which the m/z 138.85 is located
d.axis2.itos(1234) # returns the t1 time of the transient at index 1234 along F2 for a 2D
d.axis1.htomz(1000) # returns the mz value located at 1000 Hz
These are just a few examples.
Note that the i
(index) unit does not have to be integer. So if you want to access the buffer at this value, you have to do int(d.axis1.ptoi(5.5))
or even better int(round(d.axis1.ptoi(5.5)))
In FTICR, the ML1
ML2
ML3
parameters in Bruker are imported as CalibA
, CalibB
, CalibC
attribute of a FTICR Axis()
(with the caveat that Spike uses a homogeneous convention, while Bruker inverses ML2
depending on the fact that ML3
is 0 or non-null.
Display of 1D and 2D experiment is performed with the .display()
method.
It has many optional parameters.
I copy here the documentation:
display(self, scale=1.0, autoscalethresh=3.0, absmax=None, show=False, label=None, new_fig=True, axis=None,
zoom=None, xlabel="_def_", ylabel="_def_", title=None, figure=None,
linewidth=1, color=None, mpldic={}, NbMaxVect=None)
scale allows to increase the vertical scale of display,
in 2D if "auto" will compute a scale so the first level is located at at autoscalethresh sigma
autoscalethresh used for scale="auto"
absmax overwrite the value for the largest point, which will not be computed
display is scaled so that the largest point is first computed (and stored in _absmax),
and then the value at _bsmax/scale is set full screen
show will call plot.show() at the end, allowing every declared display to be shown on-screen
useless in ipython
label add a label text to plot
xlabel, ylabel : axes label (default is self.currentunit - use None to remove)
axis used as axis if present, axis length should match experiment length
in 2D, should be a pair (xaxis,yaxis)
new_fig will create a new window if set to True (default) (active only is figure==None)
if new_fig is a dict, it will be passed as is to plt.figure()
figure if not None, will be used directly to display instead of using its own
figure is actually a "matplotlib.axes" object, not a "matplotlib.figure"
zoom is a tuple defining the zoom window (left,right) or ((F1_limits),(F2_limits))
defined in the current axis unit (points, ppm, m/z etc ....)
linewidth: linewidth for the plots
mpldic: a dictionnary passed as is to the plot command
NbMaxVect: if set to a number, will limit the number of displayed vectors to that number by decimating the data (in 1D only so far)
can actually be called without harm, even if no graphic is available, it will just do nothing.
Datasets can be stored to disk in several formats:
save_txt(name) .load_txt(name)
creates a plain text file with no metadata - available only of 1D datasetsave_csv(name) .load_csv(name)
creates a csv file with the axis metadata - available only of 1D datasetsave(name) .load(name)
creates the historic Gifa / NPK compatible file (1D or 2D) with all axes metada (used for NMR)save_msh5(name)
creates a msh5 (HDF5 format) file for 1D or 2D MS experiment - load is performed at creation
msh5 and Gifa file, can be also loaded with the creator.
In addition to a large set of methods, it is possible to do dataset arithmetics:
say you have two datasets called d1
and d2
you can do the following operations:
# these commands can be pipelined
d1.mult(2) # multiplies in place the content of d1
d1.add(-1.2) # adds in place 1.2 to the content of d1
d1.add(d2) # adds in place the content of d2 to d1
# will use the numpy rules for datasets of different dimensions
d1.mult(1+2j) # multiplies in place the content of d1 by the complex number 1+2j
# note that spikes keep track of the complex state of the dataset,
# that last command will fail if d1 is real
# these cannot
d3 = 2*d1 # creates a new dataset d3 which content is the double of d1
d3 = d1 - 1 # creates a new dataset d3 which content is d1 - 1
# you can even access single points:
x = d1[123] # for a 1D
y = d2[123,456] # for a 2D
d2[654,321] = y # etc.
Fourier transform
# time to frequency
.fft() .rfft() .fftr() # complex to complex / real to complex / complex to real
# frequency to time inverse Fourier transform
.ifft() .irfft() .ifftr() # and their inverse
The real/complex state of the dataset is maintained by spike, and should not changed. In 2D, the FT methods ask for the axis on which to act.
Note that d.ifftr().phase(90,0).fftr()
performs the Hilbert transform of d
And many other things - check the documentation.
Besides the arithmetic, the main buffer is accessible through the .get_buffer()
and .set_buffer()
methods.
.get_buffer()
will actually return a plain numpy array typed relative to the axis .type
so that for a complex data-set, this command returns a complex array, whereas for a real data-set the array is real.
.get_buffer()
has an optional argument copy
which is False
by default. When False
, only an Image of the data is returned, with no memory location being actually copied, so it does not take memory space.
Because of this mechanism, operations like
data.set_buffer( 2*data.get_buffer() )
is actually performed in-place in the main buffer, and does not use any additional memory space. However, with data being typed real, doing:
data.set_buffer( 2j*data.get_buffer() )
would create a new buffer, as a 2j
if the complex number
The main buffer is actually stored in the attribute .buffer
of an NPKData. Thus .buffer
is a standard numpy array, so you can actually play with it:
x = data.buffer[i]
If the data are complex along one axis, then buffer is still real, but real and imaginary values are interleaved.
The state real / complex is conserved in the attributes d.axis1.itype
(0/1)
Using directly buffer
is strongly discouraged, because the data-set is not checked. This syntax is used in some parts of the internal code for optimization reasons.
For a 2D experiment, the complex state of the data is handled independently along each axis.
This is conserved in the attributes d.axis1.itype
and d.axis2.itype
Being complex along both axes means that the dataset is actually in hypercomplex mode (check DOI: 10.1016/0022-2364(88)90036-4 for details).
This permits to apply independent phase corrections along each axes.
SPIKE is an open-source program, this means that external contributions are welcomed.
If you believe your improvement is useful for other people, please submit a pull request
.
If you consider adding some new feature, it is probably a good idea to implement it as a plugin.
The code contains already quite a few plugins, some are quite sophisticated - see Peaks.py
for instance which implements a 1D and 2D peak picker, as well as a centroid evaluation and a full listing capability.
You can check also fastclean.py
for a very simple plugin, or wavelet.py
for a plugin relying on an external library which has to be installed.
- Spike contains many tools, most of the basic function for data interaction are found in the
NPKData.py
master file; utilities are also scattered in theutil
module. Use them, life will be easier for the other users. - Please write tests, even for the plugins ! We use standard python
unittest
, so nothing fancy. All the tests are run automatically every night (code isTests.py
), so it will detect rapidly all potential problem.
The main program is NPKData.py
, which defines NPKData object on which everything is built.
Spectroscopies are defined in the NMR.py
, FTICR.py
and Orbitrap.py
code, which sub class NPKData.
Many programs contain routines tests (in an object unittest) that also serve as an example of use.
The code goes through extensive tests daily, using the unittest
Python library. However, many tests rely on a set of tests data-sets which is more than 1Go large, and not distributed here.
a small description of the files:
-
NPKData.py the main library, allows all processing for any kind of experiments (1D, 2D and 3D) to be used as a library, in a stand-alone program or in IPython interactive session
-
NMR.py The NPKData library adapted to NMR processing
-
FTMS.py an extension of NPKData generic to Orbitrap and FTICR
-
FTICR.py an extension of FTMS for processing FT-ICR datasets (1D and 2D)
-
Orbitrap.py an extension of FTMS for processing Orbitrap datasets (1D)
-
processing.py a stand alone program, written on the top of FTICR.py, allowing the efficient processing of FT-ICR 2D datasets, with no limit on the size of the final file Produces multi-resolution files syntax :
python -m spike.processing param_file.mscf
-
Algo
contains algorithms to process data-sets (MaxEnt, Laplace, etc...) not everything active ! -
Display
a small utility to choose either for regular Matplotlib display of fake no-effect display (for tests) -
File
Importers for various file format for spectrometry, as well as the HDF5 SPIKE native format. -
plugins
Tools automatically plugged in NPK kernel : display utilities, urQRd algorithm and various other tools. -
Miscellaneous
"en vrac" -
util
set of low-level tools used all over in the code -
v1
a library implementing a partial compatibility with the NPKV_V1 program -
example of configuration files
- process_eg.mscf
- test.mscf
-
and various utilities
- NPKConfigParser.py
reads .mscf files - NPKError.py
generates error msg - QC.py
Quality Check - Tests.py
runs all tests - dev_setup.py rolls a new version
- version.py
defines version number
- NPKConfigParser.py
Current Active authors for SPIKE are:
- Marc-André Delsuc
madelsuc -at- unistra.fr
- Laura Duciel
laura.duciel -at- casc4de.eu
Previous authors:
- Christian Rolando
christian.rolando -at- univ-lille1.fr
- Lionel Chiron
Lionel.Chiron -at- casc4de.eu
- Petar Markov
petar.markov -at- igbmc.fr
- Marie-Aude Coutouly .
Marie-Aude.COUTOULY - at- datastorm.fr
Covered code is provided under this license on an "as is" basis, without warranty of any kind, either expressed or implied, including, without limitation, warranties that the covered code is free of defects. The entire risk as to the quality and performance of the covered code is with you. Should any covered code prove defective in any respect, you (not the initial developer or any other contributor) assume the cost of any necessary servicing, repair or correction.
Downloading code and datasets from this page signifies acceptance of the hereunder License Agreement. The code distributed here is covered under the CeCILL license : http://www.cecill.info/index.en.html