forked from hoffmangroup/segway
-
Notifications
You must be signed in to change notification settings - Fork 0
/
NEWS
739 lines (622 loc) · 32.1 KB
/
NEWS
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
3.0.4:
* segway: fix selection of too many potential virtual evidence windows by 1bp
* segway: fix --recover option for annotate task
* segway: add SegRNA documentation
3.0.3:
* segway: fixed posterior label output not working with resolutions greater than 1bp
* segway: fixed bug when using virtual evidence with higher resolutions
* segway: enable virtual evidence for posterior tasks
3.0.2:
* segway: added readthedocs configuration file
* segway: added conda environment file
* segway: updated GMTK download locations in documentation
* segway: change GMTK to a runtime only dependency
3.0.1:
* segway: fixed regression with the prior-strength option
* segway: added manifest file for easier installation on older systems with Python
3.0:
* segway: added modular interface for each task. subtasks include init, run, finish,
and run-round for training specifically
* segway: added cluster specific max job polling times (thanks to Rachel Chan
for diagnosis)
* segway: added virtual evidence as an option to the Segway model
2.0.5:
* segway: fixed bug causing identify to fail with archives of different track numbers
* segway: fixed python 3 encoding issue arising when --BigBed is selected
* segway: removed sleeping on main segway process for multiple training instances
* segway: when running locally segway will no longer initially sleep training threads
* segway: fix RestartableJob priority sorting when memory requirements match
2.0.4:
* segway: gaussian mixtures no longer split with only one component
* segway: priors and mixture components are now derived from previous EM
iterations instead of from input master
* segway: the DPMF table is now stored as constants instead of a fixed table
allowing splitting and vanishing components to resize correctly
* segway: updated test touchstones for newer GLIBC versions
* segway: SLURM: added support for 18.08 (https://github.com/natefoo/slurm-drmaa)
2.0.3:
* segway: fix fatal TypeError during --dry-run
* segway: fix temporary file creation of observations for multiple instances
* segway: Added compatibility for python 3
* segway: No longer requires forked-path, now needs path.py>=10
2.0.2:
* segway: fix window selection when using multiple genomedata archives
* segway: add window.bed output to cross reference window index to chromomsome
region
* segway: add instance specific job log files to avoid log overwrites
* segway: fix regression of stdout not being captured from training jobs
* segway: update support for versions of numpy > 1.14 by adding legacy scalar
and vector text printing
2.0.1:
* segway: fix training results not being parsed properly due to validation
defaults
2.0:
* segway: Add ability to use validation set to choose winner for training
(activate using --validation-coords or --validation-fraction options)
(thanks to Rachel Chan)
1.4.4:
* segway: Add mixture of Gaussians and variance floor functionality to Segway
(thanks to Max Libbrecht, Oscar Rodriguez, and Rachel Chan)
1.4.3:
* segway: fix "bad interpreter: Text file busy" errors reported by submitted
jobs
1.4.2:
* segway: training is now submitted as a segway-task
* segway: all observations are now written out to temporary storage. As a
result, the --observation option has been disabled
* segway: disabled the erroneous --dinucleotide option
* segway: Change Segway to raise warning if running Segway in identify mode
with non-zero verbosity.
1.4.1:
* segway: fix --bigBed for assemblies with alt chromosomes not being sorted
correctly
* segway: --num-minibatch-windows can now be seeded with SEGWAY_RAND_SEED and
will select the same windows with when given the same seed
(thanks to Rachel Chan)
* segway: fix --prior-strength option
* segway: local job submission now appends error logs instead of rewriting them
* segway: add resolution functionality for semisupervised mode
(thanks to Rachel Chan)
* segway: add 'annotate' as an alias for 'identify' (thanks to Rachel Chan)
* segway: fix memory leak due to DRMAA job templates not being deleted
* segway: change queueing functionality such that segway queues a shell file
containing the to-be-queued arguments, instead of queueing the arguments
themselves (thanks to Rachel Chan)
* segway: change output message to specify whether the job is queued or
run locally (thanks to Rachel Chan)
* segway: work around a GMTK bug when indexing a genomic region with a number
greater than 9999 (thanks to Rachel Chan)
1.4.0:
* segway: added support for --num-minibatch-windows, which causes Segway
to train on random subset of windows at each iteration in order to
enable training to touch every genomic position while maintaining fast
iterations. (thanks to Max Libbrecht)
* segway: remove optparse in favour of using argparse instead
* segway: fix --split-sequences specifying where to split GMTK frames instead
of by base pair length
1.3.4:
* segway: set minimum segment length default to ruler length
* segway: remove simpleresubmit test from being tested with all tests by
default due to memory requirement differences between platforms
* segway: change tests to use $testdir instead of $TMPDIR to avoid setting new
base directory for segway-wrapper.sh
* segway: add and fix better job handling for segway-task (identify)
* segway: add drmaa handling for "IBM Platform LSF"
* segway: fix not using user specified ruler_scale when set to 10
1.3.2:
* segway: add GMTK minimum version checking (1.4.2) to support better GMTK job
handling
* segway: add a job resubmission test due to out of memory issues
* segway: add a smarter job resubmission due to out of memory issues
* segway: add a better error message for out of memory issues vs actual process
issues
* segway: update documentation on debugging GMTK job issues
* segway: add a minimum job attempt of 2 before stopping all jobs
* segway: add a new bad input testcase for proper job termination
* segway: add a new simple semisupervision test case
* segway: clarify documentation on resolution support
* segway: simplify installation instructions
* segway: fix help not printing when no arguments are given
* segway: add resolution test case
* segway: clarify and add GPLV2 license
* segway: add better test result capturing
1.3.1:
* segway: fix numpy array casting bug in posterior with newer versions of
numpy (>= 1.10)
* segway: fixed case where a track was specified that existed in more than 1
genomedata
* segway: ruler default is now 10x resolution. Default resolution is still 1,
so default ruler is 10
* segway: helpful error messages when segment table boundaries or the resolution
is not divisible by the ruler
1.3.0:
* segway: fixed improper version reporting from --version option
* segway: added semi-supervsied mode support for soft assignment. See documentation
for more details.
* segway: added the ability to use multiple genomedata archives
* segway: Added support for the SEGWAY_NUM_LOCAL_JOBS environment variable,
which controls the number of concurrent processes used when
SEGWAY_CLUSTER=local is set (default: 32). (thanks to Max Libbrecht)
1.2.2:
* segway: updated test benchmark for better compatability with Torque/PBS
* segway: fixed a bug when the 0-index training instance wins and fails to
copy the result. Segway will now store the resulting winning likelihood file
and likelihood tab file without the instance index as a suffix.
* segway: removed exported bash functions in for job environment submission
after the shellshock patch (functions starting with "BASH_FUNC") since it can
cause syntax errors when trying to import.
* segway: added useful error messages for unimplemented features of posterior
task
1.2.1:
* segway: added new 'data' test baseline with GMTK 1.1.0
* segway: added new 'simplesubseg' test to test subsegment labels
* segway-layer: fixed bigBed conversion when blocksize==0 and added
segway_layer_bigbed test (thanks to Jay Hesselberth)
* segway: updated documentation to clarify --clobber option in relation to
model and structure command line options
1.2.0:
* segway: made matching against unmodified vs. modified partitions output from
gmtk more lenient to match against
* KNOWN ISSUE: Segway will produce incorrect segmentations if the
tracks are not specified in the order they appear in the genomedata
archive providing the data.
* now requires a new version of GMTK (>= 1.0.0 80c1d70cea9d
(ticket161-2)) for increased speed
* now requires Python >=2.6 (released 3.5 years ago) and <=3.0
(although we will switch to Python 2.7 when we need it, so if you
are upgrading, go all the way now)
* now requires Genomedata >=1.3.1
* default setuptools version now 0.6c11
* segway: now has experimental support for running without any cluster
system or drmaa. By default if nothing is available, or use
SEGWAY_CLUSTER (thanks to Max Libbrecht)
* segway: now has experimental Torque support. PBS and PBS Pro will
probably work with some changes--please e-mail Michael if you want
to help with identifying the necessary changes (thanks to Jay Hesselberth)
* segway: SEGWAY_CLUSTER environment variable allows manual
specification of a cluster system (e.g. export SEGWAY_CLUSTER=local)
* segway: new escaping method from Genomedata tracknames to GMTK
tracknames allows any arbitrary Genomedata trackname, but this means
that previous files will be incompatible, since "_" is now replaced
by "_5F" and "." by "_2E" (formerly both of these were "_", which
was ambiguous)
* segway: allow concatenated segmentation of tracks with --trackname
track1,track2, which gives you only one set of parameters. you may
need to adjust existing train.tab files to use track_specs instead
of include_tracknames because of other necessary changes
* segway: LSF: fix bug where wrapper script sometimes did not delete
temporary observation files after a crash
* segway: LSF: no longer generates mktemp warning messages
* segway: more informative error message when supervision labels overlap each other
* segway: segway.sh has cd command in addition to segway command
* segway: MC/MX/REAL_MAT direct parameters are no longer trained
(MEAN/COVAR/DENSECPT parameters are)
* segway-layer: add --bigBed option, which works if you have
bedToBigBed in your path
* segway: add --bigBed option which calls segway-layer --bigBed
* segway: removes PYTHONINSPECT from environment for sub-tasks, which
caused identify mode failures before
* segway: some changes in the way structure files work
* segway-layer: fix issue #19. Now doesn't override previous colors
when mnemonics aren't supplied (thanks to Jay Hesselberth)
* segway-layer: add --no-recolor option which doesn't override colors at all
* segway-winner: fix issue #21. Now works on Python 2.5.1
* installation: always require drmaa. setuptools extras_require wasn't that useful in practice
* installation: include SLURM support if you have drivers/slurm.py installed
* test: run pyflakes first
* test: use fixed random seed to generate input.master every time
* test: when files differ, output is in a form that makes it easier to diff
* test: update for new GMTK
* prereqs: hgtools is now a prerequisite
* segway: add --output-label option to control segmentation output when using sublabels
* segway: semi-supervised mode no longer causes assertion error
* segway: posterior task now compatible with sublabel output mode
1.1.0:
* segway: now again includes posterior output in bedGraph format
(thanks to Avinash Sahu)
* segway: --old-directory renamed to --recover
* segway: --recover now supports identify as well as train. less
clunky and more consistent than the old recipe for training recovery
included in the docs (posterior still not supported)
* segway: train.tab is now optional, so you should be able to run
identify on pre-Segway 1.0.0 training directories that don't have
it, as long as you specify all the same options as you did before
* segway-winner: new command to pick winning parameters in interrupted training run
* segway: train.tab now includes input master filename in multi-instance training
* segway: changed memory progression edge error to be more easily understandable
* segway: when some error kills one training instance, there is now a more detailed error at end
* segway: remove --keep-going, which hasn't done anything in a few versions
* test: filenames are sorted before comparisons
* test: now includes posterior test
* test/README: added description of test system
* test: training now only goes through two rounds to speed the tests
1.0.2:
* segway: train.tab is now sorted for easier test comparison
* segway: eliminate reporting of "info criterion" (which was flaky) in likelihood.tab
* test: automated testing framework in test/test.sh
1.0.1:
* segway: fix various option setup errors
* setup: add description
* setup: fix download URL
* docs: fix quickstart errors
1.0.0:
For the first official release, I have improved the user interface in
a number of ways. Segway should be much simpler to run. However, the
command-line interfaces have changed, so your scripts will have to
change as well.
* segway: change command-line signature. Usage is now
segway train GENOMEDATA TRAINDIR
segway identify GENOMEDATA TRAINDIR IDENTIFYDIR
segway posterior GENOMEDATA TRAINDIR IDENTIFYDIR
segway identify+posterior GENOMEDATA TRAINDIR IDENTIFYDIR
Accordingly, have removed --no-train, --no-identify, --no-posterior, --directory
* segway: change --random-starts to --iterations
* segway: training now writes important file locations hyperparameters
into TRAINDIR/train.tab
* segway: idenfication now automatically runs segway-layer as well
* segway: logs/run.sh is no longer flushed immediately. UUID is available from logs/segway.sh
* docs changed accordingly
* docs: add quick start tutorial to top
0.2.7:
* all: display hint when trying to run on cluster
* segway-layer: add --track-line-set option
* segway-layer: produce layers for every specified mnemonic, even if empty
* segway-layer: reverse previous ordering, so lines show up in genome
browser in same order as mnemonics file
* segway-layer: ignores lines that start with #
0.2.6:
* segway: SGE: fix bug where Viterbi jobs would immediately crash with
a pthreads error when using recent versions of PyTables/numexpr
0.2.5:
* segway: fix bug where aborted jobs were not returning maxvmem/vmem info
* segway: make RuntimeError for edge of memory usage progression print
troubleshooting information below traceback
* docs: add more troubleshooting info
0.2.4:
* segway: print error filename when failing due to end of memory progression
* segway: LSF: add -E /bin/true to submissions to avoid trivial job
failures on broken hosts
* docs: fix a mistake on description of explicit length distribution
0.2.3:
* segway: new --tracks-from option allows specification of a filename
* segway: LSF: now selects and declares tmp disk space usage. This
perhaps could be extended to SGE with some cluster administrator effort.
* segway-layer: fix bug where not all segment labels were in an input file
* segway: with --distribution=asinh_normal, starting covariance
parameters are now asinh(variance). This fixes a bug with negative
variances
* segway: fix bug where --split-sequences was ignored
* segway: use Genomedata-reported metadata for whole genome rather
than accumulating by chromosome
* prereqs: optbuild must now be >= 0.1.10
0.2.2:
* increase task wrapper robustness
* bug fixes
* docs fixes
* setup.py changes
0.2.1:
* docs: fix errors
* add sge_setup.py
* installation: add explicit PyTables installer
0.2.0:
* segway: add hierarchical segmentation
* segway: fix bug where only one random start is allowed
* segway: change "chunk" to "window" to avoid confusion with GMTK usage of chunk
* segway: updated system check to consider GE* _and SGE*_ systems as SGE.
* segway: removed dependence upon genomedata._util
* everything: anything that supported gzipped I/O now may support "-" to mean stdin/stdout
* segway-layer: now supports no arguments, which means use stdin/stdout
* segway-layer: now supports input files with no track line
* segway-layer: now supports non-integer segment labels
* GMTK-installation: fixed default shell to be /bin/bash as 'declare' call in
installation script caused GMTK installation to fail.
* installation: move from path.py to forked-path
0.1.21:
* segway: add --segtransition-weight-scale
0.1.20:
* segway: limits the number of jobs dumped into the queuing system at
once to avoid overloading it
* segway: increase robustness by adding a ulimit wrapper script to
explicitly limit the memory used for each job and erase temporary files
* segway: add --old-directory option for identify mode recovery of a
previous partial run
* segway-task: bugfix: won't throw a nested exception when temporary
files were not created
0.1.19:
* segway: now set CARD_FRAMEINDEX dynamically to allow for
--split-sequences to change between train and identify
* segway: --num-segs now called --num-labels
* LSF: fix bug related to: now appends to output/error instead of overwriting
0.1.18:
* LSF: now appends to output/error instead of overwriting
* recommit dumpnames.list removal (bugfix)
0.1.17:
* now requires genomedata >= 0.1.6
* remove dumpnames.list
* by default check all jobs every 30 min instead of every 54 min
0.1.16:
* now requires GMTK version 20091016
* segway: a change in the semantics of --trainable-params for
training. If it is specified and the file exists, then it will be
used as a parameter file in the first round of training. Before it
was clobbered (with --clobber), or training was disabled.
* segway: allow multiple sets of parameters with --trainable-params,
one for each random start. This together with the previous change
will allow for the easy continuation of a multithread training run.
* segway: changed error text when wrong number of arguments provided
* segway: fixed bug that causes failure in command-line logging during
identify phase
0.1.15:
* now requires GMTK version 20091016
* segway: more informative error on memory progression failure
* segway: now works on LSF even when LSF_UNIT_FOR_LIMITS has been changed
* segway: add jobname to "queued" diagnostic messages
* segway: log/res_usage.tab becomes log/jobs.tab
* segway: add jobid, jobname to jobs.tab
* segway: eliminate core dumps of jobs in LSF
* segway: make job abnormal error handling more robust
* segway: add log/details.sh to give all details of queued commands
* segway: queue longest jobs first
* segway: changed way GMTK model is specified for speedup
* segway: LSF: select only hosts with sufficient memory
* segway: eliminate 10 MB memory guard for declared usage versus crash usage
* docs updates
0.1.14:
* there is now a Segway issue tracker at http://code.google.com/p/segway-genome/
file your bugs and feature requests there
* now requires GMTK version 20091004
* now requires drmaa-python >= 0.4a3
* segway: off-load identify file creation to the subtasks
* segway: --split-sequences now called --max-frames, reenabled for all tasks
* segway: by default, split sequences instead of throwing errors on long sequences
* segway: CARD_FRAMEINDEX is now based on --max-frames value
* segway: supervision and dinucleotide tracks are temporarily disabled
* segway: block e-mail report creation by cluster
* segway: fixes for LSF support
* segway: disabled a lot of diagnostic noise
* segway: default distribution is now asinh_norm
* segway: will now resubmit jobs in a cohort before all of the other
jobs are done. this increases the load on the cluster manager, so
job completion is only checked every 60 s per thread
* segway-res-usage: remove
0.1.13:
* segway: adding switching weights
* segway: restoring weighting to global number of datapoints in a track
* segway: now supports Platform LSF with FedStage DRMAA for LSF
* segway: moved SGE-specific code into modular driver architecture
* segway: --resolution option allows downsampling for speed
* segway: --ruler-scale option allows specification of ruler scale
(formerly always 10)
* segway: start quoting words with spaces in log/run.sh
* genomedata >0.1.2 is now a prerequisite
* segway: identify BED output now names tracks segway.<uuid> in order
to prevent conflicts
* segway-layer: automatically recolor in groups defined by period
(such as 1.3) as well as alphabetical characters (A3)
* segway: fix bug: --dry-run causes traceback during training
* segway: --distribution=arcsinh_norm becomes --distribution=asinh_norm
* segway: now creates log/segway.sh to report arguments from which Segway was called
* segway: now prints out Segway distribution version to log files
* segway: with only one random start, now runs in a single-threaded mode
0.1.12:
* segway: LEN_SEG_EXPECTED becomes 100000
* segway: P(segTransition(0) | segCountDown(-1), seg(-1)) is now
weighted with scale which can be set through
-DSEGTRANSITION_WEIGHT_SCALE (not exposed to user, but you can hack
include files)
* segway: fixed --semisupervised bugs
* segway-layer: add --mnemonic-file option
* BED parsing now handles empty lines by ignoring them instead of
generating an error
* docs changes
0.1.11:
* optbuild >0.1.6 is now a prerequisite
* totally different memory management regime that eliminates old
prediction stuff, and instead relies on trying at 2 GiB, 4 GiB, etc.
until failure. globally shares minimum values for each thread
* temporary: minimum segment length is always at least 10
* temporarily disabled --split-sequences
* new prereq: switch from old DRMAA module to new drmaa module
* new prereq: optplus 0.1.1
* remove old "state" random variable from default structure
* BIC is now defined as -(2/n) ln L + k ln N where n is num of bases
and N is num of sequences
* new segway-layer program changes the format of segway.bed.gz to
something that is more intuitive for large numbers of labels, using
thick/thin lines
* segway: remove now-unused --resource-profile option
* segway: add --seg-table option to describe segment hyperparameters,
including different minimum and maximum segment lengths for each label
* remove --min-seg-len
* added some robustness to determining executable paths for queue
submission, eliminating calls to bash for sub-jobs
* segway: add --exclude-coords option, works like --include-coords
* segway: add arcsinh-norm distribution
* segway: disabled track weighting
* segway: add --semisupervised=FILE option to get semisupervision
labels from a BED file
* segway: add --drm-opt=OPTION option to allow specification of an
arbitrary distributed resource manager option (e.g. for SGE -p sets
priority, -l sets a resource requirement, etc.)
* segway: log memory and CPU usage to log/res_usage.tab
* segway: segway.inc now referenced with more relative paths
0.1.10:
* fix off-by-one error in declaring CARD_SEGCOUNTDOWN
0.1.9:
* replace state min seg len regime with one based on a ruler and segCountDown
* default: don't train any deterministic CPT
* stop creating input.master file before training when you have
multiple random starts, and are going to be creating input.0.master,
etc.
* segway.inc is changed so that CARD_SEG is now overridable. If you
have an old segway.inc you must replace the CARD_SEG bits with
the #ifndef CARD_SEG segment of a new segway.inc
* new classmethod segway.run.Runner.fromoptions() to make it easier to
subclass while keeping the same command-line interface.
* fix bug in calculating BIC, reported in log likelihood logs
* default triangulation filenames now include the number of labels in the filename
* triangulation filenames now go into triangulation/ directory
* fixed a regression in 0.1.8: now copies final params file to
params.params again
* segway now implements supports for a range argument to --num-segs
0.1.8:
* colorbrewer package is now a prerequisite
* segway: now produces BED9 output instead of BED4 output, with RGB
colors from the ColorBrewer Dark2 scheme (Colors from
www.ColorBrewer.org by Cynthia A. Brewer, Geography, Pennsylvania
State University.)
* segway: fixed error when attempting to copy likelihood files in the
case of only one random start
* segway: make error more explicit if you forget to set --num-segs=N
if N is greater than 3 in the supplied model file
* packaging: will not install as zipped egg as a temporary measure to enhance debugging
0.1.7:
* segway: add --dont-train option to allow specification of a file with list of variables not to train
0.1.6:
* no changes, this is to fix a packaging/tagging problem
0.1.5:
Currently broken
* segway: --num-segs for a range does not work
New features/bugfixes:
* segway: allow posterior with --num-segs greater than 2
* segway: hard maximum of 2,000,000 for sequence length.
* segway: --split-sequences will split sequences below 5,000,000
* segway: if --num-segs was used and segway generated the input.master
file, then use the Bayesian Information Criterion instead of likelihood to optimize parameters
* segway: now copies selected params.params and input.master rather than moving
* segway: bugfix to allow training to run
* segway: enhancement of the memory prediction algorithm
* segway: now spins off segway-task for Viterbi jobs, which produces BED output in parallel
(the segway-task interface is undocumented and subject to change)
Documentation:
* there is now some documentation in doc/segway.rst (in
reStructuredText format, a lightweight markup format)
0.1.4:
* genomedata>0.1.0 package is now a prerequisite
* optplus package is now a prerequisite
* 20090302 version of GMTK is now a prerequisite
* genomedata-load-data, genomedata-load-seq, genomedata-name-tracks, genomedata-save-metadata: moved to genomedata package
* segway: changed args from a number of h5filenames to a single genomedata directory
* segway: added extra discrete variable called "state" to the default
model. seg is now a child of this. transitions all occur state to
state rather than seg to seg
* segway: start_seg -> state_state conditional probability table is no longer generated
randomly, now reflects a geometric distribution with expected
segment length 10000
* segway: start_seg -> start_state is no longer trained
* segway: seg_seg -> start_state is no longer generated randomly, instead starting
probability defaults to 1/NUM_SEGS for each segment
* segway: forces a hard minimum segment limit of 10
* segway: new --num-segs option: allows more than 2 segment labels. even allows a range of number of segments
* segway: change some visual formatting of posterior track
* segway: bugfix: actually use default res usage file by default
* segway: don't create a posterior triangulation when --dry-run used
* segway: memory usage new requested in 1G increments instead of 1M
* segway: always uses island mode to save memory (at the cost of speed)
* segway: use ulimit to make a hard max on memory consumption
0.1.3:
* segway: weights probability from different tracks by the number of
datapoints in each track. useful for dealing with data on
heterogeneous scales
* seg.inc is now segway.inc
* seg.str is now segway.str
* segway-res-usage: now produces a tab-delimited output file that can
be used to specify Segway's memory management
* segway-res-usage: calibrates resource usage of training bundle step, posterior, identify
* segway: uses the new resource usage file by default
* segway: new --resource-profile option to use another resource usage file
* segway: by default, abort if an input sequence is going to take more
than 15G of RAM to process using any of the commands being run
* segway-res-usage: try to guess which tracks are going to be way too huge and skip them
* segway: write observations/observations.tab which gives information
about what exactly is in each observation file
* segway-res-usage, segway: identify unique runs with uuid.uuid1() instead of os.getpid()
* segway: fixed a bug regarding the setting of initial parameter
values with the --track/-t option
* segway: when rewriting regions due to --include-region or
--split-sequences, ensure that the coordinates are in a monotonically
increasing order
0.1.2:
* segway bugfixes: do not create segway.bed.gz or
posterior.seg0.wig.gz when --no-identify or --no-posterior are
enabled, respectively
0.1.1:
* Now requires optbuild 0.1.6
* new segway-res-usage program calibrates memory usage
* segway: changed -b option: It now specifies a BED filename instead of a directory where a file
named segway.bed.gz is found
* --force/-f is now --clobber/-c
* segway: now will use a params.params file during identify
implicitly, if it is in the directory specified with -d
* segway: method for generating starting parameters is somewhat different now
* segway: now generates a lot of directories for working files instead of dumping lots of stuff in one directory
* _load_data: ignore invalid chromosome specifications instead of
triggering a fatal error
* genomedata-save-metadata: can now handle empty columns
* stop creating lots of auxiliary output files during Viterbi identify
* there is no directory named "out" created by default
* segway: now arranges some working files into subdirectories, some name changes
* runs posterior decoding during identify, creates wig files
* segway: no option --no-posterior disables posterior decoding
* segway: figure out fully-qualified path to GMTK executables and
queue that. This increases robustness to minor configuration
differences on parallel nodes.
* add segway-res-usage program to calibrate resource usage, which involved some refactoring of run.py
* adding different methods to create initial parameters. interface may be exposed in a future version
* jt_info.txt is now created in work directory instead of current directory
* fix bug: don't repeat track line multiple times in BED file
* performance: greatly improve genomedata-save-metadata
0.1.0:
* new segway option: --distribution: choices are norm, gamma
* default distribution is now norm
* _load_data: add support for wigFix with step, span values set
* _util.py: mark some functions for replacement with genomedata
0.1.0a4:
* Now requires optbuild>0.1.4
* Understandable structure files (e.g. the name for observation track "obs0" becomes "h3k27me3")
* Save log likelihood files
* Rename segway-load-seq -> genomedata-load-seq
* Rename segway-load-data -> genomedata-load-data
* Rename segway-name-tracks -> genomedata-name-tracks
* New command genomedata-save-metadata
* New command: _load_data: load data quickly (in C)
(These five commands will eventually be split out into a new genomedata package)
* New command: segway-calc-distance: calculate distance between identified segments and an external BED feature file
* New command: gtf2bed: convert GTF to BED format
* New segway option --track: use only a particular track
* Generate a "dinucleotide" track when specified with --track
* New segway option --prior-strength: uses dirichlet tables on seg_seg transition CPT
* segway --help now uses option groups
* New h5histogram options: --include-identify, --identify-label: now
allows getting histogram per identified segment
* New h5histogram option: --num-bins: allow specification of number of histogram bins
* identification output files are now gzipped bed rather than wig
* rename "nonmissing" to "presence" throughout
* Move from normal distribution to gamma distribution
* Calculate starting parameters based on something close to the maximum likelihood estimation plus jitter
* Hacky (and wrong) support for memory requirements for up to six
observation tracks, will be redone when GMTK memory optimization is
done
* better handling of errors and KeyboardInterrupts during parallel operation of segway
* Stops using deprecated -inputTrainableParameters option to GMTK
* Checking for unset metadata before using genomedata files
* Corrections to README
* Bugfixes
0.1.0a3:
Files from previous versions are no longer compatible.
* Binary output of observations for GMTK. This means it takes a few
minutes to write out observations for the whole genome before
training or Viterbi rather than many hours.
* Output of segment lengths for diagnostic purposes.
* Changed observation storage from float32 to float64.
* Faster conversion of Viterbi output files to wiggle tracks.
* Observation files now have more descriptive names.
0.1.0a2:
Files from previous versions are no longer compatible.
* Efficient multiple random start EM training queue submission
* Addition of h5histogram command
* Limiting data analyzed to a particular region with --include-region
* More efficient data loading
* Some interface and file format changes
* Performance improvements
* Bugfixes