Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat/generate trainingsets #205

Open
wants to merge 55 commits into
base: main
Choose a base branch
from

Conversation

M3ssman
Copy link
Contributor

@M3ssman M3ssman commented Nov 18, 2020

Include generation of Trainingdata Sets from OCR like ALTO V3, PAGE 2013, PAGE 2019 and Image Files (tif, jpeg)

Copy link
Collaborator

@kba kba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't tested it yet, but looks very promising. I particularly appreciate the unit tests 👍

generate_sets.py Outdated
"-m",
"--minchars",
required=False,
help="Minimum chars for a line")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An explicit default value would be better. In sets/training_sets.py there is a constant DEFAULT_MIN_CHARS that is used in generate_sets.py but the min_chars kwarg to TrainingSets.create is 8.

Also, why 8/16? It's very common to have valid shorter lines, like the last word of a sentence on a new line, lines in narrow columns, dramas etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Historical note: Originates from newspaper digitalization project, where a common text line (no adds) has usually more than 20 chars. In fact, we only took care for lines with at least 32 chars, since I thought these lines were more valuable for training than shorter lines because of having more characters to learn.
But personally I'm totally free about this, so what about, say, 4 chars?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes sense to skip short lines for training, but the fact that there is a minimum of chars and what that minimum is should be clearly communicated to the user, so he/she isn't surprised why some lines are skipped.

And yes, probably something low like 4 (documented in ./generate_sets.py --help would be best IMHO.

Copy link
Collaborator

@stweil stweil Dec 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It probably makes sense not to skip short lines for training. Tesseract was trained only with artifical long lines initially, and the standard models have problems with short lines (typically the page numbers, but also short lines ending paragraphs). We know that there exist valid lines with only a single character for page numbers (1 … 9, a … z, A … Z). Why should we skip lines with one or two characters as long as they are valid?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This originates from the decision to prefer longer lines because they provide more characters. I thought more characters means more training character training material, and more material increases pattern recognition accuracy. But this doesn't pay much attention to a characters' context. In newspapers advertises I've seen many lines that are way shorter than 8 chars, only containing abbreviations and alike. Maybe focusing on "regular article lines" is another reason why Tesseract (4.1.1) usually performs rather worse in this realm, compared to single-column-liners.

@stweil Do you suggest to turn the minchars arg into being completely optional, or to set the default value to "1", to skip lines that only contain non-printable characters?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Setting minchars to 1 sounds reasonable. I cannot imagine how a line which only contains non-printable characters would look like.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with Stefan we should make minchars optional and try to make Tesseract learn short lines well. Not sure how the LSTM implementation here unrolls, but short lines should create fewer weight updates, so characters would still contribute "democratically" – just that there's more incentive to get a better transition from the initial state.

generate_sets.py Outdated Show resolved Hide resolved
requirements.txt Outdated Show resolved Hide resolved
requirements.txt Outdated Show resolved Hide resolved
sets/training_sets.py Outdated Show resolved Hide resolved
sets/training_sets.py Outdated Show resolved Hide resolved
"""

if self.revert:
return reduce(lambda c, p: p + ' ' + c, self.text_words)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we already require python-bidi, it would probably more robust to use it for handling the inversion, c.f. https://github.com/MeirKriheli/python-bidi/blob/master/bidi/algorithm.py / https://github.com/MeirKriheli/python-bidi#api

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the hint, I'll take a look!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm,
I guess we need to go without bidi so far, since it looks the output from mixed arab + latin lines makes them turn from rtl to ltr. Lines with only arab chars and indic numbers seem to work pretty well with bidi, but mixed don't.

I'm no export on that, though. Please, take a look yourself. I've added the bidi import, adopted line content generation (the commented section). Feel free to switch implementations. (in your preferred IDE place a debugging mark in test_create_sets_from_page2013_and_jpg to inspect the temporary test files written)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do I get it right: bidi works on char-Level? If so, I don't think it is useful in this scenario. I only know some (rather poor) arabic output generated from tesseract itself which is word-based.

@kba
Copy link
Collaborator

kba commented Nov 18, 2020

I've tested it now, unit tests pass and I managed to extract image-text pairs from the kant_aufklaerung_1784 sample in assets:

$ python3 ./generate_sets.py -d ../assets/data/kant_aufklaerung_1784/data/OCR-D-GT-PAGE/PAGE_0017_PAGE.xml -i ../assets/data/kant_aufklaerung_1784/data/OCR-D-IMG/INPUT_0017.tif 
[SUCCESS] created '20' training data sets, please review

It would be useful to make -o required or at least print the output directory as part of the SUCCESS message.

Could the -i argument be optional and by default be derived from imageFilename (PAGE) / sourceImageInformation/filename (ALTO)?

We also need a section on at least the CLI usage in the README.md

@M3ssman M3ssman requested a review from kba November 19, 2020 06:17
@M3ssman
Copy link
Contributor Author

M3ssman commented Nov 19, 2020

For the arabic text that is included as text resource (288652), and that's causing trouble with bidi, please see the original image (binarized)
288652

@kba kba mentioned this pull request Nov 26, 2020
@Shreeshrii
Copy link
Collaborator

@kba Do you know of any Devanagari or any other Indic language datasets in Page XML format? I only have scanned page images and and their groundtruth in text format. I don't think those will work with this PR.

@kba
Copy link
Collaborator

kba commented Dec 7, 2020

@kba Do you know of any Devanagari or any other Indic language datasets in Page XML format? I only have scanned page images and and their groundtruth in text format. I don't think those will work with this PR.

Sorry, I do not. But maybe you have OCR results in Devanagari to test the mechanics of this PR? What problems do you foresee with Devanagari?

@Shreeshrii
Copy link
Collaborator

Shreeshrii commented Dec 8, 2020

What problems do you foresee with Devanagari?

I don't foresee any, but wanted to test with complex scripts, just in case there is any difference in processing.

maybe you have OCR results in Devanagari to test the mechanics of this PR?

Good idea. I can test using ALTO output from tesseract.

Devanagari or any other Indic language datasets in Page XML format

I found a set of files at https://github.com/ramayanaocr/ocr-comparison/tree/master/Transkribus/Input, which has the png files as well as the xml files (generated by transkribus, I guess). I tested with one of those files, while the console messages reported success, the files were not created. The summary option created a file, but the file had empty lines.

 tesstrain-extract-gt  /home/ubuntu/ocr-comparison/Transkribus/Input/page/ram110.xml -i /home/ubuntu/ocr-comparison/Transkribus/Input/ram110.png
[INFO   ] generate trainingsets of '/home/ubuntu/ocr-comparison/Transkribus/Input/page/ram110.xml' with '/home/ubuntu/ocr-comparison/Transkribus/Input/ram110.png' (min: 1, sum: False, reorder: False)
[SUCCESS] created '24' training data sets in 'training_data_ram110', please review

I tested with the Arabic image shared earlier in this thread with its xml file in resources, just to make sure that I had the PR installed correctly. That worked i.e. created the files. I haven't looked at the text within them.

tesstrain-extract-gt /home/ubuntu/tesstrain/tests/resources/xml/288652.xml -i /home/ubuntu/pagedeva/288652.png -o /home/ubuntu/pagedeva/output -s
[INFO   ] generate trainingsets of '/home/ubuntu/tesstrain/tests/resources/xml/288652.xml' with '/home/ubuntu/pagedeva/288652.png' (min: 1, sum: True, reorder: False)
[SUCCESS] created '33' training data sets in '/home/ubuntu/pagedeva/output', please review

Is there a compatibility issue with transkribus generated PAGE files?

@Shreeshrii
Copy link
Collaborator

Shreeshrii commented Dec 12, 2020

I tested just now with ALTO output from tesseract and get the following warnings:

 tesstrain-extract-gt /home/ubuntu/tesstrain-San/test/iast/sandocs_2.xml -i /home/ubuntu/tesstrain-San/test/iast/sandocs_2.png -s
[INFO   ] generate trainingsets of '/home/ubuntu/tesstrain-San/test/iast/sandocs_2.xml' with '/home/ubuntu/tesstrain-San/test/iast/sandocs_2.png' (min: 1, sum: True, reorder: False)
/home/ubuntu/miniforge3/lib/python3.7/site-packages/numpy/core/_methods.py:234: RuntimeWarning: Degrees of freedom <= 0 for slice
  keepdims=keepdims)
/home/ubuntu/miniforge3/lib/python3.7/site-packages/numpy/core/_methods.py:195: RuntimeWarning: invalid value encountered in true_divide
  arrmean, rcount, out=arrmean, casting='unsafe', subok=False)
/home/ubuntu/miniforge3/lib/python3.7/site-packages/numpy/core/_methods.py:226: RuntimeWarning: invalid value encountered in double_scalars
  ret = ret.dtype.type(ret / rcount)
[SUCCESS] created '5' training data sets in 'training_data_sandocs_2', please review

EDIT: Earlier error with ALTO was because of typo in filename.

@M3ssman
Copy link
Contributor Author

M3ssman commented Dec 13, 2020

@Shreeshrii Thanks for pointing to PAGE-Files that miss `Word' elements at all!

  • Since that was the cause for the missing results in the provided Devanagari sample. I tried to fix this and integrated the file as new test resource. Unfortunately, I can't say a word about the textual outcome, so please update the PR and have a look again ...

@Shreeshrii
Copy link
Collaborator

Shreeshrii commented Dec 14, 2020

@M3ssman I tried just now but am getting the same result as before.

 git log -3
commit 3fb94996ac42818b302850080a6f2535db12251e (HEAD -> pagesets)
Author: M3ssman <uwe.hartwig@bitsrc.info>
Date:   Sun Dec 13 10:44:47 2020 +0100

    [app][fix] handle page without word elements

commit 2f3566bc23a848e3df7801b2fa1a6ce1d417e7bc
Author: M3ssman <uwe.hartwig@bitsrc.info>
Date:   Mon Dec 7 14:19:58 2020 +0100

    [app][fix] filter invalid lines

commit 57ba229ace0c9ae74afb889916cba3555ef7b4d0
Author: M3ssman <uwe.hartwig@bitsrc.info>
Date:   Mon Dec 7 13:18:48 2020 +0100

    [app][test] fix test imports
 tesstrain-extract-gt  /home/ubuntu/ocr-comparison/Transkribus/Input/page/ram110.xml -i /home/ubuntu/ocr-comparison/Transkribus/Input/ram110.png -s
[INFO   ] generate trainingsets of '/home/ubuntu/ocr-comparison/Transkribus/Input/page/ram110.xml' with '/home/ubuntu/ocr-comparison/Transkribus/Input/ram110.png' (min: 1, sum: True, reorder: False)
[SUCCESS] created '24' training data sets in 'training_data_ram110', please review

However, only the summary file is created in 'training_data_ram110'. File is attached.

ram110_summary.gt.txt

PS: I looked at the XML file and the Devanagari text in it has errors, so it is probably raw OCRed text and not corrected text for groundtruth.

@Shreeshrii
Copy link
Collaborator

Shreeshrii commented Dec 14, 2020

I also tried with the ALTO 4.1 XML referenced in the issue I opened at OCR-D/ocrd_fileformat#23
That fails with the following messages:

(base) ubuntu@tesseract-ocr-1:~/tesstrain-pagesets$ tesstrain-extract-gt /home/ubuntu/OCR_GS_Data/TypeFaces/persian_watts_typeface/data/ahsan_at_tavarikh_31.xml -i /home/ubuntu/OCR_GS_Data/TypeFaces/persian_watts_typeface/data/ahsan_at_tavarikh_31.png -s
[INFO   ] generate trainingsets of '/home/ubuntu/OCR_GS_Data/TypeFaces/persian_watts_typeface/data/ahsan_at_tavarikh_31.xml' with '/home/ubuntu/OCR_GS_Data/TypeFaces/persian_watts_typeface/data/ahsan_at_tavarikh_31.png' (min: 1, sum: True, reorder: False)
Traceback (most recent call last):
  File "/home/ubuntu/miniforge3/bin/tesstrain-extract-gt", line 8, in <module>
    sys.exit(main())
  File "/home/ubuntu/miniforge3/lib/python3.7/site-packages/generate_sets/cli.py", line 74, in main
    reorder=REORDER)
  File "/home/ubuntu/miniforge3/lib/python3.7/site-packages/generate_sets/training_sets.py", line 351, in create
    self.xml_data, min_len=min_chars, reorder=reorder)
  File "/home/ubuntu/miniforge3/lib/python3.7/site-packages/generate_sets/training_sets.py", line 184, in text_line_factory
    ns_prefix = _determine_namespace(xml_data)
  File "/home/ubuntu/miniforge3/lib/python3.7/site-packages/generate_sets/training_sets.py", line 223, in _determine_namespace
    return [k for (k, v) in XML_NS.items() if v == root_tag][0]
IndexError: list index out of range

@M3ssman
Copy link
Contributor Author

M3ssman commented Dec 14, 2020

@Shreeshrii Thanks for pointing towards ALTO V4. I've missed this before, since we're using the latest official stable release, tesseract 4.1., which doesn't create this kind of ALTO data. I've added the ALTO V4 namespace declaration and it worked fine. Somehow, I found this surprising, since the ALTO V4 data from OpenITI you pointed out looks quite unfamiliar, having String CONTENT spanned over a complete textline. I've never seen this before. Where does this data come from?

Regarding the Devanagari Issue: Your git log looks well, the version matches. Maybe tesstrain-extract-gt in your current, active environment is outdated, so please drop it and do a fresh install afterwards. You can also do a pytest -v to run the so far included test cases (with their test datasets) and check the temporary outputs in your local /tmp/pytest-of-<account> dir.

@Shreeshrii

This comment has been minimized.

@Shreeshrii

This comment has been minimized.

@Shreeshrii

This comment has been minimized.

@Shreeshrii
Copy link
Collaborator

ALTO V4 data from OpenITI you pointed out looks quite unfamiliar, having String CONTENT spanned over a complete textline. I've never seen this before. Where does this data come from?

I do not know more than the info available online. Please see
https://github.com/OpenITI/RELEASE
and
https://zenodo.org/record/4075046#.X9hC0dgzaUk

@M3ssman
Copy link
Contributor Author

M3ssman commented Dec 15, 2020

@Shreeshrii Please note, test images are just created on-the-fly, with a library that is out-of-the-box just able to render a very small subset of UTF-8 chars, I guess only ASCII, neither arabic, persian, devanagari or old german fracture letters. This was introduced to keep test data small and free from binary image stuff. It only gives you a hint whether the lines would match the "words".

@M3ssman
Copy link
Contributor Author

M3ssman commented Dec 15, 2020

@Shreeshrii Regarding the lastest version: currently, there's only a-pre-beta-version (0.0.1) annotated in the setup.py. Usually this would be the place to follow versioning. I do not know how to utilize some sort of repository information straight at this point. Maybe @kba can give us a hint?

@Shreeshrii
Copy link
Collaborator

Shreeshrii commented Dec 15, 2020

@M3ssman Thanks for the explanations regarding test files.

Maybe tesstrain-extract-gt in your current, active environment is outdated, so please drop it and do a fresh install afterwards.

You were right about this.

I removed tesstrain-extract-gt from the bin directories and reinstalled in the environment where ocrd is installed. It works now. All the tif and gt.txt were created for the Transkribus Devanagari file.

The alto4.1 Persian file is also generating line images and text. (I haven't checked regarding the RTL issue yet).

This is great!! Thank you.

@M3ssman
Copy link
Contributor Author

M3ssman commented Dec 15, 2020

@Shreeshrii You're welcome!

... Sorry for the confusion regarding RTL ... finally, it turned out that the -r flag aims at something different than real RTL which can be handled with py-bidi. If active, it only re-arranges word tokens by top-left-corner in descending order, starting from right margin. Therefore I renamed it to --reorder. It doesn't turn characters. I had to deal with arabic PAGE-XML exported from Transkribus, having inconsistent reading-orders and display artifacts and almost made me go crazy.

Since this relies on individual coordinates for each token, I'm afraid it will have no effect on test resources like the ones gathered from OpenITI which only have a single String@CONTENT element that represents a text line in total (or at least more than just one word). Reordering this way requires proper coordinates below text line level: We can't just chop the lines and reorder tokens, since the source order of elements of a plain text line is certainly not always reliable.

@stale
Copy link

stale bot commented Jan 15, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale Issues which require input by the reporter which is not provided label Jan 15, 2021
@Shreeshrii
Copy link
Collaborator

This should not be closed. It needs review by someone familiar with RTL languages.

@stale stale bot removed the stale Issues which require input by the reporter which is not provided label Jan 18, 2021
@lgtm-com
Copy link

lgtm-com bot commented Jan 18, 2021

This pull request introduces 4 alerts when merging f3e73e4 into fa57d61 - view on LGTM.com

new alerts:

  • 3 for __init__ method calls overridden method
  • 1 for 'import *' may pollute namespace

@M3ssman
Copy link
Contributor Author

M3ssman commented Jan 19, 2021

I've been talking with https://github.com/galdring , a colleague, about this review and he's out to get us somebody.

@zdenop
Copy link
Contributor

zdenop commented Jan 9, 2023

@M3ssman: can you please update your PR to current git code (python code is in src see Migrate Python code to a dedicated package)

@M3ssman
Copy link
Contributor Author

M3ssman commented Jan 26, 2023

@zdenop Sorry for the late reply.

What layout do you prefer?
<project_root>/src/extract_sets or integrate training_sets.py somehow into <project_root>/src as part of <project_root>/src/tesstrain ?

@stefan6419846
Copy link
Contributor

If I understood @zdenop correctly, the final goal is to make everything available through the tesstrain Python package in the end. As you provide a dedicated entry point, src/tesstrain sounds like the appropriate package.

Nevertheless, I am not sure about the external dependencies. They might should be made optional (extras_require).

@M3ssman
Copy link
Contributor Author

M3ssman commented Jan 26, 2023

@stefan6419846
Thanks for your reply! Do you suggest to push these dependencies into setuptool.setup.extras_require?

@stefan6419846
Copy link
Contributor

@M3ssman If you are going to integrate the training set generator into the existing Python package, I would suggest yes. At least for me they appear to be overkill for most users which just want to use the basic artificial training functionality.

extract_sets/training_sets.py Fixed Show fixed Hide fixed
setup.py Fixed Show fixed Hide fixed
setup.py Fixed Show fixed Hide fixed
extract_sets/training_sets.py Fixed Show fixed Hide fixed
extract_sets/training_sets.py Fixed Show fixed Hide fixed

`tesstrain-extract-sets` currently supports OCR data in ALTO V3, PAGE 2013 and PAGE 2019, as well as TIFF, JPEG and PNG images.

Output is written as UTF-8 encoded plain text files and TIFF images. The image frame is produced from the textline coordinates in the OCR data, so please take care of properly annotated geometrical information. Additionally, the tool can add a fixed synthetic padding around the textline or store it binarized (`--binarize`).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't padding for raw images going to be a desaster? I'd recommend making this combination disallowed in the CLI right away.


Output is written as UTF-8 encoded plain text files and TIFF images. The image frame is produced from the textline coordinates in the OCR data, so please take care of properly annotated geometrical information. Additionally, the tool can add a fixed synthetic padding around the textline or store it binarized (`--binarize`).

By default, several sanitize actions are performed at image line level, like deskewing or removement of top-bottom intruders. To disable this, add flag `--no-sanitze`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
By default, several sanitize actions are performed at image line level, like deskewing or removement of top-bottom intruders. To disable this, add flag `--no-sanitze`.
By default, several optimization actions are performed at image line level, like deskewing or removal of top-bottom intruders. To disable this, add flag `--no-sanitize`.


import exifread
import lxml.etree as etree
import numpy as np
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand. Why would you want to remove this import, which is clearly required, @stweil? And why do you say it's WIP, @M3ssman?

* drawing artificial border
* collect only contours that touch this
* get contours that are specific ratio to close to the edge
* fill those with specific grey tone
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this operation will be helpful for raw images. For binarized, it may improve, but grey untextured fill is certainly going to irritate the pixel pipeline (as it introduces artificial edges etc). It's also not realistic (not going to be seen at inference), so forcing the models to learn this is not a good idea.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, didn't you write a textured fill (grey_canvas IIRC) for that very purpose (but for synthetic training) already?

only if so, enhance img to prevent rotation
black area artifacts with constant padding
* rotate
* slice rotation result due previous padding
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doing all this on the line-level image calls for trouble:

  • skew detection via Hough transform is much less reliable than on the region level
  • derotation introduces white corners, which you then have to fill in – again, detrimental to raw/rgb images

@bertsky
Copy link
Collaborator

bertsky commented Jan 26, 2023

Nevertheless, I am not sure about the external dependencies. They might should be made optional (extras_require).

At least for me they appear to be overkill for most users which just want to use the basic artificial training functionality.

I disagree with that assessment. The pkg for synthetic training is as relevant as some way to import from the widely used file formats (ALTO, PAGE) for real GT training IMO. So if the trainingsets extension is adopted (at all), then its dependencies should not be moved to extras_require.

extract_sets/cli.py Fixed Show fixed Hide fixed
extract_sets/training_sets.py Fixed Show fixed Hide fixed
setup.py Fixed Show fixed Hide fixed
setup.py Fixed Show fixed Hide fixed
fhdl.writelines(contents)


def calculate_grayscale(low=168, neighbourhood=32, in_data=None):

Check notice

Code scanning / CodeQL

Explicit returns mixed with implicit (fall through) returns Note

Mixing implicit and explicit returns may indicate an error as implicit returns always return None.
return tuple(map(lambda c: sum(c) / len(c), zip(*point_pairs)))


def to_center_coords(elem, namespace, vertical=False):

Check notice

Code scanning / CodeQL

Explicit returns mixed with implicit (fall through) returns Note

Mixing implicit and explicit returns may indicate an error as implicit returns always return None.
self.set_id()
self.set_text()
if self.valid:
self.reorder = reorder

Check warning

Code scanning / CodeQL

Overwriting attribute in super-class or sub-class Warning

Assignment overwrites attribute reorder, which was previously defined in superclass
TextLine
.
@@ -5,6 +5,8 @@

ROOT_DIRECTORY = Path(__file__).parent.resolve()

installation_requirements = open('requirements.txt', encoding='utf-8').read().split('\n')

Check warning

Code scanning / CodeQL

File is not always closed Warning

File is opened but is not closed.
do_opt = args.sanitize
intrusion_ratio = args.intrusion_ratio
if isinstance(intrusion_ratio, str) and ',' in intrusion_ratio:
intrusion_ratio = [float(n) for n in intrusion_ratio.split(',')]

Check warning

Code scanning / CodeQL

Variable defined multiple times Warning

This assignment to 'intrusion_ratio' is unnecessary as it is
redefined
before this value is used.
if isinstance(intrusion_ratio, str) and ',' in intrusion_ratio:
intrusion_ratio = [float(n) for n in intrusion_ratio.split(',')]
else:
intrusion_ratio = float(intrusion_ratio)

Check warning

Code scanning / CodeQL

Variable defined multiple times Warning

This assignment to 'intrusion_ratio' is unnecessary as it is
redefined
before this value is used.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pinned Eternal issues which are save from becoming stale
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants