Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Viterbi - operate on an alphabet based numbers: currently, only on bits or whole chars possible #191

Open
simlei opened this issue Oct 5, 2018 · 0 comments

Comments

@simlei
Copy link
Member

simlei commented Oct 5, 2018

Screenshots of the Viterbi Plugin:
image

The Viterbi analysis does not have the concept of an alphabet w.r.t. the modular addition encryption mode. The user can choose between two modes, "XOR" and "modular addition" as encryption scheme and basis for the analysis in the other tabs.
The implementation of the analysis calls these schemes appropriately Combination, an abstract interface which provides a common interface which in turn is implemented by the concrete classes BitwiseXOR and ModularAddition. However, the atoms on which the abstract interface is defined is, in part, raw character values.

Interface:
https://github.com/jcryptool/crypto/blob/develop/org.jcryptool.analysis.viterbi/src/org/jcryptool/analysis/viterbi/algorithm/Combination.java

Implementations:
https://github.com/jcryptool/crypto/blob/develop/org.jcryptool.analysis.viterbi/src/org/jcryptool/analysis/viterbi/algorithm/ModularAddition.java
https://github.com/jcryptool/crypto/blob/develop/org.jcryptool.analysis.viterbi/src/org/jcryptool/analysis/viterbi/algorithm/BitwiseXOR.java

Accordingly, the modular addition is done on the domain of char values modulus Character.MAX_VALUE:

public class ModularAddition implements Combination {
/**
* Combines two strings by combining the characters at the same position.
* the combination is done by adding the integer-values of the chars modulo
* Character.MAX_VALUE. If the strings are not of equal length the shorter
* one will be repeated to fit the longer one.
*
* @param a
* addend
* @param b
* addend
* @return the result of the modular addition
*/
public String add(String a, String b) {

From the viewpoint of the other mode, XOR combination, this makes sense since as well this form of modular addition as XOR is an operation defined on bitstreams rather than the more abstract concept of "characters in an alphabet". An alphabet is defining a subset of the plain- and cipher data domain which implies changing the encryption as well as the analysis model -- generation of graph candidates and n-gram statistics).

Many classical encryption schemes however work on an alphabet-masked subset of the character domain, especially instructive examples. One example is this vernam encryption, using an upper case latin alphabet ABC...XYZ as domain with N=26

THATSONESMALLSTEPFORAMANONEGIANTLEAPFORMANKIND
WEAREONENATIONWEAREONEPEOPLEANDOURTIMEFORCHANGEHASCOME
-------------------------------------------------
PLAKWCAIFMTTZFPIPWSFNQPRCCPKINQHFVTXRSWARPRIAJ

The plug-in does not know about this form of adding characters, and recreating this example in tab 1 as well as analyzing is is not possible with the current view on cipher input and output. Encrzption in tab 1 as well as analysis in tab 2 and 3 only work on raw character values, rather than the concept of "indices of an alphabet which translates to valid character representations for the human eye".

Thankfully, the combinations XOR and ModularAddition have already been centralized into single classes and one can see where they are referenced e.g. via eclipse refactoring tools. However, the analysis routines and UI have to be adapted to introduce the concept of an alphabet, which includes:

  • w.r.t. UI controls
    • parametrize ModularAddition with an alphabet selection box
    • user input validation (reject non-alphabetic characters)
  • w.r.t. analysis
    • normalize existing n-gram statistics to the active alphabet; in this plug-in they are based on predefined text files, the user has the two options "english" and "german. The relevant class "NGramsProvider" has to be adapted to take n-grams into account. The relevant lines where user choice takes effect:
      if (de.getSelection()) {
      path.append("data/ngrams_de.txt"); //$NON-NLS-1$
      } else {
      path.append("data/ngrams_en.txt"); //$NON-NLS-1$
      }
      int nGramSize = Integer.parseInt(nGramDrop.getText());
      int prunningNumber = Integer.parseInt(pathDrop.getText());
      Map<String, Integer> ngrams;
      // reading ngrams from a textfile
      NGramProvider provider = new NGramProvider(path.toString());
  • Look into analysis to be sure all character-based values are translated through the alphabet.
  • Accomodate as basis for calculations both character-representations as bit values (as seen currently) and alphabet-based representations.

the nitty-gritty details that have to be considered:

  • bitstream- and alphabet-based representations of input cannot be handled completely the same. If we were to unify it completely and use a "catch-all alphabet" including every possible character, that would, with the UTF8 character set which takes up to 6 bytes per character, lead to the n-gram table to explode.
  • Displaying the analysis details in the UI as characters would require a clean approach that separates between 1) "readable string" and 2) "alphabet-based numerical data on which the algorithm actually operates". The latter is the form which currently the char type represents and which is also directly printed into text fields etc in "analysis" and "analysis details" tabs. Working on 2) algorithmically and displaying 1) to the user's eyeballs will need much of the data-handling code with relation to the UI to be refactored and may have cascading effects into much of the codebase.

In my opinion, we should not attempt to unify bitstream-based and character-based approaches, but rather leave the existing methods alone as well as can be done (relabeling them "bitwise ...") and introducing a third, "alphabet-based addition" scheme. It's nontrivial to find an effective way to integrate that existing code and the to-be-introduced alphabet-based code, however.

Some secondary thoughts:

  • "display as hexadecimal" radio buttons should then get a "display as alphabet-based"
  • In the second tab, "Viterbi analysis", The user cannot directly set the scheme on which the analysis will be based: He has to complete the first tab for that, i.e. perform an encryption with the desired scheme (XOR or Modular Addition). This is quite unintuitive if the user already has got a ciphertext encrypted with a Modular addition scheme, because he has to encrypt an unrelated example with the desired scheme to be able to perform analysis on his text (loading it in tab 2 and therefore overwriting the unrelated ciphertext from tab 1)
@simlei simlei changed the title Viterbi - operate on an alphabet Viterbi - operate on an alphabet based numbers: currently, only on bits or whole chars possible Oct 5, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant