Kolmogorov Complexity Approximation Alternative to (Error, Complexity) Pareto Frontier #715

jabowery · 2024-09-12T18:29:27Z

jabowery
Sep 12, 2024

Since any finite dataset will have finite precision, it is apparent that a given symbolic formula's residual errors will have finite precision. That being the case, error residuals can be treated as additive program literals in a expression that approximates the Kolmogorov Complexity of the dataset: An optimal lossless compression of the data.

This would approximate the gold standard for generalization in machine learning: Solomonoff Induction.

It seems the only real problems to solve in defining such a loss function are:

a principled way of assigning complexity to each library function
ignoring model-computed digits beyond the data's precision

An example of 1) would be assigning a complexity measure to, say cosine vs arccosine vs arctan vs sqrt in a manner that is justifiable in terms of Kolmogorov Complexity. This, it seems is a previously solved problem in computer engineering where known machine language algorithms, of minimal length, are known -- taking into account that math libraries must incorporate all of such algorithms in a manner that minimizes the length of the total library. Pedantry regarding "arbitrary" choice of machine instruction set may be ignored for the simple reason that there is no such "arbitrary" choice of axiomatic basis for arithmetic in reality. No one serious bothers to even talk to pedants who insist that some random choice of symbols should be considered as principled as, say, Peano's.

An example of 2) would be, say, detecting when a rational number's repeating digit pattern -- which goes on for infinity -- must be truncated to not exceed the precision of the data-point being computed. In that case the error residual would not go on for infinity but be similarly truncated.

Finally consider that any finite computation system is a finite state machine which means it is a state space model.

pukpr · 2024-09-12T18:50:23Z

pukpr
Sep 12, 2024

A recurring misunderstanding is that Kolmogorov Complexity can't be applied to data, only to a model of the data. Since that's the case, one still needs an error term that quantifies the discrepancy of the model from the data.

Apart from that, the idea behind Kolmogorov Complexity is that a seemingly simple expression can have hidden complexity, often extending into fractal or chaotic regimes. This is good in a way in that it's the entire idea behind symbolic regression. Isn't it?

0 replies

jabowery · 2024-09-13T18:11:37Z

jabowery
Sep 13, 2024
Author

Rather argue against your authoritatively postured confusion, I'll just ask one question that is similar to one I use to test large language models:

What is the shortest python program you can come up with that outputs the following string:
000000000100010000110010000101001100011101000010010101001011011000110101110011111000010001100101001110100101011011010111110001100111010110111110011101111101111111111

Now, to illustrate your confusion about "error terms", lets examine a possible "model":

print(''.join([f'{xint:0{5}b}' for xint in range(32)]))

Note that the output is almost a Kolmogorov Complexity approximation of the string as it outputs:
0000000001000100001100100001010011000111010000100101010010110110001101011100111110000100011001010011101001010110110101111100011001110101101111100111011111011111

It fails because it leaves off the trailing '11111'. But it becomes a Kolmogorov Complexity approximation, if one modifies the program to, instead, be:

print(''.join([f'{xint:0{5}b}' for xint in range(32)])+'11111')

Note that by adhering to the definition of Kolmogorov Complexity (ie: lossless compression of the data) the loss function is 72 rather than 64 + some ill defined "error term".

2 replies

pukpr Sep 14, 2024

I have to apologize. For some reason I mixed up Kolmogorov complexity with the Lyapunov exponent.
I usually do a sanity check, but this time I lazily used ChatGPT with too leading a prompt and of course it didn't point out my mistake.

So, never mind, and good luck.

jabowery Sep 14, 2024
Author

Apology accepted and thanks for the Lyapunov exponent reference. I was unfamiliar with that critical formalism although I understood its importance at an intuitive level. If I'd known of it back in 2020, there is a chance the pandemic might have taken a different turn.

At the outset of the 2020 pandemic I approached a contact at the University of Oxford's pandemic task force with the proposal that the Algorithmic Information Criterion be applied to epidemiology model selection. The response was pedantic -- in essence implying that statistical models are the only way to model anything because dynamical systems are chaotic. I know how ignorant that sounds but when couched in pedantic terms, obstructionist ignoramuses can do enormous amounts of damage to the world when ensconced in key positions. If I'd been able to refer to a formal measure of what he was throwing up as a "yes/no" qualitative distinction, I might have gotten him to think.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kolmogorov Complexity Approximation Alternative to (Error, Complexity) Pareto Frontier #715

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments 2 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

Kolmogorov Complexity Approximation Alternative to (Error, Complexity) Pareto Frontier #715

jabowery Sep 12, 2024

Replies: 2 comments · 2 replies

pukpr Sep 12, 2024

jabowery Sep 13, 2024 Author

pukpr Sep 14, 2024

jabowery Sep 14, 2024 Author

jabowery
Sep 12, 2024

Replies: 2 comments 2 replies

pukpr
Sep 12, 2024

jabowery
Sep 13, 2024
Author

jabowery Sep 14, 2024
Author