Replies: 1 comment 1 reply
-
Sure, this String part can be optimized, but not that easy you think and not using these BaseString interfaces. As for this There's also a thing: currently toString and parse(Type) are also implemented in Java, so randomly switching between Java implementation and JS implementation can be inconsistent. Please, note this very carefully: when something behaves not according spec, it's not inconsistent! Inconsistent is when in one occasions number printed one way, and in other occasions the same number printed the other way. Even String.split optimization can't be done that easy. Even figuring out that you pass only literal to a method is not that trivial. And there are many "trivial" cases from user's standpoint (say, someone wrote a method that only calls String.split) won't work as expected. Anyway, there are tons of places where TeaVM can be optimized, and I see no reason why this string stuff is the most important of them. |
Beta Was this translation helpful? Give feedback.
-
Motivation.
TeaVM considered performant library/compiler. There is even benchmark that compares speed on TeaVM vs GWT.
Still, there is one performance bottleneck in TeaVM. Unfortunately it's in one of the most used classes: String.
I have pet project which does lots of DOM generation and manipulation using Javascript API. I noticed that when I load same page statically generated (page has looooots of DOM elements) and dynamically (get data from websocket, parse JSON, generate DOM, append it), it loads 3-4 times slower. I understand that there are other places where it could be the bottleneck. But today when I improved String.toLowerCase I understood one of bottleneck parts well.
Simplified example:
What do we have here? Literals were successfully inlined, but
Even first case was not inlined. In second case we have lots of transformations. Firstly we generate constant pool consists of artificial strings. Then we append integer and "px" in StringBuilder, in the end we convert builder to String and reconverting it back to Javascript string. It remains the same after the minification. It's all instead of just having (yes, it's correct everytime assuming that $var8 is an int).
If we have big loop, this obviously goes to performance degradation where not needed.
It would be stupidly criticize without solution, but today when I was in swimming pool I came with idea which I'm sharing.
Assumptions.
Here are assumptions which are confirmed by documentation.
and so on...
Solution.
Let's consider following interface:
JSString immediately extends this interface.
Second class added is
TString becomes
Constructors when we don't play with charset, then everything is easy, when we play with different than UTF-16 charset - use emulated code.
Using this approach we can use most efficient methods of JS-platform string, and other will remain the same. Methods that most Java developers use for String 90% intersecting methods that are common between Java and JS strings. So performance will raise significantly.
Bonuses.
becomes
If we have double/float/object - then we can evaluate their String representations and then join natively.
I reviewed few articles about optimization with array.join(''), but some say that join is faster, some say that common + is faster.
Here is example. Anyway, for large concats or for concats with unknown number of arguments, StringBuilder can be used.
2. In compile time it's possible to resolve some calls to simple implementation. Example:
it's real case. I was very surprised when I found that in this codesize increased by 100kb and full Regexp implementation was included.
If resolve it in compile-time, it's possible to do next trick: if argument is constant and it's length and it's not special regexp character - just use native split or emulate it with something like (example from my codebase which I'm using to avoid regexps)
implementation which return String[] can be easily done and substituted to this call and save developer from including regexps (70-80% cases will be covered by this optimization).
3. JS interaction becomes very easy. In our case we just need to invoke
$var1.base
to pass emulated string to JS context andnew TString(jsString)
to pass JS strings back instead of converting it forth and back between contexts.4. We can get rid of string pool or just replace it with native JS string array.
$rt_s
implementation becomes just get required native string by index.5. Integer.parseInt and Integer.toString (Short, Byte) can be simplified in JS case to fast and efficient conversions. Float and double still can continue using emulated methods.
6. Codesize will shrink significantly for applications using DOM manipulations or working with text somehow else.
Future.
String templates JEP-430 in the next stable JDK (or earlier) will become part of standard. In case of native Strings, we will be able to reuse power of JS string interpolation.
Implementation.
I understand that TeaVM is mature project which is used in productions of different companies. I'm suggesting the following way:
@konsoletyper What do you think?
Beta Was this translation helpful? Give feedback.
All reactions