Skip to content
Tim Armstrong edited this page Jan 28, 2015 · 43 revisions

Cross-cutting Issues

This section is for global decisions that affect multiple functions

  • camelCase is the standard for library functions. Non-camelcase functions may be retained for compatibility
  • In Swift/T, library functions that do not have a return value typically return a void output that allows explicit dependence on it running.
  • In Swift/T, we annotate some functions with properties like @pure that are used by the optimizer. Do we need to include this info in the docs?
  • Implemented functions are annotated with 🍊 for Swift/T and 🐨 for Swift/K once they exactly match the functionality described here.

Math

❓ Question: Do we support constants?

  • Why not? - Tim
  • Just asking - Mihael
  • The Swift/K constant and global support is a little different from Swift/T, but we should be able to have a set of constants as part of the library in both, I think... - Tim

This mostly reflects pre 1.5 java.lang.Math


Trig functions

float sin(float x)

float cos(float x)

float tan(float x)

float asin(float x)

float acos(float x)

float atan(float x)

float atan2(float y, float x)

Exponentials/Powers

float exp(float x) 🍊

float ln(float x)

float log(float x, float base)

float log10(float x)

float pow(float base, float exponent)

float sqrt(float x) 🍊

float cbrt(float x)

Rounding

float ceil(float x) 🍊

float floor(float x) 🍊

float round(float x) 🍊

Misc

int min(int a, int b)

float min(float a, float b)

int max(int a, int b)

float max(float a, float b)

int abs(int z)

float abs(float x)

boolean isNaN(float x)

❓ Do we need to document details of floating point behavior such as invalid values, etc?

Random numbers

This probably needs some discussion.

❗ As primitives we should provide something simpler that doesn't depend on lazy arrays. I have a number of implementation concerns for Swift/T.

πŸ’¦ Unfortunately I'm not sure how to keep it deterministic so that restart logs would work - Mihael

πŸŽ… Maybe we should avoid having this be the canonical way for now? I'm not confident on implementing it in T because it seems to depend on running arbitrary code lazily when an array is read. How about if we had, as a lowest common denominator, a type random_state and functions random_state seed_random(int seed), (random_state, int) next_random(random_state state), etc. You could use this to fill an array if needed. It woudl be deterministic. Maybe one downside is that programmers might accidentally bifurcate the RNG.

int[] random(int seed, int min, int max) Returns a (lazy) deterministic uniform random sequence. ❗ The implementation will have to be inefficient unless the values are accessed in order. Well, maybe not. A crypto-hash of some function of the seed and the index could probably work as a democratically-slow random-access array.

float[] random(int seed, float min, float max)

Same, but with floats.

πŸŽ… I'm not sure how I feel about the overloading here. It seems surprising to me but I'm not sure why. - Tim

float[] gaussian(int seed)

Normally distributed random with mu = 0 and sigma = 1.

Stats

❗ Maybe. We should discuss.

float sum(float[] a) ❓ Guarantees about how the sum is computed, given that floating point addition is not associative/commutative?

  • dont make guarantees. Only reasonable guarantee would be serial from 0; parallel addition of partitions would be useful for huge arrays. So leave order of addition undefined. - Mike

int sum(int[] a)

  • Do we want to add - perhaps elsewhere: string sum(string[] a) ?

float avg(float[] a)

float avg(int[] a)

float moment(float[] a, int n, float center)

Returns the n-th moment of an array about a value (center). For example, the mean would be moment(a, 1), while the standard deviation is moment(a, 2, avg(a)).

float moment(int[] a, int n, float center)

❓ Do we need shift and other bitwise operators? One could potentially use division/multiplication to emulate them, but there are subtleties with rounding and signs that might make it tricky.

  • I think if there is a use case common enough that people will need them, we should support them. Emulation seems impractical. -Tim
  • Right, does anybody know of a use case? -Mihael

Conversion functions

int toInt(float x) 🍊

float toFloat(int z)

int parseInt(string s, int base = 10)

float parseFloat(string s)

string toString(int z)

string toString(float x)

string toString(boolean b)

❗ Swift/T includes a function repr that dumps any data type to a string in some implementation-specified form. This may be useful too. Also array_repr which is essentially map repr.

❗ isInt/isFloat/etc functions can be very useful.

String functions

string strcat(string... s) 🍊

❗ I like Python's use of negative indices to signify an index relative to the end of the string. However, that doesn't work well with exclusive end indices (like Java's). Perhaps, a compromise solution would be to only use -1 to signify length(str).

  • Why not? Python uses exclusive end indices. - Tim
  • Because if s.charAt(-1) is the last character then in an end-exclusive convention, s.substring(0, 0) would represent the whole string. If -1 represents the length, then s == s.substring(0, -1), but the last character is s.charAt(-2). - Mihael
  • in an end-exclusive convention, s.substring(0,0) should be an empty string though. s.substring(0, 1) is the first character only. I tested this in both Python and Java. There is an issue though, that Python's slice syntax treats a non-existent end index differently from -1. s[0:] is all n digits of the string, s[0:0] is the empty string, s[0:-1] is first n - 1 digits of the string. Maybe the crux of it is that we can't emulate Python's negative indices by mapping an absent end index to -1? - Tim

int length(string s) 🍊

string[] split(string s, string delimiter, int max = -1)

string[] splitRe(string s, string regexp, int max = -1)

like split(), except the delimiter is a regular expression

string trim(string s)

string substring(string s, int start, int end = -1)

❗ I frequently make the mistake of writing subString in Java. Maybe that is a choice we should consider.

  • I think substring is fine, personally - Tim

string toUpper(string s)

string toLower(string s)

string join(string[] sa, string delimiter) ❗ Overloading with the array join operation may be confusing. stringJoin?

string replaceAll(string s, string find, string replacement, int start = 0, int end = -1)

string replaceAllRe(string s, string findRe, string replacementRe, int start = 0, int end = -1)

❗ This is meant to allow the use of capture groups to, for example, do replacements of the sort "key1=v1, key2=v2,..." β†’ "rep1=v1, rep2=v2,...".

int indexOf(string s, string find, int start = 0, int end = -1)

int lastIndexOf(string s, string find, int start = -1; int end = 0)

string format(string spec, any... args)

boolean matches(string s, string re)

string[] findAllRe(string s, string re)

❓ Not sure about this one, but because we would not otherwise offer a comprehensive regexp library, this could be used to return an array with all capture groups in a regexp. I do not see how this would be possible with any of the other functions.

❓ do we need reverse?

πŸ’¦ It's not very common as far as I can tell. Is there at least one use case in the whole history of Swift? -Mihael

Array functions

T[K] slice(T[K] a, int start, int end)

T[int][K] split(T[K], int n)

Splits an array into chunks of size n. The last element of the returned array could have fewer than n elements

❓ if the array is sparse or not zero-based, are these indices the keys or the physical indices?

πŸ’¦ the [int] keys label the chunks. The [K] indices are the actual keys. In other words it splits a hashtable into multiple hastables without losing the mappings. -Mihael

T[int] join(T[K1][K2] a)

❗ Joins a number of arrays into a single array. If an ordering exists on K1 and/or K2, it should be preserved.

T[int] join(T[K]... arrays)

❗ The overloading is ambiguous. In the second one T could be an array type.

πŸ’¦ True.

T[int] compact(T[K] a)

❗ Returns an int-indexed array containing all elements of T[K]. The exact mapping between between the keys in the initial array and the integer keys is not specified, but it is guaranteed that the same input will always return the same output.

❓ Why not guarantee that the order is stable?

πŸ’¦ What would "stable" mean here? -Mihael

πŸŽ… If (k1, v1) and (k2, v2) are key/value pairs in the input and (k1', v1), (k2', v2) are key/value pairs in the output, then k1 < k2 <=> k1' < k2' . I.e. the order is preserved. It might be more practical to give the implementation leeway, but I feel like some users might expect this behaviour.

πŸ’¦ Sure. I think the problem is when there is no clear ordering on the keys. At least in theory, one can have struct types as keys, but I agree that if there is an ordering on K, then it should be "preserved" -Mihael

T[K][int] zip(T[K] a, T[K] b)

❓ This is based on pairing up keys right? What is behaviour if matching keys not present? Discard them or raise an error?

boolean contains(V A[K] array, K key) 🍊

Returns true if the array contains the key upon closing

❗ Currently this doesn't return until the array is closed, even if the element is assigned. This seems bad.

boolean exists(V A[K] array, K key) 🍊

Returns true if the array contains the key whenever the function runs (maybe before the array is closed.

❗ Non-deterministic.

I/O

any read(file f, string format="None")

❗ Reads data from a file. A minimum of the following formats should be supported:

  • "None" - read the entire file as a string
  • "FieldAndValue" - field = value, one per line

Other possible/desired formats:

  • "CSV" - read a CSV file (for backwards compatibility with K's readData)
  • "JSON"
    • specify readData() compatibility more clearly. May want to leave readData() for a while to ease transition? - Mike

❗ Should format be an enum or set of constants? - Tim

file write(any data, string format="None")

The inverse of read(), with relevant constraints on what "None" and "CSV" can do.

string getEnv(string name)

❓ should we disambiguate between an undefined environment variable and an environment variable set to the empty string, e.g. with an extra output argument? 99% of the time the difference isn't important, but some applications may want to handle them differently. May not be worth the hassle, or we could provide a separate function to check if it was defined.

Blobs

Swift/T supports a range of functions for working with binary blobs. I've included the major functions here for reference.

❗ This is a very low priority to port to Swift/K.

(int o) blobSize(blob b)

Size in bytes

(blob o) blobNull()

Zero-length blob

blobFromString(string s)

Conversion function. Includes null terminator in blob.

((string o) stringFromBlob(blob b)

Conversion function. Expects null terminator in blob.

(blob o) blobFromFloats(float f[])

(blob o) blobFromInts(int i[])

(float f[]) floatsFromBlob(blob b)

(blob o) blobRead(file f)

(file f) blobWrite(blob b)

(blob o) blobZeroesFloat(int n)

Assertions

Assertions may not be to everyone's taste, but they can be very useful, especially in tests.

Input argument types are tricky. assertEqual requires equality comparison to be supported. assertLT/assertLTE require the type to be one with a logical order. Currently may be best to support a limited set of primitive types.

(void o) assert(boolean condition, string msg="assertion failed")

(void o) assertEqual(string|int|float|boolean v1, string|int|float|boolean v2, string msg="assertion failed")

(void o) assertLT(string|int|float|boolean v1, string|int|float|boolean v2, string msg="assertion failed")

(void o) assertLTE(string|int|float|boolean v1, string|int|float|boolean v2, string msg="assertion failed")

Clone this wiki locally