(require '[tech.v2.datatype :as dtype])
Support for the standard C datatypes and conversions between them.
user> (require '[tech.v2.datatype.casting :as casting])
nil
user> casting/base-datatypes
#{:int32 :int16 :float32 :float64 :int64 :uint64 :uint16 :int8 :uint32 :boolean :object
:uint8}
user> (casting/cast -1 :int8)
-1
user> (casting/cast -1 :uint8)
Execution error (ExceptionInfo) at tech.v2.datatype.casting/fn (casting.clj:266).
Value out of range for uint8: -1
user> (casting/unchecked-cast -1 :uint8)
255
Make any container you like and move data in between containers fluidly.
user> (dtype/make-container :typed-buffer :float32 5)
{:datatype :float32, :backing-store [0.0, 0.0, 0.0, 0.0, 0.0]}
user> (dtype/make-container :typed-buffer :float32 (range 5))
{:datatype :float32, :backing-store [0.0, 1.0, 2.0, 3.0, 4.0]}
user> (dtype/make-container :native-buffer :float32 (range 5))
#object[java.nio.DirectFloatBufferU 0x6156b247 "java.nio.DirectFloatBufferU[pos=0 lim=5 cap=5]"]
user> (dtype/make-container :sparse :float32 [1 0 1 0 1])
19-04-22 19:50:08 chrisn-lt-2 INFO [tech.jna.timbre-log:8] - Library c found at [:system "c"]
{:b-offset 0,
:b-elem-count 5,
:sparse-value 0.0,
:indexes [0 2 4],
:data [1.0 1.0 1.0],
:buffer-datatype :float32}
user> (dtype/as-list (dtype/make-container :typed-buffer :uint8 5))
[0 0 0 0 0]
user> (type *1)
it.unimi.dsi.fastutil.bytes.ByteArrayList
user> (dtype/as-nio-buffer (dtype/make-container :typed-buffer :uint8 5))
#object[java.nio.HeapByteBuffer 0xac2f857 "java.nio.HeapByteBuffer[pos=0 lim=5 cap=5]"]
(dtype/->sub-array (dtype/make-container :typed-buffer :uint8 5))
{:offset 0, :length 5, :java-array [0, 0, 0, 0, 0]}
user> (type (:java-array *1))
[B
user> (dtype/set-value! (dtype/make-container :typed-buffer :uint8 5) 2 138)
{:datatype :uint8, :backing-store [0, 0, -118, 0, 0]}
user> (dtype/set-value! (dtype/make-container :typed-buffer :int8 5) 2 120)
{:datatype :int8, :backing-store [0, 0, 120, 0, 0]}
user> (dtype/get-value (dtype/make-container :typed-buffer :int8 5) 2)
0
user> (dtype/get-value (dtype/make-container :typed-buffer :int8 (range 10)) 2)
2
Heavily optimized copying for bulk transfers. Use this also for completing lazy operations.
;; You will not see major efficiency gains until both sizes of the copy
;; operation are backed by convertible-to-nio storage. But copy
;; will still work.
user> (dtype/copy! (range 10) (float-array 10))
[0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0]
user> (type *1)
[F
;; Faster yet
user> (dtype/copy! (range 10) (float-array 10))
[0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0]
;; Fastest (by factor of 100 discounting array creation)
user> (dtype/copy! (float-array (range 10)) (float-array 10))
[0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0]
;; Also extremely fast! native->jvm-array transfer is an optimized
;; operation by the jna base system.
user> (dtype/copy! (dtype/make-container :native-buffer :float32 (range 10)) (float-array 10))
[0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0]
;;Not nearly as fast but allows marshalling between types...safely unless requested otherwise
user> (dtype/copy! (dtype/make-container :native-buffer :float32 [-1 0 1])
(dtype/make-container :list :uint8 3))
Execution error (ExceptionInfo) at tech.v2.datatype.writer$fn$reify__35539/write (writer.clj:141).
Value out of range for uint8: -1
user> (dtype/copy! (dtype/make-container :native-buffer :float32 [-1 0 1]) 0
(dtype/make-container :list :uint8 3) 0
3
{:unchecked? true})
{:datatype :uint8, :backing-store [-1 0 1]}
;; The repl doesn't know about unsigned types, but readers do:
user> (dtype/->reader *1)
[255 0 1]
;; An important form of copying is when you have a heterogeneous source of
;; data, like a sequence of sequences. This is called copy-raw->item!
user> (def start-data (partition 3 (range 9)))
#'user/start-data
user> start-data
((0 1 2) (3 4 5) (6 7 8))
user> (float-array start-data)
Execution error (ClassCastException) at user/eval53760 (form-init6919172943829999960.clj:111).
clojure.lang.LazySeq cannot be cast to java.lang.Number
;; Copy raw->item! returns the item first, and the final offset second so you can check that it did
;; indeed copy the amount you intended it to.
user> (dtype/copy-raw->item! start-data (float-array 9))
[[0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0] 9]
user>
A small sample of what is available. Basic elementwise operations along with some statistics and a fixed rolling windowing facility.
Rules are applied in order.
- If the arguments contain an iterable, an iterable is returned.
- If the arguments contain a reader, a reader is returned.
- Else a scalar is returned.
- All arguments are casted to the widest datatype present.
- Readers implement List and RandomAccess so they look like persistent vectors in the repl.
user> (require '[tech.v2.datatype.functional :as dtype-fn])
nil
user> (def test-data (dtype-fn/+ (range 10 0 -1) 5))
#'user/test-data
user> test-data
(15 14 13 12 11 10 9 8 7 6)
user> (dtype-fn/argsort test-data)
[9, 8, 7, 6, 5, 4, 3, 2, 1, 0]
;; Make sequence a vector to allow indexing
user> (dtype-fn/indexed-reader *1 (vec test-data))
[6 7 8 9 10 11 12 13 14 15]
;; Inline declare a fully typed function to perform a thing producing
;; a new reader.
user> (dtype-fn/unary-reader :float32 (* x 3) *1)
[18.0 21.0 24.0 27.0 30.0 33.0 36.0 39.0 42.0 45.0]
user> (dtype-fn/binary-reader :float32 (+ x y) *1 *1)
[36.0 42.0 48.0 54.0 60.0 66.0 72.0 78.0 84.0 90.0]
;; Readers all support creating a reader of another type from
;; the current reader:
user> (dtype/->reader *1 :int32)
[36 42 48 54 60 66 72 78 84 90]
;; Kixi.stats is used for stats. We also include apache commons-math3
;; so it is available for anything kixi doesn't provide.
user> (dtype-fn/mean *1)
63.0
;; All stats are lifted into the functional namespace so you don't need to
;; include this file specifically.
user> (->> (ns-publics 'tech.v2.datatype.statistics)
(map first)
(sort-by name))
(geometric-mean
harmonic-mean
kendalls-correlation
kurtosis
kurtosis-population
mean
median
pearsons-correlation
skewness
skewness-population
spearmans-correlation
standard-deviation
standard-deviation-population
standard-error
variance
variance-population)
user> (dtype-fn/variance *3)
9.166666666666666
user> (dtype-fn/fixed-rolling-window 3 dtype-fn/min test-data)
(14 13 12 11 10 9 8 7 6 6)
user> (dtype-fn/fixed-rolling-window 3 dtype-fn/max test-data)
(15 15 14 13 12 11 10 9 8 7)
user> (dtype-fn/fixed-rolling-window 3 #(apply + %) test-data)
(44 42 39 36 33 30 27 24 21 19)
All of the constructors for the types are in tech.v2.datatype.datetime
. All your
favorite java.time
types are there. If you are unfamiliar with these types
please check out the datetime docs. Also in that namespace are
scalar conversions between the types and pack/unpack operations if you want to
store some of your types in data somewhere as integers.
All of the operations you can do once you have one or more of these types are
under tech.v2.datatype.datetime.operations
. You can do things like
get-day
and plus-days
in both scalar and vectorized formats:
user> (require '[tech.v2.datatype.datetime :as dtype-dt])
nil
user> (require '[tech.v2.datatype.datetime.operations :as dtype-dt-ops])
nil
user> (dtype-dt/zoned-date-time)
#object[java.time.ZonedDateTime 0x1474b1 "2020-03-31T15:36:03.063-06:00[America/Denver]"]
user> (dtype-dt/local-date)
#object[java.time.LocalDate 0x72875089 "2020-03-31"]
user> (def ld (dtype-dt/local-date))
#'user/ld
user> (dtype-dt-ops/plus-days ld 1)
#object[java.time.LocalDate 0x5df0f8e4 "2020-04-01"]
user> (dtype-dt-ops/>= *1 ld)
true
user> (dtype-dt-ops/plus-days ld (range 10))
[#object[java.time.LocalDate 0x3a3eec48 "2020-03-31"] #object[java.time.LocalDate 0x4e09d01c "2020-04-01"] #object[java.time.LocalDate 0x5149b47 "2020-04-02"] #object[java.time.LocalDate 0x1ecd0a66 "2020-04-03"] #object[java.time.LocalDate 0x535ce95b "2020-04-04"] #object[java.time.LocalDate 0x53d55a1c "2020-04-05"] #object[java.time.LocalDate 0x72ba3d82 "2020-04-06"] #object[java.time.LocalDate 0x4d28789 "2020-04-07"] #object[java.time.LocalDate 0x3070a079 "2020-04-08"] #object[java.time.LocalDate 0x498c09d1 "2020-04-09"]]
user> (dtype-dt-ops/get-day-of-week *1)
[2 3 4 5 6 7 1 2 3 ]
user> (dtype-dt-ops/get-epoch-milliseconds ld)
1585690858819
user> (dtype-dt-ops/get-epoch-milliseconds
(dtype-dt-ops/plus-days ld (range 10)))
[1585690877710 1585777277710 1585863677711
1585950077711 1586036477711 1586122877711
1586209277711 1586295677711 1586382077711
1586468477712]
;; You can also make containers of these objects
user> (dtype/make-container :java-array :local-time 2)
[#object[java.time.LocalTime 0x67a430c1 "15:50:30.971"],
#object[java.time.LocalTime 0x4498f90d "15:50:30.971"]]
user> (dtype/make-container :java-array :instant 2)
[#object[java.time.Instant 0xb9b429d "2020-03-31T21:50:36.517Z"],
#object[java.time.Instant 0x6e907408 "2020-03-31T21:50:36.517Z"]]
Generic N-dimensional support built on top of readers. The combination of (something convertible to) a reader/writer and a dimension object gets you a tensor.
user> (require '[tech.v2.tensor :as tens])
nil
;; Make a tensor out of raw data. The default datatype is double.
user> (println (tens/->tensor (partition 3 (range 9))))
#tech.v2.tensor<float64>[3 3]
[[0.000 1.000 2.000]
[3.000 4.000 5.000]
[6.000 7.000 8.000]]
nil
;; Make a tensor out of a reader
user> (println (tens/reshape (vec (range 9)) [3 3]))
#tech.v2.tensor<object>[3 3]
[[0 1 2]
[3 4 5]
[6 7 8]]
;; Cloning allows you to efficiently complete a tensor reader sequence
user> (def added-tens (dtype-fn/+ (tens/reshape (vec (range 9)) [3 3]) 2))
#'user/added-tens
;; added-tens in this case just contains a reader that will lazily evaluate the
;; above expression when necessary.
user> (println added-tens)
#tech.v2.tensor<object>[3 3]
[[2 3 4]
[5 6 7]
[8 9 10]]
nil
;; Because it is pure lazy reader, you can't write to this tensor.
user> (tens/mutable? added-tens)
false
;; But now we clone, which creates a new tensor and forces the operation.
;; There is also tensor-force which does different things for dense or sparse.
user> (def completed (tens/clone added-tens))
#'user/completed
user> (tens/mutable? completed)
true
user> (def forced (tens/tensor-force added-tens))
#'user/forced
user> (tens/mutable? forced)
true
user> (def test-tens (tens/->tensor (partition 3 (range 9))))
#'user/test-tens
user> (println (tens/transpose test-tens [1 0]))
#tech.v2.tensor<float64>[3 3]
[[0.000 3.000 6.000]
[1.000 4.000 7.000]
[2.000 5.000 8.000]]
nil
user> (println (tens/rotate test-tens [1 0]))
#tech.v2.tensor<float64>[3 3]
[[6.000 7.000 8.000]
[0.000 1.000 2.000]
[3.000 4.000 5.000]]
nil
user> (println (tens/select test-tens [1 0] :all))
#tech.v2.tensor<float64>[2 3]
[[3.000 4.000 5.000]
[0.000 1.000 2.000]]
nil
user> (println (dtype-fn/+ 2 (tens/select test-tens [1 0] :all)))
#tech.v2.tensor<float64>[2 3]
[[5.000 6.000 7.000]
[2.000 3.000 4.000]]
user> (tens/->jvm test-tens)
[[0.0 1.0 2.0] [3.0 4.0 5.0] [6.0 7.0 8.0]]
user> (tens/->jvm test-tens :datatype :int32
:base-storage :java-array)
[[0, 1, 2] [3, 4, 5] [6, 7, 8]]
user> (tens/->jvm test-tens :datatype :int32
:base-storage :persistent-vector)
[[0 1 2] [3 4 5] [6 7 8]]
;; matrix multiply uses blas via jna. Anything dense will go the blas path
;; as the cost to force or complete somthing is roughly O(N) while the cost
;; of the matrix multiply is O(N*M) or roughly O(N^2). So we can afford
;; to copy the data in order to make the matmul efficient.
;; That being said, if you are going to do a lot of these it would be wise
;; to force the inputs outside of the matrix multiply method.
user> (println (tens/matrix-multiply test-tens test-tens))
19-04-22 20:46:12 chrisn-lt-2 INFO [tech.jna.timbre-log:8] - Library blas found at [:system "blas"]
#tech.v2.tensor<float64>[3 3]
[[15.000 18.000 21.000]
[42.000 54.000 66.000]
[69.000 90.000 111.000]]
nil
;; matrix-multiply takes an option scalar alpha that is multiplied into the result.
user> (println (tens/matrix-multiply test-tens test-tens 3.0))
#tech.v2.tensor<float64>[3 3]
[[ 45.000 54.000 63.000]
[126.000 162.000 198.000]
[207.000 270.000 333.000]]
nil