forked from snowplow/snowplow
-
Notifications
You must be signed in to change notification settings - Fork 0
/
CHANGELOG
763 lines (713 loc) · 39.3 KB
/
CHANGELOG
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
Version 0.9.6 (2014-07-26)
--------------------------
Java Tracker: bumped git submodule to 0.4.0 (#892)
EmrEtlRunner: bumped to 0.9.0
EmrEtlRunner: passed etl_tstamp into Hadoop Enrich as an argument (#396)
EmrEtlRunner: removed enrichment-specific code (#811)
EmrEtlRunner: removed enrichment-specific parameters from config.yml.sample (#809)
EmrEtlRunner: replaced enrichment-specific arguments from EmrEtlRunner (#808)
EmrEtlRunner: removed %3D code following Scalding upgrade (#849)
EmrEtlRunner: fixed contract on partition_by_run (#894)
EmrEtlRunner: updated Bash script to support enrichments path (#916)
StorageLoader: bumped to 0.3.1
StorageLoader: now looking in eu-west-1 region for s3://snowplow-hosted-assets (#895)
StorageLoader: updated combined Bash script to support enrichments path (#917)
Scala Hadoop Enrich: bumped to 0.6.0
Scala Hadoop Enrich: bumped Scala to 2.10.4 (#912)
Scala Hadoop Enrich: bumped Scalding to 0.11.1 (#911)
Scala Hadoop Enrich: bumped Hadoop to 1.2.1 (#913)
Scala Hadoop Enrich: bumped to Scala Common Enrich 0.5.0 (#788)
Scala Hadoop Enrich: passed etl_tstamp into Scala Common Enrich (#817)
Scala Hadoop Enrich: removed event_vendor and ue_name and renamed ue_properties to unstruct_event (#835)
Scala Hadoop Enrich: removed %3D handling for compatibility with old Scalding Args (#850)
Scala Hadoop Enrich: added ability to download additional MaxMind databases (#885)
Scala Hadoop Enrich: added runHadoop and Tool.main tests (#914)
Scala Common Enrich: bumped to 0.5.0
Scala Common Enrich: bumped user-agent-utils version, thanks @pkallos! (#662)
Scala Common Enrich: bumped referer-parser to 0.2.2 (#864)
Scala Common Enrich: bumped httpclient to 4.3.3 (#897)
Scala Common Enrich: bumped scala-maxmind-geoip to scala-maxmind-iplookups 0.1.0 (#882)
Scala Common Enrich: stored etl_tstamp in new field in CanonicalOutput (#818)
Scala Common Enrich: removed event_vendor and ue_name and renamed ue_properties to unstruct_event (#836)
Scala Common Enrich: made referer parsing configurable with list of internal domains (#857)
Scala Common Enrich: migrated configurable enrichments to new EnrichmentRegistry (#858)
Scala Common Enrich: added validation of enrichments JSON (#807)
Scala Common Enrich: replaced "anon_ip_quartets" with "anon_ip_octets" everywhere (#547)
Scala Common Enrich: added ability to extract event_id from querystring (#723)
Scala Common Enrich: extracted CanonicalInput's userId as network_userid, thanks @pkallos! (#855)
Scala Common Enrich: added MaxMind region_name field (#873)
Scala Common Enrich: added IP -> ISP lookup (#861)
Scala Common Enrich: added IP -> organization lookup (#887)
Scala Common Enrich: added IP -> domain lookup (#886)
Scala Common Enrich: added IP -> net speed lookup (#889)
Scala Common Enrich: added validation for transaction ID (#428)
Scala Common Enrich: renamed Tests to Specs for consistency (#618)
Scala Hadoop Shred: bumped to 0.2.0
Scala Hadoop Shred: bumped to Scala Common Enrich 0.5.0 (#918)
Scala Hadoop Shred: trailing empty fields no longer cause shredding for that row to fail (#921)
Scala Hadoop Shred: updated column offsets for enriched events TSV (#915)
Redshift: bumped table-def to 0.4.0
Redshift: migration script added for 0.3.0 to 0.4.0
Redshift: added etl_tstamp to atomic.events (#819)
Redshift: removed event_vendor and ue_name and renamed ue_properties to unstruct_event (#834)
Redshift: added new MaxMind fields (#871)
Redshift: applied runlength encoding to all fields keyed off IP address (#883)
Redshift: migration script added for 0.3.0 to 0.4.0 (#838)
Postgres: bumped table-def to 0.3.0
Postgres: migration script added for 0.2.0 to 0.3.0
Postgres: added etl_tstamp to atomic.events (#820)
Postgres: removed event_vendor and ue_name and renamed ue_properties to unstruct_event (#833)
Postgres: added new MaxMind fields (#871)
Postgres: migration script added for 0.2.0 to 0.3.0 (#837)
Version 0.9.5 (2014-07-09)
--------------------------
Ruby Tracker: added git submodule. Version 0.1.0 (#645)
Java Tracker: added git submodule. Version 0.2.0 (#843)
JavaScript Tracker: bumped git submodule to 2.0.0 (#635)
Python Tracker: bumped Python Tracker git submodule to 0.4.0 (#634)
Scala Hadoop Shred: added. Version 0.1.0
EmrEtlRunner: bumped to 0.8.0
EmrEtlRunner: updated S3DistCp steps to use new S3DistCpStep from Elasticity (#629)
EmrEtlRunner: added --skip s3distcp option (#313)
EmrEtlRunner: added ability to start Lingual in EmrEtlRunner (#623)
EmrEtlRunner: added ability to start HBase in EmrEtlRunner (#622)
EmrEtlRunner: improved load performance by switching ETL to write out to HDFS (#278)
EmrEtlRunner: now invoking Scala Hadoop Shredder after main job (#644)
EmrEtlRunner: added :iglu: section to config.yml for Scala Hadoop Shred (#814)
EmrEtlRunner: updated to run Scala Hadoop Shred following Hadoop Enrich (#815)
EmrEtlRunner: added --skip shred option (#659)
StorageLoader: bumped to 0.3.0
StorageLoader: bumped Sluice to 0.2.1 (#881)
StorageLoader: added initial Ruby.contracts support (#391)
StorageLoader: updated config.yml to support shredding (#897)
StorageLoader: added ACCEPTINVCHARS to StorageLoader (#411)
StorageLoader: wrote JSON Path files for ad_* events (#642)
StorageLoader: wrote JSON Path file for link_click (#599)
StorageLoader: wrote JSON Path file for screen_view (#643)
StorageLoader: wrote JSON Path file for schema.org's WebPage (#772)
StorageLoader: added :jsonpath_assets: setting for StorageLoader (#606)
StorageLoader: added ability to load custom tables using JSON Paths (#607)
StorageLoader: added --skip shred option (#660)
StorageLoader: added :in: hint on StorageLoader configuration, thanks @joaolcorreia! (#755)
Redshift: added Redshift DDL for ad_* events (#639)
Redshift: added Redshift DDL for link_click events (#600)
Redshift: added Redshift DDL for screen_view events (#640)
Redshift: added Redshift DDL for schema.org's WebPage (#771)
Looker Analytics: wrote LookML for ad_* events (#605)
Looker Analytics: wrote LookML for screen_view events (#637)
Looker Analytics: wrote LookML for link_click events (#636)
Looker Analytics: wrote LookML for schema.org's WebPage (#770)
Looker Analytics: updated LookML to use liquid templating (#851)
Version 0.9.4 (2014-05-30)
---------------------------
Redshift: added reference_data.country_codes (#779)
Postgres: added reference_data.country_codes (#781)
Looker Analytics: New 'traffic_pulse' dashboard with globally configurable drill-down variables (#765)
Looker Analytics: Snowplow website specific dimensions and metrics removed: base model is now company-generic (#764)
Looker Analytics: cleaner joining of data sets in Looker model (#763)
Looker Analytics: dimensions and metrics renamed to make it clearer for an analyst getting started with the data (#761)
Looker Analytics: added distkeys and sortkeys to derived tables to speed up query times (#696)
Looker Analytics: derived tables now auto-generated when new data is loaded into atomic.events (#688)
Looker Analytics: 'visits' renamed to 'sessions' (#762)
Looker Analytics: LookML models versioned using SchemaVer (#766)
Version 0.9.3 (2014-05-21)
--------------------------
EmrEtlRunner: bumped to 0.7.0
EmrEtlRunner: bumped Sluice to 0.2.1 (#405)
EmrEtlRunner: bumped Elasticity to 3.0.4 (#665)
EmrEtlRunner: replaced hadoop_version setting with ami_version setting (#701)
EmrEtlRunner: fixed handling of region, placement and ec2_subnet_id (#754)
EmrEtlRunner: fixed regression where 0 files staged still kicks off EMR (#409)
EmrEtlRunner: stopped Sluice file operation threads being killed by folders (#401)
EmrEtlRunner: fixed disabling of Cascading error catching (#721)
EmrEtlRunner: renamed Clojure Collector log files in processing bucket to support multiple instances (#717)
EmrEtlRunner: added initial Ruby.contracts support into EmrEtlRunner (#392)
EmrEtlRunner: updated to use the Ruby Logger (#194)
EmrEtlRunner: updated so it's embeddable in other applications (#128)
EmrEtlRunner: added ability to bundle as a JRuby fat jar (#674)
EmrEtlRunner: added initial unit tests (#672)
Clojure Collector: bumped to 0.6.0
Clojure Collector: load balancer IP address getting stored in logs (#719)
Documentation: removed all Snowplow tracking from READMEs, thanks @acinader! (#720)
Documentation: fixed EmrEtlRunner documentation is (slightly) inconsistent, thanks @pvdb! (#749)
Version 0.9.2 (2014-04-30)
--------------------------
Scala Hadoop Enrich: bumped to 0.5.0
Scala Hadoop Enrich: bumped to Scala Common Enrich 0.4.0 (#699)
Scala Hadoop Enrich: bumped SBT to 0.13.2 (#702)
Scala Hadoop Enrich: bumped to using using sbt-assembly 0.11.2 (#704)
Scala Common Enrich: bumped to 0.4.0
Scala Common Enrich: upgraded to support new and future CloudFront file formats (#698)
Scala Common Enrich: bumped SBT to 0.13.2 (#703)
Scala Hadoop Bad Rows: added. Version 0.1.0
Hive Storage: bumped table-def.q to 0.1.0
Hive Storage: added new unstructured fields to Hive table definition (#709)
Hive Storage: added raw page_url and page_referrer into Hive table (#710)
Hive Storage: added name_tracker field to Hive table (#711)
Version 0.9.1 (2014-04-11)
--------------------------
Scala Hadoop Enrich: bumped to 0.4.0
Scala Hadoop Enrich: bumped to Scala Common Enrich 0.3.0 (#497)
Scala Hadoop Enrich: renamed AnonQuartets to AnonOctets (#498)
Scala Hadoop Enrich: renamed all Snowplow Hadoop Tests to Specs (#515)
Scala Hadoop Enrich: added page_url and page_referrer back into ETL's output (#483)
Scala Common Enrich: bumped to 0.3.0
Scala Common Enrich: bumped Argonaut to 6.0.3 (#620)
Scala Common Enrich: added app and mob as valid platform codes, thanks @kinabalu! (#524)
Scala Common Enrich: added support for remaining platform codes (#516)
Scala Common Enrich: updated POJO in Scalding ETL to include new unstructured fields (#362)
Scala Common Enrich: updated POJO in Scalding ETL to include name_tracker field (#595)
Scala Common Enrich: extract evn from Tracker Protocol (#604)
Scala Common Enrich: extract tna from Tracker Protocol (#616)
Scala Common Enrich: extract and validate unstructured events (#142)
Scala Common Enrich: extract and validate custom contexts (#426)
Scala Common Enrich: reformat incoming event and context JSONs (#589)
Scala Common Enrich: make sure to error a JSON if > length (#567)
EmrEtlRunner: bumped to 0.6.0
EmrEtlRunner: bumped Elasticity to 3.0.2 (#587)
EmrEtlRunner: allowed AWS VPC selection in EmrEtlRunner (#581)
EmrEtlRunner: set :visible_to_all_users to true for EMR jobs, thanks @smugryan! (#560)
Redshift: atomic-def script bumped to 0.3.0
Redshift: migration script added for 0.2.2 to 0.3.0
Redshift: added new unstructured fields to Redshift table definition (#361)
Redshift: changed distkey to be event_id, not domain_userid (#584)
Redshift: added raw page_url and page_referrer into Redshift table (#591)
Redshift: added name_tracker field to Redshift table (#594)
Redshift: converted Redshift varchar(38) for event IDs to char(36) (#282)
Postgres: atomic-def script bumped to 0.2.0
Postgres: migration script added for 0.1.x to 0.2.0
Postgres: added new unstructured fields to Postgres table definition (#359)
Postgres: added raw page_url and page_referrer into Postgres table (#592)
Postgres: added name_tracker field to Postgres table (#593)
Postgres: converted varchar(36) for event IDs to char(36) (#596)
StorageLoader: bumped to 0.2.0
StorageLoader: added TIMEFORMAT 'auto' to StorageLoader to handle outlier dvce_timestamps (#427)
JavaScript Tracker: bumped git submodule to 1.0.1 (#585)
Python Tracker: added git submodule pointing to 0.1.0 (#586)
Version 0.9.0 (2014-02-04)
--------------------------
Thrift Raw Event: added. Version 0.1.0
Thrift Raw Event: specified Thrift IDL for new raw event schema (#430)
Scala Stream Collector: added. Version 0.1.0
Scala Stream Collector: implemented new spray-can (Akka Http) Scala stream collector (#432)
Scala Kinesis Enrich: added. Version 0.1.0
Scala Kinesis Enrich: implemented initial Kinesis-based enrichment (#460)
Scala Common Enrich: bumped to 0.2.0
Scala Common Enrich: added Thrift SnowplowRawEvent as a dependency to common-enrich (#475)
Scala Common Enrich: added ability to read Thrift SnowplowRawEvent (Thrift) (#462)
Scala Common Enrich: renamed CloudFront to Cloudfront in code (#495)
Scala Common Enrich: renamed AnonQuartets to AnonOctets (#491)
Scala Common Enrich: added raw -> CanonicalInput tests (#484)
Scala Common Enrich: updated GET payload extraction to handle empty payloads (#502)
Git submodules: changed git:// protocol in .gitmodules to https:// (#512)
NodeJS Collector: removed contrib-nodejs-collector from 2-collectors (#474)
JavaScript Tracker: bumped JS Tracker submodule to 0.13.1 release (#511)
Version 0.8.13 (2014-01-08)
---------------------------
Looker Analytics: added 0.1.0
Looker Analytics: created Snowplow metadata model for Looker BI (www.looker.com) (#472)
Version 0.8.12 (2014-01-07)
---------------------------
Hadoop ETL: bumped to 0.3.6
Hadoop ETL: bumped to SBT 0.13.0 (#404)
Hadoop ETL: bumped to using sbt-assembly 0.10.1 (#421)
Hadoop ETL: bumped to Scala 2.10.3 (#423)
Hadoop ETL: bumped to Scalding 0.8.11 (#422)
Hadoop ETL: upgraded useragent utils to 1.11 & moved to Maven dependency (#416)
Hadoop ETL: added test running back into sbt-assembly step (#420)
Hadoop ETL: updated copyright messages to be Snowplow not SnowPlow, and to 2014 not 2013 (#419)
Hadoop ETL: added ValidatedString as a type to package.scala (#328)
Hadoop ETL: added missing validation to stringToJByte (#408)
Hadoop ETL: missing page URI no longer interpreted as bad row (#399)
Hadoop ETL: updated CfRegex to reflect Cfcs(Cookie) can be empty (#410)
Hadoop ETL: numeric fields in tr_ and ti_ now parsed to doubles, not madeTsvSafe strings (#400)
Hadoop ETL: moved ETL core into separate project scala-enrich-common (#417)
Scala Common Enrich: updated ETL versioning to include host and common versions (#448)
Postgres: bumped cube-pages.sql to 0.1.1
Postgres: minor fix: cube_pages.complete referenced non-existent table cube_pages.basic, thanks @mrwalker! (#414)
Version 0.8.11 (2013-10-22)
---------------------------
Hadoop ETL: bumped to 0.3.5
Hadoop ETL: added Argonaut 6.0 as a dependency (#342)
Hadoop ETL: added fromTimestamp to EventEnrichments (#340)
Hadoop ETL: added makeTsvSafe to ConversionUtils (#338)
Hadoop ETL: added JsonUtils (#323)
Hadoop ETL: added support for 3 and 4 return values from MapTransformer (#324)
Hadoop ETL: updated GetJsonPayload to use Argonaut and renamed to JsonPayload (#339)
Hadoop ETL: added ability to mask IP addresses in ETL (#309)
Hadoop ETL: refr_ and page_ fields now stored raw (#374)
Hadoop ETL: defensively fixed raw spaces in page and referer URLs (#346)
Hadoop ETL: fixed regression, single-encoded %s logic didn't account for % itself (#347)
Hadoop ETL: added unit tests for fixTabsNewlines (#332)
Hadoop ETL: tests now report the failing CanonicalOutput field (#325)
Hadoop ETL: now handling all fields double-encoded as per CloudFront post-14-September (#348)
Hadoop ETL: added support for 21 Oct CloudFront access log format (#384)
Hadoop ETL: added truncation to refr_term (#379)
Hadoop ETL: added truncation to se_label (#394)
Hadoop ETL: made all prior ME.identity fields TSV-safe (#395)
EmrEtlRunner: bumped to 0.5.0
EmrEtlRunner: bumped Sluice to 0.1.5 (#96)
EmrEtlRunner: bumped Elasticity to 2.6 (#345)
EmrEtlRunner: enabled EMR Job Flow debugging for easier access to logs (#279)
EmrEtlRunner: ETL job no longer fails if there's no data for last run period (#296)
EmrEtlRunner: empty processing dir check now works if dir contains 1 file (#326)
EmrEtlRunner: added ability to mask IP addresses in ETL (#309)
EmrEtlRunner: made the examples match what you get from git out of the box, thanks @shermozle (#331)
StorageLoader: bumped to 0.1.1
StorageLoader: bumped Sluice to 0.1.5 (#96)
StorageLoader: fixed "\" in fields acts as an escape character for Postgres, thanks @kingo55 (#329)
StorageLoader: added ability to --skip analyze (#335)
StorageLoader: moved VACUUM SORT ONLY to a --include step (#321)
StorageLoader: added COMPROWS to config and --include compupdate option (#344)
StorageLoader: changed Postgres VACUUM FULL to VACUUM (#357)
StorageLoader: added TRUNCATECOLUMNS for Redshift load (#360)
StorageLoader: added FILLRECORD to our Redshift COPY command (#380)
Postgres: fixed error in `recipes_basic.technology_mobile` recipe (#397)
Version 0.8.10 (2013-10-18)
---------------------------
Redshift: bumped table-def to 0.2.2
Redshift: moved events table to a new atomic schema in atomic-def.sql (#301)
Redshift: added migration script for 0.2.1 to 0.2.2
Redshift: added SQL DDL to define Redshift recipes (#297)
Redshift: added SQL DDL to define Redshift cubes (#298)
Postgres: bumped table-def to 0.1.1
Postgres: renamed table-def file to atomic-def.sql
Postgres: added migration script for 0.1.0 to 0.1.1
Postgres: moved NOT NULL constraint on event field to event_vendor field (#318)
Postgres: added SQL DDL to define Postgres recipes (#303)
Postgres: added SQL DDL to define Postgres cubes (#302)
Documentation: fixed wrong path to no-js-tracker subdirectory, thanks @gregakespret (#343)
Documentation: improved "Find out more" table in README, thanks @dideler (#353)
Version 0.8.9 (2013-09-05)
--------------------------
Hadoop ETL: bumped to 0.3.4
Hadoop ETL: updated to handle singly-encoded %s in CloudFront querystring field (#333)
Version 0.8.8 (2013-08-04)
--------------------------
JavaScript Tracker: moved into own repo (#277)
Hadoop ETL: bumped to 0.3.3
Hadoop ETL: URL-decodes "%3D" to "=" to allow Hive-style directory names as arguments (#305)
Hadoop ETL: bumped referer-parser to 0.1.1 to fix java.lang.NullPointerException (#314)
EmrEtlRunner: bumped to 0.4.0
EmrEtlRunner: bumped Sluice to 0.0.7 (#299)
EmrEtlRunner: removed :snowplow: section from config.yml.sample (#289)
EmrEtlRunner: simplified EmrEtlRunner and its config (#287)
EmrEtlRunner: added run= to timestamped ETL folder names (#294)
EmrEtlRunner: updated "Jobflow started" stdout message to include jobflow ID (#315)
Hive ETL: removed folder 3-enrich/hive-etl as no longer supported (#286)
Hive storage: updated hive-storage scripts to work with current Redshift-format flatfile (#290)
Infobright: removed folder 4-storage/infobright as not currently supported (#285)
Postgres: add Postgres table definition in atomic schema (#160)
StorageLoader: bumped to 0.1.0
StorageLoader: bumped Sluice 0.0.7 (#300)
StorageLoader: removed code to delete Hive ETL's empty event files (#306)
StorageLoader: fixed bug where download path has to be set (even when using Redshift) (#280)
StorageLoader: optimized ANALYZE and VACUUM commands (#283)
StorageLoader: added MAXERROR as StorageLoader configuration value for Redshift (#273)
StorageLoader: added support for loading Postgres (#161)
StorageLoader: removed Infobright loading capability (#307)
StorageLoader: added support for loading into multiple storage targets (#311)
Version 0.8.7 (2013-07-07)
--------------------------
JavaScript Tracker: bumped to 0.12.0
JavaScript Tracker: fixed document reference to use documentAlias (#247)
JavaScript Tracker: fixed bug with setCustomUrl (#267)
JavaScript Tracker: changed ev_ to se_ for structured events (#197)
JavaScript Tracker: fixed Firefox failure when "Always ask" set for cookies (#163)
JavaScript Tracker: fixed bug in page ping functionality detected in IE 8 (#260)
JavaScript Tracker: replaced forEach as not supported in IE 6-8 (#295)
EmrEtlRunner: fixed bug in config.yml.sample (#291)
Arduino tracker: added git submodule link (#292)
Version 0.8.6 (2013-06-03)
--------------------------
Hadoop ETL: bumped to 0.3.2
Hadoop ETL: bumped Scalding to 0.8.5
Hadoop ETL: bumped Scala version to 2.10.0
Hadoop ETL: bumped scala-maxmind-geoip to 0.0.5 to work with Scala 2.10.0
Hadoop ETL: bumped SBT from 0.12.1 to 0.12.3
Hadoop ETL: bumped Specs2 to 1.14
Hadoop ETL: replaced Bytes in CanonicalOutput with JBytes (#254)
Hadoop ETL: disabled "corruption" detection in ETL overriding custom URLs with longer collector referer URLs (#268)
EmrEtlRunner: bumped to 0.3.0
EmrEtlRunner: updated config.yml.sample to support spot task instances
EmrEtlRunner: let EmrEtlRunner use spot task instances (#193)
EmrEtlRunner: consolidate small files prior to running ETL job (#207)
Version 0.8.5 (2013-05-24)
--------------------------
Hadoop ETL: bumped to 0.3.1
Hadoop ETL: now supports downloading GeoLiteCity.dat from public S3 URL if needed, thanks @petervanwesep (part of #258)
Hadoop ETL: added Twitter Maven Repo as a resolution repo, thanks @rgabo (#239)
Hadoop ETL: stripping control characters in addition to tabs and newlines (#259)
Hadoop ETL: fixed issue with large values for se_value (#263)
Hadoop ETL: renamed ev_ fields in CanonicalOutput to se_
Hadoop ETL: extractResolution renamed and fails gracefully if view dimensions exceed Integer max size (#264)
EmrEtlRunner: bumped to 0.2.1
EmrEtlRunner: returns public S3 URL to GeoLiteCity.dat file if hosted by Snowplow, thanks @petervanwesep (part of #258)
Redshift: table-def script bumped to 0.2.1
Redshift: migration script added for 0.2.0 to 0.2.1
Redshift: bumped se_value from a float to a double
Redshift: increased size of `_urlport` fields, thanks @petervanwesep (#266)
Infobright: bumped setup_ and verify_infobright.sql to 0.0.9
Infobright: added migration script 0.0.8->0.0.9
Infobright: increased size of `_urlport` fields, thanks @petervanwesep (#266)
Version 0.8.4 (2013-05-16)
--------------------------
Hadoop ETL: bumped to 0.3.0
Hadoop ETL: added geo-ip lookup to Scalding ETL
Hadoop ETL: bumped referer-parser from 0.1.0-M6 to to 0.1.0
Hadoop ETL: removed truncation of page_referrer (#236)
Hadoop ETL: added truncation of referer path/qs/fragment (#235)
Hadoop ETL: removing tabs found in referer search terms (#234)
Hadoop ETL: fixed client timestamp so it's not incorrectly localised - thanks @rgabo (#238)
Hadoop ETL: added parsing of collector version `cv` (#243)
Hadoop ETL: bumped Scalaz from 7.0.0-M9 to 7.0.0
Hadoop ETL: removed .gets from extractPageUri (#249)
EmrEtlRunner: bumped to 0.2.0
EmrEtlRunner: now passes MaxMind .dat file into Scalding ETL (#213)
EmrEtlRunner: improve messages when ETL job starts and fails (#230)
Redshift: table-def script bumped to 0.2.0
Redshift: migration script added for 0.1.0 to 0.2.0
Redshift: added geo-ip fields to Redshift table definition (#226)
Redshift: rename ev_ fields to se_ for structured events (#227)
Version 0.8.3 (2013-05-14)
--------------------------
JavaScript Tracker: bumped to 0.11.2
JavaScript Tracker: added unstructured events, thanks @rgabo, @tarsolya, @lackac (#198)
JavaScript Tracker: remove leading ampersand in querystring (#188)
Clojure Collector: bumped to 0.5.0
Clojure Collector: upgraded to use Tomcat AccessLogValve 0.0.4 (#240)
Clojure Collector: now logging Clojure Collector and Tomcat AccessLogValve versions (#239)
Common: completed splitting custom event type into: unstructured and structured events (#133)
Version 0.8.2 (2013-05-08)
--------------------------
Clojure Collector: bumped to 0.4.0
Clojure Collector: remove duplicate of wrap-request-logging in middleware.clj (#221)
Clojure Collector: check/potentially bump lein-ring dependency in project.clj (#222)
Clojure Collector: simplify building Clojure Collector, thanks @butlermh (#223, #225)
Clojure Collector: fix Tomcat log bug of missing cs(Referer) (#220)
Version 0.8.1 (2013-04-12)
--------------------------
Hadoop ETL: bumped to 0.2.0
Hadoop ETL: break referer_url into constituent parts (part of #175)
Hadoop ETL: remove raw referrer_url (as no space in Redshift table defn) (part of #175)
Hadoop ETL: added referer parsing (#176)
Redshift: table-def script bumped to 0.1.0
Redshift: migration script added for 0.0.1 to 0.1.0
Redshift: add/update referer fields in Redshift table definition (#204)
Redshift: fix bug where mkt_source and mkt_medium are getting swapped around (#215)
Common: replaced embedded architecture images with CloudFront-hosted images
Common: completed rename of 3-etl to 3-enrich (#99)
Common: "SnowPlow" -> "Snowplow" in 1st and 2nd level READMEs
Version 0.8.0 (2013-04-03)
--------------------------
Hadoop ETL: added. Version 0.1.0 (#177)
Hadoop ETL: truncate 6 "high risk" fields for Redshift (raw useragent, page title etc) (#192)
Hadoop ETL: ev_value now extracted as a float (#201)
EmrEtlRunner: bumped to 0.1.0
EmrEtlRunner: updated to work with new config.yml fields (part of #178)
EmrEtlRunner: added support for Hadoop ETL (part of #178)
EmrEtlRunner: added run ID and human-friendly job name (#100)
EmrEtlRunner: added run IDs to output folders (Hadoop ETL only) (#79)
EmrEtlRunner: changed .rvmrc to .ruby-version, thanks @richo (part of #190)
StorageLoader: changed .rvmrc to .ruby-version, thanks @richo (part of #190)
StorageLoader: added final missing /Gemfile to BUNDLE_GEMFILE in Bash script, thanks @frutik (#206)
Common: started rename of 3-etl to 3-enrich (part of #99)
Version 0.7.6 (2013-03-03)
--------------------------
HiveQL: redshift-etl.q added. Version 0.0.1 (#174)
HiveQL: hive-rolling-etl.q renamed to hive-etl.q and bumped to 0.5.7
HiveQL: non-hive-rolling-etl.q renamed to mysql-infobright-etl.q and bumped to 0.0.8 (part of #172)
EmrEtlRunner: bumped to 0.0.9
EmrEtlRunner: renamed :snowplow: variable names and added new Redshift one in config.yml (part of #172)
EmrEtlRunner: updated to support Redshift as a storage format (#173)
EmrEtlRunner: added missing /Gemfile to BUNDLE_GEMFILE in Bash script
StorageLoader: bumped to 0.0.5
StorageLoader: added Redshift-specific fields to config.yml (part of #159)
StorageLoader: added Redshift load support into StorageLoader (part of #159)
StorageLoader: added missing /Gemfile to BUNDLE_GEMFILE in Bash scripts
Redshift: table-def.sql script added. Version 0.0.1 (#158)
Infobright: bumped setup_ and verify_infobright.sql to 0.0.8
Infobright: widened useragent field (#184)
Infobright: added migration script 0.0.7->0.0.8
Serde: fixed and enabled broken tests (#14). Version unchanged
Version 0.7.5 (2013-02-25)
--------------------------
JavaScript Tracker: bumped to 0.11.1
JavaScript Tracker: fixed bug with cookie secure flag killing user ID cookies (#181)
Version 0.7.4 (2013-02-22)
--------------------------
JavaScript Tracker: bumped to 0.11.0
JavaScript Tracker: introduced setAppId() and deprecated setSiteId() (#168)
JavaScript Tracker: 1st party user ID now transmitted as duid (domain uid) (part of #150)
JavaScript Tracker: now sends dtm - the client timestamp (#149)
JavaScript Tracker: deprecated and disabled attachUserId()
JavaScript Tracker: deprecated getVisitorId() and getVisitorInfo() - use getDomainUserId() and getDomainUserInfo() instead
JavaScript Tracker: add setUserId which sets the uid field (#167)
JavaScript Tracker: SnowPlow cookies no longer tied to site ID (#148)
Clojure Collector: bumped to 0.3.0
Clojure Collector: now append nuid (network aka 3rd party) user ID, not uid (#150)
Serde: bumped to 0.5.5
Serde: renamed tstamp field to dtm
Serde: dt and tm split into dvce_x and collector_x (#149)
Serde: extract new nuid and duid fields (#150)
Serde: renamed visit_id to domain_sessionidx (#171)
HiveQL: hive-rolling-etl.q bumped to 0.5.6
HiveQL: non-hive-rolling-etl.q bumped to 0.0.7
HiveQL: dt and tm split into dvce_x and collector_x (#149)
HiveQL: now extracts uid, nuid and duid (#150)
HiveQL: renamed visit_id to domain_sessionidx (#171)
Infobright: bumped setup_infobright.sql to 0.0.7
Infobright: renamed dt and tm to dvce_x and collector_x (#149)
Infobright: now supports uid, nuid and duid (#150)
Infobright: renamed visit_id to domain_sessionidx (#171)
Infobright: added migration script 0.0.6 CloudFront collector -> 0.0.7
Infobright: added migration script 0.0.6 Clojure collector -> 0.0.7
Version 0.7.3 (2013-02-15)
--------------------------
JavaScript Tracker: bumped to 0.10.0
JavaScript Tracker: updated copyright notices
JavaScript Tracker: removed deprecated setAccount(), setTracker(), setHeartBeatTimer() - BREAKING CHANGE (#86)
JavaScript Tracker: added document charset to querystring (#138)
JavaScript Tracker: page ping no longer killed by 1 heartbeat w/o activity (#132)
JavaScript Tracker: added document & viewport dimensions (#94)
JavaScript Tracker: introduced trackStructEvent and deprecated trackEvent (#143)
JavaScript Tracker: cleaned up getRequest code to use improved requestStringBuilder
JavaScript Tracker: fixed logImpression (was using wrong argument names) (#162)
JavaScript Tracker: added scroll offsets to page ping (#127)
Serde: bumped to 0.5.4
Serde: updated copyright notices
Serde: structured events now logged as "struct" not "custom" - DATA CHANGE
Serde: added setting of new event_vendor field (to com.snowplowanalytics) (#144)
Serde: added extraction of doc charset (#138)
Serde: added extraction of document & viewport dimensions (#94)
Serde: added extraction of scroll offsets for enhanced page ping (#127)
Serde: added extraction of URL components (#105)
HiveQL: hive-rolling-etl.q bumped to 0.5.5
HiveQL: non-hive-rolling-etl.q bumped to 0.0.6
HiveQL: updated copyright notices
HiveQL: now supports charset, document & viewport, URL components, event_vendor and enhanced page ping
Infobright: bumped setup_infobright.sql to 0.0.6
Infobright: updated copyright notices
Infobright: added migration scripts (0.0.4->.6; 0.0.5->.6)
Infobright: added charset, document & viewport, URL components, event_vendor enhanced page ping
Version 0.7.2 (2013-01-29)
--------------------------
No-JavaScript Tracker: added. Version 0.1.0
JavaScript Tracker: bumped to 0.9.1
JavaScript Tracker: fixed bug where secure flag not being set on cookies sent via HTTPS
Clojure Collector: bumped to 0.2.0
Clojure Collector: fixed Tomcat config issue of times being recorded in 12-hour clock
Serde: added NoJsTrackerTest
Serde: fixed CljTomcatFormatTest
Version 0.7.1 (2013-01-22)
--------------------------
EmrEtlRunner: bumped to 0.0.8
EmrEtlRunner: updated copyright notices
EmrEtlRunner: added .rvmrc file (part of #121, #84)
EmrEtlRunner: removed .gemspec file
EmrEtlRunner: added dependencies to Gemfile and re-generated Gemfile.lock
StorageLoader: bumped to 0.0.4
StorageLoader: updated copyright notices
StorageLoader: added .rvmrc file (part of #121, #84)
StorageLoader: removed .gemspec file
StorageLoader: added dependencies to Gemfile and re-generated Gemfile.lock
Documentation: updated to use `bundle install` (#122)
Version 0.7.0 (2013-01-04)
--------------------------
Clojure Collector: added. Version 0.1.0
HiveQL: hive-rolling-etl.q bumped to 0.5.4
HiveQL: non-hive-rolling-etl.q bumped to 0.0.5
HiveQL: v_collector now set via Hive variable, not Serde (#118)
EmrEtlRunner: bumped to 0.0.7
EmrEtlRunner: bumped to using Sluice 0.0.6
EmrEtlRunner: added "Complete" message at end of run (part of #97)
EmrEtlRunner: validates "clj-tomcat" as collector format (#119)
EmrEtlRunner: passes collector format through to HiveQL (#119)
EmrEtlRunner: support for log files generated by Clojure Collector on Tomcat (#117)
Serde: added broken CljTomcatFormatTest
StorageLoader: bumped to 0.0.3
StorageLoader: bumped to using Sluice 0.0.6
StorageLoader: added "Complete" message at end of run (part of #97)
StorageLoader: --skip argument now supports a list (#81)
Infobright: bumped setup_infobright.sql to 0.0.5
Infobright: added migration script (0.0.4 -> 0.0.5)
Infobright: user_id field widened to 38 chars to support UUID
Version 0.6.5 (2012-12-26)
--------------------------
JavaScript Tracker: bumped to 0.9.0
JavaScript Tracker: each event now sent with an event type `e` (#63)
JavaScript Tracker: refactoring of event definition code
JavaScript Tracker: added attachUserId(boolean) method (#92)
JavaScript Tracker: removed configCustomData from logImpression (#115)
JavaScript Tracker: cleaned up activity tracking (page pings)
JavaScript Tracker: added a combine only option to snowpak.sh
Serde: bumped to 0.5.3
Serde: now extracts event type (`e`) from querystring (#63)
Serde: now attaches UUID event_id to each event (#89)
Serde: added support for IP address override in querystring (#90)
Serde: no longer dies on corrupted querystring (#114)
HiveQL: hive-rolling-etl.q bumped to 0.5.3
HiveQL: non-hive-rolling-etl.q bumped to 0.0.4
HiveQL: event and event_id now extracted from Serde (#63, #89)
EmrEtlRunner: updated config file template
Version 0.6.4 (2012-12-20)
--------------------------
HiveQL: renamed table-def.q to non-hive-format-table-def.q
HiveQL: added hive-format-table-def.q (#111)
Infobright: bumped setup_infobright.sql to 0.0.4
Infobright: added migration script (0.0.3 -> 0.0.4)
Infobright: now supports long br_langs and urls (#107)
Infobright: removed lookup from fields which slow a large load (#107)
Version 0.6.3 (2012-12-18)
--------------------------
JavaScript Tracker: bumped to 0.8.2
JavaScript Tracker: fixed regressions from splitting JS into multiple files (#103)
HiveQL: hive-rolling-etl.q bumped to 0.5.2
HiveQL: addded missing comma in hive-rolling-etl.q (#112)
Version 0.6.2 (2012-11-29)
--------------------------
JavaScript Tracker: bumped to 0.8.1
JavaScript Tracker: fixed bug with trailing comma (#102)
JavaScript Tracker: removed console.log when not debugging (#101)
JavaScript Tracker: removed minified sp.js from version control (added .gitignore to keep it out)
SnowCannon: bumped submodule to latest shermozle/SnowCannon commit
Version 0.6.1 (2012-11-28)
--------------------------
JavaScript Tracker: bumped to 0.8.0
JavaScript Tracker: rename ice.png to i - BREAKING CHANGE (#29)
JavaScript Tracker: added setCollectorCf() and deprecated setAccount() (#32)
JavaScript Tracker: Tracker constructor now supports Cf or Url (part of #44)
JavaScript Tracker: getTrackerCf() and -Url() added, getTracker() deprecated (part of #44)
JavaScript Tracker: added tracker version (`tv`) to querystring (#41)
JavaScript Tracker: added color depth tracking (part of #69)
JavaScript Tracker: added timezone tracking (part of #69)
JavaScript Tracker: added user fingerprinting (#70)
JavaScript Tracker: broke out .js into multiple files (#55)
EmrEtlRunner: bumped to 0.0.6
EmrEtlRunner: --skip takes multiple args (part of #83, supercedes #80)
EmrEtlRunner: add --process-bucket to process a bucket directly (part of #83)
StorageLoader: bumped to 0.0.2
StorageLoader: changed the data file encloser to NULL (#88)
Serde: bumped to 0.5.2
Serde: now extracts color depth, timezone and fingerprint fields
Serde: added useragent into ETL (#68)
Serde: now extracts platform field
HiveQL: hive-rolling-etl.q bumped to 0.5.1
HiveQL: non-hive-rolling-etl.q bumped to 0.0.3
HiveQL: now extracts color depth, timezone and fingerprint fields
HiveQL: now includes raw useragent as a separate field (#68)
HiveQL: platform field no longer a placeholder
HiveQL: event_name field renamed to event (prep for #89)
HiveQL: added event_id as a placeholder
Infobright: bumped setup_infobright.sql to 0.0.3
Infobright: added migration script (0.0.1/2 -> 0.0.3)
Infobright: now includes color depth, timezone and fingerprint fields
Infobright: now includes raw useragent (#68)
Infobright: event_name field renamed to event
Infobright: added event_id as a placeholder (prep for #89)
Version 0.6.0 (2012-11-12)
--------------------------
EmrEtlRunner: bumped to 0.0.5
EmrEtlRunner: bumped gem dependencies to match StorageLoader (including Sluice 0.0.4)
EmrEtlRunner: renamed snowplow-emr-etl.sh to snowplow-emr-etl-runner.sh
StorageLoader: added. Ruby app to load SnowPlow events into local databases etc
Serde: bumped to 0.5.1
Serde: changed all Booleans to Bytes for non-Hive output
HiveQL: bumped non-hive-rolling-etl.q to 0.0.2
HiveQL: changed non-hive-rolling-etl.q to use the two _bt Byte fields
Infobright: bumped setup_infobright.sql to 0.0.2
Infobright: changed booleans to tinyint(1)s (non-breaking change)
Version 0.5.2 (2012-11-05)
--------------------------
EmrEtlRunner: bump to 0.0.4
EmrEtlRunner: fixed reference to old version of Hive deserializer in config.yml (fixes #71)
EmrEtlRunner: fixed bug using sub-folders with the Processing Bucket (fixes #72)
EmrEtlRunner: can now skip move-files-to-Processing-Bucket or EMR stages (fixes #58)
EmrEtlRunner: S3 filecopy code now moved to Sluice, an external Ruby gem
Version 0.5.1 (2012-10-31)
--------------------------
Data model: stubbed new event_name and platform fields
Infobright: added setup scripts and docs into 4-storage/infobright (fixes #57)
Infobright: added version handling (v_tracker, v_collector, v_etl)
HiveQL: removed hive-exact-etl.q as no longer supported
HiveQL: added non-hive-rolling-etl.q for Infobright- (and other db-)friendly event file format
HiveQL: added version handling (v_tracker, v_collector, v_etl) (fixes #42)
Serde: bumped to 0.5.0
Serde: updated to avoid throwing exceptions on a bad field, fixes #52 (thanks @mtibben!)
Serde: moved some exception handling closer to the throw point, pull req #66 (thanks @mtibben!)
Serde: added continue_on_unexpected_error (thanks @mtibben!)
Serde: tabs are changed to 4 spaces, fixes #61
Serde: browser features are now also available as individual fields, for non-hive-rolling-etl.q to use
Serde: added version handling (v_tracker, v_collector, v_etl)
EmrEtlRunner: bumped to 0.0.3
EmrEtlRunner: moved 3 .rb files in lib/ into lib/snowplow-emr-etl-runner
EmrEtlRunner: added/updated configuration options (:etl: section and hiveql versioning params)
Version 0.5.0 (2012-10-24)
--------------------------
Tidied up folder structure inside 3-etl/
Serde: assembles to /target, not to /upload any more (and jars won't be committed to Git)
EmrEtlRunner: added. Ruby application to run Hive ETL process on Amazon EMR
Version 0.4.10 (2012-10-10)
---------------------------
SnowCannon: bumped submodule to latest shermozle/SnowCannon commit
HiveQL: moved app_id to end of table for backwards compatibility
HiveQL: fixed bug where pointing to serde 0.4.8 NOT new serde 0.4.9
Version 0.4.9 (2012-10-01)
--------------------------
Serde: fixed bug where row not nulled if a critical field un-parseable
Serde: added support for new application ID (#33)
Serde: added deserialization of ecommerce fields, plus tests (#34, #51)
Serde: test suite enhancements (adding Scala helper objects)
Serde: added tests including #13 and #10
snowplow.js: bumped to 0.7.0
snowplow.js: renamed said to aid for application ID
Version 0.4.8 (2012-09-14)
--------------------------
Serde: added support for /i as well as /ice.png (issue #35)
Serde: added support for new (2012-09-12) CloudFront format
Serde: handles Cf bucket with Forward Query String = yes (issue #39)
Serde: made marketing attribution parsing more robust
Version 0.4.7 (2012-09-05)
--------------------------
snowplow.js: bumped to version 0.6
snowplow.js: added setSiteId functionality
snowplow.js: added ecommerce tracking
Version 0.4.6 (2012-08-18)
--------------------------
snowplow.js: added setCollectorUrl functionality
Version 0.4.5 (2012-08-03)
--------------------------
Serde: upgraded httpclient and tweaked URL code (issue #15)
Serde: now extracting our 5 marketing fields (issue #12)
Serde: added support for client-timestamp (issue #18)
Serde: now stripping line breaks (issue #23)
Version 0.4.4 (2012-07-28)
--------------------------
Restructured into 5 sub-systems
Updated README to explain sub-systems
Version 0.4.3 (2012-07-02)
--------------------------
Removed status code checks from Serde
Serde now outputs into /upload folder (to be uploaded by SnowPlow::Etl Ruby gem)
Version 0.4.2 (2012-06-19)
--------------------------
Moved serde into /hive from own repo
Version 0.4.1 (2012-06-16)
--------------------------
Updated serde to 0.4.4
Moved documentation to wiki
Version 0.4.0 (2012-05-30)
--------------------------
Improved names of querystring params
Added page-url to QS as fallback
Added Hive Deserializer as submodule
Documentation updates
Version 0.3.0 (2012-05-18)
--------------------------
Mostly documentation
Version 0.2.0
-------------
Formalised minification process
Version 0.1.0
-------------
Initial release of SnowPlow.js