Skip to content

Commit

Permalink
Merge pull request #61 from MSanKeys963/add_meeting_notes
Browse files Browse the repository at this point in the history
Add meeting notes
  • Loading branch information
MSanKeys963 authored Jun 28, 2024
2 parents 3a3a6e2 + ad875ae commit dc6cc91
Show file tree
Hide file tree
Showing 13 changed files with 502 additions and 3 deletions.
2 changes: 1 addition & 1 deletion meetings/2024/2024-01-11.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
layout: default
title: 11th January
description: ZEPs Meeting Notes for 2023-01-11
description: ZEPs Meeting Notes for 2024-01-11
grand_parent: ZEP meetings
parent: 2024 meetings
nav_order: 1
Expand Down
2 changes: 1 addition & 1 deletion meetings/2024/2024-01-25.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
layout: default
title: 25th January
description: ZEPs Meeting Notes for 2023-01-25
description: ZEPs Meeting Notes for 2024-01-25
grand_parent: ZEP meetings
parent: 2024 meetings
nav_order: 2
Expand Down
2 changes: 1 addition & 1 deletion meetings/2024/2024-02-08.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
layout: default
title: 8th February
description: ZEPs Meeting Notes for 2023-02-08
description: ZEPs Meeting Notes for 2024-02-08
grand_parent: ZEP meetings
parent: 2024 meetings
nav_order: 3
Expand Down
70 changes: 70 additions & 0 deletions meetings/2024/2024-02-22.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
---
layout: default
title: 22nd February
description: ZEPs Meeting Notes for 2024-02-22
grand_parent: ZEP meetings
parent: 2024 meetings
nav_order: 4
---

# 2024-02-22

**Attending:** Sanket Verma (SV), Ward Fisher (WF), Josh Moore (JM), Martin Durant (MD), Tom Nicholas (TN)

## TL;DR:

The meeting covered the use of LLMs in training, feedback on the Zarr-Specs website redesign, progress on the V3 refactor, and discussions on integrating kerchunk into Zarr, focusing on chunk manifest standardization and virtualized array concatenation.

**Updates:**

- HTTP Extension meeting: <https://docs.google.com/document/d/14TJfrjbfU1R2REjrZ35GjV74MJ18j_m6geWdj0oB83Y/edit?usp=sharing>

**Meeting Notes:**

- LLMs and how WF is using them in trainings
- Feedback for new design for Zarr-Specs website (combines ZEP and Zarr-Specs together)
- Link: <https://docs-test-sanket.readthedocs.io/en/latest/>
- MD: How's V3 refactor work going on these days?
- JM: Quite good progress taking place these days
- SV: V3 PRs can be found here - <https://github.com/zarr-developers/zarr-python/pulls?q=is%3Apr+is%3Aopen+label%3AV3>
- MD: <https://zarr.dev/zeps/draft/ZEP0003.html>
- TN: Been discussing → <https://github.com/zarr-developers/zarr-specs/issues/287>
- interested in integrating kerchunk into zarr, especially two ZEPs
- (1) chunk manifest (Joe) - standardizing what chunk json files do
- (2) concatenation - <https://github.com/zarr-developers/zarr-specs/issues/288>
- 1. manifest: opinion that it's an incredible idea that is very popular
- fsspec relationship makes things complicated
- move to the zarr spec for other implementations?
- goal is readable in any language
- difficult position
- three things to think about
- read byte ranges
- write JSON
- combine module
- roadmap:
- standardize json for the chunks. manifest file?
- JM: storage in zarr array itself
- JM: log file anytime you read a full file into memory
- Josh: virtual zarr (access pattern)
- 2. concatenation
- multi-zarr-to-zarr leads to a loop
- more sense to think of concat of virtualized arrays objects
- see kerchunk array notebook
- read in byte ranges with kerchunk. array class which only stores byte-offset arrays in memory
- can be done in xarray. concat-classes can be put into xarray and can use higher-order API
- JM: store that xarray as a zarr :smile: (but need additional metadata for realizing the array)
- TN: part of notebook that isn't done. exactly.
- common case in geo. multiple NC files, concat those array.
- possibly compression options change over time.
- prevents it from being one zarr array
- JM: or just always serialize to the chunk manifest
- JM: i.e. where do we stop? (when does Zarr become Turing Complete?)
- TN: thought at concat (clear use case). but jeremy thought indexing (also clear use case)
- JM: starting to sound like transforms (<https://github.com/ome/ngff/pull/138#issuecomment-1948424000>)
- WF: periodically get requests for operations on the data
- no one has come close to making the argument for adding that into the storage
- so many math libraries that would do it better
- TN: no computation since you don't need the values. can do some subset of concat & indexing without values.
- TN: have now become a zarr producer :tada:
- JM: cross-language motivation
- SV: pyramiding ZEP discussions
67 changes: 67 additions & 0 deletions meetings/2024/2024-03-07.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
---
layout: default
title: 7th March
description: ZEPs Meeting Notes for 2024-03-07
grand_parent: ZEP meetings
parent: 2024 meetings
nav_order: 5
---

# 2024-03-07

**Attending:** Sanket Verma (SV), Ward Fisher (WF), Davis Bennett (DB), Josh Moore (JM), Thomas Nicholas (TN), Jeremy Maitin-Shepard (JMS)

## TL;DR:

The meeting discussed enhancing Zarr store browsing, pushing Kerchunk functionality into Zarr, and the potential for a chunk manifest as a ZEP. Additionally, the group explored ideas for revising ZEP0, creating a Zarr specification IETF standard, and shared updates on upcoming Zarr HTTP Extension and SEA conference events.

**Updates:**

- Zarr HTTP Extension Meeting next week
- Check here: <https://zarr.dev/community-calls>
- TN: <https://hackmd.io/t9Myqt0HR7O0nq6wiHWCDA?view>
- Had a conversation with folks over at Development Seed, NASA, Earthmover

**Meeting Minutes:**

- TN: Jed wants to have nice ways to browse the Zarr stores - they have nice ways to browse `.tiff` files already
- Wants to propose an extension to add more information in the metadata
- The end result would look more like a Xarray HTML wrapper
- TN: <https://hackmd.io/t9Myqt0HR7O0nq6wiHWCDA?view>
- Had a conversation with folks over at Development Seed, NASA, Earthmover
- DB: Pushing Kerchunk functionality into Zarr stores
- DB: Whether the feature could be file format agnostic?
- TN: Argues that it should be a ZEP - and can be read every Zarr implementation
- JM: Having same thing implemented in FSSPEC
- DB: Would ZEP
- WF: HDF5 group may be open to a conversation
- SV: <https://zarr.dev/zeps/meetings/2023/2023-08-10.html> might have some useful information
- TN: _recaps the conversation for JMS_
- TN: Should concatenation be a part of the current ZEP?
- DB: Any reason you don't want to concatenate HDF5 and other file formats?
- TN: Chunk manifest would point inside the arrays - chunk manifest could let you create a Zarr store over other formats as well
- DB: This would make Zarr as an API/access pattern
- TN: Can be created and tested fairly separate to Zarr - personally think chunk manifest is neat feature - implementation can support/not support it
- DB: Array mutation can break the concatenation - having guidelines for archival arrays would help
- TN: Currently we're thinking about read-only case
- TN: Virtualisation in Kerchunk is a spotlight feature
- JMS: Manifest is a good idea and keeping it separate would be a minor difference - needs to align with Kerchunk
- JM: report/ZEP idea (time permitting)
- <https://w3id.org/ro/crate/>
- JM: Putting ro-create inside Zarr - or making Zarr specification a IETF standard
- JM: Would probably go ahead and write a convention in NGFF space
- <https://fairdo.org/>
- JM: Have a mechanism for going up/down the hierarchy - useful for the HTTP extension discussions
- Revising ZEP0
- <https://github.com/zarr-developers/zeps/pull/59> - comments/feedback welcome
- DB: :+1:
- DB: Would be easy to have a single PR for my ZEP
- JMS: Putting narrative document in PR description
- JM: Weird for commenting on the PR description and for the public visibility
- JMS: Rationale can be put down as a footnote
- JMS: Having numeric numbering is something Python follows
- JMS: The actual specification change can also serve as a ZEP narrative
- SV: We can pick out certain sections out of the ZEP narrative document
- JMS: Having a PR template similar to ZEP's narrative could also help us
- WF: <https://sea.ucar.edu/conference/2024>
- In-person and virtual registrations are available
41 changes: 41 additions & 0 deletions meetings/2024/2024-03-21.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
---
layout: default
title: 21st March
description: ZEPs Meeting Notes for 2024-03-21
grand_parent: ZEP meetings
parent: 2024 meetings
nav_order: 6
---

# 2024-03-21

**Attending:** Sanket Verma (SV), Thomas Nicholas (TN), Ward Fisher (WF)

## TL;DR:

**Updates:**

- Join ZulipChat: <https://ossci.zulipchat.com/>
- HTTP Extension meeting took place on 3/14
- Trying to figure out the best way forward, i.e. a ZEP or not
- Guaging interest and use cases from others in the community

**Meeting Minutes:**

- HTTP Extension
- WF: Can see the shape of it, and I think it would be useful
- SV: Existing thread: <https://ossci.zulipchat.com/#narrow/stream/423692-Zarr/topic/HTTP.20Extension>
- TN: Tom's company may have a use case for the HTTP work
- Showing [VirtualiZarr](https://github.com/TomNicholas/VirtualiZarr) (related to the "chunk manifest" ZEP)
- TN: Been working on the packages for the last 2 weeks - could potentially replace Kerchunk
- TN: _code walkthrough via screen sharing_
- TN: Storing the virtual Zarr manifests, not the actual array values
- TN: Could move `class ManifestArray` to Zarr-Python - arguments in favour and against it
- TN: Could see donating VirtualiZarr to zarr-developers
- SV: **Action items**
- TN to create a topic for VirtualiZarr to gather feedback/comments
- SV to try VirtualiZarr
- TN and SV to work on ZEP Extension proposal for virtual Zarr manifest and formally present it for broader feedback
- TABLED
- Revising ZEP0
- <https://github.com/zarr-developers/zeps/pull/59>
28 changes: 28 additions & 0 deletions meetings/2024/2024-04-04.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
---
layout: default
title: 4th April
description: ZEPs Meeting Notes for 2024-04-04
grand_parent: ZEP meetings
parent: 2024 meetings
nav_order: 7
---

# 2024-04-04

**Attending:** Sanket Verma (SV), Josh Moore (JM), Ward Fisher (WF)

## TL;DR:

**Updates:**

- CZI EOSS6 Application not funded

**Meeting Minutes:**

- NASA Grant (WF)
- <https://nspires.nasaprs.com/external/solicitations/summary.do?solId=%7b910CC61E-4616-9958-C26F-F8D9BC5AB8D9%7d&path=&method=init>
- Townhall meeting slides: <https://docs.google.com/presentation/d/14g5UPUQFsk4QW3gqwB4gtNSHm8vcdUVAmwzSFjtdVN4/edit?usp=sharing>
- Looking towards sustaining the already established open source software
- NetCDF is looking for collaboration for their application
- JM: Collaborators in US could be NF, OpenCollective, NVIDA, Columbia etc.
- JM: Will reach out to NF for their NASA grants' experience
58 changes: 58 additions & 0 deletions meetings/2024/2024-04-18.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
---
layout: default
title: 18th April
description: ZEPs Meeting Notes for 2024-04-18
grand_parent: ZEP meetings
parent: 2024 meetings
nav_order: 8
---

# 2024-04-18

**Attending:** Josh Moore (JM), Vicent Immler (VI), Sanket Verma (SV), Ward Fisher (WF), Altay Sansal (AS), Jeremy Maitin-Shepard (JMS)

## TL;DR:

The meeting covered the proposal to remove implicit groups in Zarr, progress on ZEP4 and ZEP3, and updates on V3 implementation. Additionally, discussions included async read optimizations for Zarr and the impact on performance, especially concerning large datasets and parallel data ingestion.

**Updates:**

- Davis wants to remove implicit groups: <https://github.com/zarr-developers/zarr-specs/pull/292>
- Activity going-on at ZEP4 Review PR

**Meeting Minutes:**

- Introductions w/ last gift you got
- Sanket - cologne and clothes
- Vincent - wooden board forged with family crescent
- Ward - camping tent
- Josh - pecan nuts
- Altay - lead data scientist - lego
- Removing Implicit groups
- JM: Discussed at community meeting - needs to go back to root node to figure out the group
- JM: Tensorstore doesn't use Zarr groups at all
- WF: Supposition from my side
- WF: Dennis completed the V3 implementation!
- JM: Are we closer to parity in V3 work - a question for Dennis!
- VI: How does implicit groups affect performance?
- JM: No, implicit groups means performance improvement
- VI: Working on a new software implementation for students
- JMS: No experience in working with groups
- JM: Lot of callbacks
- JMS: You'd definitely want to remove the looking upward
- AL: Couldn't see a use-case for parallel creation of groups
- JMS: You're ingesting lot of data in S3 and they read group metadata and have implicit groups
- AL: `.zattrs` would have race condition?
- JMS: Kind of a niche use-case
- AL: Are Multi-processing locks concern metadata?
- JMS: Multiple machine can leverage this!
- AL: Removing would be a good idea!
- AL: ZEP4 and ZEP3 progress
- SV: AL, are you using V2 or V3?
- AL: Using V2 and would love to move to V3 - have 20-30 PB data
- AL: Want to work on `dimension_names` - what would be the best time to do it?
- SV: After V3 release
- VI: _explains GSoC application_
- AS: Hacked Zarr to submit reads in a async manner to the machine to circumvent the problem
- AS: Zarr V3 is going to be fully async so, it helps alleviates the problem
- VI: Would be good to have a way to improve the read speeds for Zarr
52 changes: 52 additions & 0 deletions meetings/2024/2024-05-02.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
---
layout: default
title: 2nd May
description: ZEPs Meeting Notes for 2024-05-02
grand_parent: ZEP meetings
parent: 2024 meetings
nav_order: 9
---

# 2024-05-02

**Attending:** Josh Moore (JM), Sanket Verma (SV), Ward Fisher (WF), Jeremy Maitin-Shepard (JMS), Thomas Nicholas (TN)

## TL;DR:

The meeting discussed the upcoming Zarr V3 release candidate, status and integration of the chunk manifest ZEP, and potential revisions for combining ZEPs. They also covered progress on ZEP0 and plans to move ZEP2 to "Active" status, while tabling the discussion on removing implicit groups.

**Updates:**

**Meeting Minutes:**

- WF: Dennis has a PR coming up to revise the Zarr V3 - 6 months long work - release candidate coming soon!
- TN: Status of unfinished ZEP w.r.t. to chunk manifest?
- E.g. Have a sharded chunk manifest?
- JMS: Chunk manifest could refer to entire shard - use case might not be clear
- JMS: For viz tools you would not load entire shard at once
- TN: Changing chunk manifest ZEP or accomodate chunk manifest in sharding codec?
- JMS: Not really need to change
- JM: Maybe there's a way to re-write the ZEP in a way which the existing ZEPs are composable - basically how extensions would interact with each other
- TN:
- JMS: There might be cases where combination of codecs may not work well
- TN: Combining codecs seems straightforward compared to variable chunking which specifies what is allowed and what not
- JMS: The proposal which changes the data model are tricky - wanted to add non-zero origin
- TN: Interested in variable chunking
- SV: Would you in be interested in contributing to ZEP3?
- JM: We could also start thinking about ZEP2+ZEP3, ZEP3+ZEP4, ZEP4+ZEP2...
- JMS: Any reason for not using Kerchunk?
- TN: Chunk manifest is clearly defined Zarr store compared to kerchunk (which kinda looks like Zarr) - reference file-system are not defined - there's value of getting chunk manifest into Zarr specification as Kerchunk is more than Zarr
- JMS: The actual implementation of the file-system would be same across the various libraries
- TN: Relying on single maintainer code is not an ideal situation
- SV: ZP V3 implementation was outdated which led to creation of Zarrita and then finally re-using Zarrita for ZP V3 refactor
- TN: Working on nit-picking Xarray for Virtuali-Zarr
- TN: Zarr arrays are kind-of lazy arrays - when you index into Z-arrays they provide you with bytes not the actual Zarr arrays - Xarray has lazy-loading hidden inside in codebase and there has been discussion to make it a standalone library
- JMS: We could have two sizes for chunks - stored size and actual size for variable chunking strategy
- Move ZEP2 from `Accepted` to `Active`
- JM: Would be good to move ZEP1 and ZEP2 both at the same time
- SV: ZP V3 refactor would be a good time to move ZEP1 to active
- Finalise ZEP0 revisions
- <https://github.com/zarr-developers/zeps/pull/59>
- Re-start the conversation and finalise it
- **TABLED**
- Removing implicit groups - <https://github.com/zarr-developers/zarr-specs/pull/292>
49 changes: 49 additions & 0 deletions meetings/2024/2024-05-16.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
---
layout: default
title: 16th May
description: ZEPs Meeting Notes for 2024-05-16
grand_parent: ZEP meetings
parent: 2024 meetings
nav_order: 10
---

# 2024-05-16

**Attending:** Dennis Heimbigner (DH), Sanket Verma (SV), Josh Moore (JM), Jeremy Maitin-Shepard (JMS)

## TL;DR:

The meeting discussed the release of Zarr-Python 2.18.0, a new blog post by Joe Hamman, and updates on the sharding support in the R implementation. They also covered the implementation of manifest storage transformers, standardizing URLs for Zarr, and the removal of implicit groups in Zarr-Python V3.

**Updates:**

**Meeting Minutes:**

- Zarr-Python 2.18.0 out now: <https://github.com/zarr-developers/zarr-python/releases/tag/v2.18.0>
- One of the last few releases for Zarr Spec 2 - if there's anything you want to get in, please reply/tag us in the PRs/issues
- New blog post by Joe Hamman: <https://zarr.dev/blog/zarr-python-v3-update/>
- Zarr-Python developers meeting new schedule - check here: <https://zarr.dev/community-calls/>
- Lachlan Deakin added support for sharding in his R implementation: <https://github.com/LDeakin/zarrs/releases/tag/v0.13.1>

**Open agenda (add here 👇🏻):**

- JM: In sharding you can recurse and browse through the chunks - somthing like _chunks([x, y])_
- DH: Treating sub-chunks as regular chunks - like what we decided during the storage transformers proposal -
- _DH understands this proposal better and favours it_
- The relevant issue: <https://github.com/zarr-developers/zarr-specs/issues/220>
- DH: Time to gets hands dirty with the implementation and figure out any problems we have
- JMS: Using storage transformers and codecs in Neuroglancer to achieve sharding
- DH:
- SV: Manifest storage transformers - <https://github.com/zarr-developers/zarr-specs/issues/287> - defines and implements on top of the storage transformer in V3 core spec - discussion 👇🏻
- JMS: Good to define the `JSON` and add other formats later on
- DH: FSSPEC interprets the URL in Kerchunk
- DH: Having complete key values in URL would help in the long run - DAP made a mistake earlier and we fixed it - having a complete URL is a better option and you can replace the contents within it later on
- JM: Having a complete URL in manifest storage transformer for Zarr would help us but there's a question of backward compatibility
- JMS: <https://github.com/zarr-developers/zeps/pull/48/> standardise the URL
- DH: URL spec defines the format and correct way of defninig a URL - if you consider things other than FSSPEC you should have a more standardised URL
- DH: Conforming to the [URL Spec](https://www.w3.org/Addressing/URL/url-spec.txt) should be avoided actively
- JM: Having URL defined in the storage transformed would help - currently not defined
- Fix typo - <https://github.com/zarr-developers/zarr-specs/pull/294> - **MERGED**
- Updated Zarr-Specs license to CC-BY-4.0 - <https://github.com/zarr-developers/zarr-specs/pull/295>
- Implicit groups removed in Zarr-Python V3 via <https://github.com/zarr-developers/zarr-python/pull/1827>
- Corresponding PR in Zarr-Specs - <https://github.com/zarr-developers/zarr-specs/pull/292>
Loading

0 comments on commit dc6cc91

Please sign in to comment.