Add all reference maps (#17)

* Add logo * Add blog link to README * Add reference map for base * Add all reference maps * Use observed values for non-observsed value match for group_data instead of NAs, which might change the dtype. * 0.2.1
pwwang · Jun 23, 2021 · 35156bb · 35156bb
1 parent ba7cf58
commit 35156bb
Show file tree

Hide file tree

Showing 19 changed files with 763 additions and 198 deletions.
diff --git a/.github/workflows/docs.yml b/.github/workflows/docs.yml
@@ -31,6 +31,7 @@ jobs:
           cp ../README.md index.md
           cp ../example.png example.png
           cp ../example2.png example2.png
+          cp ../logo.png logo.png
           cd ..
           mkdocs build
         if : success()

diff --git a/README.md b/README.md
@@ -2,12 +2,15 @@
 
 Port of [dplyr][2] and other related R packages in python, using [pipda][3].
 
-Unlike other similar packages in python that just mimic the piping sign, `datar` follows the API designs from the original packages as much as possible. So that minimal effort is needed for those who are familar with those R packages to transition to python.
-
 <!-- badges -->
 [![Pypi][6]][7] [![Github][8]][9] ![Building][10] [![Docs and API][11]][5] [![Codacy][12]][13] [![Codacy coverage][14]][13]
 
-[Documentation][5] | [Reference Maps][15] | [Notebook Examples][16] | [API][17]
+[Documentation][5] | [Reference Maps][15] | [Notebook Examples][16] | [API][17] | [Blog][18]
+
+<img width="30%" style="margin: 10px 10px 10px 30px" align="right" src="logo.png">
+
+Unlike other similar packages in python that just mimic the piping sign, `datar` follows the API designs from the original packages as much as possible. So that minimal effort is needed for those who are familar with those R packages to transition to python.
+
 
 ## Installtion
 
@@ -69,7 +72,7 @@ df >> mutate(z=if_else(f.x>1, 1, 0)) >> filter(f.z==1)
 
 ```python
 # works with plotnine
-# works with plotnine
+# example grabbed from https://github.com/has2k1/plydata
 import numpy
 from datar.base import sin, pi
 from plotnine import ggplot, aes, geom_line, theme_classic
@@ -115,3 +118,4 @@ iris >> pull(f.Sepal_Length) >> dist_plot()
 [15]: https://pwwang.github.io/datar/reference-maps/ALL/
 [16]: https://pwwang.github.io/datar/notebooks/across/
 [17]: https://pwwang.github.io/datar/api/datar/
+[18]: https://pwwang.github.io/datar-blog
diff --git a/README.rst b/README.rst
@@ -7,8 +7,6 @@ datar
 
 Port of `dplyr <https://dplyr.tidyverse.org/index.html>`_ and other related R packages in python, using `pipda <https://github.com/pwwang/pipda>`_.
 
-Unlike other similar packages in python that just mimic the piping sign, ``datar`` follows the API designs from the original packages as much as possible. So that minimal effort is needed for those who are familar with those R packages to transition to python.
-
 :raw-html-m2r:`<!-- badges -->`
 `
 .. image:: https://img.shields.io/pypi/v/datar?style=flat-square
@@ -36,7 +34,11 @@ Unlike other similar packages in python that just mimic the piping sign, ``datar
    :alt: Codacy coverage
  <https://app.codacy.com/gh/pwwang/datar>`_
 
-`Documentation <https://pwwang.github.io/datar/>`_ | `Reference Maps <https://pwwang.github.io/datar/reference-maps/ALL/>`_ | `Notebook Examples <https://pwwang.github.io/datar/notebooks/across/>`_ | `API <https://pwwang.github.io/datar/api/datar/>`_
+`Documentation <https://pwwang.github.io/datar/>`_ | `Reference Maps <https://pwwang.github.io/datar/reference-maps/ALL/>`_ | `Notebook Examples <https://pwwang.github.io/datar/notebooks/across/>`_ | `API <https://pwwang.github.io/datar/api/datar/>`_ | `Blog <https://pwwang.github.io/datar-blog>`_
+
+:raw-html-m2r:`<img width="30%" style="margin: 10px 10px 10px 30px" align="right" src="logo.png">`
+
+Unlike other similar packages in python that just mimic the piping sign, ``datar`` follows the API designs from the original packages as much as possible. So that minimal effort is needed for those who are familar with those R packages to transition to python.
 
 Installtion
 -----------
@@ -101,7 +103,7 @@ Example usage
 .. code-block:: python
 
    # works with plotnine
-   # works with plotnine
+   # example grabbed from https://github.com/has2k1/plydata
    import numpy
    from datar.base import sin, pi
    from plotnine import ggplot, aes, geom_line, theme_classic

diff --git a/datar/__init__.py b/datar/__init__.py
@@ -4,4 +4,4 @@
 from .core import frame_format_patch as _
 from .core.defaults import f
 
-__version__ = '0.2.0'
+__version__ = '0.2.1'
diff --git a/datar/core/grouped.py b/datar/core/grouped.py
@@ -93,7 +93,7 @@ def _compute_single_var_groups(self):
 
     def _compute_multiple_var_groups(self):
         """Compute groups for multiple vars"""
-        from ..base import NA
+        from ..base import unique
 
         dtypes = {}
         groupings = []
@@ -120,21 +120,35 @@ def _compute_multiple_var_groups(self):
         # pandas not including unused categories for multiple variables
         # even with observed=False
         # #
-        # This is a simplied version to include those unobserved values
-        # Find a better way to implement the dplyr way?
         if not self.attrs['_group_drop']:
-            unobserved = [
-                self[gvar].values.categories.difference(self[gvar])
-                if is_categorical(dtype)
-                else []
-                for gvar, dtype in dtypes.items()
-            ]
-            maxlen = max((len(unobs) for unobs in unobserved))
-            if maxlen > 0:
-                unobserved = [
-                    [NA] if len(unobs) == 0 else unobs
-                    for unobs in unobserved
-                ]
+            # unobserved = [
+            #     self[gvar].values.categories.difference(self[gvar])
+            #     if is_categorical(dtype)
+            #     else []
+            #     for gvar, dtype in dtypes.items()
+            # ]
+            # maxlen = max((len(unobs) for unobs in unobserved))
+            # if maxlen > 0:
+            #     unobserved = [
+            #         ## Simply adding NAs would change dtype
+            #         [NA] if len(unobs) == 0 else unobs
+            #         for unobs in unobserved
+            #     ]
+            #     for row in product(*unobserved):
+            #         groups[row] = []
+            unobserved = []
+            insert_unobs = False
+            for gvar, dtype in dtypes.items():
+                if is_categorical(dtype):
+                    unobs = self[gvar].values.categories.difference(self[gvar])
+                    if len(unobs) > 0:
+                        unobserved.append(unobs)
+                        insert_unobs = True
+                    else:
+                        unobserved.append(unique(self[gvar]))
+                else:
+                    unobserved.append(unique(self[gvar]))
+            if insert_unobs:
                 for row in product(*unobserved):
                     groups[row] = []
 
@@ -285,8 +299,8 @@ def _groups_to_group_data(
             out[gvar] = na_if_safe(out[gvar], dtype=dtype)
         else:
             try:
-                same_dtype = out[gvar].dtype != dtype
-            except TypeError:
+                same_dtype = out[gvar].dtype == dtype
+            except TypeError: # pragma: no cover
                 # Cannot interpret 'CategoricalDtype(categories=[1, 2],
                 # ordered=False)' as a data type
                 same_dtype = False

diff --git a/docs/CHANGELOG.md b/docs/CHANGELOG.md
@@ -1,3 +1,7 @@
+## 0.2.1
+- Use observed values for non-observsed value match for group_data instead of NAs, which might change the dtype.
+- Fix tibble recycling values too early
+
 ## 0.2.0
 Added:
 - Add `base.which`, `base.bessel`, `base.special`, `base.trig_hb` and `base.string` modules