Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update pandas to 0.20.2 #209

Closed
wants to merge 1 commit into from

Conversation

pyup-bot
Copy link
Contributor

@pyup-bot pyup-bot commented Jun 5, 2017

There's a new version of pandas available.
You are currently using 0.19.0. I have updated it to 0.20.2

These links might come in handy: PyPI | Changelog | Homepage

Changelog

0.20.2


This is a minor bug-fix release in the 0.20.x series and includes some small regression fixes,
bug fixes and performance improvements.
We recommend that all users upgrade to this version.

.. contents:: What's new in v0.20.2
:local:
:backlinks: none

.. _whatsnew_0202.enhancements:

Enhancements

  • Unblocked access to additional compression types supported in pytables: 'blosc:blosclz, 'blosc:lz4', 'blosc:lz4hc', 'blosc:snappy', 'blosc:zlib', 'blosc:zstd' (:issue:14478)
  • Series provides a to_latex method (:issue:16180)
  • A new groupby method :meth:~pandas.core.groupby.GroupBy.ngroup,
    parallel to the existing :meth:~pandas.core.groupby.GroupBy.cumcount,
    has been added to return the group order (:issue:11642); see
    :ref:here <groupby.ngroup>.

.. _whatsnew_0202.performance:

Performance Improvements

  • Performance regression fix when indexing with a list-like (:issue:16285)
  • Performance regression fix for MultiIndexes (:issue:16319, :issue:16346)
  • Improved performance of .clip() with scalar arguments (:issue:15400)
  • Improved performance of groupby with categorical groupers (:issue:16413)
  • Improved performance of MultiIndex.remove_unused_levels() (:issue:16556)

.. _whatsnew_0202.bug_fixes:

Bug Fixes

  • Silenced a warning on some Windows environments about "tput: terminal attributes: No such device or address" when
    detecting the terminal size. This fix only applies to python 3 (:issue:16496)
  • Bug in using pathlib.Path or py.path.local objects with io functions (:issue:16291)
  • Bug in Index.symmetric_difference() on two equal MultiIndex's, results in a TypeError (:issue 13490)
  • Bug in DataFrame.update() with overwrite=False and NaN values (:issue:15593)
  • Passing an invalid engine to :func:read_csv now raises an informative
    ValueError rather than UnboundLocalError. (:issue:16511)
  • Bug in :func:unique on an array of tuples (:issue:16519)
  • Bug in :func:cut when labels are set, resulting in incorrect label ordering (:issue:16459)
  • Fixed a compatibility issue with IPython 6.0's tab completion showing deprecation warnings on Categoricals (:issue:16409)

Conversion
^^^^^^^^^^

  • Bug in :func:to_numeric in which empty data inputs were causing a segfault of the interpreter (:issue:16302)
  • Silence numpy warnings when broadcasting DataFrame to Series with comparison ops (:issue:16378, :issue:16306)

Indexing
^^^^^^^^

  • Bug in DataFrame.reset_index(level=) with single level index (:issue:16263)
  • Bug in partial string indexing with a monotonic, but not strictly-monotonic, index incorrectly reversing the slice bounds (:issue:16515)
  • Bug in MultiIndex.remove_unused_levels() that would not return a MultiIndex equal to the original. (:issue:16556)

I/O
^^^

  • Bug in :func:read_csv when comment is passed in a space delimited text file (:issue:16472)
  • Bug in :func:read_csv not raising an exception with nonexistent columns in usecols when it had the correct length (:issue:14671)
  • Bug that would force importing of the clipboard routines unnecessarily, potentially causing an import error on startup (:issue:16288)
  • Bug that raised IndexError when HTML-rendering an empty DataFrame (:issue:15953)
  • Bug in :func:read_csv in which tarfile object inputs were raising an error in Python 2.x for the C engine (:issue:16530)
  • Bug where DataFrame.to_html() ignored the index_names parameter (:issue:16493)
  • Bug where pd.read_hdf() returns numpy strings for index names (:issue:13492)
  • Bug in HDFStore.select_as_multiple() where start/stop arguments were not respected (:issue:16209)

Plotting
^^^^^^^^

  • Bug in DataFrame.plot with a single column and a list-like color (:issue:3486)
  • Bug in plot where NaT in DatetimeIndex results in Timestamp.min (:issue: 12405)
  • Bug in DataFrame.boxplot where figsize keyword was not respected for non-grouped boxplots (:issue:11959)

Groupby/Resample/Rolling
^^^^^^^^^^^^^^^^^^^^^^^^

  • Bug in creating a time-based rolling window on an empty DataFrame (:issue:15819)
  • Bug in rolling.cov() with offset window (:issue:16058)
  • Bug in .resample() and .groupby() when aggregating on integers (:issue:16361)

Sparse
^^^^^^

  • Bug in construction of SparseDataFrame from scipy.sparse.dok_matrix (:issue:16179)

Reshaping
^^^^^^^^^

  • Bug in DataFrame.stack with unsorted levels in MultiIndex columns (:issue:16323)
  • Bug in pd.wide_to_long() where no error was raised when i was not a unique identifier (:issue:16382)
  • Bug in Series.isin(..) with a list of tuples (:issue:16394)
  • Bug in construction of a DataFrame with mixed dtypes including an all-NaT column. (:issue:16395)
  • Bug in DataFrame.agg() and Series.agg() with aggregating on non-callable attributes (:issue:16405)

Numeric
^^^^^^^

  • Bug in .interpolate(), where limit_direction was not respected when limit=None (default) was passed (:issue:16282)

Categorical
^^^^^^^^^^^

  • Fixed comparison operations considering the order of the categories when both categoricals are unordered (:issue:16014)

Other
^^^^^

  • Bug in DataFrame.drop() with an empty-list with non-unique indices (:issue:16270)

.. _whatsnew_0131:

0.20.1


This is a major release from 0.19.2 and includes a number of API changes, deprecations, new features,
enhancements, and performance improvements along with a large number of bug fixes. We recommend that all
users upgrade to this version.

Highlights include:

  • New .agg() API for Series/DataFrame similar to the groupby-rolling-resample API's, see :ref:here <whatsnew_0200.enhancements.agg>
  • Integration with the feather-format, including a new top-level pd.read_feather() and DataFrame.to_feather() method, see :ref:here <io.feather>.
  • The .ix indexer has been deprecated, see :ref:here <whatsnew_0200.api_breaking.deprecate_ix>
  • Panel has been deprecated, see :ref:here <whatsnew_0200.api_breaking.deprecate_panel>
  • Addition of an IntervalIndex and Interval scalar type, see :ref:here <whatsnew_0200.enhancements.intervalindex>
  • Improved user API when grouping by index levels in .groupby(), see :ref:here <whatsnew_0200.enhancements.groupby_access>
  • Improved support for UInt64 dtypes, see :ref:here <whatsnew_0200.enhancements.uint64_support>
  • A new orient for JSON serialization, orient='table', that uses the Table Schema spec and that gives the possibility for a more interactive repr in the Jupyter Notebook, see :ref:here <whatsnew_0200.enhancements.table_schema>
  • Experimental support for exporting styled DataFrames (DataFrame.style) to Excel, see :ref:here <whatsnew_0200.enhancements.style_excel>
  • Window binary corr/cov operations now return a MultiIndexed DataFrame rather than a Panel, as Panel is now deprecated, see :ref:here <whatsnew_0200.api_breaking.rolling_pairwise>
  • Support for S3 handling now uses s3fs, see :ref:here <whatsnew_0200.api_breaking.s3>
  • Google BigQuery support now uses the pandas-gbq library, see :ref:here <whatsnew_0200.api_breaking.gbq>

.. warning::

Pandas has changed the internal structure and layout of the codebase.
This can affect imports that are not from the top-level pandas.* namespace, please see the changes :ref:here <whatsnew_0200.privacy>.

Check the :ref:API Changes <whatsnew_0200.api_breaking> and :ref:deprecations <whatsnew_0200.deprecations> before updating.

.. note::

This is a combined release for 0.20.0 and and 0.20.1.
Version 0.20.1 contains one additional change for backwards-compatibility with downstream projects using pandas' utils routines. (:issue:16250)

.. contents:: What's new in v0.20.0
:local:
:backlinks: none

.. _whatsnew_0200.enhancements:

New features

.. _whatsnew_0200.enhancements.agg:

agg API for DataFrame/Series
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Series & DataFrame have been enhanced to support the aggregation API. This is a familiar API
from groupby, window operations, and resampling. This allows aggregation operations in a concise way
by using :meth:~DataFrame.agg and :meth:~DataFrame.transform. The full documentation
is :ref:here <basics.aggregate> (:issue:1623).

Here is a sample

.. ipython:: python

df = pd.DataFrame(np.random.randn(10, 3), columns=['A', 'B', 'C'],
index=pd.date_range('1/1/2000', periods=10))
df.iloc[3:7] = np.nan
df

One can operate using string function names, callables, lists, or dictionaries of these.

Using a single function is equivalent to .apply.

.. ipython:: python

df.agg('sum')

Multiple aggregations with a list of functions.

.. ipython:: python

df.agg(['sum', 'min'])

Using a dict provides the ability to apply specific aggregations per column.
You will get a matrix-like output of all of the aggregators. The output has one column
per unique function. Those functions applied to a particular column will be NaN:

.. ipython:: python

df.agg({'A' : ['sum', 'min'], 'B' : ['min', 'max']})

The API also supports a .transform() function for broadcasting results.

.. ipython:: python
:okwarning:

df.transform(['abs', lambda x: x - x.min()])

When presented with mixed dtypes that cannot be aggregated, .agg() will only take the valid
aggregations. This is similiar to how groupby .agg() works. (:issue:15015)

.. ipython:: python

df = pd.DataFrame({'A': [1, 2, 3],
'B': [1., 2., 3.],
'C': ['foo', 'bar', 'baz'],
'D': pd.date_range('20130101', periods=3)})
df.dtypes

.. ipython:: python

df.agg(['min', 'sum'])

.. _whatsnew_0200.enhancements.dataio_dtype:

dtype keyword for data IO
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The 'python' engine for :func:read_csv, as well as the :func:read_fwf function for parsing
fixed-width text files and :func:read_excel for parsing Excel files, now accept the dtype keyword argument for specifying the types of specific columns (:issue:14295). See the :ref:io docs <io.dtypes> for more information.

.. ipython:: python
:suppress:

from pandas.compat import StringIO

.. ipython:: python

data = "a b\n1 2\n3 4"
pd.read_fwf(StringIO(data)).dtypes
pd.read_fwf(StringIO(data), dtype={'a':'float64', 'b':'object'}).dtypes

.. _whatsnew_0120.enhancements.datetime_origin:

.to_datetime() has gained an origin parameter
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

:func:to_datetime has gained a new parameter, origin, to define a reference date
from where to compute the resulting timestamps when parsing numerical values with a specific unit specified. (:issue:11276, :issue:11745)

For example, with 1960-01-01 as the starting date:

.. ipython:: python

pd.to_datetime([1, 2, 3], unit='D', origin=pd.Timestamp('1960-01-01'))

The default is set at origin='unix', which defaults to 1970-01-01 00:00:00, which is
commonly called 'unix epoch' or POSIX time. This was the previous default, so this is a backward compatible change.

.. ipython:: python

pd.to_datetime([1, 2, 3], unit='D')

.. _whatsnew_0200.enhancements.groupby_access:

Groupby Enhancements
^^^^^^^^^^^^^^^^^^^^

Strings passed to DataFrame.groupby() as the by parameter may now reference either column names or index level names. Previously, only column names could be referenced. This allows to easily group by a column and index level at the same time. (:issue:5677)

.. ipython:: python

arrays = [['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'],
['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']]

index = pd.MultiIndex.from_arrays(arrays, names=['first', 'second'])

df = pd.DataFrame({'A': [1, 1, 1, 1, 2, 2, 3, 3],
'B': np.arange(8)},
index=index)
df

df.groupby(['second', 'A']).sum()

.. _whatsnew_0200.enhancements.compressed_urls:

Better support for compressed URLs in read_csv
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The compression code was refactored (:issue:12688). As a result, reading
dataframes from URLs in :func:read_csv or :func:read_table now supports
additional compression methods: xz, bz2, and zip (:issue:14570).
Previously, only gzip compression was supported. By default, compression of
URLs and paths are now inferred using their file extensions. Additionally,
support for bz2 compression in the python 2 C-engine improved (:issue:14874).

.. ipython:: python

url = 'https://github.com/{repo}/raw/{branch}/{path}'.format(
repo = 'pandas-dev/pandas',
branch = 'master',
path = 'pandas/tests/io/parser/data/salaries.csv.bz2',
)
df = pd.read_table(url, compression='infer') default, infer compression
df = pd.read_table(url, compression='bz2') explicitly specify compression
df.head(2)

.. _whatsnew_0200.enhancements.pickle_compression:

Pickle file I/O now supports compression
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

:func:read_pickle, :meth:DataFrame.to_pickle and :meth:Series.to_pickle
can now read from and write to compressed pickle files. Compression methods
can be an explicit parameter or be inferred from the file extension.
See :ref:the docs here. <io.pickle.compression>

.. ipython:: python

df = pd.DataFrame({
'A': np.random.randn(1000),
'B': 'foo',
'C': pd.date_range('20130101', periods=1000, freq='s')})

Using an explicit compression type

.. ipython:: python

df.to_pickle("data.pkl.compress", compression="gzip")
rt = pd.read_pickle("data.pkl.compress", compression="gzip")
rt.head()

The default is to infer the compression type from the extension (compression='infer'):

.. ipython:: python

df.to_pickle("data.pkl.gz")
rt = pd.read_pickle("data.pkl.gz")
rt.head()
df["A"].to_pickle("s1.pkl.bz2")
rt = pd.read_pickle("s1.pkl.bz2")
rt.head()

.. ipython:: python
:suppress:

import os
os.remove("data.pkl.compress")
os.remove("data.pkl.gz")
os.remove("s1.pkl.bz2")

.. _whatsnew_0200.enhancements.uint64_support:

UInt64 Support Improved
^^^^^^^^^^^^^^^^^^^^^^^

Pandas has significantly improved support for operations involving unsigned,
or purely non-negative, integers. Previously, handling these integers would
result in improper rounding or data-type casting, leading to incorrect results.
Notably, a new numerical index, UInt64Index, has been created (:issue:14937)

.. ipython:: python

idx = pd.UInt64Index([1, 2, 3])
df = pd.DataFrame({'A': ['a', 'b', 'c']}, index=idx)
df.index

  • Bug in converting object elements of array-like objects to unsigned 64-bit integers (:issue:4471, :issue:14982)
  • Bug in Series.unique() in which unsigned 64-bit integers were causing overflow (:issue:14721)
  • Bug in DataFrame construction in which unsigned 64-bit integer elements were being converted to objects (:issue:14881)
  • Bug in pd.read_csv() in which unsigned 64-bit integer elements were being improperly converted to the wrong data types (:issue:14983)
  • Bug in pd.unique() in which unsigned 64-bit integers were causing overflow (:issue:14915)
  • Bug in pd.value_counts() in which unsigned 64-bit integers were being erroneously truncated in the output (:issue:14934)

.. _whatsnew_0200.enhancements.groupy_categorical:

GroupBy on Categoricals
^^^^^^^^^^^^^^^^^^^^^^^

In previous versions, .groupby(..., sort=False) would fail with a ValueError when grouping on a categorical series with some categories not appearing in the data. (:issue:13179)

.. ipython:: python

chromosomes = np.r_[np.arange(1, 23).astype(str), ['X', 'Y']]
df = pd.DataFrame({
'A': np.random.randint(100),
'B': np.random.randint(100),
'C': np.random.randint(100),
'chromosomes': pd.Categorical(np.random.choice(chromosomes, 100),
categories=chromosomes,
ordered=True)})
df

Previous Behavior:

.. code-block:: ipython

In [3]: df[df.chromosomes != '1'].groupby('chromosomes', sort=False).sum()

ValueError: items in new_categories are not the same as in old categories

New Behavior:

.. ipython:: python

df[df.chromosomes != '1'].groupby('chromosomes', sort=False).sum()

.. _whatsnew_0200.enhancements.table_schema:

Table Schema Output
^^^^^^^^^^^^^^^^^^^

The new orient 'table' for :meth:DataFrame.to_json
will generate a Table Schema_ compatible string representation of
the data.

.. ipython:: python

df = pd.DataFrame(
{'A': [1, 2, 3],
'B': ['a', 'b', 'c'],
'C': pd.date_range('2016-01-01', freq='d', periods=3),
}, index=pd.Index(range(3), name='idx'))
df
df.to_json(orient='table')

See :ref:IO: Table Schema for more information <io.table_schema>.

Additionally, the repr for DataFrame and Series can now publish
this JSON Table schema representation of the Series or DataFrame if you are
using IPython (or another frontend like nteract_ using the Jupyter messaging
protocol).
This gives frontends like the Jupyter notebook and nteract_
more flexiblity in how they display pandas objects, since they have
more information about the data.
You must enable this by setting the display.html.table_schema option to True.

.. _Table Schema: http://specs.frictionlessdata.io/json-table-schema/
.. _nteract: http://nteract.io/

.. _whatsnew_0200.enhancements.scipy_sparse:

SciPy sparse matrix from/to SparseDataFrame
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Pandas now supports creating sparse dataframes directly from scipy.sparse.spmatrix instances.
See the :ref:documentation <sparse.scipysparse> for more information. (:issue:4343)

All sparse formats are supported, but matrices that are not in :mod:COOrdinate <scipy.sparse> format will be converted, copying data as needed.

.. ipython:: python

from scipy.sparse import csr_matrix
arr = np.random.random(size=(1000, 5))
arr[arr < .9] = 0
sp_arr = csr_matrix(arr)
sp_arr
sdf = pd.SparseDataFrame(sp_arr)
sdf

To convert a SparseDataFrame back to sparse SciPy matrix in COO format, you can use:

.. ipython:: python

sdf.to_coo()

.. _whatsnew_0200.enhancements.style_excel:

Excel output for styled DataFrames
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Experimental support has been added to export DataFrame.style formats to Excel using the openpyxl engine. (:issue:15530)

For example, after running the following, styled.xlsx renders as below:

.. ipython:: python
:okwarning:

np.random.seed(24)
df = pd.DataFrame({'A': np.linspace(1, 10, 10)})
df = pd.concat([df, pd.DataFrame(np.random.RandomState(24).randn(10, 4),
columns=list('BCDE'))],
axis=1)
df.iloc[0, 2] = np.nan
df
styled = df.style.
applymap(lambda val: 'color: %s' % 'red' if val < 0 else 'black').
highlight_max()
styled.to_excel('styled.xlsx', engine='openpyxl')

.. image:: _static/style-excel.png

.. ipython:: python
:suppress:

import os
os.remove('styled.xlsx')

See the :ref:Style documentation &lt;style.ipynbExport-to-Excel&gt; for more detail.

.. _whatsnew_0200.enhancements.intervalindex:

IntervalIndex
^^^^^^^^^^^^^

pandas has gained an IntervalIndex with its own dtype, interval as well as the Interval scalar type. These allow first-class support for interval
notation, specifically as a return type for the categories in :func:cut and :func:qcut. The IntervalIndex allows some unique indexing, see the
:ref:docs &lt;indexing.intervallindex&gt;. (:issue:7640, :issue:8625)

.. warning::

These indexing behaviors of the IntervalIndex are provisional and may change in a future version of pandas. Feedback on usage is welcome.

Previous behavior:

The returned categories were strings, representing Intervals

.. code-block:: ipython

In [1]: c = pd.cut(range(4), bins=2)

In [2]: c
Out[2]:
[(-0.003, 1.5], (-0.003, 1.5], (1.5, 3], (1.5, 3]]
Categories (2, object): [(-0.003, 1.5] < (1.5, 3]]

In [3]: c.categories
Out[3]: Index(['(-0.003, 1.5]', '(1.5, 3]'], dtype='object')

New behavior:

.. ipython:: python

c = pd.cut(range(4), bins=2)
c
c.categories

Furthermore, this allows one to bin other data with these same bins, with NaN representing a missing
value similar to other dtypes.

.. ipython:: python

pd.cut([0, 3, 5, 1], bins=c.categories)

An IntervalIndex can also be used in Series and DataFrame as the index.

.. ipython:: python

df = pd.DataFrame({'A': range(4),
'B': pd.cut([0, 3, 1, 1], bins=c.categories)}
).set_index('B')
df

Selecting via a specific interval:

.. ipython:: python

df.loc[pd.Interval(1.5, 3.0)]

Selecting via a scalar value that is contained in the intervals.

.. ipython:: python

df.loc[0]

.. _whatsnew_0200.enhancements.other:

Other Enhancements
^^^^^^^^^^^^^^^^^^

  • DataFrame.rolling() now accepts the parameter closed=&#39;right&#39;|&#39;left&#39;|&#39;both&#39;|&#39;neither&#39; to choose the rolling window-endpoint closedness. See the :ref:documentation &lt;stats.rolling_window.endpoints&gt; (:issue:13965)
  • Integration with the feather-format, including a new top-level pd.read_feather() and DataFrame.to_feather() method, see :ref:here &lt;io.feather&gt;.
  • Series.str.replace() now accepts a callable, as replacement, which is passed to re.sub (:issue:15055)
  • Series.str.replace() now accepts a compiled regular expression as a pattern (:issue:15446)
  • Series.sort_index accepts parameters kind and na_position (:issue:13589, :issue:14444)
  • DataFrame and DataFrame.groupby() have gained a nunique() method to count the distinct values over an axis (:issue:14336, :issue:15197).
  • DataFrame has gained a melt() method, equivalent to pd.melt(), for unpivoting from a wide to long format (:issue:12640).
  • pd.read_excel() now preserves sheet order when using sheetname=None (:issue:9930)
  • Multiple offset aliases with decimal points are now supported (e.g. 0.5min is parsed as 30s) (:issue:8419)
  • .isnull() and .notnull() have been added to Index object to make them more consistent with the Series API (:issue:15300)
  • New UnsortedIndexError (subclass of KeyError) raised when indexing/slicing into an
    unsorted MultiIndex (:issue:11897). This allows differentiation between errors due to lack
    of sorting or an incorrect key. See :ref:here &lt;advanced.unsorted&gt;
  • MultiIndex has gained a .to_frame() method to convert to a DataFrame (:issue:12397)
  • pd.cut and pd.qcut now support datetime64 and timedelta64 dtypes (:issue:14714, :issue:14798)
  • pd.qcut has gained the duplicates=&#39;raise&#39;|&#39;drop&#39; option to control whether to raise on duplicated edges (:issue:7751)
  • Series provides a to_excel method to output Excel files (:issue:8825)
  • The usecols argument in pd.read_csv() now accepts a callable function as a value (:issue:14154)
  • The skiprows argument in pd.read_csv() now accepts a callable function as a value (:issue:10882)
  • The nrows and chunksize arguments in pd.read_csv() are supported if both are passed (:issue:6774, :issue:15755)
  • DataFrame.plot now prints a title above each subplot if suplots=True and title is a list of strings (:issue:14753)
  • DataFrame.plot can pass the matplotlib 2.0 default color cycle as a single string as color parameter, see here &lt;http://matplotlib.org/2.0.0/users/colors.htmlcn-color-selection&gt;__. (:issue:15516)
  • Series.interpolate() now supports timedelta as an index type with method=&#39;time&#39; (:issue:6424)
  • Addition of a level keyword to DataFrame/Series.rename to rename
    labels in the specified level of a MultiIndex (:issue:4160).
  • DataFrame.reset_index() will now interpret a tuple index.name as a key spanning across levels of columns, if this is a MultiIndex (:issue:16164)
  • Timedelta.isoformat method added for formatting Timedeltas as an ISO 8601 duration_. See the :ref:Timedelta docs &lt;timedeltas.isoformat&gt; (:issue:15136)
  • .select_dtypes() now allows the string datetimetz to generically select datetimes with tz (:issue:14910)
  • The .to_latex() method will now accept multicolumn and multirow arguments to use the accompanying LaTeX enhancements
  • pd.merge_asof() gained the option direction=&#39;backward&#39;|&#39;forward&#39;|&#39;nearest&#39; (:issue:14887)
  • Series/DataFrame.asfreq() have gained a fill_value parameter, to fill missing values (:issue:3715).
  • Series/DataFrame.resample.asfreq have gained a fill_value parameter, to fill missing values during resampling (:issue:3715).
  • :func:pandas.util.hash_pandas_object has gained the ability to hash a MultiIndex (:issue:15224)
  • Series/DataFrame.squeeze() have gained the axis parameter. (:issue:15339)
  • DataFrame.to_excel() has a new freeze_panes parameter to turn on Freeze Panes when exporting to Excel (:issue:15160)
  • pd.read_html() will parse multiple header rows, creating a MutliIndex header. (:issue:13434).
  • HTML table output skips colspan or rowspan attribute if equal to 1. (:issue:15403)
  • :class:pandas.io.formats.style.Styler template now has blocks for easier extension, :ref:see the example notebook &lt;style.ipynbSubclassing&gt; (:issue:15649)
  • :meth:Styler.render() &lt;pandas.io.formats.style.Styler.render&gt; now accepts **kwargs to allow user-defined variables in the template (:issue:15649)
  • Compatibility with Jupyter notebook 5.0; MultiIndex column labels are left-aligned and MultiIndex row-labels are top-aligned (:issue:15379)
  • TimedeltaIndex now has a custom date-tick formatter specifically designed for nanosecond level precision (:issue:8711)
  • pd.api.types.union_categoricals gained the ignore_ordered argument to allow ignoring the ordered attribute of unioned categoricals (:issue:13410). See the :ref:categorical union docs &lt;categorical.union&gt; for more information.
  • DataFrame.to_latex() and DataFrame.to_string() now allow optional header aliases. (:issue:15536)
  • Re-enable the parse_dates keyword of pd.read_excel() to parse string columns as dates (:issue:14326)
  • Added .empty property to subclasses of Index. (:issue:15270)
  • Enabled floor division for Timedelta and TimedeltaIndex (:issue:15828)
  • pandas.io.json.json_normalize() gained the option errors=&#39;ignore&#39;|&#39;raise&#39;; the default is errors=&#39;raise&#39; which is backward compatible. (:issue:14583)
  • pandas.io.json.json_normalize() with an empty list will return an empty DataFrame (:issue:15534)
  • pandas.io.json.json_normalize() has gained a sep option that accepts str to separate joined fields; the default is ".", which is backward compatible. (:issue:14883)
  • :meth:MultiIndex.remove_unused_levels has been added to facilitate :ref:removing unused levels &lt;advanced.shown_levels&gt;. (:issue:15694)
  • pd.read_csv() will now raise a ParserError error whenever any parsing error occurs (:issue:15913, :issue:15925)
  • pd.read_csv() now supports the error_bad_lines and warn_bad_lines arguments for the Python parser (:issue:15925)
  • The display.show_dimensions option can now also be used to specify
    whether the length of a Series should be shown in its repr (:issue:7117).
  • parallel_coordinates() has gained a sort_labels keyword argument that sorts class labels and the colors assigned to them (:issue:15908)
  • Options added to allow one to turn on/off using bottleneck and numexpr, see :ref:here &lt;basics.accelerate&gt; (:issue:16157)
  • DataFrame.style.bar() now accepts two more options to further customize the bar chart. Bar alignment is set with align=&#39;left&#39;|&#39;mid&#39;|&#39;zero&#39;, the default is "left", which is backward compatible; You can now pass a list of color=[color_negative, color_positive]. (:issue:14757)

.. _ISO 8601 duration: https://en.wikipedia.org/wiki/ISO_8601Durations

.. _whatsnew_0200.api_breaking:

Backwards incompatible API changes

.. _whatsnew.api_breaking.io_compat:

Possible incompatibility for HDF5 formats created with pandas < 0.13.0
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

pd.TimeSeries was deprecated officially in 0.17.0, though has already been an alias since 0.13.0. It has
been dropped in favor of pd.Series. (:issue:15098).

This may cause HDF5 files that were created in prior versions to become unreadable if pd.TimeSeries
was used. This is most likely to be for pandas < 0.13.0. If you find yourself in this situation.
You can use a recent prior version of pandas to read in your HDF5 files,
then write them out again after applying the procedure below.

.. code-block:: ipython

In [2]: s = pd.TimeSeries([1,2,3], index=pd.date_range('20130101', periods=3))

In [3]: s
Out[3]:
2013-01-01 1
2013-01-02 2
2013-01-03 3
Freq: D, dtype: int64

In [4]: type(s)
Out[4]: pandas.core.series.TimeSeries

In [5]: s = pd.Series(s)

In [6]: s
Out[6]:
2013-01-01 1
2013-01-02 2
2013-01-03 3
Freq: D, dtype: int64

In [7]: type(s)
Out[7]: pandas.core.series.Series

.. _whatsnew_0200.api_breaking.index_map:

Map on Index types now return other Index types
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

map on an Index now returns an Index, not a numpy array (:issue:12766)

.. ipython:: python

idx = Index([1, 2])
idx
mi = MultiIndex.from_tuples([(1, 2), (2, 4)])
mi

Previous Behavior:

.. code-block:: ipython

In [5]: idx.map(lambda x: x * 2)
Out[5]: array([2, 4])

In [6]: idx.map(lambda x: (x, x * 2))
Out[6]: array([(1, 2), (2, 4)], dtype=object)

In [7]: mi.map(lambda x: x)
Out[7]: array([(1, 2), (2, 4)], dtype=object)

In [8]: mi.map(lambda x: x[0])
Out[8]: array([1, 2])

New Behavior:

.. ipython:: python

idx.map(lambda x: x * 2)
idx.map(lambda x: (x, x * 2))

mi.map(lambda x: x)

mi.map(lambda x: x[0])

map on a Series with datetime64 values may return int64 dtypes rather than int32

.. ipython:: python

s = Series(date_range('2011-01-02T00:00', '2011-01-02T02:00', freq='H').tz_localize('Asia/Tokyo'))
s

Previous Behavior:

.. code-block:: ipython

In [9]: s.map(lambda x: x.hour)
Out[9]:
0 0
1 1
2 2
dtype: int32

New Behavior:

.. ipython:: python

s.map(lambda x: x.hour)

.. _whatsnew_0200.api_breaking.index_dt_field:

Accessing datetime fields of Index now return Index
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The datetime-related attributes (see :ref:here &lt;timeseries.components&gt;
for an overview) of DatetimeIndex, PeriodIndex and TimedeltaIndex previously
returned numpy arrays. They will now return a new Index object, except
in the case of a boolean field, where the result will still be a boolean ndarray. (:issue:15022)

Previous behaviour:

.. code-block:: ipython

In [1]: idx = pd.date_range("2015-01-01", periods=5, freq='10H')

In [2]: idx.hour
Out[2]: array([ 0, 10, 20, 6, 16], dtype=int32)

New Behavior:

.. ipython:: python

idx = pd.date_range("2015-01-01", periods=5, freq='10H')
idx.hour

This has the advantage that specific Index methods are still available on the
result. On the other hand, this might have backward incompatibilities: e.g.
compared to numpy arrays, Index objects are not mutable. To get the original
ndarray, you can always convert explicitly using np.asarray(idx.hour).

.. _whatsnew_0200.api_breaking.unique:

pd.unique will now be consistent with extension types
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

In prior versions, using :meth:Series.unique and :func:pandas.unique on Categorical and tz-aware
data-types would yield different return types. These are now made consistent. (:issue:15903)

  • Datetime tz-aware

Previous behaviour:

.. code-block:: ipython

 Series
In [5]: pd.Series([pd.Timestamp(&#39;20160101&#39;, tz=&#39;US/Eastern&#39;),
                   pd.Timestamp(&#39;20160101&#39;, tz=&#39;US/Eastern&#39;)]).unique()
Out[5]: array([Timestamp(&#39;2016-01-01 00:00:00-0500&#39;, tz=&#39;US/Eastern&#39;)], dtype=object)
In [6]: pd.unique(pd.Series([pd.Timestamp(&#39;20160101&#39;, tz=&#39;US/Eastern&#39;),
                             pd.Timestamp(&#39;20160101&#39;, tz=&#39;US/Eastern&#39;)]))
Out[6]: array([&#39;2016-01-01T05:00:00.000000000&#39;], dtype=&#39;datetime64[ns]&#39;)
 Index
In [7]: pd.Index([pd.Timestamp(&#39;20160101&#39;, tz=&#39;US/Eastern&#39;),
                  pd.Timestamp(&#39;20160101&#39;, tz=&#39;US/Eastern&#39;)]).unique()
Out[7]: DatetimeIndex([&#39;2016-01-01 00:00:00-05:00&#39;], dtype=&#39;datetime64[ns, US/Eastern]&#39;, freq=None)
In [8]: pd.unique([pd.Timestamp(&#39;20160101&#39;, tz=&#39;US/Eastern&#39;),
                   pd.Timestamp(&#39;20160101&#39;, tz=&#39;US/Eastern&#39;)])
Out[8]: array([&#39;2016-01-01T05:00:00.000000000&#39;], dtype=&#39;datetime64[ns]&#39;)

New Behavior:

.. ipython:: python

 Series, returns an array of Timestamp tz-aware
pd.Series([pd.Timestamp(&#39;20160101&#39;, tz=&#39;US/Eastern&#39;),
          pd.Timestamp(&#39;20160101&#39;, tz=&#39;US/Eastern&#39;)]).unique()
pd.unique(pd.Series([pd.Timestamp(&#39;20160101&#39;, tz=&#39;US/Eastern&#39;),
                     pd.Timestamp(&#39;20160101&#39;, tz=&#39;US/Eastern&#39;)]))
 Index, returns a DatetimeIndex
pd.Index([pd.Timestamp(&#39;20160101&#39;, tz=&#39;US/Eastern&#39;),
          pd.Timestamp(&#39;20160101&#39;, tz=&#39;US/Eastern&#39;)]).unique()
pd.unique(pd.Index([pd.Timestamp(&#39;20160101&#39;, tz=&#39;US/Eastern&#39;),
                    pd.Timestamp(&#39;20160101&#39;, tz=&#39;US/Eastern&#39;)]))
  • Categoricals

Previous behaviour:

.. code-block:: ipython

In [1]: pd.Series(list(&#39;baabc&#39;), dtype=&#39;category&#39;).unique()
Out[1]:
[b, a, c]
Categories (3, object): [b, a, c]
In [2]: pd.unique(pd.Series(list(&#39;baabc&#39;), dtype=&#39;category&#39;))
Out[2]: array([&#39;b&#39;, &#39;a&#39;, &#39;c&#39;], dtype=object)

New Behavior:

.. ipython:: python

 returns a Categorical
pd.Series(list(&#39;baabc&#39;), dtype=&#39;category&#39;).unique()
pd.unique(pd.Series(list(&#39;baabc&#39;), dtype=&#39;category&#39;))

.. _whatsnew_0200.api_breaking.s3:

S3 File Handling
^^^^^^^^^^^^^^^^

pandas now uses s3fs &lt;http://s3fs.readthedocs.io/&gt;_ for handling S3 connections. This shouldn't break
any code. However, since s3fs is not a required dependency, you will need to install it separately, like boto
in prior versions of pandas. (:issue:11915).

.. _whatsnew_0200.api_breaking.partial_string_indexing:

Partial String Indexing Changes
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

:ref:DatetimeIndex Partial String Indexing &lt;timeseries.partialindexing&gt; now works as an exact match, provided that string resolution coincides with index resolution, including a case when both are seconds (:issue:14826). See :ref:Slice vs. Exact Match &lt;timeseries.slice_vs_exact_match&gt; for details.

.. ipython:: python

df = DataFrame({'a': [1, 2, 3]}, DatetimeIndex(['2011-12-31 23:59:59',
'2012-01-01 00:00:00',
'2012-01-01 00:00:01']))
Previous Behavior:

.. code-block:: ipython

In [4]: df['2011-12-31 23:59:59']
Out[4]:
a
2011-12-31 23:59:59 1

In [5]: df['a']['2011-12-31 23:59:59']
Out[5]:
2011-12-31 23:59:59 1
Name: a, dtype: int64

New Behavior:

.. code-block:: ipython

In [4]: df['2011-12-31 23:59:59']
KeyError: '2011-12-31 23:59:59'

In [5]: df['a']['2011-12-31 23:59:59']
Out[5]: 1

.. _whatsnew_0200.api_breaking.concat_dtypes:

Concat of different float dtypes will not automatically upcast
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Previously, concat of multiple objects with different float dtypes would automatically upcast results to a dtype of float64.
Now the smallest acceptable dtype will be used (:issue:13247)

.. ipython:: python

df1 = pd.DataFrame(np.array([1.0], dtype=np.float32, ndmin=2))
df1.dtypes

df2 = pd.DataFrame(np.array([np.nan], dtype=np.float32, ndmin=2))
df2.dtypes

Previous Behavior:

.. code-block:: ipython

In [7]: pd.concat([df1, df2]).dtypes
Out[7]:
0 float64
dtype: object

New Behavior:

.. ipython:: python

pd.concat([df1, df2]).dtypes

.. _whatsnew_0200.api_breaking.gbq:

Pandas Google BigQuery support has moved
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

pandas has split off Google BigQuery support into a separate package pandas-gbq. You can conda install pandas-gbq -c conda-forge or
pip install pandas-gbq to get it. The functionality of :func:read_gbq and :meth:DataFrame.to_gbq remain the same with the
currently released version of pandas-gbq=0.1.4. Documentation is now hosted here &lt;https://pandas-gbq.readthedocs.io/&gt;__ (:issue:15347)

.. _whatsnew_0200.api_breaking.memory_usage:

Memory Usage for Index is more Accurate
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

In previous versions, showing .memory_usage() on a pandas structure that has an index, would only include actual index values and not include structures that facilitated fast indexing. This will generally be different for Index and MultiIndex and less-so for other index types. (:issue:15237)

Previous Behavior:

.. code-block:: ipython

In [8]: index = Index(['foo', 'bar', 'baz'])

In [9]: index.memory_usage(deep=True)
Out[9]: 180

In [10]: index.get_loc('foo')
Out[10]: 0

In [11]: index.memory_usage(deep=True)
Out[11]: 180

New Behavior:

.. code-block:: ipython

In [8]: index = Index(['foo', 'bar', 'baz'])

In [9]: index.memory_usage(deep=True)
Out[9]: 180

In [10]: index.get_loc('foo')
Out[10]: 0

In [11]: index.memory_usage(deep=True)
Out[11]: 260

.. _whatsnew_0200.api_breaking.sort_index:

DataFrame.sort_index changes
^^^^^^^^^^^^^^^^^^^^^^^^^^^^

In certain cases, calling .sort_index() on a MultiIndexed DataFrame would return the same DataFrame without seeming to sort.
This would happen with a lexsorted, but non-monotonic levels. (:issue:15622, :issue:15687, :issue:14015, :issue:13431, :issue:15797)

This is unchanged from prior versions, but shown for illustration purposes:

.. ipython:: python

df = DataFrame(np.arange(6), columns=['value'], index=MultiIndex.from_product([list('BA'), range(3)]))
df

.. ipython:: python

df.index.is_lexsorted()
df.index.is_monotonic

Sorting works as expected

.. ipython:: python

df.sort_index()

.. ipython:: python

df.sort_index().index.is_lexsorted()
df.sort_index().index.is_monotonic

However, this example, which has a non-monotonic 2nd level,
doesn't behave as desired.

.. ipython:: python

df = pd.DataFrame(
{'value': [1, 2, 3, 4]},
index=pd.MultiIndex(levels=[['a', 'b'], ['bb', 'aa']],
labels=[[0, 0, 1, 1], [0, 1, 0, 1]]))
df

Previous Behavior:

.. code-block:: python

In [11]: df.sort_index()
Out[11]:
value
a bb 1
aa 2
b bb 3
aa 4

In [14]: df.sort_index().index.is_lexsorted()
Out[14]: True

In [15]: df.sort_index().index.is_monotonic
Out[15]: False

New Behavior:

.. ipython:: python

df.sort_index()
df.sort_index().index.is_lexsorted()
df.sort_index().index.is_monotonic

.. _whatsnew_0200.api_breaking.groupby_describe:

Groupby Describe Formatting
^^^^^^^^^^^^^^^^^^^^^^^^^^^

The output formatting of groupby.describe() now labels the describe() metrics in the columns instead of the index.
This format is consistent with groupby.agg() when applying multiple functions at once. (:issue:4792)

Previous Behavior:

.. code-block:: ipython

In [1]: df = pd.DataFrame({'A': [1, 1, 2, 2], 'B': [1, 2, 3, 4]})

In [2]: df.groupby('A').describe()
Out[2]:
B
A
1 count 2.000000
mean 1.500000
std 0.707107
min 1.000000
25% 1.250000
50% 1.500000
75% 1.750000
max 2.000000
2 count 2.000000
mean 3.500000
std 0.707107
min 3.000000
25% 3.250000
50% 3.500000
75% 3.750000
max 4.000000

In [3]: df.groupby('A').agg([np.mean, np.std, np.min, np.max])
Out[3]:
B
mean std amin amax
A
1 1.5 0.707107 1 2
2 3.5 0.707107 3 4

New Behavior:

.. ipython:: python

df = pd.DataFrame({'A': [1, 1, 2, 2], 'B': [1, 2, 3, 4]})

df.groupby('A').describe()

df.groupby('A').agg([np.mean, np.std, np.min, np.max])

.. _whatsnew_0200.api_breaking.rolling_pairwise:

Window Binary Corr/Cov operations return a MultiIndex DataFrame
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

A binary window operation, like .corr() or .cov(), when operating on a .rolling(..), .expanding(..), or .ewm(..) object,
will now return a 2-level MultiIndexed DataFrame rather than a Panel, as Panel is now deprecated,
see :ref:here &lt;whatsnew_0200.api_breaking.deprecate_panel&gt;. These are equivalent in function,
but a MultiIndexed DataFrame enjoys more support in pandas.
See the section on :ref:Windowed Binary Operations &lt;stats.moments.binary&gt; for more information. (:issue:15677)

.. ipython:: python

np.random.seed(1234)
df = pd.DataFrame(np.random.rand(100, 2),
columns=pd.Index(['A', 'B'], name='bar'),
index=pd.date_range('20160101',
periods=100, freq='D', name='foo'))
df.tail()

Previous Behavior:

.. code-block:: ipython

In [2]: df.rolling(12).corr()
Out[2]:
<class 'pandas.core.panel.Panel'>
Dimensions: 100 (items) x 2 (major_axis) x 2 (minor_axis)
Items axis: 2016-01-01 00:00:00 to 2016-04-09 00:00:00
Major_axis axis: A to B
Minor_axis axis: A to B

New Behavior:

.. ipython:: python

res = df.rolling(12).corr()
res.tail()

Retrieving a correlation matrix for a cross-section

.. ipython:: python

df.rolling(12).corr().loc['2016-04-07']

.. _whatsnew_0200.api_breaking.hdfstore_where:

HDFStore where string comparison
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

In previous versions most types could be compared to string column in a HDFStore
usually resulting in an invalid comparison, returning an empty result frame. These comparisons will now raise a
TypeError (:issue:15492)

.. ipython:: python

df = pd.DataFrame({'unparsed_date': ['2014-01-01', '2014-01-01']})
df.to_hdf('store.h5', 'key', format='table', data_columns=True)
df.dtypes

Previous Behavior:

.. code-block:: ipython

In [4]: pd.read_hdf('store.h5', 'key', where='unparsed_date > ts')
File "<string>", line 1
(unparsed_date > 1970-01-01 00:00:01.388552400)
^
SyntaxError: invalid token

New Behavior:

.. code-block:: ipython

In [18]: ts = pd.Timestamp('2014-01-01')

In [19]: pd.read_hdf('store.h5', 'key', where='unparsed_date > ts')
TypeError: Cannot compare 2014-01-01 00:00:00 of
type <class 'pandas.tslib.Timestamp'> to string column

.. ipython:: python
:suppress:

import os
os.remove('store.h5')

.. _whatsnew_0200.api_breaking.index_order:

Index.intersection and inner join now preserve the order of the left Index
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

:meth:Index.intersection now preserves the order of the calling Index (left)
instead of the other Index (right) (:issue:15582). This affects inner
joins, :meth:DataFrame.join and :func:merge, and the .align method.

  • Index.intersection

.. ipython:: python

left = pd.Index([2, 1, 0])
left
right = pd.Index([1, 2, 3])
right

Previous Behavior:

.. code-block:: ipython

In [4]: left.intersection(right)
Out[4]: Int64Index([1, 2], dtype=&#39;int64&#39;)

New Behavior:

.. ipython:: python

left.intersection(right)
  • DataFrame.join and pd.merge

.. ipython:: python

left = pd.DataFrame({&#39;a&#39;: [20, 10, 0]}, index=[2, 1, 0])
left
right = pd.DataFrame({&#39;b&#39;: [100, 200, 300]}, index=[1, 2, 3])
right

Previous Behavior:

.. code-block:: ipython

In [4]: left.join(right, how=&#39;inner&#39;)
Out[4]:
    a    b
1  10  100
2  20  200

New Behavior:

.. ipython:: python

left.join(right, how=&#39;inner&#39;)

.. _whatsnew_0200.api_breaking.pivot_table:

Pivot Table always returns a DataFrame
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The documentation for :meth:pivot_table states that a DataFrame is always returned. Here a bug
is fixed that allowed this to return a Series under certain circumstance. (:issue:4386)

.. ipython:: python

df = DataFrame({'col1': [3, 4, 5],
'col2': ['C', 'D', 'E'],
'col3': [1, 3, 9]})
df

Previous Behavior:

.. code-block:: ipython

In [2]: df.pivot_table('col1', index=['col3', 'col2'], aggfunc=np.sum)
Out[2]:
col3 col2
1 C 3
3 D 4
9 E 5
Name: col1, dtype: int64

New Behavior:

.. ipython:: python

df.pivot_table('col1', index=['col3', 'col2'], aggfunc=np.sum)

.. _whatsnew_0200.api:

Other API Changes
^^^^^^^^^^^^^^^^^

  • numexpr version is now required to be >= 2.4.6 and it will not be used at all if this requisite is not fulfilled (:issue:15213).
  • CParserError has been renamed to ParserError in pd.read_csv() and will be removed in the future (:issue:12665)
  • SparseArray.cumsum() and SparseSeries.cumsum() will now always return SparseArray and SparseSeries respectively (:issue:12855)
  • DataFrame.applymap() with an empty DataFrame will return a copy of the empty DataFrame instead of a Series (:issue:8222)
  • Series.map() now respects default values of dictionary subclasses with a __missing__ method, such as collections.Counter (:issue:15999)
  • .loc has compat with .ix for accepting iterators, and NamedTuples (:issue:15120)
  • interpolate() and fillna() will raise a ValueError if the limit keyword argument is not greater than 0. (:issue:9217)
  • pd.read_csv() will now issue a ParserWarning whenever there are conflicting values provided by the dialect parameter and the user (:issue:14898)
  • pd.read_csv() will now raise a ValueError for the C engine if the quote character is larger than than one byte (:issue:11592)
  • inplace arguments now require a boolean value, else a ValueError is thrown (:issue:14189)
  • pandas.api.types.is_datetime64_ns_dtype will now report True on a tz-aware dtype, similar to pandas.api.types.is_datetime64_any_dtype
  • DataFrame.asof() will return a null filled Series instead the scalar NaN if a match is not found (:issue:15118)
  • Specific support for copy.copy() and copy.deepcopy() functions on NDFrame objects (:issue:15444)
  • Series.sort_values() accepts a one element list of bool for consistency with the behavior of DataFrame.sort_values() (:issue:15604)
  • .merge() and .join() on category dtype columns will now preserve the category dtype when possible (:issue:10409)
  • SparseDataFrame.default_fill_value will be 0, previously was nan in the return from pd.get_dummies(..., sparse=True) (:issue:15594)
  • The default behaviour of Series.str.match has changed from extracting
    groups to matching the pattern. The extracting behaviour was deprecated
    since pandas version 0.13.0 and can be done with the Series.str.extract
    method (:issue:5224). As a consequence, the as_indexer keyword is
    ignored (no longer needed to specify the new behaviour) and is deprecated.
  • NaT will now correctly report False for datetimelike boolean operations such as is_month_start (:issue:15781)
  • NaT will now correctly return np.nan for Timedelta and Period accessors such as days and quarter (:issue:15782)
  • NaT will now returns NaT for tz_localize and tz_convert
    methods (:issue:15830)
  • DataFrame and Panel constructors with invalid input will now raise ValueError rather than PandasError, if called with scalar inputs and not axes (:issue:15541)
  • DataFrame and Panel constructors with invalid input will now raise ValueError rather than pandas.core.common.PandasError, if called with scalar inputs and not axes; The exception PandasError is removed as well. (:issue:15541)
  • The exception pandas.core.common.AmbiguousIndexError is removed as it is not referenced (:issue:15541)

.. _whatsnew_0200.privacy:

Reorganization of the library: Privacy Changes

.. _whatsnew_0200.privacy.extensions:

Modules Privacy Has Changed
^^^^^^^^^^^^^^^^^^^^^^^^^^^

Some formerly public python/c/c++/cython extension modules have been moved and/or renamed. These are all removed from the public API.
Furthermore, the pandas.core, pandas.compat, and pandas.util top-level modules are now considered to be PRIVATE.
If indicated, a deprecation warning will be issued if you reference theses modules. (:issue:12588)

.. csv-table::
:header: "Previous Location", "New Location", "Deprecated"
:widths: 30, 30, 4

"pandas.lib", "pandas._libs.lib", "X"
"pandas.tslib", "pandas._libs.tslib", "X"
"pandas.computation", "pandas.core.computation", "X"
"pandas.msgpack", "pandas.io.msgpack", ""
"pandas.index", "pandas._libs.index", ""
"pandas.algos", "pandas._libs.algos", ""
"pandas.hashtable", "pandas._libs.hashtable", ""
"pandas.indexes", "pandas.core.indexes", ""
"pandas.json", "pandas._libs.json / pandas.io.json", "X"
"pandas.parser", "pandas._libs.parsers", "X"
"pandas.formats", "pandas.io.formats", ""
"pandas.sparse", "pandas.core.sparse", ""
"pandas.tools", "pandas.core.reshape", "X"
"pandas.types", "pandas.core.dtypes", "X"
"pandas.io.sas.saslib", "pandas.io.sas._sas", ""
"pandas._join", "pandas._libs.join", ""
"pandas._hash", "pandas._libs.hashing", ""
"pandas._period", "pandas._libs.period", ""
"pandas._sparse", "pandas._libs.sparse", ""
"pandas._testing", "pandas._libs.testing", ""
"pandas._window", "pandas._libs.window", ""

Some new subpackages are created with public functionality that is not directly
exposed in the top-level namespace: pandas.errors, pandas.plotting and
pandas.testing (more details below). Together with pandas.api.types and
certain functions in the pandas.io and pandas.tseries submodules,
these are now the public subpackages.

Further changes:

  • The function :func:~pandas.api.types.union_categoricals is now importable from pandas.api.types, formerly from pandas.types.concat (:issue:15998)
  • The type import pandas.tslib.NaTType is deprecated and can be replaced by using type(pandas.NaT) (:issue:16146)
  • The public functions in pandas.tools.hashing deprecated from that locations, but are now importable from pandas.util (:issue:16223)
  • The modules in pandas.util: decorators, print_versions, doctools, validators, depr_module are now private. Only the functions exposed in pandas.util itself are public (:issue:16223)

.. _whatsnew_0200.privacy.errors:

pandas.errors
^^^^^^^^^^^^^^^^^

We are adding a standard public module for all pandas exceptions & warnings pandas.errors. (:issue:14800). Previously
these exceptions & warnings could be imported from pandas.core.common or pandas.io.common. These exceptions and warnings
will be removed from the *.common locations in a future release. (:issue:15541)

The following are now part of this API:

.. code-block:: python

['DtypeWarning',
'EmptyDataError',
'OutOfBoundsDatetime',
'ParserError',
'ParserWarning',
'PerformanceWarning',
'UnsortedIndexError',
'UnsupportedFunctionCall']

.. _whatsnew_0200.privacy.testing:

pandas.testing
^^^^^^^^^^^^^^^^^^

We are adding a standard module that exposes the public testing functions in pandas.testing (:issue:9895). Those functions can be used when writing tests for functionality using pandas objects.

The following testing functions are now part of this API:

  • :func:testing.assert_frame_equal
  • :func:testing.assert_series_equal
  • :func:testing.assert_index_equal

.. _whatsnew_0200.privacy.plotting:

pandas.plotting
^^^^^^^^^^^^^^^^^^^

A new public pandas.plotting module has been added that holds plotting functionality that was previously in either pandas.tools.plotting or in the top-level namespace. See the :ref:deprecations sections &lt;whatsnew_0200.privacy.deprecate_plotting&gt; for more details.

.. _whatsnew_0200.privacy.development:

Other Development Changes
^^^^^^^^^^^^^^^^^^^^^^^^^

  • Building pandas for development now requires cython &gt;= 0.23 (:issue:14831)
  • Require at least 0.23 version of cython to avoid problems with character encodings (:issue:14699)
  • Switched the test framework to use pytest &lt;http://doc.pytest.org/en/latest&gt;__ (:issue:13097)
  • Reorganization of tests directory layout (:issue:14854, :issue:15707).

.. _whatsnew_0200.deprecations:

Deprecations

.. _whatsnew_0200.api_breaking.deprecate_ix:

Deprecate .ix
^^^^^^^^^^^^^^^^^

The .ix indexer is deprecated, in favor of the more strict .iloc and .loc indexers. .ix offers a lot of magic on the inference of what the user wants to do. To wit, .ix can decide to index positionally OR via labels, depending on the data type of the index. This has caused quite a bit of user confusion over the years. The full indexing documentation is :ref:here &lt;indexing&gt;. (:issue:14218)

The recommended methods of indexing are:

  • .loc if you want to label index
  • .iloc if you want to positionally index.

Using .ix will now show a DeprecationWarning with a link to some examples of how to convert code :ref:here &lt;indexing.deprecate_ix&gt;.

.. ipython:: python

df = pd.DataFrame({'A': [1, 2, 3],
'B': [4, 5, 6]},
index=list('abc'))

df

Previous Behavior, where you wish to get the 0th and the 2nd elements from the index in the 'A' column.

.. code-block:: ipython

In [3]: df.ix[[0, 2], 'A']
Out[3]:
a 1
c 3
Name: A, dtype: int64

Using .loc. Here we will select the appropriate indexes from the index, then use label indexing.

.. ipython:: python

df.loc[df.index[[0, 2]], 'A']

Using .iloc. Here we will get the location of the 'A' column, then use positional indexing to select things.

.. ipython:: python

df.iloc[[0, 2], df.columns.get_loc('A')]

.. _whatsnew_0200.api_breaking.deprecate_panel:

Deprecate Panel
^^^^^^^^^^^^^^^

Panel is deprecated and will be removed in a future version. The recommended way to represent 3-D data are
with a MultiIndex on a DataFrame via the :meth:~Panel.to_frame or with the xarray package &lt;http://xarray.pydata.org/en/stable/&gt;__. Pandas
provides a :meth:~Panel.to_xarray method to automate this conversion. For more details see :ref:Deprecate Panel &lt;dsintro.deprecate_panel&gt; documentation. (:issue:13563).

.. ipython:: python
:okwarning:

p = tm.makePanel()
p

Convert to a MultiIndex DataFrame

.. ipython:: python

p.to_frame()

Convert to an xarray DataArray

.. ipython:: python

p.to_xarray()

.. _whatsnew_0200.api_breaking.deprecate_group_agg_dict:

Deprecate groupby.agg() with a dictionary when renaming
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The .groupby(..).agg(..), .rolling(..).agg(..), and .resample(..).agg(..) syntax can accept a variable of inputs, including scalars,
list, and a dict of column names to scalars or lists. This provides a useful syntax for constructing multiple
(potentially different) aggregations.

However, .agg(..) can also accept a dict that allows 'renaming' of the result columns. This is a complicated and confusing syntax, as well as not consistent
between Series and DataFrame. We are deprecating this 'renaming' functionaility.

  • We are deprecating passing a dict to a grouped/rolled/resampled Series. This allowed
    one to rename the resulting aggregation, but this had a completely different
    meaning than passing a dictionary to a grouped DataFrame, which accepts column-to-aggregations.
  • We are deprecating passing a dict-of-dicts to a grouped/rolled/resampled DataFrame in a similar manner.

This is an illustrative example:

.. ipython:: python

df = pd.DataFrame({'A': [1, 1, 1, 2, 2],
'B': range(5),
'C': range(5)})
df

Here is a typical useful syntax for computing different aggregations for different columns. This
is a natural, and useful syntax. We aggregate from the dict-to-list by taking the specified
columns and applying the list of functions. This returns a MultiIndex for the columns (this is not deprecated).

.. ipython:: python

df.groupby('A').agg({'B': 'sum', 'C': 'min'})

Here's an example of the first deprecation, passing a dict to a grouped Series. This
is a combination aggregation & renaming:

.. code-block:: ipython

In [6]: df.groupby('A').B.agg({'foo': 'count'})
FutureWarning: using a dict on a Series for aggregation
is deprecated and will be removed in a future version

Out[6]:
foo
A
1 3
2 2

You can accomplish the same operation, more idiomatically by:

.. ipython:: python

df.groupby('A').B.agg(['count']).rename(columns={'count': 'foo'})

Here's an example of the second deprecation, passing a dict-of-dict to a grouped DataFrame:

.. code-block:: python

In [23]: (df.groupby('A')
.agg({'B': {'foo': 'sum'}, 'C': {'bar': 'min'}})
)
FutureWarning: using a dict with renaming is deprecated and
will be removed in a future version

Out[23]:
B C
foo bar
A
1 3 0
2 7 3

You can accomplish nearly the same by:

.. ipython:: python

(df.groupby('A')
.agg({'B': 'sum', 'C': 'min'})
.rename(columns={'B': 'foo', 'C': 'bar'})
)

.. _whatsnew_0200.privacy.deprecate_plotting:

Deprecate .plotting
^^^^^^^^^^^^^^^^^^^

The pandas.tools.plotting module has been deprecated, in favor of the top level pandas.plotting module. All the public plotting functions are now available
from pandas.plotting (:issue:12548).

Furthermore, the top-level pandas.scatter_matrix and pandas.plot_params are deprecated.
Users can import these from pandas.plotting as well.

Previous script:

.. code-block:: python

pd.tools.plotting.scatter_matrix(df)
pd.scatter_matrix(df)

Should be changed to:

.. code-block:: python

pd.plotting.scatter_matrix(df)

.. _whatsnew_0200.deprecations.other:

Other Deprecations
^^^^^^^^^^^^^^^^^^

  • SparseArray.to_dense() has deprecated the fill parameter, as that parameter was not being respected (:issue:14647)
  • SparseSeries.to_dense() has deprecated the sparse_only parameter (:issue:14647)
  • Series.repeat() has deprecated the reps parameter in favor of repeats (:issue:12662)
  • The Series constructor and .astype

@pyup-bot pyup-bot mentioned this pull request Jun 5, 2017
@pyup-bot
Copy link
Contributor Author

pyup-bot commented Jul 7, 2017

Closing this in favor of #222

@pyup-bot pyup-bot closed this Jul 7, 2017
@MightySCollins MightySCollins deleted the pyup-update-pandas-0.19.0-to-0.20.2 branch July 7, 2017 21:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant