From 3e647aa7811d0b97984b7c8ef76c5b6e87bf41ed Mon Sep 17 00:00:00 2001 From: Fiona Pigott Date: Thu, 24 Aug 2017 15:12:00 -0600 Subject: [PATCH] Fix setup script (#18) * minor changes to documentation HTML * readme to rst, remove package data * remove setup.cfg, not necessary * removing package data inclusion * fix link * fix bullets * Update README.rst * editing doc_build * symbolic link * fix readme * messing with include --- MANIFEST.in | 3 - README.md | 132 -------------------------------- README.rst | 166 +++++++++++++++++++++++++++++++++++++++++ doc_build.sh | 6 +- docs/source/README.rst | 149 +----------------------------------- setup.cfg | 2 - setup.py | 19 ++--- 7 files changed, 177 insertions(+), 300 deletions(-) delete mode 100644 MANIFEST.in delete mode 100644 README.md create mode 100644 README.rst delete mode 100644 setup.cfg diff --git a/MANIFEST.in b/MANIFEST.in deleted file mode 100644 index f450a5b..0000000 --- a/MANIFEST.in +++ /dev/null @@ -1,3 +0,0 @@ -include test/tweet_payload_examples/activity_streams_examples.json -include test/tweet_payload_examples/original_format_examples.json -include test/tweet_payload_examples/broken_and_unsupported_payloads/*.json diff --git a/README.md b/README.md deleted file mode 100644 index 196b9a4..0000000 --- a/README.md +++ /dev/null @@ -1,132 +0,0 @@ -# Tweet Parser -Authors: [Fiona Pigott](https://github.com/fionapigott), [Jeff Kolb](https://github.com/jeffakolb), -[Josh Montague](https://github.com/jrmontag), [Aaron Gonzales](https://github.com/binaryaaron) - -## Goal: -Allow reliable parsing of Tweets delivered by the Gnip platform, in both -activity-streams and original formats. - -## Status: -This package can be installed by cloning the repo and using `pip install -e .`, -or by using `pip install tweet_parser`. First probably-bug-free release is -1.0.3. As of version 1.0.5, the package works with Python 2 and 3, -and the API should be relatively stable. Recommended to use the more recent release. -Current release (As of 8/14/2017) is 1.0.6. - -Currently, this parser does not explicitly support Public API Twitter data. - -## Usage: -This package is intended to be used as a Python module inside your other -Tweet-related code. An example Python program (after pip installing the -package) would be: - -``` -from tweeet_parser.tweet import Tweet -from tweet_parser.tweet_parser_errors import NotATweetError -import fileinput -import json - -for line in fileinput.FileInput("gnip_tweet_data.json"): - try: - tweet_dict = json.loads(line) - tweet = Tweet(tweet_dict) - except (json.JSONDecodeError,NotATweetError): - pass - print(tweet.created_at_string, tweet.all_text) -``` - -I've also added simple command-line utility: - -``` -python tools/parse_tweets.py -f"gnip_tweet_data.json" -c"created_at_string,all_text" -``` - -## Testing: -A Python `test_tweet_parser.py` package exists in `test/`. - -The most important thing that it tests is the equivalence of outputs when -comparing both activity-streams input and original-format input. Any new getter -will be tested by running `test$ python test_tweet_parser.py`, as the test -checks every method attached to the Tweet object, for every test tweet stored -in `test/tweet_payload_examples`. For any cases where it is expected that the -outputs are different (e.g., outputs that depend on poll options), conditional -statements should be added to this test. - -An option also exists for run-time checking of Tweet payload formats. This -compares the set of all Tweet field keys to a superset of all possible keys, as -well as a minimum set of all required keys, to make sure that each newly loaded -Tweet fits those parameters. This shouldn't be run every time you load Tweets -(for one, it's slow), but is implemented to use as a periodic check against -Tweet format changes. This option is enabled with `--do_format_validation` on the -command line, and by setting the keyword argument `do_format_validation` to -`True` when initializing a `Tweet` object. - - -## Contributing - -Submit bug reports or feature requests through GitHub Issues, with -self-contained minimum working examples where appropriate. - -To contribute code, fork this repo, create your own local feature branch, make -your changes, test them, and submit a pull request to the master branch. -The contribution guidelines specified in the [`pandas` documentation]( -http://pandas.pydata.org/pandas-docs/stable/contributing.html#working-with-the-code) -are a great reference. - -When you submit a change, change the version number. -For most minor, non-breaking changes (fix a bug, add a getter, -package naming/structure remains the same), increment the last -number (X.Y.Z -> X.Y.Z+1) in `setup.py`. - -### Guidelines for new getters -A _getter_ is a method in the Tweet class and the accompanying code in the `getter_methods` -module. A getter for some property should: -- be named ``, a method in `Tweet` decorated with `@lazy_property` -- have a corresponding method named `get_(tweet)` in the `getter_methods` module -that implements the logic, nested uner the appropriate submodule (a text property -probably lives under the `getter_methods.tweet_text` submodule) -- provide the exact same output for original format and activity-streams format Tweet input, -except in the case where certain information is unavailable (see `get_poll_options`). - -In general, prefer that the `get_` work on a simple Tweet dictionary as well as a -Tweet object (this makes unit testing easier). This means that you might use -`is_original_format(tweet)` rather than `tweet.is_original_format` to check format inside of a getter. - -Adding unit tests for your getter in the docstrings in the "Example" section is helpful. -See existing getters for examples. - -In general, make detailed docstrings with examples in `get_`, and more concise -dosctrings in `Tweet`, with a reference for where to find the `get_` getter that -implements the logic. - -### Style -Adhere to the PEP8 style. Using a Python linter (like flake8) is reccomended. - -For documentation style, use [Google-style docstrings](http://sphinxcontrib-napoleon.readthedocs.io/en/latest/example_google.html). -Refer to the [Python docstest documentation](https://docs.python.org/3/library/doctest.html) for doctest guidelines. - -### Testing -Create an isolated virtual environment for testing (there are currently no external -dependencies for this library). - -Test your new feature by reinstalling the library in your virtual environment -and running the test script as shown below. Fix any issues until all tests -pass. - -``` -(env) [tweet_parser]$ pip install -e . -(env) [tweet_parser]$ cd test/; python test_tweet_parser.py; cd - -``` - -Furthermore, if contributing a new accessor or getter method for payload -elements, verify the code works as you intended by running the -`parse_tweets.py` script with your new field, as shown below. Check that both -input types produce the intended output. - -``` -(env) [tweet_parser]$ pip install -e . -(env) [tweet_parser]$ python tools/parse_tweets.py -f test/tweet_payload_examples/activity_streams_examples.json -c -``` - -And lastly, if you've added new docstrings and doctests, from the `docs` directory, -run `make html` (to check docstring formatting) and `make doctest` to run the doctests. diff --git a/README.rst b/README.rst new file mode 100644 index 0000000..f8efaa6 --- /dev/null +++ b/README.rst @@ -0,0 +1,166 @@ +Tweet Parser +============ + +Authors: `Fiona Pigott `__, `Jeff +Kolb `__, `Josh +Montague `__, `Aaron +Gonzales `__ + +Goal: +----- + +Allow reliable parsing of Tweets delivered by the Gnip platform, in both +activity-streams and original formats. + +Status: +------- + +This package can be installed by cloning the repo and using +``pip install -e .``, or by using ``pip install tweet_parser``. First +probably-bug-free release is 1.0.3. As of version 1.0.5, the package +works with Python 2 and 3, and the API should be relatively stable. +Recommended to use the more recent release. Current release (As of +8/14/2017) is 1.0.6. + +Currently, this parser does not explicitly support Public API Twitter +data. + +Usage: +------ + +This package is intended to be used as a Python module inside your other +Tweet-related code. An example Python program (after pip installing the +package) would be: + +:: + + from tweeet_parser.tweet import Tweet + from tweet_parser.tweet_parser_errors import NotATweetError + import fileinput + import json + + for line in fileinput.FileInput("gnip_tweet_data.json"): + try: + tweet_dict = json.loads(line) + tweet = Tweet(tweet_dict) + except (json.JSONDecodeError,NotATweetError): + pass + print(tweet.created_at_string, tweet.all_text) + +I've also added simple command-line utility: + +:: + + python tools/parse_tweets.py -f"gnip_tweet_data.json" -c"created_at_string,all_text" + +Testing: +-------- + +A Python ``test_tweet_parser.py`` package exists in ``test/``. + +The most important thing that it tests is the equivalence of outputs +when comparing both activity-streams input and original-format input. +Any new getter will be tested by running +``test$ python test_tweet_parser.py``, as the test checks every method +attached to the Tweet object, for every test tweet stored in +``test/tweet_payload_examples``. For any cases where it is expected that +the outputs are different (e.g., outputs that depend on poll options), +conditional statements should be added to this test. + +An option also exists for run-time checking of Tweet payload formats. +This compares the set of all Tweet field keys to a superset of all +possible keys, as well as a minimum set of all required keys, to make +sure that each newly loaded Tweet fits those parameters. This shouldn't +be run every time you load Tweets (for one, it's slow), but is +implemented to use as a periodic check against Tweet format changes. +This option is enabled with ``--do_format_validation`` on the command +line, and by setting the keyword argument ``do_format_validation`` to +``True`` when initializing a ``Tweet`` object. + +Contributing +------------ + +Submit bug reports or feature requests through GitHub Issues, with +self-contained minimum working examples where appropriate. + +To contribute code, fork this repo, create your own local feature +branch, make your changes, test them, and submit a pull request to the +master branch. The contribution guidelines specified in the ``pandas`` +`documentation `__ +are a great reference. + +When you submit a change, change the version number. For most minor, +non-breaking changes (fix a bug, add a getter, package naming/structure +remains the same), increment the last number (X.Y.Z -> X.Y.Z+1) in +``setup.py``. + +Guidelines for new getters +~~~~~~~~~~~~~~~~~~~~~~~~~~ + +A *getter* is a method in the Tweet class and the accompanying code in +the ``getter_methods`` module. A getter for some property should: + +- be named ````, a method in ``Tweet`` decorated with + ``@lazy_property`` +- have a corresponding method named + ``get_(tweet)`` in the ``getter_methods`` module that + implements the logic, nested uner the appropriate submodule (a text + property probably lives under the ``getter_methods.tweet_text`` + submodule) +- provide the exact same output for original format and + activity-streams format Tweet input, except in the case where certain + information is unavailable (see ``get_poll_options``). + +In general, prefer that the ``get_`` work on a simple Tweet +dictionary as well as a Tweet object (this makes unit testing easier). +This means that you might use ``is_original_format(tweet)`` rather than +``tweet.is_original_format`` to check format inside of a getter. + +Adding unit tests for your getter in the docstrings in the "Example" +section is helpful. See existing getters for examples. + +In general, make detailed docstrings with examples in +``get_``, and more concise dosctrings in ``Tweet``, with a +reference for where to find the ``get_`` getter that +implements the logic. + +Style +~~~~~ + +Adhere to the PEP8 style. Using a Python linter (like flake8) is +reccomended. + +For documentation style, use `Google-style +docstrings `__. +Refer to the `Python docstest +documentation `__ for +doctest guidelines. + +Testing +~~~~~~~ + +Create an isolated virtual environment for testing (there are currently +no external dependencies for this library). + +Test your new feature by reinstalling the library in your virtual +environment and running the test script as shown below. Fix any issues +until all tests pass. + +:: + + (env) [tweet_parser]$ pip install -e . + (env) [tweet_parser]$ cd test/; python test_tweet_parser.py; cd - + +Furthermore, if contributing a new accessor or getter method for payload +elements, verify the code works as you intended by running the +``parse_tweets.py`` script with your new field, as shown below. Check +that both input types produce the intended output. + +:: + + (env) [tweet_parser]$ pip install -e . + (env) [tweet_parser]$ python tools/parse_tweets.py -f test/tweet_payload_examples/activity_streams_examples.json -c + +And lastly, if you've added new docstrings and doctests, from the +``docs`` directory, run ``make html`` (to check docstring formatting) +and ``make doctest`` to run the doctests. diff --git a/doc_build.sh b/doc_build.sh index 811c16c..44461e0 100644 --- a/doc_build.sh +++ b/doc_build.sh @@ -22,13 +22,13 @@ rm -rf *.egg-info git pull origin gh-pages rm -r *.html *.js touch .nojekyll -git checkout $BRANCH_NAME docs tweet_parser README.md +git checkout $BRANCH_NAME docs tweet_parser README.rst # need to do this step because the readme will be overwritten -pandoc -i README.md -o docs/source/README.rst +# pandoc -i README.md -o docs/source/README.rst mv docs/* . make html mv -fv build/html/* ./ -rm -r tweet_parser docs build Makefile source README.md __pycache__/ +rm -r tweet_parser docs build Makefile source __pycache__/ echo "--------------------------------------------------------------------" echo " docs built; please review these changes and then run the following:" echo "--------------------------------------------------------------------" diff --git a/docs/source/README.rst b/docs/source/README.rst index 32eefcd..72a3355 100644 --- a/docs/source/README.rst +++ b/docs/source/README.rst @@ -1,148 +1 @@ -tweet\_parser -============= - -Authors: Fiona Pigott, Jeff Kolb, Josh Montague, Aaron Gonzales - -Goal: ------ - -Allow reliable parsing of Tweets delivered by the Gnip platform, in both -activity-streams and original formats. - -Status: -------- - -This package can be installed by cloning the repo and using -``pip install -e .``, or by using ``pip install tweet_parser``. First -probably-bug-free release is 1.0.3 (current as of 8/7/2017). No -promises. - -Currently, this parser does not explicitly support Public API Twitter -data. - -Usage: ------- - -This package is intended to be used as a Python module inside your other -Tweet-related code. An example Python program (after pip installing the -package) would be: - -:: - - from tweeet_parser.tweet import Tweet - from tweet_parser.tweet_parser_errors import NotATweetError - import fileinput - import json - - for line in fileinput.FileInput("gnip_tweet_data.json"): - try: - tweet_dict = json.loads(line) - tweet = Tweet(tweet_dict) - except (json.JSONDecodeError,NotATweetError): - pass - print(tweet.created_at_string, tweet.all_text) - -I've also added simple command-line utility: - -:: - - python tools/tweet_parser.py -f"gnip_tweet_data.json" -c"created_at_string,all_text" - -Testing: --------- - -A Python ``test_tweet_parser.py`` package exists in ``test/``. - -The most important thing that it tests is the equivalence of outputs -when comparing both activity-streams input and original-format input. -Any new getter will be tested by running -``test$ python test_tweet_parser.py``, as the test checks every method -attached to the Tweet object, for every test tweet stored in -``test/tweet_payload_examples``. For any cases where it is expected that -the outputs are different (e.g., outputs that depend on poll options), -conditional statements should be added to this test. - -An option also exists for run-time checking of Tweet payload formats. -This compares the set of all Tweet field keys to a superset of all -possible keys, as well as a minimum set of all required keys, to make -sure that each newly loaded Tweet fits those parameters. This shouldn't -be run every time you load Tweets (for one, it's slow), but is -implemented to use as a periodic check against Tweet format changes. -This option is enabled with ``--do_format_checking`` on the command -line, and by setting the keyword argument ``do_format_checking`` to -``True`` when initializing a ``Tweet`` object. - -Documentation -------------- - -We are using Sphinx with Google-style docstrings to build our -documentation. If you don't have sphinx installed, it's a quick -``pip install sphinx``. To build the docs locally, follow: - -Setup -~~~~~ - -.. code:: shell - - pip install sphinx - pip install sphinx_bootstrap_theme - -Build -~~~~~ - -.. code:: shell - - cd tweet_parser/docs - make clean - make html - -Deploying to github pages -~~~~~~~~~~~~~~~~~~~~~~~~~ - -From the root of the repo run: - -.. code:: shell - - bash doc_build.sh - -where ```` is the name of the branch you'll be building -from, most likely master. The script will change to the ``gh-pages`` -branch, clean out the olds docs, pull your changes from the relevant -branch, build them, and give you instructions for review and commands -for deployment. - -Contributing ------------- - -Submit bug reports or feature requests through GitHub Issues, with -self-contained minimum working examples where appropriate. - -To contribute code, the guidelines specified in the ```pandas`` -documentation `__ -are a great reference. Fork this repo, create your own local feature -branch, and create an isolated virtual environment (there are currently -no external dependencies for this library). Using a Python linter is -recommened. - -Test your new feature by reinstalling the library in your virtual -environment and running the test script as shown below. Fix any issues -until all tests pass. - -.. code:: bash - - (env) [tweet_parser]$ pip install -e . - (env) [tweet_parser]$ cd test/; python test_tweet_parser.py; cd - - -Furthermore, if contributing a new accessor or getter method for payload -elements, verify the code works as you intended by running the -``parse_tweets.py`` script with your new field, as shown below. Check -that both input types produce the intended output. - -.. code:: bash - - (env) [tweet_parser]$ pip install -e . - (env) [tweet_parser]$ python tools/parse_tweets.py -f test/tweet_payload_examples/activity_streams_examples.json -c - -Change the version number. For most minor, non-breaking changes (fix a -bug, add a getter, package naming/structure remains the same), simply -update the last number (Z of X.Y.Z) in ``setup.py``. +.. include:: ../README.rst diff --git a/setup.cfg b/setup.cfg deleted file mode 100644 index b88034e..0000000 --- a/setup.cfg +++ /dev/null @@ -1,2 +0,0 @@ -[metadata] -description-file = README.md diff --git a/setup.py b/setup.py index 6d29bc6..2d49f53 100644 --- a/setup.py +++ b/setup.py @@ -1,19 +1,14 @@ from setuptools import setup, find_packages setup(name='tweet_parser', - packages=find_packages(), - scripts=["tools/parse_tweets.py"], - version='1.0.6', - license='MIT', - author='Fiona Pigott', - author_email='fpigott@twitter.com', description="Tools for Tweet parsing", url='https://github.com/tw-ddis/tweet_parser', + author='Fiona Pigott, Jeff Kolb, Josh Montague, Aaron Gonzales', + long_description=open('README.rst', 'r').read(), + author_email='fpigott@twitter.com', + license='MIT', + version='1.0.6', + packages=find_packages(), + scripts=["tools/parse_tweets.py"], install_requires=[], - include_package_data=True, - package_data={ - 'tweet_parser': ['test/tweet_payload_examples/activity_streams_examples.json', - 'test/tweet_payload_examples/original_format_examples.json', - 'test/tweet_payload_examples/broken_and_unsupported_payloads/*.json'], - }, )