Skip to content

Latest commit

 

History

History
78 lines (49 loc) · 8.84 KB

non-data-python.md

File metadata and controls

78 lines (49 loc) · 8.84 KB

Non Data-Related Python Stuff

It's no secret that one of the writers (@bmcguirk) of this repository really likes Python. He wants you to, as well, because it has improved his life since he started learning it. We also need more people in State government fluent in this kind of thing.

To get started, we recommend the Anaconda distribution of Python. It's pre-packed with 1,400+ great modules that are pre-compiled and pre-configured for your operating system. These modules are geared towards doing science in Python, but we think it's worth it to get the full kit and caboodle. So what if you never use Astropy in your day-to-day work for the State. There are still hundreds of useful modules in there.

Training

Web / HTML

  • BeautifulSoup - HTML Parsing made super easy. But it needs to be fed HTML. You could use Python's built-in urllib or httplib, but what you really want to use is...
  • requests, which describes itself as "HTTP for Humans." From like 8 lines of code to go grab the content of a website, you just import requests and then requests.get('https://github.com/401ode/tools. Now the whole content of that request is an a Pythonic object, which you can then feed to BeautifulSoup to parse and do whatever you want with. Like if you want to get all the links out of a given page, you just do:
import requests
from bs4 import BeautifulSoup as bs

odetools = requests.get('https://github.com/401ode/tools')
odetoolsoup = bs(odetools.content, 'html.parser')

for link in odetoolsoup.find_all('a'):
    print(link.get('href'))
# http://example.com/brian
# http://example.com/ryan
# http://example.com/shawn
  • Scrapy - An alternative to everything I just wrote above, this package is designed to intelligently scrape whatever site you point it at. Seems super-promising.

Code Formatting

  • Less a tool, but more of the Bible for how to format/style Python code is PEP8. PEP stands for Python Enhancement Proposal. This particular PEP was written by the creator of Python himself, Guido van Rossum. In it, he makes the point that "code is read much more often than it is written." As such, code should be readable by humans.
  • yapf - Google's opinionated code-formatter. To wit: 'Most of the current formatters for Python --- e.g., autopep8, and pep8ify --- are made to remove lint errors from code. This has some obvious limitations. For instance, code that conforms to the PEP 8 guidelines may not be reformatted. But it doesn't mean that the code looks good... YAPF takes a different approach. It's based off of clang-format, developed by Daniel Jasper. In essence, the algorithm takes the code and reformats it to the best formatting that conforms to the style guide, even if the original code didn't violate the style guide. The idea is also similar to the gofmt tool for the Go programming language: end all holy wars about formatting...'
  • black is an extremely-opinionated code formatter. To quote the project's site, "Black reformats entire files in place. It is not configurable. It doesn’t take previous formatting into account." But it's actually got a great aesthetic taste. Worth your time. Also has plugins to autoformat your code in a bunch of different editors.
  • iSort - Will intelligently and elegantly sort your module imports at the top of your .py file. Compatible with Black.

Image Manipulation

Image Manipulation Modules

  • pillow - A fork of an old-school, but incredibly powerful, project called Python Imaging Library or PIL, this is much easier to use and just as powerful.
  • OpenCV for Python - An old-school, but incrediby powerful, "Computer Vision" system. It takes a bit of work to get it installed on Windows ([see this guide])(https://www.learnopencv.com/install-opencv3-on-windows/), but then it offers some very powerful ways to get your computer to "see."

Image Manipulation Guides

Cool Built-In Modules

  • re - Excellent and flexible use of regular expressions.
  • collections - A bunch of different collectio objects. New favorite one is Counter, where you can feed in elements and it'll keep track of how frequently they are used.
  • random - "This module implements pseudo-random number generators for various distributions." They note, however, that "the pseudo-random generators of this module should not be used for security purposes. For security or cryptographic uses, see the secrets module."

Distribution

  • Making a PEX from a Python Script - Lovely guide for getting around the "that script is great but I don't have python installed on my computer" issue. Copiling a PEX will take all the dependencies, the Python engine, etc. and pack it into a one-shot executable file.

Reference

  • strftime.org - A handy little site for getting you exactly what you need. In this instance, a reminder of how to format the % sign and a bunch of letters to generate a date in the proper format. For example, %Y-%m-%d, in the right command, will generate 2018-07-24.
  • Comprehensive Python Cheat Sheet - Holy moly, comprehsnive is right. Here's the link to the Github Repo.

Testing

  • Hypothesis - "Hypothesis is a Python library for creating unit tests which are simpler to write and more powerful when run, finding edge cases in your code you wouldn’t have thought to look for. It is stable, powerful and easy to add to any existing test suite... It works by generating arbitrary data matching your specification and checking that your guarantee still holds in that case. If it finds an example where it doesn’t, it takes that example and cuts it down to size, simplifying it until it finds a much smaller example that still causes the problem. It then saves that example for later, so that once it has found a problem with your code it will not forget it in the future." Very cool.
  • MutMut - Mutation testing for Python. What's Mutation Testing? It's incredibly clever and you can read about it in this article at opensource.com or this original introductory article at hackernoon. This is a good summary, though:

Mutation testing is a way to be reasonably certain your code actually tests the full behavior of your code. Not just touches all lines like a coverage report will tell you, but actually tests all behavior, and all weird edge cases. It does this by changing the code in one place at a time, as subtly as possible, and running the test suite. If the test suite succeeds it counts as a failure, because it could change the code and your tests are blissfully unaware that anything is amiss.

  • PyTest - The OG python testing suite. It's so the OG that both Hypothesis and Mutmut basically build on top of it.

Random