Merging develop (v1.2) to master (#9)

* Updated README.md * Base extractor - Initial cut (#1) * Initial commit of base for extractors * Fixed dockerfile * Added README * Changed extractor to transformer in readme * Updated README * Fleshing out functionality * Updates for file list support * Variable name clarification * Bug fixes * Optimizing Dockerfile * Fixed typo * Merging into Develop (#2) * Initial commit of base for extractors * Fixed dockerfile * Added README * Changed extractor to transformer in readme * Updated README * Fleshing out functionality * Updates for file list support * Variable name clarification * Bug fixes * Optimizing Dockerfile * Fixed typo * Debugging * Removed TERRA REF left-over commented out code * Adding command line parameter storage * Added text abour return codes * Fix up transformer declared types (#4) * Changed list to tuple * type declaration correction and additional commenting * Pylint changes * Test development (#7) * Adding test files * Was missing the 'test_transformer.py' file and made some changes to 'test_transformer_class.py' * adding the initial travis yml * Filled out the yml file * Changed the tests from base unittest to pytest * Changed to pytest over basic unittest * Moved tests to own folder * Removed duplicate files * Moved the yml file out of base-image * modified travis yml to move directories for testing * Added -v to pytest run for more clarity * Ignoring a test for testing purposes * Reverted git ignore changes and temporarily removed a test * Placed test_entrypoint.py back into folder * Pushing from new local * Changed file naming conventions to be more conformative * removing duplicates * Fixed a test * Added pylint installation to travis yml * Added test doc as a link in main readme. * fixed a formatting issue * Initial test folder readme draft * Clarified testing process * added links to helpful documentation * Clarified type of testing being done * Adding pylint file * renamed pylintrc file * changes to pylint file * fixed pylint call * trying to include all files from test folder * moved pylint file * testing pylint * Fixed yml file * fixing yml * I think I figured out the problem * I'll seperate them and make to calls * Made seperate calls * misspelled directory name * made some pylint recommended adjustments * now pulling pylintrc from repo * adding organization repo * Hopefully this works * Updated readme * Added some new lines * Trying again * hopefully this is it * please * done * Minor readme changes * Added pylint protocol link in readmes * Pylint changes * made some pylint recomended changes * Helping pylint along * Directory issues * Resetting python path * adding another line * adding another line * adding another line * added some ignore lines * adding another line * Moved files around * Finding pylint rc * Missed a quotation mark * fixed pylint call * pylint diables * Update base-image/test-files/README.md Co-Authored-By: Chris Schnaufer <schnaufer@email.arizona.edu> * Update base-image/test-files/README.md Co-Authored-By: Chris Schnaufer <schnaufer@email.arizona.edu> * Update base-image/test-files/README.md Co-Authored-By: Chris Schnaufer <schnaufer@email.arizona.edu> * Update base-image/test-files/README.md Co-Authored-By: Chris Schnaufer <schnaufer@email.arizona.edu> * Update base-image/test-files/README.md Co-Authored-By: Chris Schnaufer <schnaufer@email.arizona.edu> * Incorporated Chris' suggestions * Fixed a mistake * Made changes recommended by Chris * missed a change * Loading multiple metadata files & pylint changes (#6) * Update .gitignore, allowing multiple --metadata arguments, spelling & pylint changes * Logging what metadatafiles are loading * Added ability to download files after transformer gives the go-ahead (#8) * Update .gitignore, allowing multiple --metadata arguments, spelling & pylint changes * Logging what metadatafiles are loading * Adding call to download files Co-authored-by: Jorge Barrios <jorgebarrios@email.arizona.edu>
AgPipeline · Jan 22, 2020 · 23045ba · 23045ba
1 parent 7593f1e
commit 23045ba
Show file tree

Hide file tree

Showing 11 changed files with 414 additions and 46 deletions.
diff --git a/.gitignore b/.gitignore
@@ -102,3 +102,6 @@ venv.bak/
 
 # mypy
 .mypy_cache/
+
+# pycharm
+.idea
diff --git a/.travis.yml b/.travis.yml
@@ -0,0 +1,20 @@
+language: python
+
+python:
+  - "3.7"
+
+cache: pip
+
+install: 
+ - "pip install pytest"
+ - "pip install pylint"
+
+script:
+ - "git clone https://github.com/AgPipeline/Organization-info.git"
+ - mv base-image/test-files/*.py base-image
+ - cd base-image
+ - rm -r test-files
+ - cd ..
+ - pylint --rcfile=Organization-info/pylint.rc base-image/*.py
+ - "python -m pytest -v"
+
diff --git a/README.md b/README.md
@@ -22,3 +22,6 @@ Be sure to read the [organization documentation](https://github.com/AgPipeline/O
 Every folder in this repo must have a README.md clearly explaining the interface for derived images, how to create a derived image, and other information on how to use the images created.
 Providing a quick start guide with links to more detailed information a good approach for some situations.
 The goal is to provide documentation for users of these base images that makes it easy for them to be used.
+
+## Testing
+The testing modules and readme may be found in the [testing folder](https://github.com/AgPipeline/base-docker-support/tree/test-development/base-image/test-files).
diff --git a/base-image/README.md b/base-image/README.md
@@ -87,10 +87,17 @@ If the transformer.py file has a function named `check_continue` it will be call
 The return from the check_continue() function is used to determine if processing should continue.
 If the function is not defined, processing will continue automatically.
 
-5. Processsing:
-The `perform_process` function in transformer.py is called getting passed the transformer_class.Transformer instance and any parameters previously defined.
+5. Retrieve Files:
+If the transformer_class.Transformer instance has a method named `retrieve_files` it will be called getting passed the dictionary returned by `transformer_class.Transformer.get_transformer_params()` (see step 3.) and the loaded metadata.
+This allows the downloading of data when the transformer has determined it can proceed (see step 4.).
+If this function is not defined, processing will continue automatically. 
 
-6. Result Handling:
+6. Processsing:
+The `perform_process` function in transformer.py is called getting passed the transformer_class.Transformer instance and any parameters previously defined (see step 3.).
+This performs the processing of the data.
+It's important to note that the dictionary returned in step 3 is used to populate the parameter list of the `perform_process` call.
+
+7. Result Handling:
 The result of the above steps may produce warnings, errors, or successful results.
 These results can be stored in a file, printed to standard output, and/or returned to the caller of `do_work`.
 In the default case that we're exploring here, the return value from do_work is ignored.
@@ -102,7 +109,7 @@ The following command line parameters are defined for all transformers.
 * -h: (optional parameter) display help message (automatically defined by argparse)
 * --info, -i: (optional parameter) enable info level logging messages
 * --result: (optional parameter) how to handle the result of processing; one or more comma-separated strings of: all, file, print
-* --metadata: mandatory path to file containing JSON metadata
+* --metadata: mandatory path to file containing JSON metadata; can be specified multiple times
 * --working_space: path to folder to use as a working space and file store
 * the "file_list" argument contains all additional parameters (which are assumed to be file names but may not be)
 

diff --git a/base-image/entrypoint.py b/base-image/entrypoint.py
@@ -6,6 +6,7 @@
 import os
 import json
 import logging
+from typing import Optional
 
 import transformer_class
 
@@ -22,7 +23,7 @@ def __init__(self):
 
     @staticmethod
     def handle_error(code: int, message: str) -> dict:
-        """Handles logging and return values when an error ocurrs. Implies processing
+        """Handles logging and return values when an error occurs. Implies processing
            will stop.
         Arguments:
             code: return code related to the error
@@ -35,14 +36,12 @@ def handle_error(code: int, message: str) -> dict:
             code = -1
         if not message:
             logging.warning("An error has occurred without a message, setting default message")
-            message = "An error has ocurred with error code (%s)" % str(code)
+            message = "An error has occurred with error code (%s)" % str(code)
 
         logging.error(message)
         logging.error("Stopping processing")
 
-        result = {}
-        result['error'] = message
-        result['code'] = code
+        result = {'error': message, 'code': code}
 
         return result
 
@@ -52,13 +51,13 @@ def load_metadata(metadata_path: str) -> dict:
         Arguments:
             metadata_path: path to the metadata file
         Return:
-            Returns a dict containing the loaded metadata. If an error ocurrs, the dict
+            Returns a dict containing the loaded metadata. If an error occurs, the dict
             won't contain metadata but will contain an error message under an 'error' key.
         """
         try:
             with open(metadata_path, 'r') as in_file:
                 md_load = json.load(in_file)
-                if not md_load is None:
+                if md_load is not None:
                     md_return = {'metadata': md_load}
                 else:
                     msg = 'Invalid JSON specified in metadata file "%s"' % metadata_path
@@ -72,6 +71,83 @@ def load_metadata(metadata_path: str) -> dict:
 
         return md_return
 
+    @staticmethod
+    def check_params_result_error(transformer_params: dict) -> Optional[dict]:
+        """Checks the transformer parameter results for an error
+        Arguments:
+            transformer_params: the dictionary to check for errors
+        Return:
+            An error dict if errors were found and None if not
+        Notes:
+            See handle_error() function
+        """
+        if 'code' in transformer_params:
+            if 'error' in transformer_params:
+                error = transformer_params['error']
+            else:
+                error = "Error returned from get_transformer_params with code: %s" % transformer_params['code']
+            return __internal__.handle_error(-104, error)
+
+        return None
+
+    @staticmethod
+    def check_retrieve_results_error(transformer_retrieve: tuple) -> Optional[dict]:
+        """Checks the results of the transformer_class retrieving files
+        Arguments:
+            transformer_retrieve: the results of the retrieve
+        Return:
+            An error dict if errors were found and None if not
+        Notes:
+            See handle_error() function
+        """
+        if not transformer_retrieve:
+            return None
+
+        code = 0
+        message = None
+        retrieve_len = len(transformer_retrieve)
+        if retrieve_len > 0:
+            code = transformer_retrieve[0]
+        if retrieve_len > 1:
+            message = transformer_retrieve[1]
+        else:
+            message = "Retrieving files returned a code of %s" % str(code)
+
+        # Check for an error code
+        if code < 0:
+            return __internal__.handle_error(code, message)
+
+        # Log the message if we get one returned to us
+        if retrieve_len > 1:
+            logging.info(transformer_retrieve[1])
+
+        return None
+
+    @staticmethod
+    def load_metadata_files(metadata_files: list) -> dict:
+        """Loads the specified metadata files
+        Arguments:
+            metadata_files: list of metadata files to load
+        Returns:
+            Returns a dict containing the loaded metadata as an array. If an error occurs, the dict
+            won't contain metadata but will contain an error message under an 'error' key.
+        """
+        metadata = []
+        result = {'metadata': metadata}
+        for metadata_file in metadata_files:
+            if not os.path.exists(metadata_file):
+                result = __internal__.handle_error(-2, "Unable to access metadata file '%s'" % metadata_file)
+                break
+            logging.info("Loading metadata from file: '%s'", metadata_file)
+            md_loaded = __internal__.load_metadata(metadata_file)
+            if 'metadata' in md_loaded:
+                metadata.append(md_loaded['metadata'])
+            else:
+                result = __internal__.handle_error(-3, md_loaded['error'])
+                break
+
+        return result
+
     @staticmethod
     def parse_continue_result(result) -> tuple:
         """Parses the result of calling transformer.check_continue and returns
@@ -129,7 +205,31 @@ def handle_check_continue(transformer_instance: transformer_class.Transformer, t
         return result
 
     @staticmethod
-    def perform_processing(transformer_instance, args: argparse.ArgumentParser, metadata: dict) -> dict:
+    def handle_retrieve_files(transformer_instance: transformer_class.Transformer, args: argparse.Namespace, metadata: dict) ->\
+            Optional[dict]:
+        """Handles calling the transformer class to retrieve files
+        Arguments:
+            transformer_instance: the current transformer environment
+            args: the command line arguments
+            metadata: the loaded metadata
+        Return:
+            A dict containing error information if a problem occurs and None if no problems are found.
+        Note:
+            A side effect of this function is a information message logged if the transformer class instance does not have
+            a 'retrieve_files' function declared.
+        """
+        if hasattr(transformer_instance, 'retrieve_files'):
+            transformer_retrieve = transformer_instance.retrieve_files(args, metadata)
+            retrieve_results = __internal__.check_retrieve_results_error(transformer_retrieve)
+            if retrieve_results:
+                return retrieve_results
+        else:
+            logging.info("Transformer class doesn't have function named 'retrieve_files'")
+
+        return None
+
+    @staticmethod
+    def perform_processing(transformer_instance: transformer_class.Transformer, args: argparse.Namespace, metadata: dict) -> dict:
         """Makes the calls to perform the processing
         Arguments:
             transformer_instance: instance of transformer class
@@ -144,16 +244,14 @@ def perform_processing(transformer_instance, args: argparse.ArgumentParser, meta
         if hasattr(transformer_instance, 'get_transformer_params'):
             transformer_params = transformer_instance.get_transformer_params(args, metadata)
             if not isinstance(transformer_params, dict):
-                return __internal__.handle_error(-101, \
-                    "Invalid return from getting transformer parameters from transformer class instance")
-            if 'code' in transformer_params:
-                if 'error' in transformer_params:
-                    error = transformer_params['error']
-                else:
-                    error = "Error returned from get_transformer_params with code: %s" % transformer_params['code']
-                return __internal__.handle_error(-101, error)
+                return __internal__.handle_error(-101,
+                                                 "Invalid return from getting transformer parameters from transformer class instance")
+
+            params_result = __internal__.check_params_result_error(transformer_params)
+            if params_result:
+                return params_result
         else:
-            logging.debug("Transformer class instance does not have get_transformer_params method")
+            logging.info("Transformer class instance does not have get_transformer_params method")
             transformer_params = {}
 
         # First check if the transformer thinks everything is in place
@@ -162,31 +260,30 @@ def perform_processing(transformer_instance, args: argparse.ArgumentParser, meta
             if 'code' in result and result['code'] < 0 and 'error' not in result:
                 result['error'] = "Unknown error returned from check_continue call"
         else:
-            logging.debug("transformer module doesn't have a function named 'check_continue'")
+            logging.info("Transformer module doesn't have a function named 'check_continue'")
 
         # Retrieve additional files if indicated by return code from the check
-        if not 'error' in result and 'code' in result and result['code'] == 0:
-            # TODO: Fetch additional data for processing
-            pass
+        if 'error' not in result and 'code' in result and result['code'] == 0:
+            result = __internal__.handle_retrieve_files(transformer_instance, args, metadata)
 
         # Next make the call to perform the processing
-        if not 'error' in result:
+        if 'error' not in result:
             if hasattr(transformer, 'perform_process'):
                 result = transformer.perform_process(transformer_instance, **transformer_params)
             else:
-                logging.debug("transformer module is missing function named 'perform_process'")
+                logging.debug("Transformer module is missing function named 'perform_process'")
                 return __internal__.handle_error(-102, "Transformer perform_process interface is not available " +
                                                  "for processing data")
 
         return result
 
     @staticmethod
-    def handle_result(result_types: str, result_file_path: str, result: dict) -> None:
+    def handle_result(result: dict, result_types: str = None, result_file_path: str = None) -> dict:
         """Handles the results of processing as dictated by the arguments passed in.
         Arguments:
-            result_types: comma separated string containing one or more of: all, file, print
-            result_file_path: location to place result file
             result: the dictionary of result information
+            result_types: optional, comma separated string containing one or more of: all, file, print
+            result_file_path: optional, location to place result file
         Return:
             Returns the result parameter
         Notes:
@@ -224,7 +321,7 @@ def add_parameters(parser: argparse.ArgumentParser, transformer_instance) -> Non
     parser.add_argument('--result', nargs='?', default='all',
                         help='Direct the result of a run to one or more of (all is default): "all,file,print"')
 
-    parser.add_argument('--metadata', type=str, help='The path to the source metadata')
+    parser.add_argument('--metadata', type=str, action='append', help='The path to the source metadata')
 
     parser.add_argument('--working_space', type=str, help='the folder to use use as a workspace and ' +
                         'for storing results')
@@ -240,7 +337,7 @@ def add_parameters(parser: argparse.ArgumentParser, transformer_instance) -> Non
     # Assume the rest of the arguments are the files
     parser.add_argument('file_list', nargs=argparse.REMAINDER, help='additional files for transformer')
 
-def do_work(parser: argparse.ArgumentParser, **kwargs) -> None:
+def do_work(parser: argparse.ArgumentParser, **kwargs) -> dict:
     """Function to prepare and execute work unit
     Arguments:
         parser: an instance of argparse.ArgumentParser
@@ -252,7 +349,7 @@ def do_work(parser: argparse.ArgumentParser, **kwargs) -> None:
     transformer_instance = transformer_class.Transformer(**kwargs)
     if not transformer_instance:
         result = __internal__.handle_error(-100, "Unable to create transformer class instance for processing")
-        return __internal__.handle_result(None, None, result)
+        return __internal__.handle_result(result, None, None)
 
     add_parameters(parser, transformer_instance)
     args = parser.parse_args()
@@ -262,22 +359,20 @@ def do_work(parser: argparse.ArgumentParser, **kwargs) -> None:
 
     # Check that we have mandatory metadata
     if not args.metadata:
-        result = __internal__.handle_error(-1, "No metadata path was specified.")
-    elif not os.path.exists(args.metadata):
-        result = __internal__.handle_error(-2, "Unable to access metadata file '%s'" % args.metadata)
+        result = __internal__.handle_error(-1, "No metadata paths were specified.")
     else:
-        md_result = __internal__.load_metadata(args.metadata)
-        if 'metadata' in md_result:
-            result = __internal__.perform_processing(transformer_instance, args, md_result['metadata'])
+        md_results = __internal__.load_metadata_files(args.metadata)
+        if 'metadata' in md_results:
+            result = __internal__.perform_processing(transformer_instance, args, md_results['metadata'])
         else:
-            result = __internal__.handle_error(-3, md_result['error'])
+            result = __internal__.handle_error(-3, md_results['error'])
 
     if args.working_space:
         result_path = os.path.join(args.working_space, 'result.json')
     else:
         result_path = None
 
-    __internal__.handle_result(args.result, result_path, result)
+    __internal__.handle_result(result, args.result, result_path)
     return result
 
 if __name__ == "__main__":

diff --git a/base-image/test-files/README.md b/base-image/test-files/README.md
@@ -0,0 +1,38 @@
+# Transformer Unit Tests
+This folder holds the unit tests [TravisCI](https://travis-ci.org/) will run in it's build.
+These are basic functional unit tests, checks for output formatting and typing. All tests are written in and utilizing [pytest](https://docs.pytest.org/en/latest/).
+In addition to this, [pylint](https://www.pylint.org/) will also be deployed.
+Look into our organization's repository for our [pylint protocols](https://github.com/AgPipeline/Organization-info)
+
+### Running the tests
+Upon submitting a pull request Travis will build and run the testing modules automatically and will return a report with all passing or failing code. 
+
+### Running the tests before submitting a pull request
+Should you wish to test your code before submitting a pull request follow these steps:
+1) Clone, pull, copy or otherwise aquire the pylintrc file located at this [repo](https://github.com/AgPipeline/Organization-info)
+
+2) From the command line run the following commands (from base-docker-support as current directory)
+    ```sh
+    pylint --rcfile=<path-to-pylint.rc> base-image/*py
+    pylint --rcfile=<path-to-pylint.rc> base-image/**/*py
+    ```
+3) Once the previous commands have executed there will be a list of changes that should be made to bring any code up to standard
+4) From the command line run the following command while the current working directory is base-image 
+    ```sh
+    python -m pytest -v
+    ```
+    or
+    ```sh
+    python3 -m pytest -v
+    ```
+
+
+### Requirements 
+
+There are no additional requirements or dependancies if not running these tests locally, if however these are to be run before deploying travis the following are required. 
+
+python 3 \
+pylint \
+pytest
+
+All three may be installed using pip, conda, or another preferred method.