Currently, local-mode does not work for PySparkProcessor
due to YARN not being configured correctly for local setups. To enable local development, we created an enhanced version of the PySparkProcessor
which overrides the underlying functionality of the SageMaker SDK, and runs Spark in local mode rather than using YARN. This enhanced version also preserves the interface that exists with the original PySparkProcessor
. It's important to note that this project should serve only as a stop-gap solution (until local-mode is natively supported in SageMaker SDK).
To install:
pip install git+https://github.com/aws-samples/enhanced-pyspark-processor
Please refer to the notebook example for usage patterns.
The following versions have been tested for compatibility.
SageMaker SDK | Spark | Compatible? |
---|---|---|
sagemaker >= 2.22.0, <= 2.61.0 |
2.4 |
✔️ |
sagemaker >= 2.22.0, <= 2.61.0 |
3.0 |
✔️ |
See CONTRIBUTING for more information.
This project is licensed under the Apache-2.0 License.