- SPARK-15777 (Partial fix) - Catalog federation
- make ExternalCatalog configurable beyond in memory and hive
- FileIndex for catalog tables is provided by external catalog instead of using default impl
- SPARK-33089 Enhance ExecutorPlugin API to include callbacks on task start and end events
- Merged upstream, remove when we migrate to 3.1
- SPARK-18079 - CollectLimitExec.executeToIterator should perform per-partition limits
- SPARK-20952 - ParquetFileFormat should forward TaskContext to its forkjoinpool
- SPARK-26626 - Limited the maximum size of repeatedly substituted aliases
- SPARK-25200 - Allow setting HADOOP_CONF_DIR as a spark config
- SafeLogging implemented for the following files:
- core: Broadcast, CoarseGrainedExecutorBackend, CoarseGrainedSchedulerBackend, Executor, MapOutputTracker (partial), MemoryStore, SparkContext, TorrentBroadcast
- kubernetes: ExecutorPodsAllocator, ExecutorPodsLifecycleManager, ExecutorPodsPollingSnapshotSource, ExecutorPodsSnapshot, ExecutorPodsWatchSnapshotSource, KubernetesClusterSchedulerBackend
- yarn: YarnClusterSchedulerBackend, YarnSchedulerBackend
- SPARK-20001 (SPARK-13587) - Support PythonRunner executing inside a Conda env (and R)
- SPARK-21195 - Automatically register new metrics from sources and wire default registry
spark.sql.parquet.outputTimestampType
defaults toINT64 (TIMESTAMP_MICROS)
- palantir/spark#381 Gradle plugin to easily create custom docker images for use with k8s
- palantir/spark#521 K8s local file mounting
- palantir/spark#600 K8s local deploy mode
- Support Arrow-serialization of Python 2 strings (#679)
- palantir/spark#678 TODO: Revert 679 once we move off of python 2
- Filter rLibDir by exists so that daemon.R references the correct file (#460)
- Add pre-installed conda configuration and use to find rlib directory (#700)