- Ingest from the CLI
- Ingest from filesystem -> GeoWave
- Pointer: LocalToGeowaveCommand.java
- Notes: Tested, believed to work.
- Ingest from filesystem -> HDFS -> GeoWave
- Pointer: LocalToMapReduceToGeowaveCommand.java
- Notes: Not tested, status unknown.
- Stage from filesystem -> HDFS
- Pointer: LocalToHdfsCommand.java
- Notes: Not tested, status unknown.
- Stage from filesystem -> Kafka
- Pointer: LocalToKafkaCommand.java
- Notes: Not tested, status unknown.
- Ingest from Kafka -> GeoWave
- Pointer: KafkaToGeowaveCommand.java
- Notes: Not tested, status unknown.
- Ingest from HDFS -> GeoWave
- Pointer: MapReduceToGeowaveCommand.java
- Notes: Not tested, status unknown.
- Pointer: GeoWaveMain.java
- Notes: Requires plugins for each input file format, jars must be in the classpath. The source code for those formats can be found in the
extensions/formats
directory.
- Ingest from filesystem -> GeoWave
- Ingest Using the API
- Bulk
- Pointer: AccumuloKeyValuePairGenerator.java
- Notes: For data already stored on HDFS, given an appropriate InputFormat, it is possible to create a class derived from org.apache.hadoop.mapreduce.Mapper that can be used to bulk-insert the data into Accumulo (with appropriate GeoWave keys and values).
- Piecemeal
- Pointer: IndexWriter.java
- Notes: Given an appropriate DataStore, Adapter, and Index, it is possible to produce an IndexWriter that can be used to write items one-by-one into the DataStore, using the Adapater, and according to the Index.
- Bulk
- File Formats Supported
- avro
- gdelt
- geolife
- geotools-raster (GeoTools-supported raster data)
- geotools-vector (GeoTools-supported vector data)
- gpx
- stanag4676
- tdrive
- Via Extensions:
- HBase
- Accumulo
- mrgeo (reading)
- GeoTrellis (prospective) (reading and writing)
- Via C++ bindings
- PDAL (reading and writing)
- mapnik (reading)
- Numerical
- Temporal
- Textual
- Pointer: TextSecondaryIndexConfiguration.java
- User Defined
- Pointer: SimpleFeatureUserDataConfiguration.java
- Examples:
- TimeDescriptionConfiguration allows indexing over intervals
- VisibilityConfiguration.java
- StatsConfigurationCollection.java
- No Cost-Based Optimization
- k-means
- CLI
- Pointer: KmeansParallelCommand.java
- Map-Reduce
- Pointer: KMeansMapReduce.java
- Notes: Means are supported, it appears that medians and centers are not.
- CLI
- Jump Method (k-discovery)
- CLI
- Pointer: KmeansJumpCommand.java
- Map-Reduce
- Pointer: KMeansDistortionMapReduce.java
- Notes: Uses the approach given in Catherine A. Sugar; Gareth M. James (2003). "Finding the number of clusters in a data set: An information theoretic approach". Journal of the American Statistical Association 98 (January): 750–763 with the common covariance matrix set to the identity matrix.
- CLI
- Sampling
- Map-Reduce
- Pointer: KSamplerMapReduce.java
- Notes: Chooses k random features from either the overall collection of features or from a (some) group(s) of features.
- Map-Reduce
- Kernel Density Estimation
- CLI
- Pointer: KdeCommand.java
- Map-Reduce
- Pointer: AccumuloKDEReducer.java
- Notes: General background can be found on the Wikipedia page. This implementation uses Gaussian basis functions.
- CLI
- Nearest Neighbors
- CLI
- Pointer: NearestNeighborCommand.java
- Map-Reduce
- Pointer: NNMapReduce.java
- Notes: Appears to use partitioned direct search.
- CLI
- Clustering
- Map-Reduce
- Convex Hulls of Clusters
- Map-Reduce
- Pointer: ConvexHullMapReduce.java
- Map-Reduce
- DBSCAN
- Map-Reduce
- Pointer: DBScanMapReduce.java
- Notes: See the Wikipedia page for general background. This implementation deviates from the standard approach. See this code comment for details.
- Map-Reduce
- Spark Support
- Pointer: analytics/spark/src/main/scala/mil/nga/giat/geowave/analytics/spark
- Pointer: AnalyticRecipes.scala
- Notes: The analytic recipes file provides tidbits useful for addressing clustering-related questions with GeoWave and Spark.
- GeoServer Plugin
- Pointer: GeoserverServiceImpl.java
- Notes: This can be used to view GeoWave Layers in GeoServer. Using an SLD such as this one allows large datasets to be shown interactively by subsampling at the pixel level
- Query
- RDD
- Pointer: GeoWaveInputFormat.java
- Notes: One can use the GeoWaveInputFormat class to perform a query which returns an RDD of key, value pairs. The procedure for doing that is to construct GeoWave Query and QueryOptions objects then insert them into a org.apache.hadoop.conf.Configuration object using static methods on GeoWaveInputFormat, the passing that configuration object into a call to the newAPIHadoopRDD method.
- Iterator
- Pointer: DataStore.java
- Notes: It is possible to perform a query which returns an iterator of values. Given a DataStore, a Query, and a QueryOptions, one uses the query method on the DataStore class.
- RDD