Skip to content

Commit

Permalink
Merge branch 'master' into dataframe
Browse files Browse the repository at this point in the history
  • Loading branch information
fe2s committed Nov 8, 2018
2 parents 57bb869 + 78ff03d commit 117c65a
Show file tree
Hide file tree
Showing 3 changed files with 51 additions and 21 deletions.
4 changes: 2 additions & 2 deletions doc/dataframe.md
Original file line number Diff line number Diff line change
Expand Up @@ -75,7 +75,7 @@ It is used by spark-redis internally when reading DataFrame back to Spark memory

### Specifying Redis key

By default, spark-redis generates UUID identifier for each row to ensure
By default spark-redis generates UUID identifier for each row to ensure
their uniqueness. However, you can also provide your own column as a key. This is controlled with `key.column` option:

```scala
Expand Down Expand Up @@ -157,7 +157,7 @@ df.write

### Persistence model

By default, DataFrames are persisted as Redis Hashes. It allows to write data with Spark and query from non-Spark environment.
By default DataFrames are persisted as Redis Hashes. It allows to write data with Spark and query from non-Spark environment.
It also enables projection query optimization when only a small subset of columns are selected. On the other hand, there is currently
a limitation with Hash model - it doesn't support nested DataFrame schema. One option to overcome it is making your DataFrame schema flat.
If it is not possible due to some constraints, you may consider using Binary persistence model.
Expand Down
44 changes: 27 additions & 17 deletions doc/getting-started.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,25 +20,21 @@ cd spark-redis
mvn clean package -DskipTests
```

## Using the library
Add Spark-Redis to Spark with the `--jars` command line option. For example, use it from spark-shell, include it in the following manner:
### Using the library with spark shell
Add Spark-Redis to Spark with the `--jars` command line option.

```
```bash
$ bin/spark-shell --jars <path-to>/spark-redis-<version>-jar-with-dependencies.jar
```

Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 2.3.1
/_/
By default it connects to `localhost:6379` without any password, you can change the connection settings in the following manner:

Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_101)
```bash
$ bin/spark-shell --jars <path-to>/spark-redis-<version>-jar-with-dependencies.jar --conf "spark.redis.host=localhost" --conf "spark.redis.port=6379" --conf "spark.redis.auth=passwd"
```

The following sections contain code snippets that demonstrate the use of Spark-Redis. To use the sample code, you'll need to replace `your.redis.server` and `6379` with your Redis database's IP address or hostname and port, respectively.

### Configuring Connections to Redis using SparkConf
### Configuring connection to Redis in a self-contained application

Below is an example configuration of SparkContext with redis configuration:

Expand All @@ -47,21 +43,33 @@ import com.redislabs.provider.redis._

...

sc = new SparkContext(new SparkConf()
val sc = new SparkContext(new SparkConf()
.setMaster("local")
.setAppName("myApp")

// initial redis host - can be any node in cluster mode
.set("spark.redis.host", "localhost")

// initial redis port
.set("spark.redis.port", "6379")

// optional redis AUTH password
.set("spark.redis.auth", "")
.set("spark.redis.auth", "passwd")
)
```

The SparkSession can be configured in a similar manner:

```scala
val spark = SparkSession
.builder()
.appName("myApp")
.master("local[*]")
.config("spark.redis.host", "localhost")
.config("spark.redis.port", "6379")
.config("spark.redis.auth", "passwd")
.getOrCreate()

val sc = spark.sparkContext
```

### Create RDD

```scala
Expand All @@ -83,6 +91,8 @@ df.write
### Create Stream

```scala
import com.redislabs.provider.redis._

val ssc = new StreamingContext(sc, Seconds(1))
val redisStream = ssc.createRedisStream(Array("foo", "bar"),
storageLevel = StorageLevel.MEMORY_AND_DISK_2)
Expand Down
24 changes: 22 additions & 2 deletions doc/python.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,9 +8,16 @@ Here is an example:
1. Run `pyspark` providing the spark-redis jar file

```bash
$ ./bin/pyspark --jars /your/path/to/spark-redis-<version>-jar-with-dependencies.jar
$ ./bin/pyspark --jars <path-to>/spark-redis-<version>-jar-with-dependencies.jar
```

By default it connects to `localhost:6379` without any password, you can change the connection settings in the following manner:

```bash
$ bin/spark-shell --jars <path-to>/spark-redis-<version>-jar-with-dependencies.jar --conf "spark.redis.host=localhost" --conf "spark.redis.port=6379" --conf "spark.redis.auth=passwd"
```


2. Read DataFrame from json, write/read from Redis:
```python
df = spark.read.json("examples/src/main/resources/people.json")
Expand All @@ -19,7 +26,7 @@ loadedDf = spark.read.format("org.apache.spark.sql.redis").option("table", "peop
loadedDf.show()
```

2. Check the data with redis-cli:
3. Check the data with redis-cli:

```bash
127.0.0.1:6379> hgetall people:Justin
Expand All @@ -29,3 +36,16 @@ loadedDf.show()
4) "Justin"
```

The self-contained application can be configured in the following manner:

```python
SparkSession\
.builder\
.appName("myApp")\
.config("spark.redis.host", "localhost")\
.config("spark.redis.port", "6379")\
.config("spark.redis.auth", "passwd")\
.getOrCreate()
```


0 comments on commit 117c65a

Please sign in to comment.