river and feeder mode, bulk fixes, examples added

jprante · May 10, 2014 · 9de2af7 · 9de2af7
1 parent 771490b
commit 9de2af7
Show file tree

Hide file tree

Showing 144 changed files with 3,628 additions and 6,729 deletions.
diff --git a/.travis.yml b/.travis.yml
@@ -1 +1,3 @@
 language: java
+jdk:
+  - oraclejdk8
diff --git a/README.rst b/README.rst
@@ -2,15 +2,14 @@
 
 Image by `icons8 <http://www.iconsdb.com/icons8/?icon=database>`_ Creative Commons Attribution-NoDerivs 3.0 Unported.
 
-Elasticsearch JDBC river
-========================
+JDBC plugin for Elasticsearch
+=============================
+.. image:: https://travis-ci.org/jprante/elasticsearch-river-jdbc.png
 
-The Java Database Connection (JDBC) `river <http://www.elasticsearch.org/guide/reference/river/>`_  allows to fetch data from JDBC sources for indexing into `Elasticsearch <http://www.elasticsearch.org>`_.
+The Java Database Connection (JDBC) plugin allows to fetch data from JDBC sources for indexing into `Elasticsearch <http://www.elasticsearch.org>`_.
 
 It is implemented as an `Elasticsearch plugin <http://www.elasticsearch.org/guide/reference/modules/plugins.html>`_.
 
-The relational data is internally transformed into structured JSON objects for the schema-less indexing model in Elasticsearch.
-
 Creating a JDBC river is easy. Install the plugin. Download a JDBC driver jar from your vendor's site (here MySQL) and put the jar into the folder of the plugin `$ES_HOME/plugins/river-jdbc`.
 Then issue this simple command::
 
@@ -26,36 +25,58 @@ Then issue this simple command::
         }
     }'
 
-Installation
-------------
+Plugin works as a river or a feeder
+-----------------------------------
 
-.. image:: https://travis-ci.org/jprante/elasticsearch-river-jdbc.png
+The plugin can operate as a river in "pull mode" or as a feeder in "push mode". In feeder mode, the plugin
+runs in a separate JVM and can connect to a remote Elasticsearch cluster.
 
-Prerequisites
+.. image:: ../../../elasticsearch-river-jdbc/raw/master/src/site/resources/jdbc-river-feeder-architecture.png
 
-  - a JDBC driver jar for your database (download from vendor site and put into JDBC river plugin folder)
+The relational data is internally transformed into structured JSON objects for the schema-less indexing model
+of Elasticsearch documents.
 
-=============  ===========  =================  =============================================================
-ES version     Plugin       Release date       Command
--------------  -----------  -----------------  -------------------------------------------------------------
-0.90.3         0.90.3.1     Jan 31, 2014       ./bin/plugin -install river-jdbc -url http://bit.ly/1emqDH9
-0.90.10        0.90.10.2    Jan 31, 2014       ./bin/plugin -install river-jdbc -url http://bit.ly/1a8Mcve
-1.0.0          1.0.0.2      Mar 31, 2014       ./bin/plugin --install river-jdbc --url http://bit.ly/1gIk4jW
-1.1.0          1.1.0.0      Apr 5, 2014        ./bin/plugin --install river-jdbc --url http://bit.ly/1iadfnF
-=============  ===========  =================  =============================================================
+.. image:: ../../../elasticsearch-river-jdbc/raw/master/src/site/resources/simple-tabular-json-data.png
 
-Do not forget to restart the node after installing.
+Both ends are scalable. The plugin can fetch data from different RDBMS source in parallel, and multithreaded
+bulk mode ensures high throughput when indexing to Elasticsearch.
 
-Project docs
+.. image:: ../../../elasticsearch-river-jdbc/raw/master/src/site/resources/tabular-json-data.png
+
+Versions
+--------
+
+=============  ===========  =================
+ES version     Plugin       Release date
+-------------  -----------  -----------------
+1.1.0          1.1.0.1      May 10, 2014
+=============  ===========  =================
+
+Prerequisites
+-------------
+
+- a JDBC driver jar for your database (download from vendor site and put into JDBC river plugin folder)
+
+Installation
 ------------
 
-The Maven project site is available at `Github <http://jprante.github.io/elasticsearch-river-jdbc>`_
+    ./bin/plugin --install river-jdbc --url http://xbib.org/repository/org/xbib/elasticsearch/plugin/elasticsearch-river-jdbc/1.1.0.1/elasticsearch-river-jdbc-1.1.0.1-plugin.zip
 
-Binaries
+Do not forget to restart the node after installing.
+
+Checksum
 --------
 
-Binaries are available at `Bintray <https://bintray.com/pkg/show/general/jprante/elasticsearch-plugins/elasticsearch-river-jdbc>`_
+===========================================   ========================================
+File                                          SHA1
+-------------------------------------------   ----------------------------------------
+elasticsearch-river-jdbc-1.1.0.1-plugin.zip   1065a30897beddd4e37cb63ca40500a02319dbe7
+===========================================   ========================================
+
+Project docs
+------------
 
+The Maven project site is available at `Github <http://jprante.github.io/elasticsearch-river-jdbc>`_
 
 Documentation
 -------------
@@ -89,7 +110,7 @@ License
 
 Elasticsearch JDBC River Plugin
 
-Copyright (C) 2012,2013 Jörg Prante
+Copyright (C) 2012-2014 Jörg Prante
 
 Licensed under the Apache License, Version 2.0 (the "License");
 you may not use this file except in compliance with the License.

diff --git a/bin/create-postgresql-river.sh b/bin/create-postgresql-river.sh
diff --git a/bin/create-postgresql-river.sh~ b/bin/create-postgresql-river.sh~
diff --git a/bin/feeder/h2/create.sh b/bin/feeder/h2/create.sh
@@ -0,0 +1,30 @@
+#!/bin/sh
+
+java="/usr/bin/java"
+#java="/Library/Java/JavaVirtualMachines/jdk1.8.0.jdk/Contents/Home/bin/java"
+#java="/usr/java/jdk1.8.0/bin/java"
+
+echo '
+{
+    "concurrency" : 1,
+    "elasticsearch" : "es://localhost:9300?es.cluster.name=elasticsearch",
+    "client" : "ingest",
+    "index" : "myh2",
+    "type" : "myh2",
+    "jdbc" : [
+      {
+        "url" : "jdbc:h2:test",
+        "user" : "",
+        "password" : "",
+        "sql" : [
+          {
+            "statement" : "select *, created as _id, \"myjdbc\" as _index, \"mytype\" as _type from \"orders\""
+          }
+        ]
+      }
+    ]
+}
+' | ${java} \
+    -cp $(pwd):$(pwd)/\*:$(pwd)/../../lib/\* \
+    org.xbib.elasticsearch.plugin.feeder.Runner \
+    org.xbib.elasticsearch.plugin.feeder.jdbc.JDBCFeeder
diff --git a/bin/feeder/mysql/create.sh b/bin/feeder/mysql/create.sh
@@ -0,0 +1,45 @@
+#!/bin/sh
+
+java="/usr/bin/java"
+#java="/Library/Java/JavaVirtualMachines/jdk1.8.0.jdk/Contents/Home/bin/java"
+#java="/usr/java/jdk1.8.0/bin/java"
+
+echo '
+{
+    "concurrency" : 2,
+    "elasticsearch" : "es://localhost:9300?es.cluster.name=elasticsearch",
+    "client" : "bulk",
+    "jdbc" : [
+      {
+            "url" : "jdbc:mysql://localhost:3306/test",
+            "user" : "",
+            "password" : "",
+            "sql" : [
+                {
+                    "statement" : "select *, created as _id, \"myjdbc\" as _index, \"mytype\" as _type from orders"
+                }
+            ],
+            "index" : "myjdbc",
+            "type" : "mytype",
+            "index_settings" : {
+                "index" : {
+                    "number_of_shards" : 1
+                }
+            }
+      },
+      {
+            "url" : "jdbc:mysql://localhost:3306/test",
+            "user" : "",
+            "password" : "",
+            "sql" : [
+                {
+                    "statement" : "select *, name as _id, \"myproducts\" as _index, \"myproducts\" as _type from products"
+                }
+            ]
+      }
+    ]
+}
+' | ${java} \
+    -cp $(pwd):$(pwd)/\*:$(pwd)/../../lib/\* \
+    org.xbib.elasticsearch.plugin.feeder.Runner \
+    org.xbib.elasticsearch.plugin.feeder.jdbc.JDBCFeeder
diff --git a/bin/feeder/mysql/geo.sh b/bin/feeder/mysql/geo.sh
@@ -0,0 +1,75 @@
+#!/bin/sh
+
+# a complete minimalistic geo "push" example for MySQL geo -> Elasticsearch geo search
+
+# - install MySQL in /usr/local/mysql
+# - start MySQL on localhost:3306 (default)
+# - prepare a 'test' database in MySQL
+# - create empty user '' with empty password ''
+# - execute SQL in "geo.dump" /usr/local/mysql/bin/mysql test < src/test/resources/geo.dump
+# - then run this script from $ES_HOME/plugins/jdbc: bash bin/feeder/mysql/geo.sh
+
+curl -XDELETE 'localhost:9200/myjdbc'
+
+java="/usr/bin/java"
+#java="/Library/Java/JavaVirtualMachines/jdk1.8.0.jdk/Contents/Home/bin/java"
+#java="/usr/java/jdk1.8.0/bin/java"
+
+echo '
+{
+    "elasticsearch" : "es://localhost:9300?es.cluster.name=elasticsearch",
+    "jdbc" : {
+        "url" : "jdbc:mysql://localhost:3306/test",
+        "user" : "",
+        "password" : "",
+        "locale" : "en_US",
+        "sql" : [
+            {
+                "statement" : "select \"myjdbc\" as _index, \"mytype\" as _type, name as _id, city, zip, address, lat as \"location.lat\", lon as \"location.lon\" from geo"
+            }
+        ],
+        "index" : "myjdbc",
+        "type" : "mytype",
+        "index_settings" : {
+            "index" : {
+                "number_of_shards" : 1
+            }
+        },
+        "type_mapping": {
+            "mytype" : {
+                "properties" : {
+                    "location" : {
+                        "type" : "geo_point"
+                    }
+                }
+            }
+        }
+    }
+}
+' | ${java} \
+    -cp $(pwd):$(pwd)/\*:$(pwd)/../../lib/\* \
+    org.xbib.elasticsearch.plugin.feeder.Runner \
+    org.xbib.elasticsearch.plugin.feeder.jdbc.JDBCFeeder
+
+curl -XGET 'localhost:9200/myjdbc/_refresh'
+
+curl -XPOST 'localhost:9200/myjdbc/_search?pretty' -d '
+{
+  "query": {
+     "filtered": {
+       "query": {
+          "match_all": {
+           }
+       },
+       "filter": {
+           "geo_distance" : {
+               "distance" : "20km",
+               "location" : {
+                    "lat" : 51.0,
+                    "lon" : 7.0
+                }
+            }
+        }
+     }
+   }
+}'
diff --git a/bin/feeder/oracle/create.sh b/bin/feeder/oracle/create.sh
@@ -0,0 +1,30 @@
+#!/bin/sh
+
+java="/usr/bin/java"
+#java="/Library/Java/JavaVirtualMachines/jdk1.8.0.jdk/Contents/Home/bin/java"
+#java="/usr/java/jdk1.8.0/bin/java"
+
+echo '
+{
+    "concurrency" : 1,
+    "elasticsearch" : "es://localhost:9300?es.cluster.name=elasticsearch",
+    "client" : "bulk",
+    "jdbc" : {
+        "url" : "jdbc:oracle:thin:@//host:1521/sid",
+        "user" : "user",
+        "password" : "password",
+        "sql" : "select or_id as \"_id\", or_tan as \"tan\" from orders",
+        "index" : "myoracle",
+        "type" : "myoracle",
+        "index_settings" : {
+            "index" : {
+                "number_of_shards" : 1,
+                "number_of_replica" : 0
+            }
+        }
+    }
+}
+' | ${java} \
+    -cp $(pwd):$(pwd)/\*:$(pwd)/../../lib/\* \
+    org.xbib.elasticsearch.plugin.feeder.Runner \
+    org.xbib.elasticsearch.plugin.feeder.jdbc.JDBCFeeder
diff --git a/bin/river/mysql/create.sh b/bin/river/mysql/create.sh
@@ -0,0 +1,12 @@
+#!/bin/sh
+
+curl -XPUT '0:9200/_river/my_mysql_river/_meta' -d '{
+    "type" : "jdbc",
+    "jdbc" : {
+        "url" : "jdbc:mysql://localhost:3306/test",
+        "user" : "",
+        "password" : "",
+        "sql" :  "select *, created as _id from orders",
+        "maxbulkactions" : 10
+    }
+}'