Skip to content
This repository has been archived by the owner on May 22, 2019. It is now read-only.

ValueError: RDD is empty #320

Open
sakalouski opened this issue Oct 3, 2018 · 4 comments
Open

ValueError: RDD is empty #320

sakalouski opened this issue Oct 3, 2018 · 4 comments

Comments

@sakalouski
Copy link

I have installed the module as suggested and run the command:
srcml preprocrepos -m 50G,50G,50G -r siva --output ./test
Where siva is the directory, containing all the siva files. The memory parameters do not change anything.
My spark is very old (1.3) - could it be the reason? Is it runnable in pyspark (the latest one)?

_/usr/local/lib64/python3.6/site-packages/h5py/init.py:36: FutureWarning: Conversion of the second argument of issubdtype from float to np.floating is deprecated. In future, it will be treated as np.float64 == np.dtype(float).type.
from ._conv import register_converters as _register_converters
INFO:spark:Starting preprocess_repos-424fe007-f0db-48b7-863b-5a5b90ce5f63 on local[*]
Ivy Default Cache set to: /home/b7066789/.ivy2/cache
The jars for the packages stored in: /home/b7066789/.ivy2/jars
:: loading settings :: url = jar:file:/home/b7066789/.local/lib/python3.6/site-packages/pyspark/jars/ivy-2.4.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
tech.sourced#engine added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent;1.0
confs: [default]
found tech.sourced#engine;0.6.4 in central
found io.netty#netty-all;4.1.17.Final in central
found org.eclipse.jgit#org.eclipse.jgit;4.9.0.201710071750-r in central
found com.jcraft#jsch;0.1.54 in central
found com.googlecode.javaewah#JavaEWAH;1.1.6 in central
found org.apache.httpcomponents#httpclient;4.3.6 in central
found org.apache.httpcomponents#httpcore;4.3.3 in central
found commons-logging#commons-logging;1.1.3 in central
found commons-codec#commons-codec;1.6 in central
found org.slf4j#slf4j-api;1.7.2 in central
found tech.sourced#siva-java;0.1.3 in central
found org.bblfsh#bblfsh-client;1.8.2 in central
found com.thesamet.scalapb#scalapb-runtime_2.11;0.7.1 in central
found com.thesamet.scalapb#lenses_2.11;0.7.0-test2 in central
found com.lihaoyi#fastparse_2.11;1.0.0 in central
found com.lihaoyi#fastparse-utils_2.11;1.0.0 in central
found com.lihaoyi#sourcecode_2.11;0.1.4 in central
found com.google.protobuf#protobuf-java;3.5.0 in central
found commons-io#commons-io;2.5 in central
found io.grpc#grpc-netty;1.10.0 in central
found io.grpc#grpc-core;1.10.0 in central
found io.grpc#grpc-context;1.10.0 in central
found com.google.code.gson#gson;2.7 in central
found com.google.guava#guava;19.0 in central
found com.google.errorprone#error_prone_annotations;2.1.2 in central
found com.google.code.findbugs#jsr305;3.0.0 in central
found io.opencensus#opencensus-api;0.11.0 in central
found io.opencensus#opencensus-contrib-grpc-metrics;0.11.0 in central
found io.netty#netty-codec-http2;4.1.17.Final in central
found io.netty#netty-codec-http;4.1.17.Final in central
found io.netty#netty-codec;4.1.17.Final in central
found io.netty#netty-transport;4.1.17.Final in central
found io.netty#netty-buffer;4.1.17.Final in central
found io.netty#netty-common;4.1.17.Final in central
found io.netty#netty-resolver;4.1.17.Final in central
found io.netty#netty-handler;4.1.17.Final in central
found io.netty#netty-handler-proxy;4.1.17.Final in central
found io.netty#netty-codec-socks;4.1.17.Final in central
found com.thesamet.scalapb#scalapb-runtime-grpc_2.11;0.7.1 in central
found io.grpc#grpc-stub;1.10.0 in central
found io.grpc#grpc-protobuf;1.10.0 in central
found com.google.protobuf#protobuf-java;3.5.1 in central
found com.google.protobuf#protobuf-java-util;3.5.1 in central
found com.google.api.grpc#proto-google-common-protos;1.0.0 in central
found io.grpc#grpc-protobuf-lite;1.10.0 in central
found org.rogach#scallop_2.11;3.0.3 in central
found org.apache.commons#commons-pool2;2.4.3 in central
found tech.sourced#enry-java;1.6.3 in central
found org.xerial#sqlite-jdbc;3.21.0 in central
found com.groupon.dse#spark-metrics;2.0.0 in central
found io.dropwizard.metrics#metrics-core;3.1.2 in central
:: resolution report :: resolve 1148ms :: artifacts dl 44ms
:: modules in use:
com.google.api.grpc#proto-google-common-protos;1.0.0 from central in [default]
com.google.code.findbugs#jsr305;3.0.0 from central in [default]
com.google.code.gson#gson;2.7 from central in [default]
com.google.errorprone#error_prone_annotations;2.1.2 from central in [default]
com.google.guava#guava;19.0 from central in [default]
com.google.protobuf#protobuf-java;3.5.1 from central in [default]
com.google.protobuf#protobuf-java-util;3.5.1 from central in [default]
com.googlecode.javaewah#JavaEWAH;1.1.6 from central in [default]
com.groupon.dse#spark-metrics;2.0.0 from central in [default]
com.jcraft#jsch;0.1.54 from central in [default]
com.lihaoyi#fastparse-utils_2.11;1.0.0 from central in [default]
com.lihaoyi#fastparse_2.11;1.0.0 from central in [default]
com.lihaoyi#sourcecode_2.11;0.1.4 from central in [default]
com.thesamet.scalapb#lenses_2.11;0.7.0-test2 from central in [default]
com.thesamet.scalapb#scalapb-runtime-grpc_2.11;0.7.1 from central in [default]
com.thesamet.scalapb#scalapb-runtime_2.11;0.7.1 from central in [default]
commons-codec#commons-codec;1.6 from central in [default]
commons-io#commons-io;2.5 from central in [default]
commons-logging#commons-logging;1.1.3 from central in [default]
io.dropwizard.metrics#metrics-core;3.1.2 from central in [default]
io.grpc#grpc-context;1.10.0 from central in [default]
io.grpc#grpc-core;1.10.0 from central in [default]
io.grpc#grpc-netty;1.10.0 from central in [default]
io.grpc#grpc-protobuf;1.10.0 from central in [default]
io.grpc#grpc-protobuf-lite;1.10.0 from central in [default]
io.grpc#grpc-stub;1.10.0 from central in [default]
io.netty#netty-all;4.1.17.Final from central in [default]
io.netty#netty-buffer;4.1.17.Final from central in [default]
io.netty#netty-codec;4.1.17.Final from central in [default]
io.netty#netty-codec-http;4.1.17.Final from central in [default]
io.netty#netty-codec-http2;4.1.17.Final from central in [default]
io.netty#netty-codec-socks;4.1.17.Final from central in [default]
io.netty#netty-common;4.1.17.Final from central in [default]
io.netty#netty-handler;4.1.17.Final from central in [default]
io.netty#netty-handler-proxy;4.1.17.Final from central in [default]
io.netty#netty-resolver;4.1.17.Final from central in [default]
io.netty#netty-transport;4.1.17.Final from central in [default]
io.opencensus#opencensus-api;0.11.0 from central in [default]
io.opencensus#opencensus-contrib-grpc-metrics;0.11.0 from central in [default]
org.apache.commons#commons-pool2;2.4.3 from central in [default]
org.apache.httpcomponents#httpclient;4.3.6 from central in [default]
org.apache.httpcomponents#httpcore;4.3.3 from central in [default]
org.bblfsh#bblfsh-client;1.8.2 from central in [default]
org.eclipse.jgit#org.eclipse.jgit;4.9.0.201710071750-r from central in [default]
org.rogach#scallop_2.11;3.0.3 from central in [default]
org.slf4j#slf4j-api;1.7.2 from central in [default]
org.xerial#sqlite-jdbc;3.21.0 from central in [default]
tech.sourced#engine;0.6.4 from central in [default]
tech.sourced#enry-java;1.6.3 from central in [default]
tech.sourced#siva-java;0.1.3 from central in [default]
:: evicted modules:
com.google.protobuf#protobuf-java;3.5.0 by [com.google.protobuf#protobuf-java;3.5.1] in [default]
---------------------------------------------------------------------
| | modules || artifacts |
| conf | number| search|dwnlded|evicted|| number|dwnlded|
---------------------------------------------------------------------
| default | 51 | 0 | 0 | 1 || 50 | 0 |
---------------------------------------------------------------------
:: retrieving :: org.apache.spark#spark-submit-parent
confs: [default]
0 artifacts copied, 50 already retrieved (0kB/18ms)
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
18/10/03 15:50:55 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
18/10/03 15:50:55 WARN SparkConf: In Spark 1.0 and later spark.local.dir will be overridden by the value set by the cluster manager (via SPARK_LOCAL_DIRS in mesos/standalone and LOCAL_DIRS in YARN).
18/10/03 15:50:58 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
INFO:engine:Initializing engine on siva
INFO:ParquetSaver:Ignition -> DzhigurdaFiles -> UastExtractor -> Moder -> FieldsSelector -> ParquetSaver
Traceback (most recent call last):
File "/home/b7066789/.local/bin/srcml", line 11, in
sys.exit(main())
File "/home/b7066789/.local/lib/python3.6/site-packages/sourced/ml/main.py", line 354, in main
return handler(args)
File "/home/b7066789/.local/lib/python3.6/site-packages/sourced/ml/utils/engine.py", line 87, in wrapped_pause
return func(cmdline_args, *args, **kwargs)
File "/home/b7066789/.local/lib/python3.6/site-packages/sourced/ml/cmd/preprocess_repos.py", line 24, in preprocess_repos
.link(ParquetSaver(save_loc=args.output))
File "/home/b7066789/.local/lib/python3.6/site-packages/sourced/ml/transformers/transformer.py", line 114, in execute
head = node(head)
File "/home/b7066789/.local/lib/python3.6/site-packages/sourced/ml/transformers/basic.py", line 292, in call
rdd.toDF().write.parquet(self.save_loc)
File "/home/b7066789/.local/lib/python3.6/site-packages/pyspark/sql/session.py", line 58, in toDF
return sparkSession.createDataFrame(self, schema, sampleRatio)
File "/home/b7066789/.local/lib/python3.6/site-packages/pyspark/sql/session.py", line 582, in createDataFrame
rdd, schema = self._createFromRDD(data.map(prepare), schema, samplingRatio)
File "/home/b7066789/.local/lib/python3.6/site-packages/pyspark/sql/session.py", line 380, in _createFromRDD
struct = self._inferSchema(rdd, samplingRatio)
File "/home/b7066789/.local/lib/python3.6/site-packages/pyspark/sql/session.py", line 351, in inferSchema
first = rdd.first()
File "/home/b7066789/.local/lib/python3.6/site-packages/pyspark/rdd.py", line 1364, in first
raise ValueError("RDD is empty")
ValueError: RDD is empty

@vmarkovtsev
Copy link
Collaborator

@zurk Can you please have a look

@zurk
Copy link
Contributor

zurk commented Oct 4, 2018

@sakalouski, just to be sure I tested srcml preprocrepos and it works with proper spark version which is 2.2.1 (https://github.com/src-d/jgit-spark-connector#pre-requisites) so yes, I think the problem is in old spark. We do not test it for 1.x versions.

@sakalouski
Copy link
Author

sakalouski commented Oct 4, 2018 via email

@vmarkovtsev
Copy link
Collaborator

We don't support 2.3.x because they changed the API in a non-backward-compatible way.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants