-
Notifications
You must be signed in to change notification settings - Fork 187
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unable to mount volumes for pod Learner #152
Comments
Hi, sorry to bother again, I recently deployed ffdl on Google cloud again, but one of those pod, + DRIVER_LOCATION=/host/usr/libexec/kubernetes/kubelet-plugins/volume/exec/ibm~ibmc-s3fs
+ KUBELET_SVC_CONFIG=/host/lib/systemd/system/kubelet.service
+ apt-get -y update
Get:1 http://security.ubuntu.com/ubuntu bionic-security InRelease [83.2 kB]
Hit:2 http://archive.ubuntu.com/ubuntu bionic InRelease
Get:3 http://archive.ubuntu.com/ubuntu bionic-updates InRelease [88.7 kB]
Get:4 http://archive.ubuntu.com/ubuntu bionic-backports InRelease [74.6 kB]
Get:5 http://security.ubuntu.com/ubuntu bionic-security/universe Sources [32.0 kB]
Get:6 http://security.ubuntu.com/ubuntu bionic-security/universe amd64 Packages [133 kB]
Get:7 http://archive.ubuntu.com/ubuntu bionic-updates/universe Sources [167 kB]
Get:8 http://security.ubuntu.com/ubuntu bionic-security/multiverse amd64 Packages [1367 B]
Get:9 http://security.ubuntu.com/ubuntu bionic-security/main amd64 Packages [281 kB]
Get:10 http://archive.ubuntu.com/ubuntu bionic-updates/universe amd64 Packages [900 kB]
Get:11 http://archive.ubuntu.com/ubuntu bionic-updates/multiverse amd64 Packages [6931 B]
Get:12 http://archive.ubuntu.com/ubuntu bionic-updates/main amd64 Packages [599 kB]
Get:13 http://archive.ubuntu.com/ubuntu bionic-updates/restricted amd64 Packages [10.7 kB]
Get:14 http://archive.ubuntu.com/ubuntu bionic-backports/universe amd64 Packages [3655 B]
Fetched 2381 kB in 1s (1754 kB/s)
Reading package lists...
+ apt-get -y install s3fs
Reading package lists...
Building dependency tree...
Reading state information...
The following additional packages will be installed:
ca-certificates file fuse libasn1-8-heimdal libcurl3-gnutls libfuse2
libgssapi3-heimdal libhcrypto4-heimdal libheimbase1-heimdal
libheimntlm0-heimdal libhx509-5-heimdal libicu60 libkrb5-26-heimdal
libldap-2.4-2 libldap-common libmagic-mgc libmagic1 libnghttp2-14 libpsl5
libroken18-heimdal librtmp1 libsasl2-2 libsasl2-modules libsasl2-modules-db
libsqlite3-0 libssl1.1 libwind0-heimdal libxml2 mime-support openssl
publicsuffix xz-utils
Suggested packages:
libsasl2-modules-gssapi-mit | libsasl2-modules-gssapi-heimdal
libsasl2-modules-ldap libsasl2-modules-otp libsasl2-modules-sql
The following NEW packages will be installed:
ca-certificates file fuse libasn1-8-heimdal libcurl3-gnutls libfuse2
libgssapi3-heimdal libhcrypto4-heimdal libheimbase1-heimdal
libheimntlm0-heimdal libhx509-5-heimdal libicu60 libkrb5-26-heimdal
libldap-2.4-2 libldap-common libmagic-mgc libmagic1 libnghttp2-14 libpsl5
libroken18-heimdal librtmp1 libsasl2-2 libsasl2-modules libsasl2-modules-db
libsqlite3-0 libssl1.1 libwind0-heimdal libxml2 mime-support openssl
publicsuffix s3fs xz-utils
0 upgraded, 33 newly installed, 0 to remove and 33 not upgraded.
Need to get 13.3 MB of archives.
After this operation, 52.2 MB of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu bionic-updates/main amd64 libssl1.1 amd64 1.1.0g-2ubuntu4.3 [1130 kB]
Get:2 http://archive.ubuntu.com/ubuntu bionic-updates/main amd64 openssl amd64 1.1.0g-2ubuntu4.3 [532 kB]
Get:3 http://archive.ubuntu.com/ubuntu bionic/main amd64 ca-certificates all 20180409 [151 kB]
Get:4 http://archive.ubuntu.com/ubuntu bionic-updates/main amd64 libmagic-mgc amd64 1:5.32-2ubuntu0.1 [184 kB]
Get:5 http://archive.ubuntu.com/ubuntu bionic-updates/main amd64 libmagic1 amd64 1:5.32-2ubuntu0.1 [68.4 kB]
Get:6 http://archive.ubuntu.com/ubuntu bionic-updates/main amd64 file amd64 1:5.32-2ubuntu0.1 [22.1 kB]
Get:7 http://archive.ubuntu.com/ubuntu bionic/main amd64 libicu60 amd64 60.2-3ubuntu3 [8054 kB]
Get:8 http://archive.ubuntu.com/ubuntu bionic/main amd64 libsqlite3-0 amd64 3.22.0-1 [496 kB]
Get:9 http://archive.ubuntu.com/ubuntu bionic-updates/main amd64 libxml2 amd64 2.9.4+dfsg1-6.1ubuntu1.2 [663 kB]
Get:10 http://archive.ubuntu.com/ubuntu bionic/main amd64 mime-support all 3.60ubuntu1 [30.1 kB]
Get:11 http://archive.ubuntu.com/ubuntu bionic/main amd64 xz-utils amd64 5.2.2-1.3 [83.8 kB]
Get:12 http://archive.ubuntu.com/ubuntu bionic/main amd64 libfuse2 amd64 2.9.7-1ubuntu1 [80.9 kB]
Get:13 http://archive.ubuntu.com/ubuntu bionic/main amd64 fuse amd64 2.9.7-1ubuntu1 [24.5 kB]
Get:14 http://archive.ubuntu.com/ubuntu bionic/main amd64 libpsl5 amd64 0.19.1-5build1 [41.8 kB]
Get:15 http://archive.ubuntu.com/ubuntu bionic/main amd64 publicsuffix all 20180223.1310-1 [97.6 kB]
Get:16 http://archive.ubuntu.com/ubuntu bionic/main amd64 libroken18-heimdal amd64 7.5.0+dfsg-1 [41.3 kB]
Get:17 http://archive.ubuntu.com/ubuntu bionic/main amd64 libasn1-8-heimdal amd64 7.5.0+dfsg-1 [175 kB]
Get:18 http://archive.ubuntu.com/ubuntu bionic/main amd64 libheimbase1-heimdal amd64 7.5.0+dfsg-1 [29.3 kB]
Get:19 http://archive.ubuntu.com/ubuntu bionic/main amd64 libhcrypto4-heimdal amd64 7.5.0+dfsg-1 [85.9 kB]
Get:20 http://archive.ubuntu.com/ubuntu bionic/main amd64 libwind0-heimdal amd64 7.5.0+dfsg-1 [47.8 kB]
Get:21 http://archive.ubuntu.com/ubuntu bionic/main amd64 libhx509-5-heimdal amd64 7.5.0+dfsg-1 [107 kB]
Get:22 http://archive.ubuntu.com/ubuntu bionic/main amd64 libkrb5-26-heimdal amd64 7.5.0+dfsg-1 [206 kB]
Get:23 http://archive.ubuntu.com/ubuntu bionic/main amd64 libheimntlm0-heimdal amd64 7.5.0+dfsg-1 [14.8 kB]
Get:24 http://archive.ubuntu.com/ubuntu bionic/main amd64 libgssapi3-heimdal amd64 7.5.0+dfsg-1 [96.5 kB]
Get:25 http://archive.ubuntu.com/ubuntu bionic/main amd64 libsasl2-modules-db amd64 2.1.27~101-g0780600+dfsg-3ubuntu2 [14.8 kB]
Get:26 http://archive.ubuntu.com/ubuntu bionic/main amd64 libsasl2-2 amd64 2.1.27~101-g0780600+dfsg-3ubuntu2 [49.2 kB]
Get:27 http://archive.ubuntu.com/ubuntu bionic-updates/main amd64 libldap-common all 2.4.45+dfsg-1ubuntu1.1 [16.6 kB]
Get:28 http://archive.ubuntu.com/ubuntu bionic-updates/main amd64 libldap-2.4-2 amd64 2.4.45+dfsg-1ubuntu1.1 [155 kB]
Get:29 http://archive.ubuntu.com/ubuntu bionic/main amd64 libnghttp2-14 amd64 1.30.0-1ubuntu1 [77.8 kB]
Get:30 http://archive.ubuntu.com/ubuntu bionic/main amd64 librtmp1 amd64 2.4+20151223.gitfa8646d.1-1 [54.2 kB]
Get:31 http://archive.ubuntu.com/ubuntu bionic-updates/main amd64 libcurl3-gnutls amd64 7.58.0-2ubuntu3.5 [212 kB]
Get:32 http://archive.ubuntu.com/ubuntu bionic/main amd64 libsasl2-modules amd64 2.1.27~101-g0780600+dfsg-3ubuntu2 [48.7 kB]
Get:33 http://archive.ubuntu.com/ubuntu bionic/universe amd64 s3fs amd64 1.82-1 [200 kB]
debconf: delaying package configuration, since apt-utils is not installed
Fetched 13.3 MB in 2s (8077 kB/s)
Selecting previously unselected package libssl1.1:amd64.
(Reading database ...
(Reading database ... 5%
(Reading database ... 10%
(Reading database ... 15%
(Reading database ... 20%
(Reading database ... 25%
(Reading database ... 30%
(Reading database ... 35%
(Reading database ... 40%
(Reading database ... 45%
(Reading database ... 50%
(Reading database ... 55%
(Reading database ... 60%
(Reading database ... 65%
(Reading database ... 70%
(Reading database ... 75%
(Reading database ... 80%
(Reading database ... 85%
(Reading database ... 90%
(Reading database ... 95%
(Reading database ... 100%
(Reading database ... 4458 files and directories currently installed.)
Preparing to unpack .../00-libssl1.1_1.1.0g-2ubuntu4.3_amd64.deb ...
Unpacking libssl1.1:amd64 (1.1.0g-2ubuntu4.3) ...
Selecting previously unselected package openssl.
Preparing to unpack .../01-openssl_1.1.0g-2ubuntu4.3_amd64.deb ...
Unpacking openssl (1.1.0g-2ubuntu4.3) ...
Selecting previously unselected package ca-certificates.
Preparing to unpack .../02-ca-certificates_20180409_all.deb ...
Unpacking ca-certificates (20180409) ...
Selecting previously unselected package libmagic-mgc.
Preparing to unpack .../03-libmagic-mgc_1%3a5.32-2ubuntu0.1_amd64.deb ...
Unpacking libmagic-mgc (1:5.32-2ubuntu0.1) ...
Selecting previously unselected package libmagic1:amd64.
Preparing to unpack .../04-libmagic1_1%3a5.32-2ubuntu0.1_amd64.deb ...
Unpacking libmagic1:amd64 (1:5.32-2ubuntu0.1) ...
Selecting previously unselected package file.
Preparing to unpack .../05-file_1%3a5.32-2ubuntu0.1_amd64.deb ...
Unpacking file (1:5.32-2ubuntu0.1) ...
Selecting previously unselected package libicu60:amd64.
Preparing to unpack .../06-libicu60_60.2-3ubuntu3_amd64.deb ...
Unpacking libicu60:amd64 (60.2-3ubuntu3) ...
Selecting previously unselected package libsqlite3-0:amd64.
Preparing to unpack .../07-libsqlite3-0_3.22.0-1_amd64.deb ...
Unpacking libsqlite3-0:amd64 (3.22.0-1) ...
Selecting previously unselected package libxml2:amd64.
Preparing to unpack .../08-libxml2_2.9.4+dfsg1-6.1ubuntu1.2_amd64.deb ...
Unpacking libxml2:amd64 (2.9.4+dfsg1-6.1ubuntu1.2) ...
Selecting previously unselected package mime-support.
Preparing to unpack .../09-mime-support_3.60ubuntu1_all.deb ...
Unpacking mime-support (3.60ubuntu1) ...
Selecting previously unselected package xz-utils.
Preparing to unpack .../10-xz-utils_5.2.2-1.3_amd64.deb ...
Unpacking xz-utils (5.2.2-1.3) ...
Selecting previously unselected package libfuse2:amd64.
Preparing to unpack .../11-libfuse2_2.9.7-1ubuntu1_amd64.deb ...
Unpacking libfuse2:amd64 (2.9.7-1ubuntu1) ...
Selecting previously unselected package fuse.
Preparing to unpack .../12-fuse_2.9.7-1ubuntu1_amd64.deb ...
Unpacking fuse (2.9.7-1ubuntu1) ...
Selecting previously unselected package libpsl5:amd64.
Preparing to unpack .../13-libpsl5_0.19.1-5build1_amd64.deb ...
Unpacking libpsl5:amd64 (0.19.1-5build1) ...
Selecting previously unselected package publicsuffix.
Preparing to unpack .../14-publicsuffix_20180223.1310-1_all.deb ...
Unpacking publicsuffix (20180223.1310-1) ...
Selecting previously unselected package libroken18-heimdal:amd64.
Preparing to unpack .../15-libroken18-heimdal_7.5.0+dfsg-1_amd64.deb ...
Unpacking libroken18-heimdal:amd64 (7.5.0+dfsg-1) ...
Selecting previously unselected package libasn1-8-heimdal:amd64.
Preparing to unpack .../16-libasn1-8-heimdal_7.5.0+dfsg-1_amd64.deb ...
Unpacking libasn1-8-heimdal:amd64 (7.5.0+dfsg-1) ...
Selecting previously unselected package libheimbase1-heimdal:amd64.
Preparing to unpack .../17-libheimbase1-heimdal_7.5.0+dfsg-1_amd64.deb ...
Unpacking libheimbase1-heimdal:amd64 (7.5.0+dfsg-1) ...
Selecting previously unselected package libhcrypto4-heimdal:amd64.
Preparing to unpack .../18-libhcrypto4-heimdal_7.5.0+dfsg-1_amd64.deb ...
Unpacking libhcrypto4-heimdal:amd64 (7.5.0+dfsg-1) ...
Selecting previously unselected package libwind0-heimdal:amd64.
Preparing to unpack .../19-libwind0-heimdal_7.5.0+dfsg-1_amd64.deb ...
Unpacking libwind0-heimdal:amd64 (7.5.0+dfsg-1) ...
Selecting previously unselected package libhx509-5-heimdal:amd64.
Preparing to unpack .../20-libhx509-5-heimdal_7.5.0+dfsg-1_amd64.deb ...
Unpacking libhx509-5-heimdal:amd64 (7.5.0+dfsg-1) ...
Selecting previously unselected package libkrb5-26-heimdal:amd64.
Preparing to unpack .../21-libkrb5-26-heimdal_7.5.0+dfsg-1_amd64.deb ...
Unpacking libkrb5-26-heimdal:amd64 (7.5.0+dfsg-1) ...
Selecting previously unselected package libheimntlm0-heimdal:amd64.
Preparing to unpack .../22-libheimntlm0-heimdal_7.5.0+dfsg-1_amd64.deb ...
Unpacking libheimntlm0-heimdal:amd64 (7.5.0+dfsg-1) ...
Selecting previously unselected package libgssapi3-heimdal:amd64.
Preparing to unpack .../23-libgssapi3-heimdal_7.5.0+dfsg-1_amd64.deb ...
Unpacking libgssapi3-heimdal:amd64 (7.5.0+dfsg-1) ...
Selecting previously unselected package libsasl2-modules-db:amd64.
Preparing to unpack .../24-libsasl2-modules-db_2.1.27~101-g0780600+dfsg-3ubuntu2_amd64.deb ...
Unpacking libsasl2-modules-db:amd64 (2.1.27~101-g0780600+dfsg-3ubuntu2) ...
Selecting previously unselected package libsasl2-2:amd64.
Preparing to unpack .../25-libsasl2-2_2.1.27~101-g0780600+dfsg-3ubuntu2_amd64.deb ...
Unpacking libsasl2-2:amd64 (2.1.27~101-g0780600+dfsg-3ubuntu2) ...
Selecting previously unselected package libldap-common.
Preparing to unpack .../26-libldap-common_2.4.45+dfsg-1ubuntu1.1_all.deb ...
Unpacking libldap-common (2.4.45+dfsg-1ubuntu1.1) ...
Selecting previously unselected package libldap-2.4-2:amd64.
Preparing to unpack .../27-libldap-2.4-2_2.4.45+dfsg-1ubuntu1.1_amd64.deb ...
Unpacking libldap-2.4-2:amd64 (2.4.45+dfsg-1ubuntu1.1) ...
Selecting previously unselected package libnghttp2-14:amd64.
Preparing to unpack .../28-libnghttp2-14_1.30.0-1ubuntu1_amd64.deb ...
Unpacking libnghttp2-14:amd64 (1.30.0-1ubuntu1) ...
Selecting previously unselected package librtmp1:amd64.
Preparing to unpack .../29-librtmp1_2.4+20151223.gitfa8646d.1-1_amd64.deb ...
Unpacking librtmp1:amd64 (2.4+20151223.gitfa8646d.1-1) ...
Selecting previously unselected package libcurl3-gnutls:amd64.
Preparing to unpack .../30-libcurl3-gnutls_7.58.0-2ubuntu3.5_amd64.deb ...
Unpacking libcurl3-gnutls:amd64 (7.58.0-2ubuntu3.5) ...
Selecting previously unselected package libsasl2-modules:amd64.
Preparing to unpack .../31-libsasl2-modules_2.1.27~101-g0780600+dfsg-3ubuntu2_amd64.deb ...
Unpacking libsasl2-modules:amd64 (2.1.27~101-g0780600+dfsg-3ubuntu2) ...
Selecting previously unselected package s3fs.
Preparing to unpack .../32-s3fs_1.82-1_amd64.deb ...
Unpacking s3fs (1.82-1) ...
Setting up libicu60:amd64 (60.2-3ubuntu3) ...
Setting up libnghttp2-14:amd64 (1.30.0-1ubuntu1) ...
Setting up mime-support (3.60ubuntu1) ...
Setting up libldap-common (2.4.45+dfsg-1ubuntu1.1) ...
Setting up libpsl5:amd64 (0.19.1-5build1) ...
Setting up libfuse2:amd64 (2.9.7-1ubuntu1) ...
Setting up libsasl2-modules-db:amd64 (2.1.27~101-g0780600+dfsg-3ubuntu2) ...
Setting up libsasl2-2:amd64 (2.1.27~101-g0780600+dfsg-3ubuntu2) ...
Setting up libroken18-heimdal:amd64 (7.5.0+dfsg-1) ...
Setting up librtmp1:amd64 (2.4+20151223.gitfa8646d.1-1) ...
Setting up libxml2:amd64 (2.9.4+dfsg1-6.1ubuntu1.2) ...
Setting up libmagic-mgc (1:5.32-2ubuntu0.1) ...
Setting up libmagic1:amd64 (1:5.32-2ubuntu0.1) ...
Processing triggers for libc-bin (2.27-3ubuntu1) ...
Setting up publicsuffix (20180223.1310-1) ...
Setting up libssl1.1:amd64 (1.1.0g-2ubuntu4.3) ...
debconf: unable to initialize frontend: Dialog
debconf: (TERM is not set, so the dialog frontend is not usable.)
debconf: falling back to frontend: Readline
debconf: unable to initialize frontend: Readline
debconf: (Can't locate Term/ReadLine.pm in @INC (you may need to install the Term::ReadLine module) (@INC contains: /etc/perl /usr/local/lib/x86_64-linux-gnu/perl/5.26.1 /usr/local/share/perl/5.26.1 /usr/lib/x86_64-linux-gnu/perl5/5.26 /usr/share/perl5 /usr/lib/x86_64-linux-gnu/perl/5.26 /usr/share/perl/5.26 /usr/local/lib/site_perl /usr/lib/x86_64-linux-gnu/perl-base) at /usr/share/perl5/Debconf/FrontEnd/Readline.pm line 7.)
debconf: falling back to frontend: Teletype
Setting up xz-utils (5.2.2-1.3) ...
update-alternatives: using /usr/bin/xz to provide /usr/bin/lzma (lzma) in auto mode
update-alternatives: warning: skip creation of /usr/share/man/man1/lzma.1.gz because associated file /usr/share/man/man1/xz.1.gz (of link group lzma) doesn't exist
update-alternatives: warning: skip creation of /usr/share/man/man1/unlzma.1.gz because associated file /usr/share/man/man1/unxz.1.gz (of link group lzma) doesn't exist
update-alternatives: warning: skip creation of /usr/share/man/man1/lzcat.1.gz because associated file /usr/share/man/man1/xzcat.1.gz (of link group lzma) doesn't exist
update-alternatives: warning: skip creation of /usr/share/man/man1/lzmore.1.gz because associated file /usr/share/man/man1/xzmore.1.gz (of link group lzma) doesn't exist
update-alternatives: warning: skip creation of /usr/share/man/man1/lzless.1.gz because associated file /usr/share/man/man1/xzless.1.gz (of link group lzma) doesn't exist
update-alternatives: warning: skip creation of /usr/share/man/man1/lzdiff.1.gz because associated file /usr/share/man/man1/xzdiff.1.gz (of link group lzma) doesn't exist
update-alternatives: warning: skip creation of /usr/share/man/man1/lzcmp.1.gz because associated file /usr/share/man/man1/xzcmp.1.gz (of link group lzma) doesn't exist
update-alternatives: warning: skip creation of /usr/share/man/man1/lzgrep.1.gz because associated file /usr/share/man/man1/xzgrep.1.gz (of link group lzma) doesn't exist
update-alternatives: warning: skip creation of /usr/share/man/man1/lzegrep.1.gz because associated file /usr/share/man/man1/xzegrep.1.gz (of link group lzma) doesn't exist
update-alternatives: warning: skip creation of /usr/share/man/man1/lzfgrep.1.gz because associated file /usr/share/man/man1/xzfgrep.1.gz (of link group lzma) doesn't exist
Setting up libheimbase1-heimdal:amd64 (7.5.0+dfsg-1) ...
Setting up openssl (1.1.0g-2ubuntu4.3) ...
Setting up libsqlite3-0:amd64 (3.22.0-1) ...
Setting up libsasl2-modules:amd64 (2.1.27~101-g0780600+dfsg-3ubuntu2) ...
Setting up ca-certificates (20180409) ...
debconf: unable to initialize frontend: Dialog
debconf: (TERM is not set, so the dialog frontend is not usable.)
debconf: falling back to frontend: Readline
debconf: unable to initialize frontend: Readline
debconf: (Can't locate Term/ReadLine.pm in @INC (you may need to install the Term::ReadLine module) (@INC contains: /etc/perl /usr/local/lib/x86_64-linux-gnu/perl/5.26.1 /usr/local/share/perl/5.26.1 /usr/lib/x86_64-linux-gnu/perl5/5.26 /usr/share/perl5 /usr/lib/x86_64-linux-gnu/perl/5.26 /usr/share/perl/5.26 /usr/local/lib/site_perl /usr/lib/x86_64-linux-gnu/perl-base) at /usr/share/perl5/Debconf/FrontEnd/Readline.pm line 7.)
debconf: falling back to frontend: Teletype
Updating certificates in /etc/ssl/certs...
133 added, 0 removed; done.
Setting up fuse (2.9.7-1ubuntu1) ...
Setting up libwind0-heimdal:amd64 (7.5.0+dfsg-1) ...
Setting up libasn1-8-heimdal:amd64 (7.5.0+dfsg-1) ...
Setting up libhcrypto4-heimdal:amd64 (7.5.0+dfsg-1) ...
Setting up file (1:5.32-2ubuntu0.1) ...
Setting up libhx509-5-heimdal:amd64 (7.5.0+dfsg-1) ...
Setting up libkrb5-26-heimdal:amd64 (7.5.0+dfsg-1) ...
Setting up libheimntlm0-heimdal:amd64 (7.5.0+dfsg-1) ...
Setting up libgssapi3-heimdal:amd64 (7.5.0+dfsg-1) ...
Setting up libldap-2.4-2:amd64 (2.4.45+dfsg-1ubuntu1.1) ...
Setting up libcurl3-gnutls:amd64 (7.58.0-2ubuntu3.5) ...
Setting up s3fs (1.82-1) ...
Processing triggers for libc-bin (2.27-3ubuntu1) ...
Processing triggers for ca-certificates (20180409) ...
Updating certificates in /etc/ssl/certs...
0 added, 0 removed; done.
Running hooks in /etc/ca-certificates/update.d...
done.
+ cp /root/bin/s3fs /host/usr/local/bin/
cp: cannot create regular file '/host/usr/local/bin/': Not a directory I was taking a look at |
GKE should use Container-Optimized OS underneath, cmp. https://cloud.google.com/container-optimized-os/ and it is possible the open source driver will not work without modification on that. If you want to deploy to GKE, you would have to first make sure https://github.com/IBM/ibmcloud-object-storage-plugin works. Since I don't have access to GKE, I cannot test or fix this for you. Regarding the general FfDL setup, it should cleanly deploy against IBM Cloud. Unfortunately, I have two hard deadlines at the end of the week, so I cannot look into deployment on Minikube and DIND right now. DIND 1.10 worked a while ago, I briefly tried to deploy against 1.12 not too long ago and also ran into problems. |
Thanks for the advice of locating the issue, still on trying. |
@JunFugithub For minikube could you look for the statefulset that is created and do a |
apiVersion: apps/v1
kind: StatefulSet
metadata:
creationTimestamp: 2018-12-20T14:53:36Z
generation: 2
labels:
service: dlaas-learner
training_id: training-pTOewHymR
user_id: test-user
name: learner-3a77bbc9-7418-44d6-7797-e697a1d43fd1
namespace: default
resourceVersion: "22353"
selfLink: /apis/apps/v1/namespaces/default/statefulsets/learner-3a77bbc9-7418-44d6-7797-e697a1d43fd1
uid: 0785179e-0467-11e9-a165-c2aacdd61c5f
spec:
podManagementPolicy: OrderedReady
replicas: 1
revisionHistoryLimit: 0
selector:
matchLabels:
service: dlaas-learner
training_id: training-pTOewHymR
user_id: test-user
serviceName: learner-3a77bbc9-7418-44d6-7797-e697a1d43fd1
template:
metadata:
annotations:
scheduler.alpha.kubernetes.io/nvidiaGPU: '{ "AllocationPriority": "Dense"
}'
scheduler.alpha.kubernetes.io/tolerations: '[ { "key": "dedicated", "operator":
"Equal", "value": "gpu-task" } ]'
creationTimestamp: null
labels:
service: dlaas-learner
training_id: training-pTOewHymR
user_id: test-user
spec:
automountServiceAccountToken: false
containers:
- command:
- bash
- -c
- "export PATH=/usr/local/bin/:$PATH; cp /entrypoint-files/*.sh /usr/local/bin/;
chmod +x /usr/local/bin/*.sh;\n\t\t\tif [ ! -f /job/load-model.exit ]; then\n\t\t\t\twhile
[ ! -f /job/load-model.start ]; do sleep 2; done ;\n\t\t\t\tdate \"+%s%N\"
| cut -b1-13 > /job/load-model.start_time ;\n\t\t\t\t\n\t\t\techo \"Starting
Training $TRAINING_ID\"\n\t\t\tmkdir -p \"$MODEL_DIR\" ;\n\t\t\tpython -m
zipfile -e $RESULT_DIR/_submitted_code/model.zip $MODEL_DIR ;\n\t\t\t\techo
$? > /job/load-model.exit ;\n\t\t\tfi\n\t\t\techo \"Done load-model\" ;\n\t\t\tif
[ ! -f /job/learner.exit ]; then\n\t\t\t\twhile [ ! -f /job/learner.start
]; do sleep 2; done ;\n\t\t\t\tdate \"+%s%N\" | cut -b1-13 > /job/learner.start_time
;\n\t\t\t\t\n\t\t\tfor i in ${!ALERTMANAGER*} ${!DLAAS*} ${!ETCD*} ${!GRAFANA*}
${!HOSTNAME*} ${!KUBERNETES*} ${!MONGO*} ${!PUSHGATEWAY*}; do unset $i;
done;\n\t\t\texport LEARNER_ID=$((${DOWNWARD_API_POD_NAME##*-} + 1)) ;\n\t\t\tmkdir
-p $RESULT_DIR/learner-$LEARNER_ID ;\n\t\t\tmkdir -p $CHECKPOINT_DIR ;bash
-c 'train.sh >> $JOB_STATE_DIR/latest-log 2>&1 ; exit ${PIPESTATUS[0]}'
;\n\t\t\t\techo $? > /job/learner.exit ;\n\t\t\tfi\n\t\t\techo \"Done learner\"
;\n\t\t\tif [ ! -f /job/store-logs.exit ]; then\n\t\t\t\twhile [ ! -f /job/store-logs.start
]; do sleep 2; done ;\n\t\t\t\tdate \"+%s%N\" | cut -b1-13 > /job/store-logs.start_time
;\n\t\t\t\t\n\t\t\techo Calling copy logs.\n\t\t\tmv -nf $LOG_DIR/* $RESULT_DIR/learner-$LEARNER_ID
;\n\t\t\tERROR_CODE=$? ;\n\t\t\techo $ERROR_CODE > $RESULT_DIR/learner-$LEARNER_ID/.log-copy-complete
;\n\t\t\tbash -c 'exit $ERROR_CODE' ;\n\t\t\t\techo $? > /job/store-logs.exit
;\n\t\t\tfi\n\t\t\techo \"Done store-logs\" ;\n\t\twhile true; do sleep
2; done ;"
env:
- name: DATA_DIR
value: /mnt/data/tf_training_data
- name: LOG_DIR
value: /job/logs
- name: RESULT_DIR
value: /mnt/results/tf_trained_model/training-pTOewHymR
- name: MODEL_DIR
value: /job/model-code
- name: TRAINING_COMMAND
value: 'python3 convolutional_network.py --trainImagesFile ${DATA_DIR}/train-images-idx3-ubyte.gz --trainLabelsFile
${DATA_DIR}/train-labels-idx1-ubyte.gz --testImagesFile ${DATA_DIR}/t10k-images-idx3-ubyte.gz --testLabelsFile
${DATA_DIR}/t10k-labels-idx1-ubyte.gz --learningRate 0.001 --trainingIters
2000 '
- name: TRAINING_ID
value: training-pTOewHymR
- name: GPU_COUNT
value: "0.000000"
- name: DOWNWARD_API_POD_NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.name
- name: DOWNWARD_API_POD_NAMESPACE
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
- name: LEARNER_NAME_PREFIX
value: learner-3a77bbc9-7418-44d6-7797-e697a1d43fd1
- name: TRAINING_ID
value: training-pTOewHymR
- name: NUM_LEARNERS
value: "1"
- name: JOB_STATE_DIR
value: /job
- name: CHECKPOINT_DIR
value: /mnt/results/tf_trained_model/_wml_checkpoints
- name: RESULT_BUCKET_DIR
value: /mnt/results/tf_trained_model
image: tensorflow/tensorflow:1.5.0-py3
imagePullPolicy: IfNotPresent
name: learner
ports:
- containerPort: 22
protocol: TCP
- containerPort: 2222
protocol: TCP
resources:
limits:
cpu: 500m
memory: 1048576k
nvidia.com/gpu: "0"
requests:
cpu: 500m
memory: 1048576k
nvidia.com/gpu: "0"
securityContext:
capabilities:
drop:
- CHOWN
- DAC_OVERRIDE
- FOWNER
- FSETID
- KILL
- SETPCAP
- NET_RAW
- MKNOD
- SETFCAP
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /mnt/data/tf_training_data
name: cosinputmount-3a77bbc9-7418-44d6-7797-e697a1d43fd1
- mountPath: /mnt/results/tf_trained_model
name: cosoutputmount-3a77bbc9-7418-44d6-7797-e697a1d43fd1
- mountPath: /job
name: jobdata
subPath: training-pTOewHymR
- mountPath: /entrypoint-files
name: learner-entrypoint-files
dnsPolicy: ClusterFirst
imagePullSecrets:
- name: regcred
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 30
tolerations:
- effect: NoSchedule
key: dedicated
operator: Equal
value: gpu-task
volumes:
- flexVolume:
driver: ibm/ibmc-s3fs
options:
bucket: tf_training_data
cache-size-gb: "0"
chunk-size-mb: "52"
curl-debug: "false"
debug-level: warn
endpoint: http://192.168.64.25:31971
ensure-disk-free: "0"
kernel-cache: "true"
multireq-max: "20"
parallel-count: "5"
region: us-standard
s3fs-fuse-retry-count: "30"
stat-cache-size: "100000"
tls-cipher-suite: DEFAULT
secretRef:
name: cossecretdata-3a77bbc9-7418-44d6-7797-e697a1d43fd1
name: cosinputmount-3a77bbc9-7418-44d6-7797-e697a1d43fd1
- flexVolume:
driver: ibm/ibmc-s3fs
options:
bucket: tf_trained_model
cache-size-gb: "0"
chunk-size-mb: "52"
curl-debug: "false"
debug-level: warn
endpoint: http://192.168.64.25:31971
ensure-disk-free: "2048"
kernel-cache: "false"
multireq-max: "20"
parallel-count: "2"
region: us-standard
s3fs-fuse-retry-count: "30"
stat-cache-size: "100000"
tls-cipher-suite: DEFAULT
secretRef:
name: cossecretresults-3a77bbc9-7418-44d6-7797-e697a1d43fd1
name: cosoutputmount-3a77bbc9-7418-44d6-7797-e697a1d43fd1
- configMap:
defaultMode: 420
name: learner-entrypoint-files
name: learner-entrypoint-files
- name: jobdata
persistentVolumeClaim:
claimName: learner-1
updateStrategy:
type: OnDelete
status:
collisionCount: 0
currentRevision: learner-3a77bbc9-7418-44d6-7797-e697a1d43fd1-7df856b884
observedGeneration: 2
replicas: 1
updateRevision: learner-3a77bbc9-7418-44d6-7797-e697a1d43fd1-5dc4cfdf78
updatedReplicas: 1
$ kubectl describe po learner-3a77bbc9-7418-44d6-7797-e697a1d43fd1-0
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 9m default-scheduler Successfully assigned learner-3a77bbc9-7418-44d6-7797-e697a1d43fd1-0 to minikube
Normal SuccessfulMountVolume 9m kubelet, minikube MountVolume.SetUp succeeded for volume "learner-entrypoint-files"
Normal SuccessfulMountVolume 9m kubelet, minikube MountVolume.SetUp succeeded for volume "hostpathtest"
Warning FailedMount 5m (x2 over 7m) kubelet, minikube Unable to mount volumes for pod "learner-3a77bbc9-7418-44d6-7797-e697a1d43fd1-0_default(06ed9d78-0529-11e9-a165-c2aacdd61c5f)": timeout expired waiting for volumes to attach or mount for pod "default"/"learner-3a77bbc9-7418-44d6-7797-e697a1d43fd1-0". list of unmounted volumes=[cosinputmount-3a77bbc9-7418-44d6-7797-e697a1d43fd1 cosoutputmount-3a77bbc9-7418-44d6-7797-e697a1d43fd1]. list of unattached volumes=[cosinputmount-3a77bbc9-7418-44d6-7797-e697a1d43fd1 cosoutputmount-3a77bbc9-7418-44d6-7797-e697a1d43fd1 learner-entrypoint-files jobdata]
Warning FailedMount 3m (x11 over 9m) kubelet, minikube MountVolume.SetUp failed for volume "cosinputmount-3a77bbc9-7418-44d6-7797-e697a1d43fd1" : mount command failed, status: Failure, reason: Error mounting volume: s3fs mount failed:
Warning FailedMount 3m (x11 over 9m) kubelet, minikube MountVolume.SetUp failed for volume "cosoutputmount-3a77bbc9-7418-44d6-7797-e697a1d43fd1" : mount command failed, status: Failure, reason: Error mounting volume: s3fs mount failed:
# truncated, same results over and over again
$ minikube logs
Dec 21 14:06:29 minikube kubelet[17241]: E1221 14:06:29.648527 17241 driver-call.go:258] mount command failed, status: Failure, reason: Error mounting volume: s3fs mount failed:
Dec 21 14:06:29 minikube kubelet[17241]: E1221 14:06:29.649222 17241 nestedpendingoperations.go:267] Operation for "\"flexvolume-ibm/ibmc-s3fs/06ed9d78-0529-11e9-a165-c2aacdd61c5f-cosinputmount-3a77bbc9-7418-44d6-7797-e697a1d43fd1\" (\"06ed9d78-0529-11e9-a165-c2aacdd61c5f\")" failed. No retries permitted until 2018-12-21 14:08:31.649187316 +0000 UTC m=+474.175452305 (durationBeforeRetry 2m2s). Error: "MountVolume.SetUp failed for volume \"cosinputmount-3a77bbc9-7418-44d6-7797-e697a1d43fd1\" (UniqueName: \"flexvolume-ibm/ibmc-s3fs/06ed9d78-0529-11e9-a165-c2aacdd61c5f-cosinputmount-3a77bbc9-7418-44d6-7797-e697a1d43fd1\") pod \"learner-3a77bbc9-7418-44d6-7797-e697a1d43fd1-0\" (UID: \"06ed9d78-0529-11e9-a165-c2aacdd61c5f\") : mount command failed, status: Failure, reason: Error mounting volume: s3fs mount failed: "
Dec 21 14:06:29 minikube kubelet[17241]: E1221 14:06:29.953671 17241 driver-call.go:258] mount command failed, status: Failure, reason: Error mounting volume: s3fs mount failed:
Dec 21 14:06:29 minikube kubelet[17241]: E1221 14:06:29.954160 17241 nestedpendingoperations.go:267] Operation for "\"flexvolume-ibm/ibmc-s3fs/06ed9d78-0529-11e9-a165-c2aacdd61c5f-cosoutputmount-3a77bbc9-7418-44d6-7797-e697a1d43fd1\" (\"06ed9d78-0529-11e9-a165-c2aacdd61c5f\")" failed. No retries permitted until 2018-12-21 14:08:31.954118575 +0000 UTC m=+474.480383265 (durationBeforeRetry 2m2s). Error: "MountVolume.SetUp failed for volume \"cosoutputmount-3a77bbc9-7418-44d6-7797-e697a1d43fd1\" (UniqueName: \"flexvolume-ibm/ibmc-s3fs/06ed9d78-0529-11e9-a165-c2aacdd61c5f-cosoutputmount-3a77bbc9-7418-44d6-7797-e697a1d43fd1\") pod \"learner-3a77bbc9-7418-44d6-7797-e697a1d43fd1-0\" (UID: \"06ed9d78-0529-11e9-a165-c2aacdd61c5f\") : mount command failed, status: Failure, reason: Error mounting volume: s3fs mount failed: "
Dec 21 14:06:30 minikube kubelet[17241]: W1221 14:06:30.227089 17241 kubelet_pods.go:878] Unable to retrieve pull secret default/regcred for default/ffdl-trainer-858b8ccf95-fpttp due to secrets "regcred" not found. The image pull may not succeed.
Dec 21 14:06:34 minikube kubelet[17241]: E1221 14:06:34.923885 17241 kubelet.go:1635] Unable to mount volumes for pod "learner-3a77bbc9-7418-44d6-7797-e697a1d43fd1-0_default(06ed9d78-0529-11e9-a165-c2aacdd61c5f)": timeout expired waiting for volumes to attach or mount for pod "default"/"learner-3a77bbc9-7418-44d6-7797-e697a1d43fd1-0". list of unmounted volumes=[cosinputmount-3a77bbc9-7418-44d6-7797-e697a1d43fd1 cosoutputmount-3a77bbc9-7418-44d6-7797-e697a1d43fd1]. list of unattached volumes=[cosinputmount-3a77bbc9-7418-44d6-7797-e697a1d43fd1 cosoutputmount-3a77bbc9-7418-44d6-7797-e697a1d43fd1 learner-entrypoint-files jobdata]; skipping pod
Dec 21 14:06:34 minikube kubelet[17241]: E1221 14:06:34.924016 17241 pod_workers.go:186] Error syncing pod 06ed9d78-0529-11e9-a165-c2aacdd61c5f ("learner-3a77bbc9-7418-44d6-7797-e697a1d43fd1-0_default(06ed9d78-0529-11e9-a165-c2aacdd61c5f)"), skipping: timeout expired waiting for volumes to attach or mount for pod "default"/"learner-3a77bbc9-7418-44d6-7797-e697a1d43fd1-0". list of unmounted volumes=[cosinputmount-3a77bbc9-7418-44d6-7797-e697a1d43fd1 cosoutputmount-3a77bbc9-7418-44d6-7797-e697a1d43fd1]. list of unattached volumes=[cosinputmount-3a77bbc9-7418-44d6-7797-e697a1d43fd1 cosoutputmount-3a77bbc9-7418-44d6-7797-e697a1d43fd1 learner-entrypoint-files jobdata]
Dec 21 14:06:42 minikube kubelet[17241]: W1221 14:06:42.227372 17241 kubelet_pods.go:878] Unable to retrieve pull secret default/regcred for default/ffdl-ui-55f5754ffb-d8msw due to secrets "regcred" not found. The image pull may not succeed.
Dec 21 14:07:19 minikube kubelet[17241]: W1221 14:07:19.227941 17241 kubelet_pods.go:878] Unable to retrieve pull secret default/regcred for default/jobmonitor-3a77bbc9-7418-44d6-7797-e697a1d43fd1-6c7d4d484942m5x due to secrets "regcred" not found. The image pull may not succeed.
Dec 21 14:07:19 minikube kubelet[17241]: W1221 14:07:19.231865 17241 kubelet_pods.go:878] Unable to retrieve pull secret default/regcred for default/lhelper-3a77bbc9-7418-44d6-7797-e697a1d43fd1-f7c7d96c5-4qqhh due to secrets "regcred" not found. The image pull may not succeed.
Dec 21 14:07:20 minikube kubelet[17241]: W1221 14:07:20.229235 17241 kubelet_pods.go:878] Unable to retrieve pull secret default/regcred for default/ffdl-restapi-6fc48bd5b5-wdwbr due to secrets "regcred" not found. The image pull may not succeed.
Dec 21 14:07:21 minikube kubelet[17241]: W1221 14:07:21.227647 17241 kubelet_pods.go:878] Unable to retrieve pull secret default/regcred for default/ffdl-lcm-6d96b5767b-g2nn6 due to secrets "regcred" not found. The image pull may not succeed.
Dec 21 14:07:23 minikube kubelet[17241]: W1221 14:07:23.226763 17241 kubelet_pods.go:878] Unable to retrieve pull secret default/regcred for default/ffdl-trainingdata-c57f5cddd-bsfm4 due to secrets "regcred" not found. The image pull may not succeed.
Dec 21 14:07:31 minikube kubelet[17241]: W1221 14:07:31.227217 17241 kubelet_pods.go:878] Unable to retrieve pull secret default/regcred for default/ffdl-trainer-858b8ccf95-fpttp due to secrets "regcred" not found. The image pull may not succeed.
Dec 21 14:07:47 minikube kubelet[17241]: W1221 14:07:47.226982 17241 kubelet_pods.go:878] Unable to retrieve pull secret default/regcred for default/ffdl-ui-55f5754ffb-d8msw due to secrets "regcred" not found. The image pull may not succeed.
Dec 21 14:08:23 minikube kubelet[17241]: W1221 14:08:23.233435 17241 kubelet_pods.go:878] Unable to retrieve pull secret default/regcred for default/lhelper-3a77bbc9-7418-44d6-7797-e697a1d43fd1-f7c7d96c5-4qqhh due to secrets "regcred" not found. The image pull may not succeed.
Dec 21 14:08:25 minikube kubelet[17241]: W1221 14:08:25.230748 17241 kubelet_pods.go:878] Unable to retrieve pull secret default/regcred for default/ffdl-trainingdata-c57f5cddd-bsfm4 due to secrets "regcred" not found. The image pull may not succeed.
Dec 21 14:08:27 minikube kubelet[17241]: W1221 14:08:27.226867 17241 kubelet_pods.go:878] Unable to retrieve pull secret default/regcred for default/ffdl-restapi-6fc48bd5b5-wdwbr due to secrets "regcred" not found. The image pull may not succeed. reproduce $ minikube start --insecure-registry 9.0.0.0/8 --insecure-registry 10.0.0.0/8 \
--cpus 4 \
--memory 4096 --disk-size=40g\
--vm-driver=hyperkit --apiserver-ips 127.0.0.1 --apiserver-name localhost --logtostderr
$ make deploy-plugin
$ make quickstart-deploy
$ make test-push-data-s3
$ make test-job-submit
$ kubectl logs ffdl-restapi-7f5c57c77d-lp4k2
time="2018-12-21T14:31:06Z" level=debug msg="Log level set to 'debug'"
time="2018-12-21T14:31:06Z" level=debug msg="Milli CPU is: 60"
time="2018-12-21T14:31:06Z" level=info msg="GetTrainingDataMemInMB() returns 300"
time="2018-12-21T14:31:06Z" level=debug msg="Training Data Mem in MB is: 300"
time="2018-12-21T14:31:06Z" level=debug msg="No config file 'config-dev.yml' found. Using environment variables only."
{"level":"info","msg":"DLaaS REST API v1 serving on :8080","time":"2018-12-21T14:31:10Z"}
{"level":"info","method":"POST","msg":"Started handling request","remote":"127.0.0.1:40906","request":"/v1/models?version=2017-02-13","time":"2018-12-21T14:43:46Z"}
{"level":"debug","msg":"Enter into auth handler","time":"2018-12-21T14:43:46Z"}
{"level":"debug","msg":"request: \u0026{Method:POST URL:/v1/models?version=2017-02-13 Proto:HTTP/1.1 ProtoMajor:1 ProtoMinor:1 Header:map[Accept:[application/json] Authorization:[Basic dGVzdC11c2VyOnRlc3Q=] Content-Type:[multipart/form-data; boundary=79f1ce044563b1c04bbc0fef5a4af5484d5361472883bc5af5b39e48168e] X-Watson-Userinfo:[bluemix-instance-id=test-user] Accept-Encoding:[gzip] User-Agent:[Go-http-client/1.1]] Body:0xc420374e00 GetBody:\u003cnil\u003e ContentLength:-1 TransferEncoding:[chunked] Close:false Host:localhost:32605 Form:map[] PostForm:map[] MultipartForm:\u003cnil\u003e Trailer:map[] RemoteAddr:127.0.0.1:40906 RequestURI:/v1/models?version=2017-02-13 TLS:\u003cnil\u003e Cancel:\u003cnil\u003e Response:\u003cnil\u003e ctx:0xc420374e40}","time":"2018-12-21T14:43:46Z"}
{"level":"debug","msg":"Writing to header in callBefore \"Access-Control-Allow-Origin: *\"","time":"2018-12-21T14:43:46Z"}
{"level":"debug","msg":"wmlTenantID: ","time":"2018-12-21T14:43:46Z"}
{"level":"debug","msg":"X-DLaaS-UserID: test-user","time":"2018-12-21T14:43:46Z"}
{"Accept":["application/json"],"Accept-Encoding":["gzip"],"Authorization":["Basic dGVzdC11c2VyOnRlc3Q="],"Content-Type":["multipart/form-data; boundary=79f1ce044563b1c04bbc0fef5a4af5484d5361472883bc5af5b39e48168e"],"User-Agent":["Go-http-client/1.1"],"X-Dlaas-Userid":["test-user"],"X-Watson-Userinfo":["bluemix-instance-id=test-user"],"level":"debug","msg":"Request headers:","time":"2018-12-21T14:43:46Z"}
{"caller_info":"server/models_impl.go:63 postModel -","level":"debug","model_filename":"manifest_testrun.yml","module":"rest-api","msg":"postModel invoked: map[Accept:[application/json] Authorization:[Basic dGVzdC11c2VyOnRlc3Q=] Content-Type:[multipart/form-data; boundary=79f1ce044563b1c04bbc0fef5a4af5484d5361472883bc5af5b39e48168e] X-Watson-Userinfo:[bluemix-instance-id=test-user] Accept-Encoding:[gzip] X-Dlaas-Userid:[test-user] User-Agent:[Go-http-client/1.1]]","time":"2018-12-21T14:43:46Z","user_id":"test-user"}
{"caller_info":"server/models_impl.go:59 postModel -","level":"debug","model_filename":"manifest_testrun.yml","module":"rest-api","msg":"Loading Manifest","time":"2018-12-21T14:43:46Z","user_id":"test-user"}
{"level":"info","msg":"dialing to target with scheme: \"\"","time":"2018-12-21T14:43:47Z"}
{"level":"info","msg":"ccResolverWrapper: sending new addresses to cc: [{ffdl-trainer.default.svc.cluster.local:80 0 \u003cnil\u003e}]","time":"2018-12-21T14:43:47Z"}
{"level":"info","msg":"ClientConn switching balancer to \"pick_first\"","time":"2018-12-21T14:43:47Z"}
{"level":"info","msg":"pickfirstBalancer: HandleSubConnStateChange: 0xc420281ea0, CONNECTING","time":"2018-12-21T14:43:47Z"}
{"level":"info","msg":"pickfirstBalancer: HandleSubConnStateChange: 0xc420281ea0, READY","time":"2018-12-21T14:43:47Z"}
{"caller_info":"server/manifest.go:237 manifest2TrainingRequest -","level":"debug","model_filename":"manifest_testrun.yml","module":"rest-api","msg":"EMExtractionSpec ImageTag: ","time":"2018-12-21T14:43:47Z","user_id":"test-user"}
{"caller_info":"server/models_impl.go:117 postModel -","error":"rpc error: code = Canceled desc = context canceled","level":"error","model_filename":"manifest_testrun.yml","module":"rest-api","msg":"Trainer service call failed","time":"2018-12-21T14:43:56Z","user_id":"test-user"}
{"caller_info":"server/models_impl.go:857 error500 -","level":"error","model_filename":"manifest_testrun.yml","module":"rest-api","msg":"Returning 500 error: ","time":"2018-12-21T14:43:56Z","user_id":"test-user"}
{"level":"info","measure#rest-api.latency":9943356700,"method":"POST","msg":"Completed handling request","remote":"127.0.0.1:40906","request":"/v1/models?version=2017-02-13","status":500,"text_status":"Internal Server Error","time":"2018-12-21T14:43:56Z","took":9943356700} Yesterday @fplk mentioned about the version of dind. I downloaded dind 1.10.9, but decide to give it up because of failure of dind 1.10.9 installation. FYI, there're two more thing I'd like to mention. I left NULL to environment variable SHARED_VOLUME_STORAGE_CLASS under both minikube and dind VM. I hope there's no connection with this part. And S3 service part works well, I mean I checked out s3 buckets which does have the training data after the command @sboagibm Thanks a lot for any of your suggestions. |
I apologize for the delay due to the holidays. I think I can reproduce the error you encountered and have been able to get it working. A couple of things: a) The scripts in https://github.com/IBM/FfDL/tree/master/bin/dind_scripts should largely work with the exception that you need to update DIND in launch_kubernetes.sh (RawGit is deprecated and 1.9 is old, so I successfully used https://github.com/kubernetes-sigs/kubeadm-dind-cluster/releases/download/v0.1.0/dind-cluster-v1.13.sh - just replace all occurrences in the file accordingly) b) Here are my steps: ssh root@<machine>.sl.cloud9.ibm.com
apt install -y git software-properties-common
mkdir -p /home/ffdlr/go/src/github.com/IBM/ && cd $_ && git clone https://github.com/IBM/FfDL.git && cd FfDL
# Replace DIND version as explained in (a)
cd bin/dind_scripts/
chmod +x create_user.sh
. create_user.sh
# Enter new password and get kicked out
ssh ffdlr@<machine>.sl.cloud9.ibm.com
cd /home/ffdlr/go/src/github.com/IBM/FfDL/bin/dind_scripts/
sudo chmod +x experimental_master.sh
. experimental_master.sh Build own manifest with:
Run with c) This should work, but is currently not the most user-friendly way of setting things up. I can try to push some changes to improve usability - it looks from the outside like the manifest creation gets garbled up somewhere, but until then this should get you running. d) Minikube is a suboptimal environment due to unfixed storage bugs on their side. FfDL should be able to run against GKE in principle, but I'm not sure the open source S3 driver will work against that. I think the storage team supports DIND and IBM Cloud and their architecture should work against any cloud provider, but you would have to test the driver manually and PR minor changes if it does not work against your target provider out of the box. Or use a different driver. |
OK, with #158 I can deploy FfDL with the typical 4 commands from the README against DIND on macOS and Linux as well as IBM Cloud. Please let me know if it helps. |
What happend:
Hi there, thanks a lot for your work. It's impressive, so I was trying to deploy it on local MINIKUBE and local DIND, but in fact none of them worked properly. I was stuck in an issue for few days, so I'd like to ask you guys for help. By chance I've found something similar to my issue from your docs but in the different condition, which means:
minikube
encountered the issue which was recorded in the DIND-TRAING -- all pods worked as expectedexcept the pod learner with eternal pending status because of the following warning.
and here's the details of pod learner-x
dind
encountered the issue with non-hint FAILED ERROR while training. All the pods was running, but there're no pods jobmonitor, learner and lhelper.What you expected to happen:
Make FfDL work as properly on either local DIND or MINIKUBE.
Environment:
OS: Darwin local 17.4.0 Darwin Kernel Version 17.4.0:
MINIKUBE:
How to reproduce it (as minimally and precisely as possible):
I was just following README.rd with several make instructions
Anything else we need to know?:
In situation 2, I totally followed the above-mentioned steps;
In situation 1, because it popped out hints that nfs error at first, and I just remember one of the doc I've read about MINIKUBE as if to say that, for persistent volumes, it just supports
hostpath
type, so I created a PV and PVC, here's the details.Thanks in advance for all advices and have a good day
The text was updated successfully, but these errors were encountered: