-
Make sure you enclose your code in triple back ticks. Example:
use this code - notice the 3 ` enclosing the code block:
to render this:
~/.conda/envs/tf-gpu/lib/python3.6/multiprocessing/popen_fork.py in __init__(self, process_obj)
18 sys.stderr.flush()
19 self.returncode = None
---> 20 self._launch(process_obj)
21
22 def duplicate_for_child(self, fd):
~/.conda/envs/tf-gpu/lib/python3.6/multiprocessing/popen_fork.py in _launch(self, process_obj)
65 code = 1
66 parent_r, child_w = os.pipe()
---> 67 self.pid = os.fork()
68 if self.pid == 0:
69 try:
OSError: [Errno 12] Cannot allocate memory
🔴 NOTE: Do NOT put your Jupyter Notebook under the /data/
directory! Here's the link for why.
The default location is under the dl1
folder, wherever you've cloned the repo on your GPU machine.
my example
(fastai) paperspace@psgyqmt1m:~$ ls
anaconda3 data downloads fastai
- Paperspace:
/home/paperspace/fastai/courses/dl1
- AWS:
/home/ubuntu/fastai/courses/dl1
If you change the default location of your notebook, you'll need to update your .bashrc
file. Add in the path to where you've cloned the fastai GitHub repo:
- for me, my notebooks are in a "projects" directory:
~/projects
- my
fastai
repo is cloned at the root level, so it is here:~/fastai
in the file .bashrc
add this path:
export PYTHONPATH=$PYTHONPATH:~/fastai
Reminder: don't forget to run (or source
) your .bashrc
file:
- add path where fastai repo is to
.bashrc
- save and exit
- source it:
source ~/.bashrc
Note that if you did pip install
, you don't need to specify the path (as in option 2, or you don't need to put in the courses folder, as in option 1).
However, fastai is still being updated so there is a delay in library being available directly via pip.
Can try:
pip install https://github.com/fastai/fastai/archive/master.zip
my path
PATH = "/home/ubuntu/data/dogscats/"
looking at my directory structure
!tree {PATH} -d
/home/ubuntu/data/dogscats/
├── models
├── sample
│ ├── models
│ ├── tmp
│ ├── train
│ │ ├── cats
│ │ └── dogs
│ └── valid
│ ├── cats
│ └── dogs
├── test
├── train
│ ├── cats
│ └── dogs
└── valid
├── cats
└── dogs
models
directory: created automaticallysample
directory: you create this with a small sub-sample, for testing codetest
directory: put any test data there if you have ittrain
/test
directory: you create these and separate the data using your own data sampletmp
directory: if you have this, it was automatically created after running models- fastai / keras code automatically picks up the label of your categories based on your folders. Hence, in this example, the two labels are: dogs, cats
- not important, you can name them whatever you want
looking at file counts
# print number of files in each folder
print("training data: cats")
!ls -l {PATH}train/cats | grep ^[^dt] | wc -l
print("training data: dogs")
!ls -l {PATH}train/dogs | grep ^[^dt] | wc -l
print("validation data: cats")
!ls -l {PATH}valid/cats | grep ^[^dt] | wc -l
print("validation data: dogs")
!ls -l {PATH}valid/dogs | grep ^[^dt] | wc -l
print("test data")
!ls -l {PATH}test1 | grep ^[^dt] | wc -l
my output
training data: cats
11501
training data: dogs
11501
validation data: cats
1001
validation data: dogs
1001
test data
12501
- can do
80/20
(train/validation) - if you have or are creating a 'test' split, use for (train/validation/test):
- can do
80/15/5
- can do
70/20/10
- can do
60/20/20
- can do
Note: Depending on who the instructor is, they use various naming conventions:
- train/test and then validation for holdout data
- train/validation and then test for holdout data
It's important to understand that:
- in the case of train/test, the test set is used to test for generalization
- the holdout data is a second test set
Instructions on using scp
command to transfer files from platforms