Implement the code from the Paper.
But the paper is ambiguous about the setting of the architecture, like how many filters in the inception models, so I refer the part of setting from here. Besides, I didn't realize the regularization "Label Smoothing Regularization", because I think it just improved merely on the accuacy(improvement 0.2% on top1-error), it needs more experiments to prove that the regularization "indeed" helps on overfitting.
The accuracy and loss are shown as below:(100 classes)