You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Running into an issue when using Adanet TPUEstimator. Say, for example, the estimator is configured with max_iteration_steps=500 and it is desired to evaluate the model's performance during training after every 100 training steps (i.e. steps_per_evaluation=100) for 2 complete Adanet iterations.
To achieve this, estimator.train(max_steps, train_input) followed by estimator.evaluate(eval_input) are run in a loop, while incrementing max_steps by steps_per_evaluation number of steps at the end of each loop, until max_steps=1000 is reached (i.e. corresponding to 2 complete Adanet iterations)
When running in local mode (i.e. use_tpu=False), training proceeds as expected. That is, training proceeds for 2 complete Adanet iterations (i.e. steps 0 to 500 for the first iteration and steps 500 to 1000 for the second iteration, with evaluation every 100 steps). However, when running on CloudTPU (i.e. use_tpu=True), training reaches max_steps=1000 without ever progressing to a second iteration.
On the other hand, a single call of estimator.train(max_steps=1000, train_input) using CloudTPU without the estimator.evaluate results in 2 complete Adanet iterations as expected. This makes me think the issue lies with the evaluation call? What could the issue be? If this is a TPUEstimator related issue, am I then constrained to the standard Estimator if I want this kind of train-evaluation loop configuration?
The text was updated successfully, but these errors were encountered:
@nicholasbreckwoldt: We just released adanet=0.9.0 which includes better TPU, and TF 2 support. Please try installing it, and let us know if it resolves your issue.
@cweill Thanks for the update! I am running into a new issue with the upgrade to TF 2.2 and adanet==0.9.0 which has so far prevented me from establishing whether the above evaluation issue has been resolved. I've added a description of this new issue (#157).
Running into an issue when using Adanet TPUEstimator. Say, for example, the estimator is configured with
max_iteration_steps=500
and it is desired to evaluate the model's performance during training after every 100 training steps (i.e.steps_per_evaluation=100
) for 2 complete Adanet iterations.To achieve this,
estimator.train(max_steps, train_input
) followed byestimator.evaluate(eval_input)
are run in a loop, while incrementingmax_steps
bysteps_per_evaluation
number of steps at the end of each loop, untilmax_steps=1000
is reached (i.e. corresponding to 2 complete Adanet iterations)When running in local mode (i.e.
use_tpu=False
), training proceeds as expected. That is, training proceeds for 2 complete Adanet iterations (i.e. steps 0 to 500 for the first iteration and steps 500 to 1000 for the second iteration, with evaluation every 100 steps). However, when running on CloudTPU (i.e.use_tpu=True
), training reachesmax_steps=1000
without ever progressing to a second iteration.On the other hand, a single call of
estimator.train(max_steps=1000, train_input)
using CloudTPU without theestimator.evaluate
results in 2 complete Adanet iterations as expected. This makes me think the issue lies with the evaluation call? What could the issue be? If this is a TPUEstimator related issue, am I then constrained to the standard Estimator if I want this kind of train-evaluation loop configuration?The text was updated successfully, but these errors were encountered: