Skip to content

Commit

Permalink
add "datahub not loading" error
Browse files Browse the repository at this point in the history
  • Loading branch information
lillianw101 committed Mar 20, 2024
1 parent bb61b61 commit 9eb78a9
Show file tree
Hide file tree
Showing 6 changed files with 19 additions and 4 deletions.
Binary file modified docs/Data-100-Debugging-Guide.pdf
Binary file not shown.
5 changes: 5 additions & 0 deletions docs/jupyter_datahub/jupyter_datahub.html
Original file line number Diff line number Diff line change
Expand Up @@ -171,6 +171,7 @@ <h2 id="toc-title">Table of contents</h2>
<li><a href="#click-here-to-download-zip-file-is-not-working" id="toc-click-here-to-download-zip-file-is-not-working" class="nav-link" data-scroll-target="#click-here-to-download-zip-file-is-not-working">“Click <u>here</u> to download zip file” is not working</a></li>
<li><a href="#i-cant-export-my-assignment-as-a-pdf-due-to-a-latexfailed-error" id="toc-i-cant-export-my-assignment-as-a-pdf-due-to-a-latexfailed-error" class="nav-link" data-scroll-target="#i-cant-export-my-assignment-as-a-pdf-due-to-a-latexfailed-error">I can’t export my assignment as a PDF due to a <code>LatexFailed</code> error</a></li>
<li><a href="#i-cant-open-jupyter-http-error-431" id="toc-i-cant-open-jupyter-http-error-431" class="nav-link" data-scroll-target="#i-cant-open-jupyter-http-error-431">I can’t open Jupyter: <code>HTTP ERROR 431</code></a></li>
<li><a href="#datahub-is-not-loading" id="toc-datahub-is-not-loading" class="nav-link" data-scroll-target="#datahub-is-not-loading">Datahub is not loading</a></li>
</ul>
</nav>
</div>
Expand Down Expand Up @@ -268,6 +269,10 @@ <h2 class="anchored" data-anchor-id="i-cant-export-my-assignment-as-a-pdf-due-to
<section id="i-cant-open-jupyter-http-error-431" class="level2">
<h2 class="anchored" data-anchor-id="i-cant-open-jupyter-http-error-431">I can’t open Jupyter: <code>HTTP ERROR 431</code></h2>
<p>If this happens, try <a href="https://support.google.com/accounts/answer/32050?hl=en&amp;co=GENIE.Platform%3DDesktop">clearing your browser cache</a> or opening Datahub in an incognito window.</p>
</section>
<section id="datahub-is-not-loading" class="level2">
<h2 class="anchored" data-anchor-id="datahub-is-not-loading">Datahub is not loading</h2>
<p>If your link to Datahub is not loading, go to https://data100.datahub.berkeley.edu/hub/home and restart your server.</p>


</section>
Expand Down
2 changes: 1 addition & 1 deletion docs/projA2/projA2.html
Original file line number Diff line number Diff line change
Expand Up @@ -294,7 +294,7 @@ <h3 class="anchored" data-anchor-id="typeerror-nonetype-is-not-subscriptable"><c
<p>This error occurs when a <code>NoneType</code> variable is being accessed like a class, for example <code>None.some_function()</code>. It may be difficult to identify where the <code>NoneType</code> is coming from, but here are some possible causes:</p>
<ul>
<li>Check that your helper functions always end with a <code>return</code> statement and that the result is expected!</li>
<li><code>panda</code>s <code>inplace=</code> argument allows us to simplify code; instead of reassigning <code>df = df.an_operation(inplace=False)</code>, you can choose to shorten the operation as <code>df.an_operation(inplace=True)</code>. Note that any <code>inplace=True</code> argument modifies the <code>DataFrame</code> and <em>returns nothing</em>. Both <code>df = df.an_operation(inplace=True)</code> and <code>df.an_operation(inplace=True).another_operation()</code> will result in this <code>TypeError</code>.</li>
<li><code>pandas</code><code>inplace=</code> argument allows us to simplify code; instead of reassigning <code>df = df.an_operation(inplace=False)</code>, you can choose to shorten the operation as <code>df.an_operation(inplace=True)</code>. Note that any <code>inplace=True</code> argument modifies the <code>DataFrame</code> and <em>returns nothing</em> (read more about it in <a href="https://stackoverflow.com/questions/45570984/in-pandas-is-inplace-true-considered-harmful-or-not">this stack overflow post</a>). Both <code>df = df.an_operation(inplace=True)</code> and <code>df.an_operation(inplace=True).another_operation()</code> will result in this <code>TypeError</code>.</li>
</ul>
<p>We suggest adding print statements to your function to find the <code>None</code> values.</p>
</section>
Expand Down
9 changes: 8 additions & 1 deletion docs/search.json
Original file line number Diff line number Diff line change
Expand Up @@ -104,6 +104,13 @@
"section": "I can’t open Jupyter: HTTP ERROR 431",
"text": "I can’t open Jupyter: HTTP ERROR 431\nIf this happens, try clearing your browser cache or opening Datahub in an incognito window."
},
{
"objectID": "jupyter_datahub/jupyter_datahub.html#datahub-is-not-loading",
"href": "jupyter_datahub/jupyter_datahub.html#datahub-is-not-loading",
"title": "Jupyter / Datahub",
"section": "Datahub is not loading",
"text": "Datahub is not loading\nIf your link to Datahub is not loading, go to https://data100.datahub.berkeley.edu/hub/home and restart your server."
},
{
"objectID": "autograder_gradescope/autograder_gradescope.html#autograder",
"href": "autograder_gradescope/autograder_gradescope.html#autograder",
Expand Down Expand Up @@ -221,7 +228,7 @@
"href": "projA2/projA2.html#questions-5d-and-5f",
"title": "Project A2 Common Questions",
"section": "Questions 5d and 5f",
"text": "Questions 5d and 5f\n\nGeneral Debugging Tips\nQuestion 5 is a challenging question that mirrors a lot of data science work in the real world: cleaning, exploring, and transforming data; fitting a model, working with a pre-defined pipeline and evaluating your model’s performance. Here are some general debugging tips to make the process easier:\n\nSeparate small tasks into helper functions, especially if you will execute them multiple times. For example, a helper function that one-hot encodes a categorical variable may be helpful as you could perform it on multiple such columns. If you’re parsing a column with RegEx, it also might be a good idea to separate it to a helper function. This allows you to verify that you’re not making errors in these small tasks and prevents unknown bugs from appearing.\nFeel free to make new cells to play with the data! As long as you delete them afterward, it will not affect the autograder.\nThe feature_engine_final function looks daunting at first, but start small. First, try and implement a model with a single feature to get familiar with how the pipeline works, then slowly experiment with adding one feature at a time and see how that affects your training RMSE.\n\n\n\nMy training RMSE is low, but my validation/test RMSE is high\nYour model is likely overfitting to the training data and does not generalize to the test set. Recall the bias-variance tradeoff discussed in lecture. As you add more features and make your model more complex, it is expected that your training error will decrease. Your validation and test error may also decrease initially, but if your model is too complex, you end up with high validation and test RMSE.\n\n\n\nTo decrease model complexity, consider visualizing the relationship between the features you’ve chosen with the (Log) Sale Price and removing features that are not highly correlated. Removing outliers can also help your model generalize better and prevent it from fitting to noise in the data. Methods like cross-validation allow you to get a better sense of where you lie along the validation error curve. Feel free to take a look at the code used in Lecture 16 if you’re confused on how to implement cross-validation.\n\n\nValueError: Per-column arrays must each be 1-dimensional\nIf you’re passing the tests for question 5d but getting this error in question 5f, then your Y variable is likely a DataFrame, not a Series. sklearn models like LinearRegression expect X to be a 2D datatype (ie. DataFrame, 2D NumPy array) and Y to be a 1D datatype (ie. Series, 1D NumPy array).\n\n\nKeyError: 'Sale Price'/KeyError: 'Log Sale Price'\nKeyErrors are raised when a column name does not exist in your DataFrame. You could be getting this error because:\n\nThe test set does not contain a \"(Log) Sale Price\" as that’s what we’re trying to predict. Make sure you only reference the \"(Log) Sale Price\" column when working with training data (is_test_set=False).\nYou dropped the \"Sale Price\" column twice in your preprocessing code.\n\n\n\nValue Error: could not convert string to float\nThis error occurs if your final design matric contains non-numeric columns. For example, if you simply run X = data.drop(columns = [\"Log Sale Price\", \"Sale Price\"]), all the non-numeric columns of data are still included in X and you will see this error message. The fit function of a lm.LinearRegression object can take a pandas DataFrame as the X argument, but requires that the DataFrame is only composed of numeric values.\n\n\nValueError: Input X contains infinity or a value too large for dtype('float64')\nThe reason why your X data contains infinity is likely because you are taking the logarithm of 0 somewhere in your code. To prevent this, try:\n\nAdding a small number to the features that you want to perform the log transformation on so that all values are positive and greater than 0. Note that whatever value you add to your train data should also be added to your test data.\nRemoving zeroes before taking the logarithm. Note that this is only possible on the training data as you cannot drop rows from the test set.\n\n\n\nValueError: Input X contains NaN\nThe reason why your design matrix X contains NaN values is likely because you take the log of a negative number somewhere in your code. To prevent this, try:\n\nShifting the range of values for features that you want to perform the logging operation on to positive values greater than 0. Note that whatever value you add to your train data should also be added to your test data.\nRemoving negative values before taking the log. Note that this is only possible on the training data as you cannot drop rows from the test set.\n\n\n\nValueError: The feature names should match those that were passed during fit\nThis error is followed by one or both of the following:\nFeature names unseen at fit time: \n- FEATURE NAME 1\n- FEATURE NAME 2\n ...\n\nFeature names seen at fit time, yet now missing\n- FEATURE NAME 1\n- FEATURE NAME 2\n ...\nThis error occurs if the columns/features you’re passing in for the test dataset aren’t the same as the features you used to train the model. sklearn’s models expect the testing data’s column names to match the training data’s. The features listed under Feature names unseen at fit time are columns that were present in the training data but not the testing data, and features listed under Feature names seen at fit time, yet now missing were present in the testing data but not the training data.\nPotential causes for this error:\n\nYour preprocessing for X is different for training and testing. Double-check your code in feature_engine_final! Besides removing any references to 'Sale Price' and code that would remove rows from the test set, your preprocessing should be the same.\nSome one-hot-encoded categories are present in training but not in testing (or vice versa). For example, let’s say that the feature \"Data100\" has categories “A”, “B”, “C”, and “D”. If “A”, “B”, and “C” are present in the training data, but “B”, “C”, and “D” are present in the testing data, you will get this error:\nThe feature names should match those that were passed during fit. Feature names unseen at fit time: \n- Data100_D\n ...\n\nFeature names seen at fit time, yet now missing\n- Data100_A\n\n\n\nValueError: operands could not be broadcast together with shapes ...\nThis error occurs when you attempt to perform an operation on two NumPy arrays with mismatched dimensions. For example, np.ones(100000) - np.ones(1000000) is not defined since you cannot perform elementwise addition on arrays with different lengths. Use the error traceback to identify which line is erroring, and print out the shape of the arrays on the line before using .shape.\n\n\nTypeError: NoneType is not subscriptable\nThis error occurs when a NoneType variable is being accessed like a class, for example None.some_function(). It may be difficult to identify where the NoneType is coming from, but here are some possible causes:\n\nCheck that your helper functions always end with a return statement and that the result is expected!\npanda’s inplace= argument allows us to simplify code; instead of reassigning df = df.an_operation(inplace=False), you can choose to shorten the operation as df.an_operation(inplace=True). Note that any inplace=True argument modifies the DataFrame and returns nothing. Both df = df.an_operation(inplace=True) and df.an_operation(inplace=True).another_operation() will result in this TypeError.\n\nWe suggest adding print statements to your function to find the None values."
"text": "Questions 5d and 5f\n\nGeneral Debugging Tips\nQuestion 5 is a challenging question that mirrors a lot of data science work in the real world: cleaning, exploring, and transforming data; fitting a model, working with a pre-defined pipeline and evaluating your model’s performance. Here are some general debugging tips to make the process easier:\n\nSeparate small tasks into helper functions, especially if you will execute them multiple times. For example, a helper function that one-hot encodes a categorical variable may be helpful as you could perform it on multiple such columns. If you’re parsing a column with RegEx, it also might be a good idea to separate it to a helper function. This allows you to verify that you’re not making errors in these small tasks and prevents unknown bugs from appearing.\nFeel free to make new cells to play with the data! As long as you delete them afterward, it will not affect the autograder.\nThe feature_engine_final function looks daunting at first, but start small. First, try and implement a model with a single feature to get familiar with how the pipeline works, then slowly experiment with adding one feature at a time and see how that affects your training RMSE.\n\n\n\nMy training RMSE is low, but my validation/test RMSE is high\nYour model is likely overfitting to the training data and does not generalize to the test set. Recall the bias-variance tradeoff discussed in lecture. As you add more features and make your model more complex, it is expected that your training error will decrease. Your validation and test error may also decrease initially, but if your model is too complex, you end up with high validation and test RMSE.\n\n\n\nTo decrease model complexity, consider visualizing the relationship between the features you’ve chosen with the (Log) Sale Price and removing features that are not highly correlated. Removing outliers can also help your model generalize better and prevent it from fitting to noise in the data. Methods like cross-validation allow you to get a better sense of where you lie along the validation error curve. Feel free to take a look at the code used in Lecture 16 if you’re confused on how to implement cross-validation.\n\n\nValueError: Per-column arrays must each be 1-dimensional\nIf you’re passing the tests for question 5d but getting this error in question 5f, then your Y variable is likely a DataFrame, not a Series. sklearn models like LinearRegression expect X to be a 2D datatype (ie. DataFrame, 2D NumPy array) and Y to be a 1D datatype (ie. Series, 1D NumPy array).\n\n\nKeyError: 'Sale Price'/KeyError: 'Log Sale Price'\nKeyErrors are raised when a column name does not exist in your DataFrame. You could be getting this error because:\n\nThe test set does not contain a \"(Log) Sale Price\" as that’s what we’re trying to predict. Make sure you only reference the \"(Log) Sale Price\" column when working with training data (is_test_set=False).\nYou dropped the \"Sale Price\" column twice in your preprocessing code.\n\n\n\nValue Error: could not convert string to float\nThis error occurs if your final design matric contains non-numeric columns. For example, if you simply run X = data.drop(columns = [\"Log Sale Price\", \"Sale Price\"]), all the non-numeric columns of data are still included in X and you will see this error message. The fit function of a lm.LinearRegression object can take a pandas DataFrame as the X argument, but requires that the DataFrame is only composed of numeric values.\n\n\nValueError: Input X contains infinity or a value too large for dtype('float64')\nThe reason why your X data contains infinity is likely because you are taking the logarithm of 0 somewhere in your code. To prevent this, try:\n\nAdding a small number to the features that you want to perform the log transformation on so that all values are positive and greater than 0. Note that whatever value you add to your train data should also be added to your test data.\nRemoving zeroes before taking the logarithm. Note that this is only possible on the training data as you cannot drop rows from the test set.\n\n\n\nValueError: Input X contains NaN\nThe reason why your design matrix X contains NaN values is likely because you take the log of a negative number somewhere in your code. To prevent this, try:\n\nShifting the range of values for features that you want to perform the logging operation on to positive values greater than 0. Note that whatever value you add to your train data should also be added to your test data.\nRemoving negative values before taking the log. Note that this is only possible on the training data as you cannot drop rows from the test set.\n\n\n\nValueError: The feature names should match those that were passed during fit\nThis error is followed by one or both of the following:\nFeature names unseen at fit time: \n- FEATURE NAME 1\n- FEATURE NAME 2\n ...\n\nFeature names seen at fit time, yet now missing\n- FEATURE NAME 1\n- FEATURE NAME 2\n ...\nThis error occurs if the columns/features you’re passing in for the test dataset aren’t the same as the features you used to train the model. sklearn’s models expect the testing data’s column names to match the training data’s. The features listed under Feature names unseen at fit time are columns that were present in the training data but not the testing data, and features listed under Feature names seen at fit time, yet now missing were present in the testing data but not the training data.\nPotential causes for this error:\n\nYour preprocessing for X is different for training and testing. Double-check your code in feature_engine_final! Besides removing any references to 'Sale Price' and code that would remove rows from the test set, your preprocessing should be the same.\nSome one-hot-encoded categories are present in training but not in testing (or vice versa). For example, let’s say that the feature \"Data100\" has categories “A”, “B”, “C”, and “D”. If “A”, “B”, and “C” are present in the training data, but “B”, “C”, and “D” are present in the testing data, you will get this error:\nThe feature names should match those that were passed during fit. Feature names unseen at fit time: \n- Data100_D\n ...\n\nFeature names seen at fit time, yet now missing\n- Data100_A\n\n\n\nValueError: operands could not be broadcast together with shapes ...\nThis error occurs when you attempt to perform an operation on two NumPy arrays with mismatched dimensions. For example, np.ones(100000) - np.ones(1000000) is not defined since you cannot perform elementwise addition on arrays with different lengths. Use the error traceback to identify which line is erroring, and print out the shape of the arrays on the line before using .shape.\n\n\nTypeError: NoneType is not subscriptable\nThis error occurs when a NoneType variable is being accessed like a class, for example None.some_function(). It may be difficult to identify where the NoneType is coming from, but here are some possible causes:\n\nCheck that your helper functions always end with a return statement and that the result is expected!\npandas’ inplace= argument allows us to simplify code; instead of reassigning df = df.an_operation(inplace=False), you can choose to shorten the operation as df.an_operation(inplace=True). Note that any inplace=True argument modifies the DataFrame and returns nothing (read more about it in this stack overflow post). Both df = df.an_operation(inplace=True) and df.an_operation(inplace=True).another_operation() will result in this TypeError.\n\nWe suggest adding print statements to your function to find the None values."
},
{
"objectID": "projA2/projA2.html#question-6",
Expand Down
Loading

0 comments on commit 9eb78a9

Please sign in to comment.