Skip to content
This repository has been archived by the owner on Aug 5, 2024. It is now read-only.

housker-columbia/CryptoClustering

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Columbia AI Module 11 Challenge: CryptoClustering Unsupervised

Requirements

Find the Best Value for k Using the Original Scaled DataFrame (15 points)

  • Code the elbow method algorithm to find the best value for k. Use a range from 1 to 11. (5 points)
  • Visually identify the optimal value for k by plotting a line chart of all the inertia values computed with the different values of k. (5 points)
  • Answer the following question: What’s the best value for k? (5 points)

Cluster Cryptocurrencies with K-Means Using the Original Scaled Data (10 points)

  • Initialize the K-means model with four clusters by using the best value for k. (1 point)
  • Fit the K-means model by using the original data. (1 point)
  • Predict the clusters for grouping the cryptocurrencies by using the original data. Review the resulting array of cluster values. (3 points)
  • Create a copy of the original data, and then add a new column of the predicted clusters. (1 point)
  • Using pandas’ plot, create a scatter plot by setting x="price_change_percentage_24h" and y="price_change_percentage_7d". (4 points)

Optimize the Clusters with Principal Component Analysis (10 points)

  • Create a PCA model instance, and set n_components=3. (1 point)
  • Use the PCA model to reduce the features to three principal components, then review the first five rows of the DataFrame. (2 points)
  • Get the explained variance to determine how much information can be attributed to each principal component. (2 points)
  • Answer the following question: What’s the total explained variance of the three principal components? (3 points)
  • Create a new DataFrame with the PCA data. Be sure to set the coin_id index from the original DataFrame as the index for the new DataFrame. Review the resulting DataFrame. (2 points)

Find the Best Value for k by Using the PCA Data (10 points)

  • Code the elbow method algorithm, and use the PCA data to find the best value for k. Use a range from 1 to 11. (2 points)
  • Visually identify the optimal value for k by plotting a line chart of all the inertia values computed with the different values of k. (5 points)
  • Answer the following questions: What’s the best value for k when using the PCA data? Does it differ from the best value for k that you found by using the original data? (3 points)

Cluster the Cryptocurrencies with K-Means by Using the PCA Data (10 points)

  • Initialize the K-means model with four clusters by using the best value for k. (1 point)
  • Fit the K-means model by using the PCA data. (1 point)
  • Predict the clusters for grouping the cryptocurrencies by using the PCA data. Review the resulting array of cluster values. (3 points)
  • Create a copy of the DataFrame with the PCA data, and then add a new column to store the predicted clusters. (1 point)
  • Using pandas’ plot, create a scatter plot by setting x="PC1" and y="PC2". (4 points)

Determine the Weights of Each Feature on Each Principal Component (15 points)

  • Create a DataFrame that shows the weights of each feature (column) for each principal component by using the columns from the original scaled DataFrame as the index. (10 points)
  • Answer the following question: Which features have the strongest positive or negative influence on each component? (5 points)

Coding Conventions and Formatting (10 points)

  • Place imports at the top of the file, just after any module comments and docstrings, and before module globals and constants. (3 points)
  • Name functions and variables with lowercase characters, with words separated by underscores. (2 points)
  • Follow DRY (Don't Repeat Yourself) principles, creating maintainable and reusable code. (3 points)
  • Use concise logic and creative engineering where possible. (2 points)

Deployment and Submission (10 points)

  • Submit a link to a GitHub repository that’s cloned to your local machine and that contains your files. (4 points)
  • Use the command line to add your files to the repository. (3 points)
  • Include appropriate commit messages in your files. (3 points)

Code Comments (10 points)

  • Be well commented with concise, relevant notes that other developers can understand. (10 points)

About

Columbia AI Module 11 Challenge

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published