Skip to content

EthanNorton/SWEskills

Repository files navigation

Data Science and Software Engineering Foundations

To excel as a data scientist and effectively manage code, it's crucial to develop strong software engineering (SWE) skills. Here are the skills I am developing.

Goals and Coding Principles

  1. Ensure code is easily interpretable by all team members.
  2. Prioritize reproducibility, cleanliness, and thorough documentation.
  3. Aim for scalable code and understand system design principles.

Libraries and Principles

1. ETL Principles

  • Extract, Transform, Load:
    • Understand the principles of ETL processes for data integration and manipulation.

2. Deep Learning

  • Libraries:
    • TensorFlow, PyTorch, Keras
  • Topics:
    • Neural networks, convolutional neural networks (CNNs), recurrent neural networks (RNNs), transformers
  • Practice:
    • Apply deep learning techniques to various data science tasks.

3. Design Principles

  • Clean Code:
    • Understand Python code optimization, SOLID principles, and PEP-20.
  • Design Patterns:
    • Learn common design patterns for solving recurring design problems, working through neet code library that has common leet code problems.
  • System Design:
    • Study architectural patterns for scalability and performance.

4. Libraries Knowledge

  • ML Library Development:
    • Develop ML libraries and types of objects in ML flow.
  • Big Data Processing:
    • Explore Spark, Dask, and DAG workflows for scalable data processing.

5. Math and Algorithms

  • Optimization:
    • Learn performance profiling and optimization techniques.
  • Concurrency and Parallelism:
    • Understand multi-threading, parallel processing, and asynchronous programming.
  • Simulation:
    • Study simulation techniques for modeling complex systems.

Additional Resources and Practices

  • Version Control:
    • Master Git and understand best practices for version control.
  • Testing:
    • Write unit tests, integration tests, and practice test-driven development (TDD).
  • CI/CD:
    • Familiarize myself with continuous integration and deployment pipelines. Continously update repos.
  • Database Optimization:
    • Learn indexing, query optimization, and database normalization.
  • Community and Forums:
    • Engage with classmates, YouTube channels, and forums to stay updated and collaborate on projects.
    • Such as MSDS class.

Relevant Videos and Resources

Releases

No releases published

Packages

No packages published