Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Microsoft Malware Prediction #795

Open
somaiaahmed opened this issue Jun 14, 2024 · 11 comments
Open

Microsoft Malware Prediction #795

somaiaahmed opened this issue Jun 14, 2024 · 11 comments
Labels
Status: Up for Grabs Up for grabs issue.

Comments

@somaiaahmed
Copy link

🔴 Project Title: Microsoft Malware Prediction Challenge

🔴 Aim: Develop predictive models using data science techniques to anticipate malware attacks on machines, thereby preventing potential damage to Microsoft's vast user base.

🔴 Dataset: Utilize the unprecedented malware dataset provided by Microsoft to facilitate open-source advancements in malware prediction techniques.

🔴 Approach: Perform exploratory data analysis (EDA) on the malware dataset to understand its structure and characteristics. Implement 3-4 machine learning algorithms such as Random Forest, XGBoost, Neural Networks, and others. Compare these algorithms based on their performance metrics such as accuracy, precision, and recall to identify the most effective model for predicting malware occurrences.


📍 Follow the Guidelines to Contribute in the Project:

  • Create a separate folder named "Microsoft Malware Prediction" under the main repository.
  • Inside the "Microsoft Malware Prediction" folder, include the following components:
    • Images: For any necessary visualizations or diagrams related to EDA or model comparisons.
    • Dataset: Provide information about the malware dataset and its source.
    • Model: Implement machine learning models using the malware dataset.
    • requirements.txt: List required packages/libraries for project replication.
  • Inside the Model folder, ensure the README.md file is filled with visualizations, conclusions, and model performance details.

🔴🟡 Points to Note:

  • Issues are assigned on a first-come, first-serve basis; 1 Issue == 1 Pull Request (PR).
  • Issue Title and PR Title should be identical, including the issue number.
  • Follow Contributing Guidelines & Code of Conduct before starting to contribute.

To be Mentioned while taking the issue:

  • Full name: Somaia Ahmed
  • GitHub Profile Link: https://github.com/somaiaahmed
  • Email ID:somaia.ahmed03@gmail.com
  • Participant ID (if applicable): [NA or mention if applicable]
  • Approach for this Project: Perform EDA, implement Random Forest, XGBoost, Neural Networks, and other models, compare their performance using metrics like accuracy, precision, and recall.
  • What is your participant role?: GSSoC'24 | Contributor

Happy Contributing! 🚀

All the best. Enjoy your open source journey ahead. 😎

Copy link

Thank you for creating this issue! We'll look into it as soon as possible. Your contributions are highly appreciated! 😊

@somaiaahmed
Copy link
Author

@abhisheks008 , 👋 Hey bro can you please assign me this issue under GSSoC'24 with an appropriate level tag

@Nidhi-Satyapriya
Copy link

@abhisheks008 , kindly assign this isssue to me with an appropriate level tag

@abhisheks008
Copy link
Owner

@abhisheks008 , 👋 Hey bro can you please assign me this issue under GSSoC'24 with an appropriate level tag

What are the models you are planning for this problem statement? Mention at least 3-4 models for this dataset.

@somaiaahmed
Copy link
Author

@abhisheks008 I'm planning to use Gradient Boosting Machines (GBM)

For tabular data like the one in this malware prediction challenge, tree-based ensemble methods (XGBoost, LightGBM, CatBoost) are often the most effective. These methods can handle the complexity and variability in the data well.

@abhisheks008
Copy link
Owner

@abhisheks008 I'm planning to use Gradient Boosting Machines (GBM)

For tabular data like the one in this malware prediction challenge, tree-based ensemble methods (XGBoost, LightGBM, CatBoost) are often the most effective. These methods can handle the complexity and variability in the data well.

Hi @somaiaahmed thanks for the approach. But this project repository demands deep learning models instead of machine learning models, hence can you please upgrade your approach and get back to this issue?

@somaiaahmed
Copy link
Author

@abhisheks008 ok i can build CNN model
plz assign it to me

@abhisheks008
Copy link
Owner

@abhisheks008 ok i can build CNN model plz assign it to me

Can you brief more on the planned the models? Only CNN will not work here as you need to implement at least 2-3 models for any project.

@Basma2423
Copy link
Contributor

Basma2423 commented Jun 26, 2024

@abhisheks008, I can start working on it, after making sure you approve my solution for the Micromobility-Lane-Recognition Issue

Full name: Basma Mahmoud
GitHub Profile Link: Basma2423
Email ID: mayarbasma2423@gmail.com

Approach for this Project:

  1. Data Loading and Preprocessing
  2. EDA
  3. Models:
    3.1 Multiple Deep Learning approaches suitable for tabular data, e.g: FNN, TabNet, and Entity Embeddings for Categorical Variables.
    3.2 Maybe some pre-trained models, e.g. Pretrained TabNet, PyCaret, and AutoGluon.
  4. Models Assessment.

What is your participant role? (Mention the Open Source program): GSSoC-2024 participant

Can you add the label for GSSoC, please?
Thanks.

@abhisheks008
Copy link
Owner

@abhisheks008, I can start working on it, after making sure you approve my solution for the Micromobility-Lane-Recognition Issue

Full name: Basma Mahmoud GitHub Profile Link: Basma2423 Email ID: mayarbasma2423@gmail.com

Approach for this Project:

  1. Data Loading and Preprocessing
  2. EDA
  3. Models:
    3.1 Multiple Deep Learning approaches suitable for tabular data, e.g: FNN, TabNet, and Entity Embeddings for Categorical Variables.
    3.2 Maybe some pre-trained models, e.g. Pretrained TabNet, PyCaret, and AutoGluon.
  4. Models Assessment.

What is your participant role? (Mention the Open Source program): GSSoC-2024 participant

Can you add the label for GSSoC, please? Thanks.

As this issue is raised by a contributor, I can't assign this to you

@Basma2423
Copy link
Contributor

@abhisheks008 no probs.

@abhisheks008 abhisheks008 added the Status: Up for Grabs Up for grabs issue. label Aug 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Up for Grabs Up for grabs issue.
Projects
None yet
Development

No branches or pull requests

4 participants