A comprehensive collection of data science projects showcasing exploratory data analysis, predictive modeling, and machine learning techniques. This portfolio includes three distinct projects with end-to-end analysis from data preprocessing to model evaluation.
Project File: Project_Covid_19_Analysis.ipynb
An exploratory data analysis of COVID-19 confirmed cases and deaths using the John Hopkins dataset. This project involves:
- Data aggregation and preprocessing
- Time-series visualization of confirmed cases by country
- Statistical analysis and trend identification
- Global and country-specific insights
Dataset:
covid19_Confirmed_dataset.csv- Confirmed cases datacovid19_deaths_dataset.csv- Deaths data
Key Techniques: EDA, Data Aggregation, Time-Series Analysis, Data Visualization
Project File: Project_Rainfall_Prediction.ipynb
A predictive modeling project using linear regression to forecast rainfall in Austin, Texas. This project demonstrates:
- Weather data preprocessing and feature engineering
- Linear regression model development
- Model performance evaluation
- Weather pattern analysis
Dataset: austin_weather.csv - Austin weather data with historical measurements
Key Techniques: Linear Regression, Feature Engineering, Model Evaluation, Data Cleaning
Project File: Project_Tumor_Detection (1).ipynb
A machine learning classification project for tumor detection with comprehensive data preprocessing and exploratory analysis. This project includes:
- Data preprocessing and feature engineering
- Exploratory data analysis
- Classification model development
- Model evaluation and performance metrics
Dataset: Tumor_Detection.csv - Tumor detection dataset with medical indicators
Key Techniques: Classification, Data Preprocessing, EDA, Feature Selection, Model Evaluation
DataScienceProject/
βββ README.md # Project documentation
βββ app1.py # Streamlit web application
βββ requirements.txt # Python dependencies
β
βββ Project_Covid_19_Analysis.ipynb # COVID-19 analysis notebook
βββ Project_Rainfall_Prediction.ipynb # Rainfall prediction notebook
βββ Project_Tumor_Detection (1).ipynb # Tumor detection notebook
β
βββ covid19_Confirmed_dataset.csv # COVID-19 confirmed cases data
βββ covid19_deaths_dataset.csv # COVID-19 deaths data
βββ austin_weather.csv # Austin weather data
βββ Tumor_Detection.csv # Tumor detection data
βββ worldwide_happiness_report.csv # Additional dataset
- Python 3.x
- Data Processing: pandas, NumPy
- Visualization: Matplotlib, Seaborn
- Machine Learning: scikit-learn
- Web Framework: Streamlit
- Jupyter Notebook
- Python 3.7 or higher
- pip or conda package manager
cd DataScienceProjectpip install -r requirements.txt# Run individual notebooks
jupyter notebook Project_Covid_19_Analysis.ipynb
jupyter notebook Project_Rainfall_Prediction.ipynb
jupyter notebook "Project_Tumor_Detection (1).ipynb"streamlit run app1.pyThis launches an interactive web application showcasing all three projects with:
- Project selection via sidebar navigation
- Interactive data exploration
- Real-time visualizations
- Multi-select country comparison (COVID-19)
Access all three projects through our unified web application:
This single application provides:
- β Interactive navigation between all 3 projects
- β Real-time data visualizations
- β Multi-select country comparison (COVID-19)
- β Rainfall predictions and weather analysis
- β Tumor detection analysis and insights
- β Full exploratory data analysis
- β Global case tracking
- β Country-wise comparison
- β Time-series trends
- β Death statistics
- β Interactive filtering
- β Weather data preprocessing
- β Linear regression modeling
- β Performance metrics visualization
- β Weather pattern analysis
- β Prediction accuracy evaluation
- β Medical data preprocessing
- β Feature correlation analysis
- β Classification model development
- β Performance evaluation
- β Data insights visualization
- Start with individual Jupyter notebooks to understand the analysis step-by-step
- Review comments and documentation in each cell
- Experiment with parameters and visualizations
- Set up a Streamlit Cloud account
- Connect your GitHub repository
- Deploy
app1.pyfor an interactive dashboard - Share the deployed link in the Demo section above
- Update datasets as needed
- Modify parameters in notebooks or
app1.py - Add new visualizations or analysis
- Create additional project notebooks following the same structure
- COVID-19: Line charts showing case progression, statistical summaries, data tables
- Rainfall: Regression plots, performance metrics, weather insights
- Tumor Detection: Classification reports, feature importance, confusion matrices
Feel free to:
- Fork and enhance these projects
- Add new analysis or models
- Improve visualizations
- Expand datasets
- Create additional notebooks
- Ensure all CSV files are in the same directory as the Jupyter notebooks and
app1.py - For Streamlit app, run from the project root directory
- Update file paths if reorganizing project structure
- Check that all dependencies are installed before running
For questions or issues:
- Review notebook comments and documentation
- Check dataset descriptions in respective notebooks
- Refer to library documentation (Pandas, Scikit-learn, Streamlit)
This project is open for educational and personal use. Feel free to use and modify these projects for learning purposes.
Last Updated: June 2024
Happy Analyzing! π