๐๐ป๐ Data Science projects I have done in the past
About Data Science๐
๐๐ฌ๐งฎ๐ป Data science is an interdisciplinary field that uses scientific methods, processes, algorithms, and systems to extract knowledge and insights from noisy, structured, and unstructured data ๐๐๐๐. It is related to data mining, machine learning, and big data ๐ก๐ง ๐ง.
Data science is a "concept to unify statistics, data analysis, informatics, and their related methods" in order to "understand and analyze actual phenomena" with data ๐๐๐. It uses techniques and theories drawn from many fields within the context of mathematics, statistics, computer science, information science, and domain knowledge ๐๐งฎ๐ป.
However, data science is different from computer science and information science ๐ค. Turing Award winner Jim Gray imagined data science as a "fourth paradigm" of science (empirical, theoretical, computational, and now data-driven) and asserted that "everything about science is changing because of the impact of information technology" and the data deluge ๐๐ฅ๐.
๐๐ค๐ฎStock Market Price Predictor Using Machine Learning
A stock market price predictor ๐๐ฐ was developed using Streamlit ๐ for the frontend and Recurrent Neural Network (RNN) ๐๐ง for the backend.
The model was trained on the Dow Jones and other indexes and achieved a Mean Absolute Error (MAE) of 0.0015 and a Mean Squared Error (MSE) of 0.00005, indicating a high level of accuracy in predicting stock market prices. The project utilized Streamlit to create a user-friendly interface for the model and RNN to analyze and process historical stock market data, showcasing proficiency in developing interactive web-based applications and advanced machine learning techniques.
This project demonstrates my ability to use cutting-edge technologies such as RNN and Streamlit to develop a highly accurate stock market price predictor. ๐ช๐ผ๐
Credit Card Fraud Detector ๐ณ๐ซ
A credit card fraud detection system ๐ณ๐ต๏ธโโ๏ธ was developed by implementing and comparing various machine learning algorithms ๐ค such as Decision Tree ๐ณ, K-Nearest Neighbors ๐จโ๐ฉโ๐งโ๐ฆ, Logistic Regression ๐, Support Vector Machine ๐ค, Random Forest Classifier ๐ณ and XGBoost ๐.
The system achieved an accuracy of 0.9995 with K-Nearest Neighbors and 0.9991 with Logistic Regression.
The project utilized a variety of tools and libraries ๐ ๏ธ such as scikit-learn, XGBoost, NumPy, Matplotlib and termcolor to implement the project and visualize the results, showcasing proficiency in using popular data science libraries and tools to effectively build and evaluate credit card fraud detection systems.
This project demonstrates my ability to use various machine learning algorithms and data science tools to develop an accurate and efficient credit card fraud detection system. ๐ช
Fake News Detector๐ฐ
A fake news detection system was developed using ๐ป NLTK, ๐ TF-IDF vectorizer, and ๐ค Multinomial Naive Bayes algorithm. The system achieved high accuracy and efficiency in detecting fake news with an accuracy of 0.9591 on train data and 0.9511 on test data.
The project utilized various natural language processing techniques such as NLP to pre-process and clean the text data, and implemented vectorization and classification methods to train and test the model.
Additionally, a variety of tools including ๐ Matplotlib, ๐ข NumPy, and ๐ผ Pandas were used for handling and visualizing the CSV data, showcasing proficiency in data manipulation and visualization techniques.
This project demonstrates my ability to use advanced machine learning techniques and natural language processing tools to develop an efficient and accurate fake news detection system. ๐