Hello π
Iβm a Data Science student π at Aalto University expecting to graduate in Sep 2024. My journey in Data Science π started 4 years ago, where I initlially worked on Computer Vision models for biomedicine π, and later specialized in Online Learning algorithms βοΈ. Iβm always courious and very open to learn new Data Science technologies to broaden my experience in this field π€.
Projects
-
LightRiver β Master Thesis
Online Machine Learning in Rust
- Ported an Online Machine Learning model from Python to Rust, achieving a 97% reduction in execution time and attaining a processing capability of 10,000 records per second for IoT devices.
- Optimized memory usage to a minimal 20MB, facilitating deployment on robots with stringent memory limits and resulting in a 55% enhancement in positioning accuracy.
- Specialized implementation of Mondrian Forests, streamlining the model for real-time learning in performance-sensitive applications.
-
NYCTaxis
Big Data project with Apache Spark
- Analyzed taxi tipping behavior using Apache Spark to process large-scale datasets (67 GB of Yellow Taxi trip) to assess the impact of ride area, time, distance, and traffic on tipping.
- Employed Spark for data aggregation and analysis, generating insights into spatial and temporal patterns of taxi rides and their correlation with tipping trends.
- Visualized findings using Python libraries (Matplotlib and Seaborn), creating heatmaps and time-series graphs to depict variations in tipping across different NYC zones and times of the day.
-
MT-SMAC
Multi-Target Hyperparameter Optimization
- Implemented improvements to a multi-target Hyperparameter Optimization model, extending and enhancing the work of a PhD project.
- The project, known as MT-SMAC, focuses on the Sequential Model-based Algorithm Configuration (SMAC) for optimizing highly parameterized algorithms.
- Conducted empirical analysis using the YAHPO gym for benchmarking, providing insights into the efficiency and robustness of the model.
-
CourseMatch
NLP Project with Transformer Models
- Developed a Course Recommender System in a team of 3 students using a Transformer based Large Language Model (LLM).
- Utilized the BERT model from the HuggingFace library to generate embeddings for both the query and the corpus.
- Conducted comprehensive experiments on a custom-built database and evaluated the models using various similarity techniques.
- Recognized as the Best Project in the 2023 Natural Language Processing Course, surpassing 26 other projects.
-
SmartElevator
AI sytem to Analyze and make Fault prediction in your elevator
- Built Fault prevention system for elevators using Python and improving apon the Scikitβlearn simple methods.
- Ranked 3rd for the best project out of 10 teams.
- Graded 29/30 in AI for Innovation master course.
-
Medical Imaging
FBK Research Project on Bioimaging
- Engaged in a medical image segmentation project utilizing advanced bioimaging techniques.
- Analyzed 3-D structures to extract information about the progression of specific illnesses in individual patients.
- Implemented the project using a deep neural network in PyTorch, with data analysis performed using pandas and Scikit-learn.
-
Nowcasting β Bachelor Thesis
Comparative Analysis of Prediction Models for Short-Term Forecasting
β’ Trained Image-based time series forecasting system to predict extreme precipitation to alert Italian civil protection together with a researcher.
β’ Implemented large-scale data collection in PostgreSQL, developed using Python Telegram APIs.
β’ Deployed using Azure Functions using Docker containerization system, tested using GitLab CI. -
Generate link to edit events in Google Calendar
After downloading an ICS file from Google Calendar, how do I generate the URL to edit a specific event? If thatβs the question you are asking, this is the right article for you!
-
Calendar Analyzer
Interactive dashboard for Google Calendar Event Analysis
β’ Analyzed time-series data gathered from Google Calendar using Python and pandas for data frame manipulation
β’ Implemented Stramlit dashboard in Python using Altair data visualization framework
β’ Deployed in a Docker container using continuous deployment in Github Actions on a raspberry using the Linux ecosystem -
DeepGalaxies
ML Course Project on Galaxy Classification
- Designed and implemented an image recognition system for galaxies classification.
- Conducted comparisons with state-of-the-art methods using feature extraction and dimensionality reduction techniques.
- Implemented the network using the PyTorch framework, with NumPy for data manipulation and seaborn for data visualization.
-
-
-
-
-
-
-
-
-
-