ID2223 Project Showcase 2024
This page showcases some of the Serverless ML Systems developed by the students in the Scalable ML and Deep Learning masters level course (ID2223 at KTH university). The main requirements for the project were to build a complete ML system that includes:
In practice, this projects followed the feature-training-inference pipeline architecture for building ML systems. The programs developed included most of the following:
Nearly all projects run on free serverless ML infrastructure in the cloud, although a couple of projects with custom UIs used nodejs and virtual machines on KTH’s free cloud service. Most of the projects used some variant of the following free serverless services such as Hopsworks, Modal, Github Actions, and Hugging Face.
Table of Contents
Stockholm Transit Authority Interactive Delay Predictor
DeLight - Flight Delay Predictor for Stockholm Arlanda
Sentiment analysis for stocks using articles from the last 7 days
World News Summary using Transformers
Surf Height Prediction on Huntington Beach
Finance Commodity Price Predictor
Predict risk for developing diabetes
Planet discovery - celestial object classification
Tag suggestions for audio files
Upvote predictor for HackerNews
Predict UK river flooding events
League of Legends Pre-game Win Predictor
Predict soccer player transfer prices
Sentiment Analysis for YouTube Comments
Music Genre Classifier (private KTH git repo)
Predict Whether you have Heart Disease
Swedish House Price Prediction (Hemnet)
Deal sourcing for NYC real-estate
Geolocalization of Photos in Stockholm
Time-series prediction with Prophet
Waiting time for Stockholm traffic incidents
Do you want to know if your red-line subway in Stockholm is delayed? Or do you want to know exactly where your bus in Stockholm is right now. If yes, this project is for you.
This project has a super cool interactive javascript UI map stockholm, where positions of buses, trains, trams, and the metro are updated every 3 seconds!
It uses the TrafikLab API and scrapes 20m readings per month (for free).
It updates features every 3 seconds! They wrote their own rust stream processing engine, as Spark streaming was too clunky, and used Redis as a message bus (easier operationally than Kafka). Predictions are made for delays on the red subway line using a LSTM.
See the UI here (sometimes it is down).
The LSTM architecture
Have you ever wanted an estimate of whether your flight will be delayed or not (and you will travel through Stockholm Arlanda)? Then use this ML system to estimate delays for your flight.
Batch ML System with Dashboard
The data sources are SMHI (weather), Swedavia (flight info) and Zyla API (historical flight info).
A feature pipeline scrapes data from the APIs and writes to Hopsworks. A model is trained with XGBoost from the feature store. A batch inference pipeline reads inference data from the feature store. A HuggingFace UI allows interactive queries, see below.
Do you invest in a stock and want to check in weekly on how people feel about it?
This interactive ML System with a Streamlit UI is built with Yahoo finance and News APIs. You enter a stock ticker, number of headlines, lookups articles for the previous 7 days. It uses a teacher student - model.
Do you want to find out what is happening in world news in a summary? Look no further.
This is an interactive ML System with a Gradio UI. It runs once per day and uses Bert to summarize new stores. This one is good under covers. pytorch, a classifier and the pretrained model.
Find your celebrity twin.
Upload images of your face and find the closest match in the databases. Uses a fine-tuned Resnet-50 with a dataset of 165 celebrities with 150x150 image sizes. Interactive ML System with Gradio UI Celebrity images database. \It logs user requests to the GCS feature store.
I found my celebrity twin - it’s Alan Ruck (Cameron from Ferris Bueller’s Day Off). Now, go find yours!
Do you, like me, like to surf? Then you want to know if there’s waves without travelling to the beach or learning how to read NOAA’s buoy forecasts.
This is a batch ML System with a Dashboard. It predicts the height of waves at Huntington beach using historical observations of wave height and NOAA’s buoy forecasts for the buoy just off the coast. It updates data daily in Hopsworks with a feature pipeline, trains models in a train pipeline, and has a batch inference pipeline for predictions.
How are my investments going to do today? I know the answer (because i don’t have any), but maybe you have some and want to know. Then find out here.
This is an interactive ML System with a Gradio UI and Dashboard. It updates the feature store with new data daily in the feature pipeline. You can get predictions for interest rates, gold, S&P 500 for up to 90 days. It predicts price with a LSTM.
Do you want to know your risk for developing diabetes using AI? If yes, read on…
This is an interactive ML System with a Gradio UI. It logs user requests and predictions to create new feature data. It has a UI with SHAP for explainability. And it has a monitoring UI. Nice, we think so.
Discover (and name) your own planet with AI!
This is both a batch ML System with a Dashboard (that predicts the type of celestial object for one new sample daily) as well as an interactive ML System with a Gradio UI. It is based on the
SDSS Data Release 17 dataset that is updated daily. The model was trained on 100k of samples and random forest outperformed MLP. In the interactive UI, you can enter details on the celestial object, and the system classifies it as a star or planet or galaxy.
Did you ever have a .ogg file hanging around and wonder what type of sound is in it? Then wait no more.
This is an interactive ML System with Gradio UI. You can upload a sound file and it will tell you the type of sound in it - like a thud, bang, etc. It translates from the sound’s spectrogram to a PNG and uses CNN for training and inference. It also runs daily pulling down a random sound and classifying it for your amusement.
Have you ever wanted to submit an article on HN that will make the front page? Then try this out before you submit that article to predict its likelihood of doing so.
It is an Interactive ML System with Gradio UI that has a feature pipeline that runs daily.
The prediction problem is hard due to imbalanced data with time-series properties. This project used a fine-tuned BERT model.
Have you ever made a movie and wanted an estimate of how much box office revenue it would pull in? If yes, this project is for you!
This is a batch ML System with a Dashboard. It predicts expected box office revenue for one new movie added each day. It uses many features from the TMDB dataset - budget, popularity of crew.
Do you live in a house in the UK at risk of flooding? If so, this project is for you!
It is a batch ML System with a Dashboard that uses the UK Govt Environment Data, updated daily.
Do you play league of legends and want to be a winner? If so, this project is for you!
This is a custom interactive ML System with a Javascript UI. It uses the public Riot Games API and updates data on-demand and daily.
Be an even better fantasy soccer player with this batch ML System and Dashboard.
It will help you predict the transfer price of players. It is based on data from Transfermarkt - but its data source is only updated infrequently.
Find out what the sentiment of comments is in your 15-minutes of fame youtube video.
This is an interactive ML System with a Gradio UI. It uses the Youtube API to find comments. It logs user requests as a feature pipeline. It uses sentiment Analysis for YouTube Comments. Teacher-student model - vader for sentiment. Hopsworks used with modal. Game reviews from Youtube. Good, but not excellent.
Did you ever have a song and you and your friend couldn’t agree if it was rap or rock (maybe ‘walk this way’ by RunDMC/Aerosmith)? Wait no more.
This is an interactive ML System with a Gradio UI. It uses the HF Music dataset. The training data - 8k tracks 30s long. It classifies music as one of: Electronic, Experimental, Folk, Hip-Hop, Instrumental, International, Pop, Rock.
This is an interactive ML System with a Gradio UI. The modeling is quite good in the project, and it is architected with a feature pipeline, training pipeline, and online inference pipeline.
It uses Hopsworks feature store and compares random forests, feedforward NNs, and linear regression. It includes a monitoring UI - see below.
This is a batch ML System with a Dashboard that does what it says on the tin. It uses the Smartcast weather API and the ML pipelines run daily.
Do you like eating hamburgers and fries? If yes, this is not the project for you!
Given a patient's information, this ML system predicts whether the patient has a risk of heart disease or not. This tool could allow the early detection of heart disease and allow patients to seek medical help early before any serious complications arise. Thus saving lives and improving the quality of life of patients.
It is an interactive ML System with a Gradio UI. The feature pipeline is the UI - it logs user requests to create new feature data for training. It includes a monitoring UI and SHAP for explainability. The modeling problem includes Imbalanced data and requires a weighted F1 score. It’s bootstrapped with 320k rows of training data.
Do you live in the San Fernando fault region? If yes, this project will help you sleep better.
It is an interactive ML System with a Gradio UI with a feature pipeline, training pipeline, and the UI as the online inference pipeline, as well as a monitoring UI to evaluate model performance. The dataset for this project was sourced from ANSS Comprehensive Earthquake Catalog, which is updated daily.
Have you wondered whether a certain place is a factory farm or not? Now you can use AI and a location (longitude, latitude) to determine if a building is likely a factory farm or not. But please, no graffiti.
This is an interactive ML System with a Gradio UI that uses CNNs for classification.
Tired of negative news? Get positive news only with this very nice project.
It is a Batch ML System that includes a pre-trained transformer for sentiment analysis and:
Will that volcano eruption cause your bus company in Iceland to pay fines for delays? If yes, read on.
This is a batch ML System with a Dashboard custom built with its own ML infrastructure. It uses the Icelandic public bus system API, and apparently the data isn’t that clean, so caveat emptor.
Are you like most middle-aged Swedish people and talk constantly about house prices? If yes, this is for you. Paste in the URL of a property you find on Hemnet, and a XGBoost model will estimate the true value of that property. It is an interactive ML System with a custom (beautiful) javascript UI. The data source is Hemnet, and a feature pipeline scrapes data daily. It scrapes nearly all house listings in Sweden from the Hemnet website. It provides a nice UI for getting an estimate of your house price with a strong XGBoost model behind it.
Comparison with Booli predictions
Have you ever found yourself frustrated when deciding which property to buy in New York? This ML system separates the wheat from the chaff for you - narrowing down potential candidate property deals from a pool of thousands available. It is an interactive ML system with a Gradio UI. It includes a feature pipeline and training pipeline and uses XGBoost.
Have you ever found yourself mysteriously transported to an unknown location in Stockholm, where your camera phone and Internet works, but your GPS does (those Russians!)? If yes, then you can now use this interactive ML System with a Gradio UI to take a photo of your location and it will tell you where you are in the city.
The dataset is Google Street View and feature pipelines run daily scraping new images for locations.
What is cool about this project is that it stores many photos for the same location (locations look different in summer, spring, autumn, and winter). So when you upload a photo of your location in Stockholm, it is very accurate at geolocalization.
Do you have a load of PDFs you would like to ask questions about their contents?
This project took all the slides from ID2223, put them in a google drive, and have a feature pipeline that chunks them, stores them in the feature store (after also generating question/answer pairs with a teacher model (GPT-4) for supervised fine-tuning). Then the chunks are indexed in a vector DB to be used by RAG. An interactive Gradio UI is included to allow you to ask questions - the answers include reference to the pdf document and the page number of the answer. This is an LLM Chatbot extraordinaire.
Do you lie awake at night wondering whether your bitcoins are making you happy or sad (the bitcoin market doesn’t close at night in contrast to the stock market)? If yes, this project is for you.
It is an interactive ML System with a Gradio UI and the data source is the Binance API. The feature pipeline runs daily, storing data in Hopsworks.
Do you want to know how much your gold chains are worth? If yes, this one's for you.
It predicts the highest daily gold price prediction with Hopsworks and Modal using Meta’s Prophet library for time-series prediction. It is an interactive ML System with a Gradio UI.
You just got stuck in traffic due to a crash near Stockholm. How should you expect to wait for the incident to be resolved?
This project uses the TomTom API to scrape data that it writes to the Hopsworks feature store.
The project is an interactive ML System with Gradio UI, showing incidents and predicted waiting time on a map.