ID2223 Project Showcase 2024

This page showcases some of the Serverless ML Systems developed by the students in the Scalable ML and Deep Learning masters level course (ID2223 at KTH university). The main requirements for the project were to build a complete ML system that includes:

In practice, this projects followed the feature-training-inference pipeline architecture for building ML systems. The programs developed included most of the following:

Nearly all projects run on free serverless ML infrastructure in the cloud, although a couple of projects with custom UIs used nodejs and virtual machines on KTH’s free cloud service. Most of the projects used some variant of the following free serverless services such as Hopsworks, Modal, Github Actions, and Hugging Face.

Table of Contents

Stockholm Transit Authority Interactive Delay Predictor

DeLight - Flight Delay Predictor for Stockholm Arlanda

Sentiment analysis for stocks using articles from the last 7 days

World News Summary using Transformers

(Celebrity) Twin Matcher

Surf Height Prediction on Huntington Beach

Finance Commodity Price Predictor

Predict risk for developing diabetes

Planet discovery - celestial object classification

Tag suggestions for audio files

Upvote predictor for HackerNews

Box office revenue prediction

Predict UK river flooding events

League of Legends Pre-game Win Predictor

Predict soccer player transfer prices

Sentiment Analysis for YouTube Comments

Music Genre Classifier (private KTH git repo)

Flight delay predictor for US

Weather forecast

Predict Whether you have Heart Disease

Earthquake Predictor

Factory Farms Locator

Positive News Daily

Iceland Bus Delay Predictions

Swedish House Price Prediction (Hemnet)

Deal sourcing for NYC real-estate

Geolocalization of Photos in Stockholm

Query your PDFs from an LLM

BTC Price Prediction

Time-series prediction with Prophet

Waiting time for Stockholm traffic incidents

Stockholm Transit Authority Interactive Delay Predictor

Do you want to know if your red-line subway in Stockholm is delayed? Or do you want to know exactly where your bus in Stockholm is right now. If yes, this project is for you.

This project has a super cool interactive javascript UI map stockholm, where positions of buses, trains, trams, and the metro are updated every 3 seconds!

It uses the TrafikLab API and scrapes 20m readings per month (for free).

It updates features every 3 seconds! They wrote their own rust stream processing engine, as Spark streaming was too clunky, and used Redis as a message bus (easier operationally than Kafka). Predictions are made for delays on the red subway line using a LSTM.

See the UI here (sometimes it is down).

The LSTM architecture

DeLight - Flight Delay Predictor for Stockholm Arlanda

Have you ever wanted an estimate of whether your flight will be delayed or not (and you will travel through Stockholm Arlanda)? Then use this ML system to estimate delays for your flight.

Batch ML System with Dashboard

The data sources are SMHI (weather), Swedavia (flight info) and Zyla API (historical flight info).

A feature pipeline scrapes data from the APIs and writes to Hopsworks. A model is trained with XGBoost from the feature store. A batch inference pipeline reads inference data from the feature store. A HuggingFace UI allows interactive queries, see below.

Sentiment analysis for stocks using articles from the last 7 days

Do you invest in a stock and want to check in weekly on how people feel about it?

This interactive ML System with a Streamlit UI is built with Yahoo finance and News APIs. You enter a stock ticker, number of headlines, lookups articles for the previous 7 days. It uses a teacher student - model.

World News Summary using Transformers

Do you want to find out what is happening in world news in a summary? Look no further.

This is an interactive ML System with a Gradio UI. It runs once per day and uses Bert to summarize new stores. This one is good under covers. pytorch, a classifier and the pretrained model.

(Celebrity) Twin Matcher

Find your celebrity twin.

Upload images of your face and find the closest match in the databases. Uses a fine-tuned Resnet-50 with a dataset of 165 celebrities with 150x150 image sizes. Interactive ML System with Gradio UI Celebrity images database. \It logs user requests to the GCS feature store.

I found my celebrity twin - it’s Alan Ruck (Cameron from Ferris Bueller’s Day Off). Now, go find yours!

   

Surf Height Prediction on Huntington Beach

Do you, like me, like to surf? Then you want to know if there’s waves without travelling to the beach or learning how to read NOAA’s buoy forecasts.

This is a batch ML System with a Dashboard. It predicts the height of waves at Huntington beach using historical observations of wave height and NOAA’s buoy forecasts for the buoy just off the coast. It updates data daily in Hopsworks with a feature pipeline, trains models in a train pipeline, and has a batch inference pipeline for predictions.

Finance Commodity Price Predictor

How are my investments going to do today? I know the answer (because i don’t have any), but maybe you have some and want to know. Then find out here.

This is an interactive ML System with a Gradio UI and Dashboard. It updates the feature store with new data daily in the feature pipeline. You can get predictions for interest rates, gold, S&P 500 for up to 90 days. It predicts price with a LSTM.

Predict risk for developing diabetes

Do you want to know your risk for developing diabetes using AI? If yes, read on…

This is an interactive ML System with a Gradio UI. It logs user requests and predictions to create new feature data. It has a UI with SHAP for explainability. And it has a monitoring UI. Nice, we think so.

Planet discovery - celestial object classification

Discover (and name) your own planet with AI!  

This is both a batch ML System with a Dashboard (that predicts the type of celestial object for one new sample daily) as well as an interactive ML System with a Gradio UI. It is based on the
SDSS Data Release 17 dataset that is updated daily. The model was trained on 100k of samples and random forest outperformed MLP. In the interactive UI, you can enter details on the celestial object, and the system classifies it as a star or planet or galaxy.

Tag suggestions for audio files

Did you ever have a .ogg file hanging around and wonder what type of sound is in it? Then wait no more.

This is an interactive ML System with Gradio UI. You can upload a sound file and it will tell you the type of sound in it - like a thud, bang, etc. It translates from the sound’s spectrogram to a PNG and uses CNN for training and inference. It also runs daily pulling down a random sound and classifying it for your amusement.

Upvote predictor for HackerNews

Have you ever wanted to submit an article on HN that will make the front page? Then try this out before you submit that article to predict its likelihood of doing so.

It is an Interactive ML System with Gradio UI that has a feature pipeline that runs daily.

The prediction problem is hard due to imbalanced data with time-series properties. This project used a fine-tuned BERT model.

Box office revenue prediction

Have you ever made a movie and wanted an estimate of how much box office revenue it would pull in? If yes, this project is for you!

This is a batch ML System with a Dashboard. It predicts expected box office revenue for one new movie added each day. It uses many features from the TMDB dataset - budget, popularity of crew.

Predict UK river flooding events

Do you live in a house in the UK at risk of flooding? If so, this project is for you!

It is a batch ML System with a Dashboard that uses the UK Govt Environment Data, updated daily.

League of Legends Pre-game Win Predictor

Do you play league of legends and want to be a winner?  If so, this project is for you!

This is a custom interactive ML System with a Javascript UI. It uses the public Riot Games API and updates data on-demand and daily.

Predict soccer player transfer prices

Be an even better fantasy soccer player with this batch ML System and Dashboard.

It will help you predict the transfer price of players. It is based on data from Transfermarkt - but its data source is only updated infrequently.

Sentiment Analysis for YouTube Comments

Find out what the sentiment of comments is in your 15-minutes of fame youtube video.

This is an interactive ML System with a Gradio UI. It uses the Youtube API to find comments. It logs user requests as a feature pipeline. It uses sentiment Analysis for YouTube Comments. Teacher-student model - vader for sentiment. Hopsworks used with modal. Game reviews from Youtube. Good, but not excellent.

Music Genre Classifier (private KTH git repo)

Did you ever have a song and you and your friend couldn’t agree if it was rap or rock (maybe ‘walk this way’ by RunDMC/Aerosmith)? Wait no more.

This is an interactive ML System with a Gradio UI. It uses the HF Music dataset. The training data - 8k tracks 30s long. It classifies music as one of: Electronic, Experimental, Folk, Hip-Hop, Instrumental, International, Pop, Rock.

Flight delay predictor for US

This is an interactive ML System with a Gradio UI. The modeling is quite good in the project, and it is architected with a feature pipeline, training pipeline, and online inference pipeline.

It uses Hopsworks feature store and compares random forests, feedforward NNs, and linear regression. It includes a monitoring UI - see below.

Weather forecast

This is a batch ML System with a Dashboard that does what it says on the tin. It uses the Smartcast weather API and the ML pipelines run daily.

Predict Whether you have Heart Disease

Do you like eating hamburgers and fries? If yes, this is not the project for you!

Given a patient's information, this ML system predicts whether the patient has a risk of heart disease or not. This tool could allow the early detection of heart disease and allow patients to seek medical help early before any serious complications arise. Thus saving lives and improving the quality of life of patients.

It is an interactive ML System with a Gradio UI. The feature pipeline is the UI  - it logs user requests to create new feature data for training. It includes a monitoring UI and SHAP for explainability. The modeling problem includes Imbalanced data and requires a weighted F1 score. It’s bootstrapped with 320k rows of training data.

Earthquake Predictor

Do you live in the San Fernando fault region? If yes, this project will help you sleep better.

It is an interactive ML System with a Gradio UI with a feature pipeline, training pipeline, and the UI as the online inference pipeline, as well as a monitoring UI to evaluate model performance. The dataset for this project was sourced from ANSS Comprehensive Earthquake Catalog, which is updated daily.

Factory Farms Locator

Have you wondered whether a certain place is a factory farm or not? Now you can use AI and a location (longitude, latitude) to determine if a building is likely a factory farm or not. But please, no graffiti.

This is an interactive ML System with a Gradio UI that uses CNNs for classification.

Positive News Daily

Tired of negative news? Get positive news only with this very nice project.

It is a Batch ML System that includes a pre-trained transformer for sentiment analysis and:

Iceland Bus Delay Predictions

Will that volcano eruption cause your bus company in Iceland to pay fines for delays? If yes, read on.

This is a batch ML System with a Dashboard custom built with its own ML infrastructure. It uses the Icelandic public bus system API, and apparently the data isn’t that clean, so caveat emptor.

Swedish House Price Prediction (Hemnet)

Are you like most middle-aged Swedish people and talk constantly about house prices? If yes, this is for you. Paste in the URL of a property you find on Hemnet, and a XGBoost model will estimate the true value of that property. It is an interactive ML System with a custom (beautiful) javascript UI. The data source is Hemnet, and a feature pipeline scrapes data daily. It scrapes nearly all house listings in Sweden from the Hemnet website. It provides a nice UI for getting an estimate of your house price with a strong XGBoost model behind it.

Comparison with Booli predictions

Deal sourcing for NYC real-estate

Have you ever found yourself frustrated when deciding which property to buy in New York? This ML system separates the wheat from the chaff for you - narrowing down potential candidate property deals from a pool of thousands available. It is an interactive ML system with a Gradio UI. It includes a feature pipeline and training pipeline and uses XGBoost.

Geolocalization of Photos in Stockholm

Have you ever found yourself mysteriously transported to an unknown location in Stockholm, where your camera phone and Internet works, but your GPS does (those Russians!)? If yes, then you can now use this interactive ML System with a Gradio UI to take a photo of your location and it will tell you where you are in the city.

The dataset is Google Street View and feature pipelines run daily scraping new images for locations.

What is cool about this project is that it stores many photos for the same location (locations look different in summer, spring, autumn, and winter). So when you upload a photo of your location in Stockholm, it is very accurate at geolocalization.

Query your PDFs from an LLM

Do you have a load of PDFs you would like to ask questions about their contents?

This project took all the slides from ID2223, put them in a google drive, and have a feature pipeline that chunks them, stores them in the feature store (after also generating question/answer pairs with a teacher model (GPT-4) for supervised fine-tuning). Then the chunks are indexed in a vector DB to be used by RAG. An interactive Gradio UI is included to allow you to ask questions - the answers include reference to the pdf document and the page number of the answer. This is an LLM Chatbot extraordinaire.

BTC Price Prediction

Do you lie awake at night wondering whether your bitcoins are making you happy or sad (the bitcoin market doesn’t close at night in contrast to the stock market)? If yes, this project is for you.

It is an interactive ML System with a Gradio UI and the data source is the Binance API. The feature pipeline runs daily, storing data in Hopsworks.

 

Time-series prediction with Prophet

Do you want to know how much your gold chains are worth? If yes, this one's for you.

It predicts the highest daily gold price prediction with Hopsworks and Modal using Meta’s Prophet library for time-series prediction. It is an interactive ML System with a Gradio UI.

Waiting time for Stockholm traffic incidents

You just got stuck in traffic due to a crash near Stockholm. How should you expect to wait for the incident to be resolved?

This project uses the TomTom API to scrape data that it writes to the Hopsworks feature store.

The project is an interactive ML System with Gradio UI, showing incidents and predicted waiting time on a map.