Hi, I am Kim Mathews.

Masters in Data Science graduate from Uppsala University

I'm a data-driven professional with a Master’s in Data Science from Uppsala University, specializing in data engineering, machine learning, and analytics. With over 7 years of experience in SQL, Python, and cloud-based data solutions, I have built scalable data pipelines, optimized database systems, and leveraged machine learning to extract valuable insights.

I am passionate about solving real-world challenges through data-driven decision-making. Whether it's designing robust ETL pipelines, developing predictive models, or building data visualizations, I enjoy translating complex datasets into actionable insights.

I'm an active investor and trader in the stock market, deeply fascinated by the intersection of machine learning and finance. I actively explore how AI-driven analytics can be applied to predictive modeling, risk assessment, and algorithmic trading to gain deeper market insights and enhance decision-making strategies.

Beyond finance, I'm passionate about uncovering hidden patterns in data and transforming raw information into actionable insights. I'm always looking for opportunities to leverage data to make informed decisions and drive impactful solutions.

Mail: kim.kmathews@gmail.com

Experience

Data Engineer & Python Developer

at IQVIA IMS Health Analytics, July 2021 - July 2022

Designed and implemented ETL pipelines to automate data ingestion, transformation, and integration across multiple sources, improving data accessibility for analytics. Optimized SQL queries and indexing strategies for large-scale datasets, enhancing processing efficiency. Developed Python-based automation scripts to streamline workflows and reduce manual effort. Built interactive Power BI dashboards, transforming raw data into actionable insights for business decision-making.

Python
SQL
ETL Pipelines
Data Engineering
Power BI
Snowflake
Data Warehousing
Data Automation
Exploratory Data Analysis

Oracle Database Developer & Python Data Engineer

at Tata Consultancy Services, April 2018 - July 2021

Developed and maintained high-performance Oracle PL/SQL packages for data management in large-scale systems. Migrated data processes from PL/SQL to Python, improving efficiency and enabling advanced data processing techniques. Assisted in building data pipelines for machine learning workflows, ensuring structured and clean data for predictive modeling. Automated post-release validation checks, enhancing data integrity and model performance.

Oracle PL/SQL
Python
Data Modeling
Shell Scripting
Machine Learning Pipelines
ETL
Feature Engineering
Git
Bitbucket
Jira
Agile Development

Oracle PL/SQL Developer and Oracle Database Admin

at Prudent Technologies Private Limited, Dec 2014 - April 2018

Led database architecture design and optimization, implementing indexing, partitioning, and performance tuning techniques for large-scale datasets. Developed and managed PL/SQL procedures, functions, and triggers to automate critical data processes. Migrated and optimized Oracle databases, ensuring high availability and performance. Monitored database health, performing backups and fine-tuning queries to maintain system efficiency.

Oracle PL/SQL
Database Administration
Database Design
Performance Optimization
Query Tuning
Indexing
Backup & Recovery
Database Migration
Shell Scripting

PROJECTS

Machine Learning model for best strategies in Corner Kick Taking in Football

Master Thesis

This project focused on analyzing football data to predict the outcome of corner kicks. By using advanced machine learning techniques, such as logistic regression and gradient boosting, the study aimed to identify key factors influencing shot and goal outcomes.
The goal was to provide valuable insights for teams to optimize their corner kick strategies.

Football Analytics
Machine Learning
Python
Data Analysis
Scikit-Learn
Statsmodel

ChatGPT Football Commentary

Project in Data Science

This project uses AI to create a more engaging and informative football commentary experience.
By analyzing game data, the system can provide real-time insights and commentary, going beyond basic play-by-play descriptions.
It aims to capture the excitement of live football and offer unique perspectives on the game.

Football Analytics
Prompt Engineering
Data Analysis
Data Visualization
Clustering

Stock Market Predictor App - Streamlit

Personal Project

This project developed a stock price prediction app for the Indian stock market (NSE).
It utilized historical stock data and machine learning techniques, primarily regression models, to predict future price movements.
Key features included technical indicators, time-series analysis, and model evaluation metrics like RMSE, MAE, MAPE, and R-squared.
Future work aims to incorporate LSTM models for improved long-term predictions.

Machine Learning
Feature Engineering
Time Series Analysis
Model Evaluation
Streamlit
Quantitative Analysis (Quant)

Stock Portfolio Analysis & Optimization Tool

Personal Project

A Streamlit app to analyze and optimize stock portfolios using historical data from the Indian Stock Market (NSE).
It provides insights through risk metrics, historical performance, and portfolio optimization (maximizing Sharpe Ratio), with interactive visualizations like cumulative returns and efficient frontiers.
Offers a data-driven approach beyond traditional chart and fundamental analysis.

Python
Streamlit
Plotly
Portfolio Optimization
Data Visualization

Data-Driven Football Player Analysis

Personal Project

This tool uses data science techniques to analyze football players. It compares players based on their key statistics, like goals, assists, and passing accuracy, to find players with similar playing styles.
This can be helpful for coaches, scouts, and fans who want to learn more about a player or find potential replacements.
The tool uses data from the top 5 European football leagues (England, Spain, Italy, France, and Germany) from the 2020-2024 seasons to provide comprehensive and insightful analysis.
Future plans include expanding to more leagues and exploring clustering approaches for further player analysis.

Python
Football Analytics
Streamlit
Data Science
Data Visualization

Real-Time Twitter Sentiment Pipeline

Personal Project

This project builds a real-time data pipeline to process tweets using AWS services, analyzing sentiment and extracting hashtags from 10,000 tweets with a 50/50 sentiment split.
It leverages S3 for storage, Lambda for processing, DynamoDB for persistence, and includes sentiment analysis with plans for ML model integration and Flask deployment on EC2.

AWS Lambda
Amazon S3
DynamoDB
Python
Data Pipeline

Movie Recommendation System with Hugging Face Embeddings

Personal Project

A modular movie recommendation system built with Python, Streamlit, and Hugging Face embeddings, using the MovieLens dataset.
It provides content-based movie recommendations based on tag similarity and user ratings, featuring a user-friendly interface with filtering options for ratings and release years.

Hugging Face
Sentence Transformers
Content-Based Filtering
Python
Streamlit
Data Preprocessing

LSTM Stock Market Prediction Tool

Personal Project

This is a personal project where I used LSTM models to experiment with classification and regression techniques to predict stock market movements and compare the results.
The app fetches historical data via yfinance, trains models, and provides interactive visualizations for performance analysis.
It supports both price direction classification and future price regression, with MLflow for experiment tracking, offering a robust tool for financial forecasting.

Deep Learning
Time Series Analysis
Quantitative Analysis
MLflow
Streamlit
LSTM

Github Analytic System using a Streaming Framework

Data Engineering Project

This project built a streaming framework to analyze data from GitHub and answer user queries.
It fetches data on repository updates, programming languages used, and development approaches (TDD/DevOps).
The system provides insights into popular languages, frequently updated repositories, and the correlation between languages and development practices.

Apache Pulsar
MongoDB
Flask
Statistical Analysis
Data Visualization

Keyword mining of Reddit comments using Hadoop and Spark

Data Engineering Project

This project leverages the power of big data and machine learning to analyze Reddit comments.
By utilizing Apache Spark and Hadoop Distributed File System (HDFS), we efficiently processed and analyzed large-scale datasets.
TThrough experiments with different cluster configurations, we optimized performance and identified the optimal number of workers for efficient processing.

Apache Spark
Hadoop
Cloud Computing
Docker
Data Visualization

Forest Cover Type Prediction Project

Personal Project

This project predicts forest cover types in Roosevelt National Forest using the Covertype Dataset, achieving up to 0.96 accuracy with a tuned XGBoost model.
It involves data preprocessing, class imbalance handling, and model comparison (Random Forest, XGBoost, Neural Network), with visualizations for EDA and evaluation.

EDA
Scikit-Learn
XGBoost
Tensorflow
Data Visualization

Comparative Analysis of Apriori Implementations for Association Rule Mining in Retail Data

Data Mining Project

This project leverages data mining techniques to uncover patterns in customer purchasing behavior.
By analyzing a large dataset of online transactions, we employed the Apriori algorithm to identify frequent itemsets and association rules.
Key techniques such as support, confidence, and lift were used to evaluate the strength of these relationships.

Data Mining
Association Rule Mining
Apriori Algorithm
MLXTEND
APYORI

Holiday Recommendation System Using Markov Chains

Artificial Intelligence Project

This project developed a holiday recommendation system using Markov Chains.
The system suggests personalized travel destinations based on user history, age, and spending habits.
The system analyzes user data to predict future preferences and recommends relevant destinations.
The effectiveness of the system was evaluated using a test dataset, and the results demonstrated promising accuracy in aligning recommendations with user behavior.

Markov Chains
Recommendation Systems
Data Mining
Data Analysis

IPO Performance Analysis in Indian Stock Market

Personal Project

This project aimed to predict the success of Initial Public Offerings (IPOs) in the Indian stock market using machine learning.
Key features analyzed included subscription data from Qualified Institutional Buyers (QIBs), High Net Worth Individuals (HNIs), and Retail Investors.
The project explored the potential of machine learning models, such as Gradient Boosting and Logistic Regression, to forecast whether an IPO would result in a positive listing gain. .

Quant Finance
Financial Data Analysis
Data Analysis
Machine Learning

Gender classification in Movie Screenplays using statistical machine learning methods

Statistical Machine Learning Project

This project analyzed a dataset of movie screenplays to investigate gender bias in Hollywood films.
Machine learning models, including logistic regression, discriminant analysis, KNN, and tree-based methods, were employed to predict the gender of lead characters.
The performance of these models was evaluated using metrics such as accuracy, precision, recall, and F1-score.

Machine Learning
Scikit-Learn
Feature Engineering
Model Evaluation
Data Analysis

Skills

Python
SQL
Oracle PL/SQL
Machine Learning
Pandas
Seaborn
Sci-kit Learn
Tensorflow
Pytorch
Streamlit
AWS
Microsoft Azure
Google Cloud Platform
Data Pipelines
ETL
DBT
Snowflake
Spark
Power BI
Docker
Kubernetes
SQL Server
Data Warehouse Concept
Agile Methodologies
Git