Hi, I am Kim Mathews.
Masters in Data Science graduate from Uppsala University
I'm a data-driven professional with a Master’s in Data Science from Uppsala University, specializing in data engineering, machine learning, and analytics. With over 7 years of experience in SQL, Python, and cloud-based data solutions, I have built scalable data pipelines, optimized database systems, and leveraged machine learning to extract valuable insights.
I am passionate about solving real-world challenges through data-driven decision-making. Whether it's designing robust ETL pipelines, developing predictive models, or building data visualizations, I enjoy translating complex datasets into actionable insights.
I'm an active investor and trader in the stock market, deeply fascinated by the intersection of machine learning and finance. I actively explore how AI-driven analytics can be applied to predictive modeling, risk assessment, and algorithmic trading to gain deeper market insights and enhance decision-making strategies.
Beyond finance, I'm passionate about uncovering hidden patterns in data and transforming raw information into actionable insights. I'm always looking for opportunities to leverage data to make informed decisions and drive impactful solutions.
Mail: kim.kmathews@gmail.com
Experience
Data Engineer & Python Developer
at IQVIA IMS Health Analytics, July 2021 - July 2022Designed and implemented ETL pipelines to automate data ingestion, transformation, and integration across multiple sources, improving data accessibility for analytics. Optimized SQL queries and indexing strategies for large-scale datasets, enhancing processing efficiency. Developed Python-based automation scripts to streamline workflows and reduce manual effort. Built interactive Power BI dashboards, transforming raw data into actionable insights for business decision-making.
- Python
- SQL
- ETL Pipelines
- Data Engineering
- Power BI
- Snowflake
- Data Warehousing
- Data Automation
- Exploratory Data Analysis
Oracle Database Developer & Python Data Engineer
at Tata Consultancy Services, April 2018 - July 2021Developed and maintained high-performance Oracle PL/SQL packages for data management in large-scale systems. Migrated data processes from PL/SQL to Python, improving efficiency and enabling advanced data processing techniques. Assisted in building data pipelines for machine learning workflows, ensuring structured and clean data for predictive modeling. Automated post-release validation checks, enhancing data integrity and model performance.
- Oracle PL/SQL
- Python
- Data Modeling
- Shell Scripting
- Machine Learning Pipelines
- ETL
- Feature Engineering
- Git
- Bitbucket
- Jira
- Agile Development
Oracle PL/SQL Developer and Oracle Database Admin
at Prudent Technologies Private Limited, Dec 2014 - April 2018Led database architecture design and optimization, implementing indexing, partitioning, and performance tuning techniques for large-scale datasets. Developed and managed PL/SQL procedures, functions, and triggers to automate critical data processes. Migrated and optimized Oracle databases, ensuring high availability and performance. Monitored database health, performing backups and fine-tuning queries to maintain system efficiency.
- Oracle PL/SQL
- Database Administration
- Database Design
- Performance Optimization
- Query Tuning
- Indexing
- Backup & Recovery
- Database Migration
- Shell Scripting
PROJECTS
Machine Learning model for best strategies in Corner Kick Taking in Football
Master Thesis
This project focused on analyzing football data to predict the outcome of corner kicks. By using advanced machine learning techniques, such as logistic regression and gradient boosting, the study aimed to identify key factors influencing shot and goal outcomes.
The goal was to provide valuable insights for teams to optimize their corner kick strategies.
- Football Analytics
- Machine Learning
- Python
- Data Analysis
- Scikit-Learn
- Statsmodel
ChatGPT Football Commentary
Project in Data Science
This project uses AI to create a more engaging and informative football commentary experience.
By analyzing game data, the system can provide real-time insights and commentary, going beyond basic play-by-play descriptions.
It aims to capture the excitement of live football and offer unique perspectives on the game.
- Football Analytics
- Prompt Engineering
- Data Analysis
- Data Visualization
- Clustering
Stock Market Predictor App - Streamlit
Personal Project
This project developed a stock price prediction app for the Indian stock market (NSE).
It utilized historical stock data and machine learning techniques, primarily regression models, to predict future price movements.
Key features included technical indicators, time-series analysis, and model evaluation metrics like RMSE, MAE, MAPE, and R-squared.
Future work aims to incorporate LSTM models for improved long-term predictions.
- Machine Learning
- Feature Engineering
- Time Series Analysis
- Model Evaluation
- Streamlit
- Quantitative Analysis (Quant)
Stock Portfolio Analysis & Optimization Tool
Personal Project
A Streamlit app to analyze and optimize stock portfolios using historical data from the Indian Stock Market (NSE).
It provides insights through risk metrics, historical performance, and portfolio optimization (maximizing Sharpe Ratio), with interactive visualizations like cumulative returns and efficient frontiers.
Offers a data-driven approach beyond traditional chart and fundamental analysis.
- Python
- Streamlit
- Plotly
- Portfolio Optimization
- Data Visualization
Data-Driven Football Player Analysis
Personal Project
This tool uses data science techniques to analyze football players. It compares players based on their key statistics, like goals, assists, and passing accuracy, to find players with similar playing styles.
This can be helpful for coaches, scouts, and fans who want to learn more about a player or find potential replacements.
The tool uses data from the top 5 European football leagues (England, Spain, Italy, France, and Germany) from the 2020-2024 seasons to provide comprehensive and insightful analysis.
Future plans include expanding to more leagues and exploring clustering approaches for further player analysis.
- Python
- Football Analytics
- Streamlit
- Data Science
- Data Visualization
Real-Time Twitter Sentiment Pipeline
Personal Project
This project builds a real-time data pipeline to process tweets using AWS services, analyzing sentiment and extracting hashtags from 10,000 tweets with a 50/50 sentiment split.
It leverages S3 for storage, Lambda for processing, DynamoDB for persistence, and includes sentiment analysis with plans for ML model integration and Flask deployment on EC2.
- AWS Lambda
- Amazon S3
- DynamoDB
- Python
- Data Pipeline
Movie Recommendation System with Hugging Face Embeddings
Personal Project
A modular movie recommendation system built with Python, Streamlit, and Hugging Face embeddings, using the MovieLens dataset.
It provides content-based movie recommendations based on tag similarity and user ratings, featuring a user-friendly interface with filtering options for ratings and release years.
- Hugging Face
- Sentence Transformers
- Content-Based Filtering
- Python
- Streamlit
- Data Preprocessing
LSTM Stock Market Prediction Tool
Personal Project
This is a personal project where I used LSTM models to experiment with classification and regression techniques to predict stock market movements and compare the results.
The app fetches historical data via yfinance, trains models, and provides interactive visualizations for performance analysis.
It supports both price direction classification and future price regression, with MLflow for experiment tracking, offering a robust tool for financial forecasting.
- Deep Learning
- Time Series Analysis
- Quantitative Analysis
- MLflow
- Streamlit
- LSTM
Github Analytic System using a Streaming Framework
Data Engineering Project
This project built a streaming framework to analyze data from GitHub and answer user queries.
It fetches data on repository updates, programming languages used, and development approaches (TDD/DevOps).
The system provides insights into popular languages, frequently updated repositories, and the correlation between languages and development practices.
- Apache Pulsar
- MongoDB
- Flask
- Statistical Analysis
- Data Visualization
Keyword mining of Reddit comments using Hadoop and Spark
Data Engineering Project
This project leverages the power of big data and machine learning to analyze Reddit comments.
By utilizing Apache Spark and Hadoop Distributed File System (HDFS), we efficiently processed and analyzed large-scale datasets.
TThrough experiments with different cluster configurations, we optimized performance and identified the optimal number of workers for efficient processing.
- Apache Spark
- Hadoop
- Cloud Computing
- Docker
- Data Visualization
Forest Cover Type Prediction Project
Personal Project
This project predicts forest cover types in Roosevelt National Forest using the Covertype Dataset, achieving up to 0.96 accuracy with a tuned XGBoost model.
It involves data preprocessing, class imbalance handling, and model comparison (Random Forest, XGBoost, Neural Network), with visualizations for EDA and evaluation.
- EDA
- Scikit-Learn
- XGBoost
- Tensorflow
- Data Visualization
Comparative Analysis of Apriori Implementations for Association Rule Mining in Retail Data
Data Mining Project
This project leverages data mining techniques to uncover patterns in customer purchasing behavior.
By analyzing a large dataset of online transactions, we employed the Apriori algorithm to identify frequent itemsets and association rules.
Key techniques such as support, confidence, and lift were used to evaluate the strength of these relationships.
- Data Mining
- Association Rule Mining
- Apriori Algorithm
- MLXTEND
- APYORI
Holiday Recommendation System Using Markov Chains
Artificial Intelligence Project
This project developed a holiday recommendation system using Markov Chains.
The system suggests personalized travel destinations based on user history, age, and spending habits.
The system analyzes user data to predict future preferences and recommends relevant destinations.
The effectiveness of the system was evaluated using a test dataset, and the results demonstrated promising accuracy in aligning recommendations with user behavior.
- Markov Chains
- Recommendation Systems
- Data Mining
- Data Analysis
IPO Performance Analysis in Indian Stock Market
Personal Project
This project aimed to predict the success of Initial Public Offerings (IPOs) in the Indian stock market using machine learning.
Key features analyzed included subscription data from Qualified Institutional Buyers (QIBs), High Net Worth Individuals (HNIs), and Retail Investors.
The project explored the potential of machine learning models, such as Gradient Boosting and Logistic Regression, to forecast whether an IPO would result in a positive listing gain. .
- Quant Finance
- Financial Data Analysis
- Data Analysis
- Machine Learning
Gender classification in Movie Screenplays using statistical machine learning methods
Statistical Machine Learning Project
This project analyzed a dataset of movie screenplays to investigate gender bias in Hollywood films.
Machine learning models, including logistic regression, discriminant analysis, KNN, and tree-based methods, were employed to predict the gender of lead characters.
The performance of these models was evaluated using metrics such as accuracy, precision, recall, and F1-score.
- Machine Learning
- Scikit-Learn
- Feature Engineering
- Model Evaluation
- Data Analysis
Skills
- Python
- SQL
- Oracle PL/SQL
- Machine Learning
- Pandas
- Seaborn
- Sci-kit Learn
- Tensorflow
- Pytorch
- Streamlit
- AWS
- Microsoft Azure
- Google Cloud Platform
- Data Pipelines
- ETL
- DBT
- Snowflake
- Spark
- Power BI
- Docker
- Kubernetes
- SQL Server
- Data Warehouse Concept
- Agile Methodologies
- Git