Hi, I am Kim Mathews.
Gen AI Lead | Master's in Data Science, Uppsala University
I am a results-driven Generative AI Engineer and Lead with over 8 years of experience spanning data engineering,
machine learning, and production Gen AI systems. Currently, I serve as the Gen AI Lead at Focaloid Technologies,
where I specialize in designing multi-agent LLM systems, agentic workflows, and RAG-based retrieval pipelines.
My technical stack heavily features tools like LangChain, LangGraph, and DeepAgents to build context-aware,
scalable AI solutions.
Backed by a Master's in Data Science from Uppsala University, I combine a strong foundation in scalable data
engineering, cloud platforms, and Python with a proven track record of leading development initiatives and
delivering end-to-end intelligent systems.
I am an active investor and trader in the Indian stock market, deeply fascinated by the intersection of
artificial intelligence and finance. To bridge these domains, I develop agentic AI solutions tailored for
comprehensive stock market analysis, automated screening, and intelligent reporting. By leveraging advanced
frameworks like LangChain and LangGraph to process live market data. I actively explore how multi-agent
workflows can enhance predictive modeling, risk assessment, and algorithmic trading.
Beyond the financial markets, my core passion lies in uncovering hidden patterns within complex datasets and
transforming raw information into actionable intelligence. Whether I am architecting advanced conversational
agents or engineering scalable data pipelines, I am constantly seeking opportunities to leverage data-driven AI
to make informed decisions and build impactful, real-world solutions.
Mail: kim.kmathews@gmail.com
Experience
Gen AI Lead
at Focaloid Technologies, Sept 2025 - Till dateBuilt production multi-agent Gen AI systems for research automation and risk monitoring using LangChain, LangGraph, and DeepAgents. Developed conversational research agents featuring persistent state and context-aware RAG retrieval. Consolidated fragmented monitoring into unified intelligence pipelines, reducing infrastructure by 50% and boosting efficiency by 30%. Led development initiatives and enforced engineering standards to drive agile delivery and maintain system momentum.
- LangChain
- LangGraph
- DeepAgents
- Multi-Agent Systems
- Prompt Engineering
- Retrieval-Augmented Generation (RAG)
- LangSmith
- Claude SDK
Data Engineer & Python Developer
at IQVIA IMS Health Analytics, July 2021 - July 2022Designed and implemented ETL pipelines to automate data ingestion, transformation, and integration across multiple sources, improving data accessibility for analytics. Optimized SQL queries and indexing strategies for large-scale datasets, enhancing processing efficiency. Developed Python-based automation scripts to streamline workflows and reduce manual effort. Built interactive Power BI dashboards, transforming raw data into actionable insights for business decision-making.
- Python
- SQL
- ETL Pipelines
- Data Engineering
- Power BI
- Snowflake
- Data Warehousing
- Data Automation
- Exploratory Data Analysis
Oracle Database Developer & Python Data Engineer
at Tata Consultancy Services, April 2018 - July 2021Developed and maintained high-performance Oracle PL/SQL packages for data management in large-scale systems. Migrated data processes from PL/SQL to Python, improving efficiency and enabling advanced data processing techniques. Assisted in building data pipelines for machine learning workflows, ensuring structured and clean data for predictive modeling. Automated post-release validation checks, enhancing data integrity and model performance.
- Oracle PL/SQL
- Python
- Data Modeling
- Shell Scripting
- Machine Learning Pipelines
- ETL
- Feature Engineering
- Git
- Bitbucket
- Jira
- Agile Development
Oracle PL/SQL Developer and Oracle Database Admin
at Prudent Technologies Private Limited, Dec 2014 - April 2018Led database architecture design and optimization, implementing indexing, partitioning, and performance tuning techniques for large-scale datasets. Developed and managed PL/SQL procedures, functions, and triggers to automate critical data processes. Migrated and optimized Oracle databases, ensuring high availability and performance. Monitored database health, performing backups and fine-tuning queries to maintain system efficiency.
- Oracle PL/SQL
- Database Administration
- Database Design
- Performance Optimization
- Query Tuning
- Indexing
- Backup & Recovery
- Database Migration
- Shell Scripting
PROJECTS
Machine Learning model for best strategies in Corner Kick Taking in Football
Master Thesis
This project focused on analyzing football data to predict the outcome of corner kicks. By using advanced
machine learning techniques, such as logistic regression and gradient boosting, the study aimed to identify
key factors influencing shot and goal outcomes.
The goal was to provide valuable insights for teams to optimize their corner kick strategies.
- Football Analytics
- Machine Learning
- Python
- Data Analysis
- Scikit-Learn
- Statsmodel
Multi-Format YouTube Content Engine
Personal Project
This project features an asynchronous multi-agent system designed to automatically transform YouTube videos
into diverse written content formats.
Leveraging LangGraph and DeepAgents, the pipeline extracts video transcripts and utilizes a centralized
coordinator to spawn specialized sub-agents.
The system intelligently generates timestamped chapters, educational long-form blogs, and tailored social
media copy optimized for specific audiences and tones.
- LangGraph
- DeepAgents
- LangChain
- Generative AI
- Streamlit
Travel Planner Multi-Agent System
Personal Project
A multi-agent system built using the deepagents framework and LangChain, utilizing Google Gemini models to automatically research and compile detailed, budget-conscious travel itineraries based on natural language queries.
It features a conversational interface with session management, a Supervisor-Worker architecture for task delegation (weather forecasting and itinerary research), and structured output for clean, visually appealing summaries.
- LangChain
- DeepAgents
- Generative AI
- Python
- Multi-Agent Systems
ChatGPT Football Commentary
Project in Data Science
This project uses AI to create a more engaging and informative football commentary experience.
By analyzing game data, the system can provide real-time insights and commentary, going beyond basic
play-by-play descriptions.
It aims to capture the excitement of live football and offer unique perspectives on the game.
- Football Analytics
- Prompt Engineering
- Data Analysis
- Data Visualization
- Clustering
Stock Market Predictor App - Streamlit
Personal Project
This project developed a stock price prediction app for the Indian stock market (NSE).
It utilized historical stock data and machine learning techniques, primarily regression models, to predict
future price movements.
Key features included technical indicators, time-series analysis, and model evaluation metrics like RMSE,
MAE, MAPE, and R-squared.
Future work aims to incorporate LSTM models for improved long-term predictions.
- Machine Learning
- Feature Engineering
- Time Series Analysis
- Model Evaluation
- Streamlit
- Quantitative Analysis (Quant)
Stock Portfolio Analysis & Optimization Tool
Personal Project
A Streamlit app to analyze and optimize stock portfolios using historical data from the Indian Stock Market
(NSE).
It provides insights through risk metrics, historical performance, and portfolio optimization (maximizing
Sharpe Ratio), with interactive visualizations like cumulative returns and efficient frontiers.
Offers a data-driven approach beyond traditional chart and fundamental analysis.
- Python
- Streamlit
- Plotly
- Portfolio Optimization
- Data Visualization
Data-Driven Football Player Analysis
Personal Project
This tool uses data science techniques to analyze football players. It compares players based on their key
statistics, like goals, assists, and passing accuracy, to find players with similar playing styles.
This can be helpful for coaches, scouts, and fans who want to learn more about a player or find potential
replacements.
The tool uses data from the top 5 European football leagues (England, Spain, Italy, France, and Germany)
from the 2020-2024 seasons to provide comprehensive and insightful analysis.
Future plans include expanding to more leagues and exploring clustering approaches for further player
analysis.
- Python
- Football Analytics
- Streamlit
- Data Science
- Data Visualization
Real-Time Twitter Sentiment Pipeline
Personal Project
This project builds a real-time data pipeline to process tweets using AWS services, analyzing sentiment and
extracting hashtags from 10,000 tweets with a 50/50 sentiment split.
It leverages S3 for storage, Lambda for processing, DynamoDB for persistence, and includes sentiment
analysis with plans for ML model integration and Flask deployment on EC2.
- AWS Lambda
- Amazon S3
- DynamoDB
- Python
- Data Pipeline
Movie Recommendation System with Hugging Face Embeddings
Personal Project
A modular movie recommendation system built with Python, Streamlit, and Hugging Face embeddings, using the
MovieLens dataset.
It provides content-based movie recommendations based on tag similarity and user ratings, featuring a
user-friendly interface with filtering options for ratings and release years.
- Hugging Face
- Sentence Transformers
- Content-Based Filtering
- Python
- Streamlit
- Data Preprocessing
LSTM Stock Market Prediction Tool
Personal Project
This is a personal project where I used LSTM models to experiment with classification and regression
techniques to predict stock market movements and compare the results.
The app fetches historical data via yfinance, trains models, and provides interactive visualizations for
performance analysis.
It supports both price direction classification and future price regression, with MLflow for experiment
tracking, offering a robust tool for financial forecasting.
- Deep Learning
- Time Series Analysis
- Quantitative Analysis
- MLflow
- Streamlit
- LSTM
Github Analytic System using a Streaming Framework
Data Engineering Project
This project built a streaming framework to analyze data from GitHub and answer user queries.
It fetches data on repository updates, programming languages used, and development approaches
(TDD/DevOps).
The system provides insights into popular languages, frequently updated repositories, and the correlation
between languages and development practices.
- Apache Pulsar
- MongoDB
- Flask
- Statistical Analysis
- Data Visualization
Keyword mining of Reddit comments using Hadoop and Spark
Data Engineering Project
This project leverages the power of big data and machine learning to analyze Reddit comments.
By utilizing Apache Spark and Hadoop Distributed File System (HDFS), we efficiently processed and analyzed
large-scale datasets.
TThrough experiments with different cluster configurations, we optimized performance and identified the
optimal number of workers for efficient processing.
- Apache Spark
- Hadoop
- Cloud Computing
- Docker
- Data Visualization
Forest Cover Type Prediction Project
Personal Project
This project predicts forest cover types in Roosevelt National Forest using the Covertype Dataset, achieving
up to 0.96 accuracy with a tuned XGBoost model.
It involves data preprocessing, class imbalance handling, and model comparison (Random Forest, XGBoost,
Neural Network), with visualizations for EDA and evaluation.
- EDA
- Scikit-Learn
- XGBoost
- Tensorflow
- Data Visualization
Comparative Analysis of Apriori Implementations for Association Rule Mining in Retail Data
Data Mining Project
This project leverages data mining techniques to uncover patterns in customer purchasing behavior.
By analyzing a large dataset of online transactions, we employed the Apriori algorithm to identify
frequent itemsets and association rules.
Key techniques such as support, confidence, and lift were used to evaluate the strength of these
relationships.
- Data Mining
- Association Rule Mining
- Apriori Algorithm
- MLXTEND
- APYORI
Holiday Recommendation System Using Markov Chains
Artificial Intelligence Project
This project developed a holiday recommendation system using Markov Chains.
The system suggests personalized travel destinations based on user history, age, and spending habits.
The system analyzes user data to predict future preferences and recommends relevant destinations.
The effectiveness of the system was evaluated using a test dataset, and the results demonstrated promising
accuracy in aligning recommendations with user behavior.
- Markov Chains
- Recommendation Systems
- Data Mining
- Data Analysis
IPO Performance Analysis in Indian Stock Market
Personal Project
This project aimed to predict the success of Initial Public Offerings (IPOs) in the Indian stock market using
machine learning.
Key features analyzed included subscription data from Qualified Institutional Buyers (QIBs), High Net
Worth Individuals (HNIs), and Retail Investors.
The project explored the potential of machine learning models, such as Gradient Boosting and Logistic
Regression, to forecast whether an IPO would result in a positive listing gain. .
- Quant Finance
- Financial Data Analysis
- Data Analysis
- Machine Learning
Gender classification in Movie Screenplays using statistical machine learning methods
Statistical Machine Learning Project
This project analyzed a dataset of movie screenplays to investigate gender bias in Hollywood films.
Machine learning models, including logistic regression, discriminant analysis, KNN, and tree-based
methods, were employed to predict the gender of lead characters.
The performance of these models was evaluated using metrics such as accuracy, precision, recall, and
F1-score.
- Machine Learning
- Scikit-Learn
- Feature Engineering
- Model Evaluation
- Data Analysis
Skills
- Python
- SQL
- Oracle PL/SQL
- Machine Learning
- Pandas
- Seaborn
- Sci-kit Learn
- Tensorflow
- Pytorch
- Streamlit
- AWS
- Microsoft Azure
- Google Cloud Platform
- Data Pipelines
- ETL
- DBT
- Snowflake
- Spark
- Power BI
- Docker
- Kubernetes
- SQL Server
- Data Warehouse Concept
- Agile Methodologies
- Git