Hi, I am Kim Mathews.

Gen AI Lead | Master's in Data Science, Uppsala University

I am a results-driven Generative AI Engineer and Lead with over 8 years of experience spanning data engineering, machine learning, and production Gen AI systems. Currently, I serve as the Gen AI Lead at Focaloid Technologies, where I specialize in designing multi-agent LLM systems, agentic workflows, and RAG-based retrieval pipelines. My technical stack heavily features tools like LangChain, LangGraph, and DeepAgents to build context-aware, scalable AI solutions.

Backed by a Master's in Data Science from Uppsala University, I combine a strong foundation in scalable data engineering, cloud platforms, and Python with a proven track record of leading development initiatives and delivering end-to-end intelligent systems.

I am an active investor and trader in the Indian stock market, deeply fascinated by the intersection of artificial intelligence and finance. To bridge these domains, I develop agentic AI solutions tailored for comprehensive stock market analysis, automated screening, and intelligent reporting. By leveraging advanced frameworks like LangChain and LangGraph to process live market data. I actively explore how multi-agent workflows can enhance predictive modeling, risk assessment, and algorithmic trading.

Beyond the financial markets, my core passion lies in uncovering hidden patterns within complex datasets and transforming raw information into actionable intelligence. Whether I am architecting advanced conversational agents or engineering scalable data pipelines, I am constantly seeking opportunities to leverage data-driven AI to make informed decisions and build impactful, real-world solutions.

Mail: kim.kmathews@gmail.com

Experience

Gen AI Lead

at Focaloid Technologies, Sept 2025 - Till date

Built production multi-agent Gen AI systems for research automation and risk monitoring using LangChain, LangGraph, and DeepAgents. Developed conversational research agents featuring persistent state and context-aware RAG retrieval. Consolidated fragmented monitoring into unified intelligence pipelines, reducing infrastructure by 50% and boosting efficiency by 30%. Led development initiatives and enforced engineering standards to drive agile delivery and maintain system momentum.

LangChain
LangGraph
DeepAgents
Multi-Agent Systems
Prompt Engineering
Retrieval-Augmented Generation (RAG)
LangSmith
Claude SDK

Data Engineer & Python Developer

at IQVIA IMS Health Analytics, July 2021 - July 2022

Designed and implemented ETL pipelines to automate data ingestion, transformation, and integration across multiple sources, improving data accessibility for analytics. Optimized SQL queries and indexing strategies for large-scale datasets, enhancing processing efficiency. Developed Python-based automation scripts to streamline workflows and reduce manual effort. Built interactive Power BI dashboards, transforming raw data into actionable insights for business decision-making.

Python
SQL
ETL Pipelines
Data Engineering
Power BI
Snowflake
Data Warehousing
Data Automation
Exploratory Data Analysis

Oracle Database Developer & Python Data Engineer

at Tata Consultancy Services, April 2018 - July 2021

Developed and maintained high-performance Oracle PL/SQL packages for data management in large-scale systems. Migrated data processes from PL/SQL to Python, improving efficiency and enabling advanced data processing techniques. Assisted in building data pipelines for machine learning workflows, ensuring structured and clean data for predictive modeling. Automated post-release validation checks, enhancing data integrity and model performance.

Oracle PL/SQL
Python
Data Modeling
Shell Scripting
Machine Learning Pipelines
ETL
Feature Engineering
Git
Bitbucket
Jira
Agile Development

Oracle PL/SQL Developer and Oracle Database Admin

at Prudent Technologies Private Limited, Dec 2014 - April 2018

Led database architecture design and optimization, implementing indexing, partitioning, and performance tuning techniques for large-scale datasets. Developed and managed PL/SQL procedures, functions, and triggers to automate critical data processes. Migrated and optimized Oracle databases, ensuring high availability and performance. Monitored database health, performing backups and fine-tuning queries to maintain system efficiency.

Oracle PL/SQL
Database Administration
Database Design
Performance Optimization
Query Tuning
Indexing
Backup & Recovery
Database Migration
Shell Scripting

PROJECTS

Machine Learning model for best strategies in Corner Kick Taking in Football

Master Thesis

This project focused on analyzing football data to predict the outcome of corner kicks. By using advanced machine learning techniques, such as logistic regression and gradient boosting, the study aimed to identify key factors influencing shot and goal outcomes.
The goal was to provide valuable insights for teams to optimize their corner kick strategies.

Football Analytics
Machine Learning
Python
Data Analysis
Scikit-Learn
Statsmodel

Multi-Format YouTube Content Engine

Personal Project

This project features an asynchronous multi-agent system designed to automatically transform YouTube videos into diverse written content formats.
Leveraging LangGraph and DeepAgents, the pipeline extracts video transcripts and utilizes a centralized coordinator to spawn specialized sub-agents.
The system intelligently generates timestamped chapters, educational long-form blogs, and tailored social media copy optimized for specific audiences and tones.

LangGraph
DeepAgents
LangChain
Generative AI
Streamlit

Travel Planner Multi-Agent System

Personal Project

A multi-agent system built using the deepagents framework and LangChain, utilizing Google Gemini models to automatically research and compile detailed, budget-conscious travel itineraries based on natural language queries.
It features a conversational interface with session management, a Supervisor-Worker architecture for task delegation (weather forecasting and itinerary research), and structured output for clean, visually appealing summaries.

LangChain
DeepAgents
Generative AI
Python
Multi-Agent Systems

ChatGPT Football Commentary

Project in Data Science

This project uses AI to create a more engaging and informative football commentary experience.
By analyzing game data, the system can provide real-time insights and commentary, going beyond basic play-by-play descriptions.
It aims to capture the excitement of live football and offer unique perspectives on the game.

Football Analytics
Prompt Engineering
Data Analysis
Data Visualization
Clustering

Stock Market Predictor App - Streamlit

Personal Project

This project developed a stock price prediction app for the Indian stock market (NSE).
It utilized historical stock data and machine learning techniques, primarily regression models, to predict future price movements.
Key features included technical indicators, time-series analysis, and model evaluation metrics like RMSE, MAE, MAPE, and R-squared.
Future work aims to incorporate LSTM models for improved long-term predictions.

Machine Learning
Feature Engineering
Time Series Analysis
Model Evaluation
Streamlit
Quantitative Analysis (Quant)

Stock Portfolio Analysis & Optimization Tool

Personal Project

A Streamlit app to analyze and optimize stock portfolios using historical data from the Indian Stock Market (NSE).
It provides insights through risk metrics, historical performance, and portfolio optimization (maximizing Sharpe Ratio), with interactive visualizations like cumulative returns and efficient frontiers.
Offers a data-driven approach beyond traditional chart and fundamental analysis.

Python
Streamlit
Plotly
Portfolio Optimization
Data Visualization

Data-Driven Football Player Analysis

Personal Project

This tool uses data science techniques to analyze football players. It compares players based on their key statistics, like goals, assists, and passing accuracy, to find players with similar playing styles.
This can be helpful for coaches, scouts, and fans who want to learn more about a player or find potential replacements.
The tool uses data from the top 5 European football leagues (England, Spain, Italy, France, and Germany) from the 2020-2024 seasons to provide comprehensive and insightful analysis.
Future plans include expanding to more leagues and exploring clustering approaches for further player analysis.

Python
Football Analytics
Streamlit
Data Science
Data Visualization

Real-Time Twitter Sentiment Pipeline

Personal Project

This project builds a real-time data pipeline to process tweets using AWS services, analyzing sentiment and extracting hashtags from 10,000 tweets with a 50/50 sentiment split.
It leverages S3 for storage, Lambda for processing, DynamoDB for persistence, and includes sentiment analysis with plans for ML model integration and Flask deployment on EC2.

AWS Lambda
Amazon S3
DynamoDB
Python
Data Pipeline

Movie Recommendation System with Hugging Face Embeddings

Personal Project

A modular movie recommendation system built with Python, Streamlit, and Hugging Face embeddings, using the MovieLens dataset.
It provides content-based movie recommendations based on tag similarity and user ratings, featuring a user-friendly interface with filtering options for ratings and release years.

Hugging Face
Sentence Transformers
Content-Based Filtering
Python
Streamlit
Data Preprocessing

LSTM Stock Market Prediction Tool

Personal Project

This is a personal project where I used LSTM models to experiment with classification and regression techniques to predict stock market movements and compare the results.
The app fetches historical data via yfinance, trains models, and provides interactive visualizations for performance analysis.
It supports both price direction classification and future price regression, with MLflow for experiment tracking, offering a robust tool for financial forecasting.

Deep Learning
Time Series Analysis
Quantitative Analysis
MLflow
Streamlit
LSTM

Github Analytic System using a Streaming Framework

Data Engineering Project

This project built a streaming framework to analyze data from GitHub and answer user queries.
It fetches data on repository updates, programming languages used, and development approaches (TDD/DevOps).
The system provides insights into popular languages, frequently updated repositories, and the correlation between languages and development practices.

Apache Pulsar
MongoDB
Flask
Statistical Analysis
Data Visualization

Keyword mining of Reddit comments using Hadoop and Spark

Data Engineering Project

This project leverages the power of big data and machine learning to analyze Reddit comments.
By utilizing Apache Spark and Hadoop Distributed File System (HDFS), we efficiently processed and analyzed large-scale datasets.
TThrough experiments with different cluster configurations, we optimized performance and identified the optimal number of workers for efficient processing.

Apache Spark
Hadoop
Cloud Computing
Docker
Data Visualization

Forest Cover Type Prediction Project

Personal Project

This project predicts forest cover types in Roosevelt National Forest using the Covertype Dataset, achieving up to 0.96 accuracy with a tuned XGBoost model.
It involves data preprocessing, class imbalance handling, and model comparison (Random Forest, XGBoost, Neural Network), with visualizations for EDA and evaluation.

EDA
Scikit-Learn
XGBoost
Tensorflow
Data Visualization

Comparative Analysis of Apriori Implementations for Association Rule Mining in Retail Data

Data Mining Project

This project leverages data mining techniques to uncover patterns in customer purchasing behavior.
By analyzing a large dataset of online transactions, we employed the Apriori algorithm to identify frequent itemsets and association rules.
Key techniques such as support, confidence, and lift were used to evaluate the strength of these relationships.

Data Mining
Association Rule Mining
Apriori Algorithm
MLXTEND
APYORI

Holiday Recommendation System Using Markov Chains

Artificial Intelligence Project

This project developed a holiday recommendation system using Markov Chains.
The system suggests personalized travel destinations based on user history, age, and spending habits.
The system analyzes user data to predict future preferences and recommends relevant destinations.
The effectiveness of the system was evaluated using a test dataset, and the results demonstrated promising accuracy in aligning recommendations with user behavior.

Markov Chains
Recommendation Systems
Data Mining
Data Analysis

IPO Performance Analysis in Indian Stock Market

Personal Project

This project aimed to predict the success of Initial Public Offerings (IPOs) in the Indian stock market using machine learning.
Key features analyzed included subscription data from Qualified Institutional Buyers (QIBs), High Net Worth Individuals (HNIs), and Retail Investors.
The project explored the potential of machine learning models, such as Gradient Boosting and Logistic Regression, to forecast whether an IPO would result in a positive listing gain. .

Quant Finance
Financial Data Analysis
Data Analysis
Machine Learning

Gender classification in Movie Screenplays using statistical machine learning methods

Statistical Machine Learning Project

This project analyzed a dataset of movie screenplays to investigate gender bias in Hollywood films.
Machine learning models, including logistic regression, discriminant analysis, KNN, and tree-based methods, were employed to predict the gender of lead characters.
The performance of these models was evaluated using metrics such as accuracy, precision, recall, and F1-score.

Machine Learning
Scikit-Learn
Feature Engineering
Model Evaluation
Data Analysis

Skills

Python
SQL
Oracle PL/SQL
Machine Learning
Pandas
Seaborn
Sci-kit Learn
Tensorflow
Pytorch
Streamlit
AWS
Microsoft Azure
Google Cloud Platform
Data Pipelines
ETL
DBT
Snowflake
Spark
Power BI
Docker
Kubernetes
SQL Server
Data Warehouse Concept
Agile Methodologies
Git