Max Bahar

About / Skills / Featured Work

About

Hi, I'm Max!

I'm a Data Science MS candidate at Harvard (graduating May 2026) with experience in machine learning and data analytics across fintech and geospatial domains. I'm targeting data science roles in Singapore from July 2026.

Recently, I interned at GoTo Financial (GoPay) in Jakarta, where I built features for QRIS scam detection at 3M+ transactions per day and contributed to a 58.6% reduction in blocked transactions while sustaining detection performance.

Previously, I spent three years at Caliper Corporation, developing geospatial data pipelines and conducting spatial analyses for Fortune 500 clients including Amazon, Optum, and Assa Abloy.

Skills

Languages
Python SQL JavaScript Stata
ML & Modeling
XGBoost TensorFlow scikit-learn SHAP Random Forest LASSO BiLSTM SMOTE
Data Engineering
Pandas NumPy Dask GeoPandas
Visualization
Altair D3.js Leaflet.js
Platforms
Alibaba MaxCompute Maptitude Streamlit
★ Featured

QRIS Scam Detection at GoTo Financial

August 2025

Developed a large-scale feature set for QRIS transaction scam detection (3M+ transactions/day) at GoTo Financial (GoPay), integrating transactional, behavioral, geographic, and identity verification signals in Python and SQL (Alibaba MaxCompute).

Applied multi-method feature selection (variance threshold, LASSO, random forest, XGBoost) to identify a high-signal feature subset, then extended the production XGBoost model via incremental learning to incorporate new features while retaining prior model knowledge.

Conducted precision-recall tradeoff analysis across model configurations and presented threshold recommendations to the business team, contributing to a 58.6% reduction in blocked transactions while sustaining scam detection performance.

Python XGBoost SQL Alibaba MaxCompute
Land Data Quality and Governance at Alam Sutera Realty
January 2026

Established the company's first data governance framework for land parcel data by partnering with mapping and legal division heads, building an automated health report that reduced missing data from ~80% to ~10% across a ~20,000-record database. Presented a data maturity assessment and infrastructure recommendations to company directors as part of a digital transformation initiative.

Python Leaflet.js
Space Object Identification with Neural Networks (CHESTER)
May 2025

Designed CHESTER, a hierarchical TensorFlow neural network combining a feedforward State Model and Bidirectional LSTM to classify space objects as payloads, rocket bodies, or debris. Trained on Space-Track.org orbital elements using SMOTE and a staged freeze-and-fine-tune procedure, achieving 93.66% test accuracy across 3 object categories.

View repository →

Python TensorFlow
From Wearable Data to Actionable Stress Insights
May 2025

Built an interactive Streamlit visualization tool combining wearable biometric signals with user-annotated stress events, using Pandas for data cleaning and Altair for interactive time-series visualizations to help users identify personal stress patterns.

View repository →

Python Streamlit Altair
Voter Turnout and Demographics in Massachusetts
December 2024

End-to-end ML pipeline on 2022 MA voter data: Census block-group demographics via GeoPandas, LASSO feature selection (38 to 13 predictors), tuned random forest regressor (R² = 0.86 on test set), and SHAP analysis identifying income, language, ethnicity, and age as key drivers. Deployed as an interactive D3.js data product with choropleth maps and county-level turnout prediction.

View interactive tool →

Python scikit-learn SHAP D3.js
Simulating Tropical Cyclone Potential Intensity
December 2024

Modeled climate change impacts on hurricane intensity using CMIP6 data integrated via intake-esm, applying GEV distribution fitting and Kolmogorov-Smirnov scoring across climate scenarios. Developed the tc_potential and tc_extremes Python library modules with Dask for large-scale parallel climate data processing.

View repository →

Python Dask
Maptitude Banking Compliance Data
June 2023

Processed a 16M+ record, 99-field FFIEC HMDA mortgage loan dataset, aggregating loan-level data to Census tract geographies and integrating multi-source CRA compliance layers to produce analysis-ready geospatial datasets for a commercial banking analytics product.

View product →

Python Maptitude
Albertsons & Kroger Geographic Market Analysis
October 2022

Joined location and demographic datasets across ~40 Albertsons and Kroger subsidiary brands to analyze merger impact, running radius and drive-time analyses to estimate that 56% of the US mainland population lives within 5 miles of a combined store location, profiling geodemographic overlap across major US cities.

View blog post →

Maptitude
Long-term Effects of Parental Migration on Income: Evidence from Indonesia
May 2021

Econometric study using 21 years of panel data from the Indonesian Family Life Survey (IFLS, 22,000 individuals), applying instrumental variable regression using region-level migration rates to estimate the long-term causal impact of parental migration on child income.

View full paper →

Stata