Open to opportunitiesWorking, Learning and Researching in Applied AI, Fintech Infrastructure, and Data Analytics

Three disciplines.Perscrutating, Ratiocinating.

Engineering, financial tech, and data analytics, inculcated at Penn State, applied across three U.S. internships, defended in code that ships.

Siddharth Shah
130+

credits at Penn State across AI, Computer Engineering, Data Sciences, IST

3

U.S. internships across enterprise analytics, fintech infrastructure, and university research

3+ years

in leadership roles. Sustained interests in mathematics, economics, geopolitics, data analytics.

EDUCATION

Pennsylvania State University (Penn State)

B.S. Information Sciences and Technology

with special focus in Data Sciences, minor in Computer Engineering

Domains studied:

Machine LearningDistributed SystemsCalculus & Linear AlgebraProbability TheoryMicroeconomics & MacroeconomicsComputational StatisticsData StructuresDatabase SystemsComputer VisionNetwork SecurityCloud ArchitectureOperating SystemsComputer ArchitecturePrivacy, Security and Ethics in Data Science

Sustained personal study:

GeopoliticsInternational Fiscal Policy and EconomicsNLP and Financial Market AnalyticsCalculus and Probability TheoryWriting and Rhetoric

SECTION 02 / Developing;)

three projects taking off soon; one about to land.

Architecture committed; coding my way to accomplishments. The status badges are accurate at the moment they're read and stale by the time they're acted upon. That's the nature of work.

mediain flight

Geopolitical News-Sentiment Atlas

Where the world's attention concentrates, mapped in real time.

The Arc

  • GDELT 2.0 ingest pipeline is humming
  • Geocoding + sentiment pass — currently writing
  • First static map render for the README
  • Interactive D3 globe layer
  • Architecture write-up (LinkedIn long-form)

Target metrics ──────

  • ~12k events ingested daily
  • ~85% sentiment classifier F1 (held-out test set)
GitHub ↗
financein flight

Graph-AML Detection

Money-laundering detection that earns its complexity over the tabular baseline.

The Arc

  • Repo and README scaffolded
  • IBM AML synthetic dataset loaded; XGBoost baseline trained
  • Graph construction in NetworkX — first pass
  • GNN training and benchmark vs baseline
  • Reproducible notebook + writeup

Target metrics ──────

  • Beat tabular XGBoost baseline by ≥5 F1 points
  • Inference latency under 50ms per transaction graph (1000-node neighborhood)
GitHub ↗
corein flight

LLM Audit-Chain

Cryptographic provenance for every LLM call your organisation makes.

The Arc

  • Architecture documented in README
  • Python middleware decorator
  • SQLite hash-chain logging
  • Verification CLI
  • Documentation, examples, and architecture post

Target metrics ──────

  • <2ms logging overhead per LLM call
  • Tamper detection in O(log n) with linked hashes
GitHub ↗
financein flight

Complalg

An AI-native Compliance Control Room — file Reg-M, surveil trades, and stay audit-ready in one loop.

The Arc

  • Three months of user research with compliance officers and control-room operators
  • Email → Copilot → Reg-M → autofill TN loop shipped end-to-end
  • Chrome MV3 extension live; 14+ FINRA TN fields autofill with preview / commit / undo
  • Demo to a 20+-year industry insider — partnership and pilot trials offered
  • Applied to Y Combinator — Winter 2026 batch
  • Wiring Restricted/Watch List objects into the same Copilot recipes
  • Private pilot with first design partners — January 15-31, 2026
  • Surveillance brief module + Syndicate & Position tracker (greenshoe, CNS FTD close-outs)

Target metrics ──────

  • Minutes-not-hours per RPN/NOI/TN filing — measured from mandate email to filed form with audit trail
  • False-positive noise in surveillance briefs collapsed into a clearable morning brief
  • Field-level provenance: every autofilled value bound to source data, undoable in one click
GitHub ↗

SECTION 03 / The Slate

core

Multi-Agent Research Synthesiser

Three coordinated agents — searcher, critic, writer — that produce a defensible literature review.

Synthesises ≥10-paper review in <5 min

GH ↗
finance

FinTech Filings Fine-Tune

A 7B model that matches GPT-4 on regulatory-filing question answering.

89% F1 on 500-query held-out set

GH ↗
core

Edge AI Benchmarking Leaderboard

Honest tokens-per-second-per-watt numbers for sub-7B models on consumer hardware.

≥5 models on the public leaderboard

GH ↗
finance

Quant Strategy Replication

Six published trading strategies, replicated. Most are paper tigers.

documented survivor count after frictions

GH ↗
finance

ESG Earnings-Call NLP

What S&P 500 firms say about climate vs. what they spend on it.

quantified talk-walk gap, sectoral cut

GH ↗
media

Parliamentary Speech Tracker

Lok Sabha debates, indexed and searchable for journalists.

Aiming 1000+ hours of speech indexed

GH ↗
media

Source-Credibility Graph

When two newspapers cite each other, who sources whom?

Looking 50+ outlets in the citation graph

GH ↗
core

Predictive Cardio Health

Supervised ML flagging early cardiovascular risk from vitals/labs — Flask REST service with automated Python+SQL ingestion, validation, and feature pipelines.

~85% classification accuracy · cross-validation + HP tuning for generalisation

GH ↗
core

AI Forensic Image Analysis

CNN for object/trace detection hardened with error-level analysis, augmentation, and noise-robust preprocessing. Led a 10-person build to a notebook → report workflow analysts could re-run.

~87% detection accuracy · packaged for repeatable analyst handoff

GH · TBD

SECTION 04 / The Foundation

What the credit hours actually built.

Enumerous courses. Filling this list deliberately, only the ones whose work outlived the semester.

SECTION 05 / Journey Onward

The leadership tracks, and the recognitions that came with showing up.

A measured pause here. The roles I've held and the recognitions worth listing are deliberately unmentioned until I've had a chance to write them in my own voice.

SECTION 06 / Mind & Margin

I read, I write, I observe - I thrive

My mind reasons, My heart expresses, I put it in letter. Few live; more wait. On McCandless and Rousseau. On India's fiscal architecture.

Read them at Mind & Margin →

Get In Touch

Coffee, then code. In that order.

Reach out, let's talk. The work is the part I'm here for; everything else is logistics.

Looking for opportunities in applied AI engineering, financial-data infrastructure, and analytical journalism. Joining immediately is fine; remote, hybrid, or relocated all work. The point is to learn fast, ship things that defend themselves, and find the rooms where the next problems get framed.

Siddharth Shah · © 2026