AI · ML EngineerM.S. CS · GPA 4.0RANLP 2025 Co-AuthorCalifornia, US
Shazan Ansar
Mohammed

Building intelligent systems that work in the real world. End-to-end ML pipelines, multilingual NLP, and production-grade data platforms.

Get in Touch↓ Resume
0%
BLEU Score Improvement
0%
Data Latency Reduction
4.0
GPA · M.S. CS
0+
Projects Built
PythonPyTorchAWSApache SparkKafkaAirflowDelta LakeHuggingFaceFastAPIDockerMLflowLangChainFAISSSQLTensorFlowscikit-learnPandasTableauNLLB-200mBARTSageMakerECSGrafanadbtKubernetesPythonPyTorchAWSApache SparkKafkaAirflowDelta LakeHuggingFaceFastAPIDockerMLflowLangChainFAISSSQLTensorFlowscikit-learnPandasTableauNLLB-200mBARTSageMakerECSGrafanadbtKubernetes

About

Engineer. Researcher. Builder.

I'm a Data and Machine Learning Engineer based in California, specializing in end-to-end ML pipelines, multilingual NLP systems, and production-grade data platforms.

Currently completing my M.S. in Computer Science at Cal State Dominguez Hills (GPA: 4.0), where I co-authored a peer-reviewed paper at RANLP 2025 on clinical NLP translation.

Actively seeking full-time roles in AI Engineering, ML Engineering, Data Science, or Data Engineering — anywhere I can ship systems with real impact.

Education
M.S. Computer Science
Cal State Dominguez Hills · 2024–Present
GPA: 4.0 / 4.0
Location
California, United States
Contact
shazanansar@gmail.com
+1 747-352-9855
Open to Roles
AI EngineerML EngineerData ScientistData EngineerMLOps EngineerApplied ScientistDevOps EngineerNLP ResearcherFuture PhD Candidate
Shazan Ansar

Experience

Where I've Worked

California State University Dominguez Hills
CurrentResearch Assistant — Survey PaperJan 2026 – Present
  • Leading comprehensive survey on LLMs Impact on Neural Machine Translationongoing research
  • Synthesizing 100+ papers across multilingual NLP, LLM-based NMT, and low-resource translation
  • Analyzing transformer architectures, fine-tuning strategies, and evaluation frameworks
  • Collaborating with faculty advisors on structured literature review methodology
PyTorchHuggingFacemBARTNLLB-200PythonLaTeX
California State University Dominguez Hills
PastResearch Assistant / AI & ML EngineerFeb 2025 – Aug 2025
  • Designed medical-domain translation pipeline using mBART and NLLB-20032% BLEU score improvement
  • Curated and cleaned a domain-specific corpus70,000+ entries processed
  • Built visualization dashboards for BLEU, METEOR, BERTScore, COMET evaluation
  • Conducted error profiling and ethical assessment for low-resource NMT in clinical contexts
  • Co-authored peer-reviewed paper accepted at RANLP 2025
PyTorchmBARTNLLB-200AWSPythonHuggingFace
Code for India Foundation
PastData Science InternMar 2023 – Nov 2023
  • Developed and evaluated sentiment analysis models15% accuracy improvement
  • Designed EDA workflows to uncover trends, anomalies, and data quality issues
  • Built automated data preprocessing pipelines20% latency reduction
  • Translated analytical insights into actionable recommendations for stakeholders
PythonSQLscikit-learnPandasTableau

Projects

What I've Built

40 projects

Data Engineering
Real-Time Data Lakehouse for Analytics & ML
Medallion-style lakehouse enabling near real-time processing and reducing data latency.
40% latency reduction
Apache SparkDelta LakeAWS S3AirflowPython
RAG & LLM
LLM-Powered Analytics Assistant
RAG pipeline for accurate analytics Q&A with automated evaluation and cost optimization.
35% latency reduction
LangChainOpenAIVector DBAWSPython
MLOps
End-to-End ML Experimentation Framework
Reproducible ML framework with data versioning, config management, and metric tracking.
MLflowPyTorchDockerAWSPython
RAG & LLM
RAG System for Domain Knowledge
Production-grade RAG pipeline combining fine-tuned LLMs with vector-based retrieval.
30% answer precision
PythonFAISSHuggingFaceFastAPI
NLP
Multilingual LLM Fine-Tuning & Evaluation Platform
Scalable fine-tuning pipelines for multilingual transformers with automated evaluation.
32% BLEU improvement
PyTorchmBARTNLLB-200HuggingFaceAWS
MLOps
LLM Monitoring, Reliability & Cost Optimization
Monitoring pipelines for latency, token usage, failure rates across LLM workloads.
45% faster detection
AWS CloudWatchPythonGrafanaDocker
RAG & LLM
Secure Enterprise LLM API & Guardrails
LLM inference API with auth, rate limiting, and policy-based output guardrails.
FastAPIPythonDockerAWS Lambda
MLOps
Automated Model Retraining & Drift Pipeline
MLOps pipeline detecting data drift and triggering controlled model retraining.
40% fewer degradations
AirflowPythonDockerAWSscikit-learn
NLP
End-to-End ML Pipeline for Multilingual NLP
Full pipeline from data ingestion through training, evaluation, and error analysis.
40% faster iterations
PyTorchPythonAWSHuggingFace
Real-Time Systems
Streaming Feature Engineering Platform
Real-time feature pipelines with versioning and validation for ML inference.
35% fewer drift incidents
KafkaApache SparkPythonAWS S3
MLOps
ML Monitoring, Cost & Reliability Framework
Dashboards for ML pipeline health, model performance, and compute cost.
20% cost savings
PythonAWS CloudWatchTableauGrafana
Data Engineering
Enterprise Feature Engineering & Analytics Platform
ETL/ELT pipelines transforming raw operational data into ML-ready feature tables.
25% cost reduction
Apache SparkPySparkAirflowAWS S3SQL
Cloud
Cloud-Native Cost & Observability Pipeline
Pipelines aggregating cloud usage metrics for centralized cost and reliability visibility.
50% faster MTTD
AWSSparkPythonCloudWatch
Data Engineering
Enterprise Medallion Data Lake
Bronze/Silver/Gold architecture on S3 for analytics and ML workloads at scale.
35% faster analytics
AWS S3Apache SparkDelta LakeAirflowSQL
Analytics
Experimentation Framework for NLP Model Selection
Statistically rigorous framework for comparing NLP model variants reproducibly.
35% fewer false promotions
PythonSQLscipyMLflow
Analytics
SHAP-Based Feature Impact Analysis
Feature impact analysis to quantify and validate ML feature contributions.
20% accuracy improvement
PythonSHAPscikit-learnPandas
MLOps
Data Quality, Bias & Reliability Platform
Pipelines monitoring quality, schema changes, and subgroup fairness across ML datasets.
40% faster detection
PythonGreat ExpectationsSQLAirflow
Analytics
KPI-Driven Decision Analytics for NLP
Business-aligned KPIs bridging offline ML metrics with real-world operational impact.
PythonTableauSQL
Cloud
AWS VPC Multi-Tier Network Architecture
Complete VPC with public/private subnets, NAT gateways, route tables, and security groups.
VPCSubnetsNAT GatewaySecurity GroupsRoute Tables
Cloud
EC2 Production Web Server Deployment
Launched and configured EC2 with Apache, SSH key pair, and Elastic IP.
EC2ApacheSSHElastic IPIAM
Cloud
S3 Static Site with CloudFront CDN
Static site deployed to S3 with CloudFront distribution, custom domain, and HTTPS via ACM.
S3CloudFrontACMRoute 53
Cloud
IAM Least Privilege Security Setup
IAM users, groups, roles, and policies following AWS least-privilege best practices.
IAMAWS CLIMFASCP
Cloud
Aurora MySQL Multi-AZ Database
Aurora MySQL with read replicas, multi-AZ failover, and automated snapshots.
Aurora MySQLRDSMulti-AZVPC
Cloud
DynamoDB NoSQL Schema Design & Queries
DynamoDB tables with partition/sort keys, GSIs, and optimized query patterns.
DynamoDBGSIPartiQLLambdaSDK
Cloud
AWS Lambda Serverless Functions
Event-driven Lambda functions with S3 triggers, API Gateway, and CloudWatch logging.
LambdaAPI GatewayS3CloudWatchPython
Cloud
Auto Scaling Groups & Load Balancer
ALB with target groups, health checks, and ASG policies for elastic compute scaling.
ALBAuto ScalingLaunch TemplatesEC2
Cloud
CloudFormation Infrastructure as Code
Complete AWS stack templated with CloudFormation including nested stacks.
CloudFormationYAMLNested StacksParameters
Cloud
CI/CD Pipeline with CodePipeline & CodeBuild
End-to-end CI/CD from GitHub to EC2/ECS using AWS native DevOps tools.
CodePipelineCodeBuildCodeDeployGitHub
Cloud
Containerized App on ECS Fargate
Dockerized web app deployed to ECS Fargate with ECR and service discovery.
ECS FargateECRDockerALB
Cloud
VPC Peering & Transit Gateway
Connected multiple VPCs across accounts with Transit Gateway for centralized routing.
VPC PeeringTransit GatewayCIDRRoute Tables
Cloud
SNS & SQS Event-Driven Architecture
Decoupled microservices using SNS fan-out and SQS queues with dead-letter queue handling.
SNSSQSDLQLambdaEventBridge
Cloud
CloudWatch Dashboards & Alarms
Custom dashboards, metric alarms, Log Insights queries, and SNS notifications.
CloudWatchMetricsLog InsightsSNS
Data Engineering
AWS Glue ETL Data Pipeline
Serverless ETL with Glue crawlers, Data Catalog, and Spark jobs writing to S3.
AWS GlueSparkS3AthenaData Catalog
Real-Time Systems
Kinesis Real-Time Data Ingestion
Real-time clickstream ingestion with Kinesis Streams, Firehose, and Lambda processing.
KinesisFirehoseLambdaS3Athena
Analytics
Athena + S3 Serverless Analytics
Queried partitioned S3 data lake with Athena and cost-optimized columnar formats.
AthenaS3ParquetGlue CatalogSQL
Analytics
QuickSight Business Intelligence Dashboard
Interactive QuickSight dashboards from RDS/S3 with SPICE for sub-second querying.
QuickSightSPICERDSS3
Cloud
AWS WAF & Shield Security Hardening
WAF rules, managed rule groups, Shield Standard, and IP reputation filtering.
WAFShieldCloudFrontALB
Cloud
Secrets Manager & Parameter Store Integration
Rotated RDS credentials and injected config via Parameter Store into Lambda.
Secrets ManagerParameter StoreLambdaIAM
MLOps
Step Functions Workflow Orchestration
Multi-step ML data prep workflow with error handling, retries, and parallel states.
Step FunctionsLambdaS3DynamoDB
MLOps
SageMaker Model Training & Endpoint
Trained and deployed an ML model using SageMaker with hyperparameter tuning.
SageMakerS3IAMBoto3Python

Skills

Technical Expertise

Programming & Data
PythonSQLJavaPySparkBashR
ML & AI
PyTorchTensorFlowscikit-learnNLPTransformer ModelsRAGLLM Fine-tuningEmbeddingsPrompt EngineeringBERTScoreCOMET
Data Engineering
Apache SparkKafkaAirflowDelta LakeMedallion ArchitectureETL / ELTFeature PipelinesStreaming Systems
Cloud & MLOps
AWS (EC2, S3, Lambda, EMR, Glue)GCP (Vertex AI, BigQuery)DockerCI/CDModel ServingDrift DetectionModel Monitoring
Analytics & Viz
PandasNumPyTableauA/B TestingExperiment TrackingDashboard DesignMetric Design

Proficiency

Python / PySpark
95%
AWS Cloud
90%
PyTorch / ML
88%
Data Engineering
85%
NLP / LLMs
87%
SQL / Analytics
92%
MLOps / DevOps
80%
Docker / CI-CD
78%

Research

Peer-Reviewed Paper

RANLP 2025Recent Advances in Natural Language Processing
2025

Advancing Clinical Translation in Nepali through Fine-Tuned Multilingual Models

Benyamin Ahmadnia, Sumaiya Shaikh, Bibek Poudel, Shazan Ansar, Sahar Hooshmand
Department of Computer Science, California State University, Dominguez Hills, Carson, USA

Low-resource Neural Machine Translation (NMT) remains a major challenge in high-stakes domains such as healthcare. This paper presents a domain-adapted pipeline for English-Nepali medical translation leveraging mBART and NLLB-200. Translation fidelity is assessed through BLEU, CHRF++, METEOR, BERTScore, COMET, and perplexity. NLLB-200 consistently outperforms mBART, achieving higher accuracy and lower hallucination rates in clinical settings.

Key Contributions
1Nepali-English parallel corpus tailored to the medical domain from diverse domain-specific sources
2Fine-tuned mBART and NLLB-200 with a unified framework across lexical and semantic metrics
3Error analysis, hallucination detection, and ethical assessment for domain-specific term accuracy
NLLB-200 outperforms mBART — lower hallucination rates in clinical settings
Clinical NLPLow-Resource NMTmBARTNLLB-200English-NepaliHealthcare AI
First Page Preview
Proceedings of RANLP 2025, Varna, Bulgaria
Advancing Clinical Translation in Nepali through Fine-Tuned Multilingual Models
Benyamin Ahmadnia, Sumaiya Shaikh, Bibek Poudel, Shazan Ansar, Sahar Hooshmand
Dept. of Computer Science, CSUDH, Carson, USA
Abstract
Low-resource Neural Machine Translation (NMT) remains a major challenge in high-stakes domains such as healthcare. This paper presents a domain-adapted pipeline for English-Nepali medical translation leveraging mBART and NLLB-200. NLLB-200 consistently outperforms mBART across BLEU, CHRF++, METEOR, BERTScore, COMET, and perplexity...
1 Introduction
NMT has brought significant advancements, offering more fluent and accurate translations. Nepali, a low-resource language, presents unique challenges particularly in specialized domains such as healthcare...

Certifications

Verified Credentials

AWS
AWS Certified Solutions Architect – Associate
Amazon Web Services · 2024
AWS
AWS Certified Cloud Practitioner
Amazon Web Services · 2024
G
Google Data Analytics Certificate
Google · 2023

Contact

Let's Connect

Open to full-time roles, research collaborations, and conversations about AI, ML, and data engineering. Based in California, available remote or on-site.

Shazan Ansar Mohammed
Shazan Ansar
California, US
Thesis Research Survey

Conducting a survey for my thesis on NLP, AI, curated agents, and AI in finance. Share your expert knowledge and insights. Your input shapes real research.

Share Your Knowledge →
in
LinkedIn
shazan-ansar
GH
GitHub
SHAZAN01
@
Email
shazanansar@gmail.com
M
Medium
@shazanansar
IG
Instagram
shazan_ansar
DC
Discord
insightshazan
Shazan Ansar Mohammed
AI · ML · Data Engineer · California, US
Open to Work — Full Time