Shazan Ansar Mohammed — AI/ML & Data Engineer

PythonPyTorchAWSApache SparkKafkaAirflowDelta LakeHuggingFaceFastAPIDockerMLflowLangChainFAISSSQLTensorFlowscikit-learnPandasTableauNLLB-200mBARTSageMakerECSGrafanadbtKubernetesPythonPyTorchAWSApache SparkKafkaAirflowDelta LakeHuggingFaceFastAPIDockerMLflowLangChainFAISSSQLTensorFlowscikit-learnPandasTableauNLLB-200mBARTSageMakerECSGrafanadbtKubernetes

About

Engineer. Researcher. Builder.

I'm a Data and Machine Learning Engineer based in California, specializing in end-to-end ML pipelines, multilingual NLP systems, and production-grade data platforms.

Currently completing my M.S. in Computer Science at Cal State Dominguez Hills (GPA: 4.0), where I co-authored a peer-reviewed paper at RANLP 2025 on clinical NLP translation.

Actively seeking full-time roles in AI Engineering, ML Engineering, Data Science, or Data Engineering — anywhere I can ship systems with real impact.

Education

M.S. Computer Science

Cal State Dominguez Hills · 2024–Present

GPA: 4.0 / 4.0

Location

California, United States

Contact

shazanansar@gmail.com

+1 747-352-9855

Open to Roles

AI EngineerML EngineerData ScientistData EngineerMLOps EngineerApplied ScientistDevOps EngineerNLP ResearcherFuture PhD Candidate

Experience

Where I've Worked

California State University Dominguez Hills

CurrentResearch Assistant — Survey PaperJan 2026 – Present

CSUDH Research ↗

›Leading comprehensive survey on LLMs Impact on Neural Machine Translation— ongoing research
›Synthesizing 100+ papers across multilingual NLP, LLM-based NMT, and low-resource translation
›Analyzing transformer architectures, fine-tuning strategies, and evaluation frameworks
›Collaborating with faculty advisors on structured literature review methodology

PyTorchHuggingFacemBARTNLLB-200PythonLaTeX

California State University Dominguez Hills

PastResearch Assistant / AI & ML EngineerFeb 2025 – Aug 2025

RANLP 2025 Paper ↗

›Designed medical-domain translation pipeline using mBART and NLLB-200— 32% BLEU score improvement
›Curated and cleaned a domain-specific corpus— 70,000+ entries processed
›Built visualization dashboards for BLEU, METEOR, BERTScore, COMET evaluation
›Conducted error profiling and ethical assessment for low-resource NMT in clinical contexts
›Co-authored peer-reviewed paper accepted at RANLP 2025

PyTorchmBARTNLLB-200AWSPythonHuggingFace

Code for India Foundation

PastData Science InternMar 2023 – Nov 2023

›Developed and evaluated sentiment analysis models— 15% accuracy improvement
›Designed EDA workflows to uncover trends, anomalies, and data quality issues
›Built automated data preprocessing pipelines— 20% latency reduction
›Translated analytical insights into actionable recommendations for stakeholders

PythonSQLscikit-learnPandasTableau

Projects

What I've Built

40 projects

Data Engineering

Real-Time Data Lakehouse for Analytics & ML

Medallion-style lakehouse enabling near real-time processing and reducing data latency.

↑ 40% latency reduction

Apache SparkDelta LakeAWS S3AirflowPython

RAG & LLM

LLM-Powered Analytics Assistant

RAG pipeline for accurate analytics Q&A with automated evaluation and cost optimization.

↑ 35% latency reduction

LangChainOpenAIVector DBAWSPython

MLOps

End-to-End ML Experimentation Framework

Reproducible ML framework with data versioning, config management, and metric tracking.

MLflowPyTorchDockerAWSPython

RAG & LLM

RAG System for Domain Knowledge

Production-grade RAG pipeline combining fine-tuned LLMs with vector-based retrieval.

↑ 30% answer precision

PythonFAISSHuggingFaceFastAPI

NLP

Multilingual LLM Fine-Tuning & Evaluation Platform

Scalable fine-tuning pipelines for multilingual transformers with automated evaluation.

↑ 32% BLEU improvement

PyTorchmBARTNLLB-200HuggingFaceAWS

MLOps

LLM Monitoring, Reliability & Cost Optimization

Monitoring pipelines for latency, token usage, failure rates across LLM workloads.

↑ 45% faster detection

AWS CloudWatchPythonGrafanaDocker

RAG & LLM

Secure Enterprise LLM API & Guardrails

LLM inference API with auth, rate limiting, and policy-based output guardrails.

FastAPIPythonDockerAWS Lambda

MLOps

Automated Model Retraining & Drift Pipeline

MLOps pipeline detecting data drift and triggering controlled model retraining.

↑ 40% fewer degradations

AirflowPythonDockerAWSscikit-learn

NLP

End-to-End ML Pipeline for Multilingual NLP

Full pipeline from data ingestion through training, evaluation, and error analysis.

↑ 40% faster iterations

PyTorchPythonAWSHuggingFace

Real-Time Systems

Streaming Feature Engineering Platform

Real-time feature pipelines with versioning and validation for ML inference.

↑ 35% fewer drift incidents

KafkaApache SparkPythonAWS S3

MLOps

ML Monitoring, Cost & Reliability Framework

Dashboards for ML pipeline health, model performance, and compute cost.

↑ 20% cost savings

PythonAWS CloudWatchTableauGrafana

Data Engineering

Enterprise Feature Engineering & Analytics Platform

ETL/ELT pipelines transforming raw operational data into ML-ready feature tables.

↑ 25% cost reduction

Apache SparkPySparkAirflowAWS S3SQL

Cloud

Cloud-Native Cost & Observability Pipeline

Pipelines aggregating cloud usage metrics for centralized cost and reliability visibility.

↑ 50% faster MTTD

AWSSparkPythonCloudWatch

Data Engineering

Enterprise Medallion Data Lake

Bronze/Silver/Gold architecture on S3 for analytics and ML workloads at scale.

↑ 35% faster analytics

AWS S3Apache SparkDelta LakeAirflowSQL

Analytics

Experimentation Framework for NLP Model Selection

Statistically rigorous framework for comparing NLP model variants reproducibly.

↑ 35% fewer false promotions

PythonSQLscipyMLflow

Analytics

SHAP-Based Feature Impact Analysis

Feature impact analysis to quantify and validate ML feature contributions.

↑ 20% accuracy improvement

PythonSHAPscikit-learnPandas

MLOps

Data Quality, Bias & Reliability Platform

Pipelines monitoring quality, schema changes, and subgroup fairness across ML datasets.

↑ 40% faster detection

PythonGreat ExpectationsSQLAirflow

Analytics

KPI-Driven Decision Analytics for NLP

Business-aligned KPIs bridging offline ML metrics with real-world operational impact.

PythonTableauSQL

Cloud

AWS VPC Multi-Tier Network Architecture

Complete VPC with public/private subnets, NAT gateways, route tables, and security groups.

VPCSubnetsNAT GatewaySecurity GroupsRoute Tables

Cloud

EC2 Production Web Server Deployment

Launched and configured EC2 with Apache, SSH key pair, and Elastic IP.

EC2ApacheSSHElastic IPIAM

Cloud

S3 Static Site with CloudFront CDN

Static site deployed to S3 with CloudFront distribution, custom domain, and HTTPS via ACM.

S3CloudFrontACMRoute 53

Cloud

IAM Least Privilege Security Setup

IAM users, groups, roles, and policies following AWS least-privilege best practices.

IAMAWS CLIMFASCP

Cloud

Aurora MySQL Multi-AZ Database

Aurora MySQL with read replicas, multi-AZ failover, and automated snapshots.

Aurora MySQLRDSMulti-AZVPC

Cloud

DynamoDB NoSQL Schema Design & Queries

DynamoDB tables with partition/sort keys, GSIs, and optimized query patterns.

DynamoDBGSIPartiQLLambdaSDK

Cloud

AWS Lambda Serverless Functions

Event-driven Lambda functions with S3 triggers, API Gateway, and CloudWatch logging.

LambdaAPI GatewayS3CloudWatchPython

Cloud

Auto Scaling Groups & Load Balancer

ALB with target groups, health checks, and ASG policies for elastic compute scaling.

ALBAuto ScalingLaunch TemplatesEC2

Cloud

CloudFormation Infrastructure as Code

Complete AWS stack templated with CloudFormation including nested stacks.

CloudFormationYAMLNested StacksParameters

Cloud

CI/CD Pipeline with CodePipeline & CodeBuild

End-to-end CI/CD from GitHub to EC2/ECS using AWS native DevOps tools.

CodePipelineCodeBuildCodeDeployGitHub

Cloud

Containerized App on ECS Fargate

Dockerized web app deployed to ECS Fargate with ECR and service discovery.

ECS FargateECRDockerALB

Cloud

VPC Peering & Transit Gateway

Connected multiple VPCs across accounts with Transit Gateway for centralized routing.

VPC PeeringTransit GatewayCIDRRoute Tables

Cloud

SNS & SQS Event-Driven Architecture

Decoupled microservices using SNS fan-out and SQS queues with dead-letter queue handling.

SNSSQSDLQLambdaEventBridge

Cloud

CloudWatch Dashboards & Alarms

Custom dashboards, metric alarms, Log Insights queries, and SNS notifications.

CloudWatchMetricsLog InsightsSNS

Data Engineering

AWS Glue ETL Data Pipeline

Serverless ETL with Glue crawlers, Data Catalog, and Spark jobs writing to S3.

AWS GlueSparkS3AthenaData Catalog

Real-Time Systems

Kinesis Real-Time Data Ingestion

Real-time clickstream ingestion with Kinesis Streams, Firehose, and Lambda processing.

KinesisFirehoseLambdaS3Athena

Analytics

Athena + S3 Serverless Analytics

Queried partitioned S3 data lake with Athena and cost-optimized columnar formats.

AthenaS3ParquetGlue CatalogSQL

Analytics

QuickSight Business Intelligence Dashboard

Interactive QuickSight dashboards from RDS/S3 with SPICE for sub-second querying.

QuickSightSPICERDSS3

Cloud

AWS WAF & Shield Security Hardening

WAF rules, managed rule groups, Shield Standard, and IP reputation filtering.

WAFShieldCloudFrontALB

Cloud

Secrets Manager & Parameter Store Integration

Rotated RDS credentials and injected config via Parameter Store into Lambda.

Secrets ManagerParameter StoreLambdaIAM

MLOps

Step Functions Workflow Orchestration

Multi-step ML data prep workflow with error handling, retries, and parallel states.

Step FunctionsLambdaS3DynamoDB

MLOps

SageMaker Model Training & Endpoint

Trained and deployed an ML model using SageMaker with hyperparameter tuning.

SageMakerS3IAMBoto3Python

Skills

Technical Expertise

Programming & Data

PythonSQLJavaPySparkBashR

ML & AI

PyTorchTensorFlowscikit-learnNLPTransformer ModelsRAGLLM Fine-tuningEmbeddingsPrompt EngineeringBERTScoreCOMET

Data Engineering

Apache SparkKafkaAirflowDelta LakeMedallion ArchitectureETL / ELTFeature PipelinesStreaming Systems

Cloud & MLOps

AWS (EC2, S3, Lambda, EMR, Glue)GCP (Vertex AI, BigQuery)DockerCI/CDModel ServingDrift DetectionModel Monitoring

Analytics & Viz

PandasNumPyTableauA/B TestingExperiment TrackingDashboard DesignMetric Design

Proficiency

Python / PySpark

95%

AWS Cloud

90%

PyTorch / ML

88%

Data Engineering

85%

NLP / LLMs

87%

SQL / Analytics

92%

MLOps / DevOps

80%

Docker / CI-CD

78%

Research

Peer-Reviewed Paper

RANLP 2025—Recent Advances in Natural Language Processing

2025

Advancing Clinical Translation in Nepali through Fine-Tuned Multilingual Models

Benyamin Ahmadnia, Sumaiya Shaikh, Bibek Poudel, Shazan Ansar, Sahar Hooshmand

Department of Computer Science, California State University, Dominguez Hills, Carson, USA

Low-resource Neural Machine Translation (NMT) remains a major challenge in high-stakes domains such as healthcare. This paper presents a domain-adapted pipeline for English-Nepali medical translation leveraging mBART and NLLB-200. Translation fidelity is assessed through BLEU, CHRF++, METEOR, BERTScore, COMET, and perplexity. NLLB-200 consistently outperforms mBART, achieving higher accuracy and lower hallucination rates in clinical settings.

Key Contributions

1Nepali-English parallel corpus tailored to the medical domain from diverse domain-specific sources

2Fine-tuned mBART and NLLB-200 with a unified framework across lexical and semantic metrics

3Error analysis, hallucination detection, and ethical assessment for domain-specific term accuracy

↑ NLLB-200 outperforms mBART — lower hallucination rates in clinical settings

Clinical NLPLow-Resource NMTmBARTNLLB-200English-NepaliHealthcare AI

First Page Preview

Proceedings of RANLP 2025, Varna, Bulgaria

Advancing Clinical Translation in Nepali through Fine-Tuned Multilingual Models

Benyamin Ahmadnia, Sumaiya Shaikh, Bibek Poudel, Shazan Ansar, Sahar Hooshmand

Dept. of Computer Science, CSUDH, Carson, USA

Abstract

Low-resource Neural Machine Translation (NMT) remains a major challenge in high-stakes domains such as healthcare. This paper presents a domain-adapted pipeline for English-Nepali medical translation leveraging mBART and NLLB-200. NLLB-200 consistently outperforms mBART across BLEU, CHRF++, METEOR, BERTScore, COMET, and perplexity...

1 Introduction

NMT has brought significant advancements, offering more fluent and accurate translations. Nepali, a low-resource language, presents unique challenges particularly in specialized domains such as healthcare...

Preview only

Request Full Paper →

Certifications

Verified Credentials

AWS

AWS Certified Solutions Architect – Associate

Amazon Web Services · 2024

✓

AWS

AWS Certified Cloud Practitioner

Amazon Web Services · 2024

✓

Google Data Analytics Certificate

Google · 2023

✓

Contact

Let's Connect

Open to full-time roles, research collaborations, and conversations about AI, ML, and data engineering. Based in California, available remote or on-site.

Shazan Ansar

California, US

Thesis Research Survey

Conducting a survey for my thesis on NLP, AI, curated agents, and AI in finance. Share your expert knowledge and insights. Your input shapes real research.

Share Your Knowledge →

shazan-ansar

GitHub

SHAZAN01

shazanansar@gmail.com

Shazan Ansar Mohammed

AI · ML · Data Engineer · California, US

Open to Work — Full Time