Skylar Ahn | AI Dataset Quality Portfolio

Focus Areas

Dataset QA
Annotation Validation
Review Queue
Pre-labeling

BBox checks · Distribution analysis · Robotics data curation

Featured Projects

Dataset QA systems, not random spot checks.

MVP 01

Dataset Quality Management Toolkit

A COCO-format dataset QA toolkit for validating annotations, analyzing dataset distribution, and prioritizing suspicious samples for human review.

Human-in-the-Loop Pre-labeling Pipeline

A pre-labeling workflow using Grounding DINO, SAM, and CLIP to generate initial labels, filter uncertain predictions, and route high-risk samples to human reviewers.

Review queue preview

A compact issue tracker view for deciding which samples need human attention first.

Quality Signals

Dataset health panel

Live Model

schema PASS References and required fields are valid.

bbox REVIEW Possible out-of-bounds or abnormal aspect ratio.

boundary AMBIGUOUS Task transition point needs evidence check.

leakage CHECK Near-duplicate sample may cross splits.

Expand signal detail

Field Experience

Robotics Dataset QA Field Experience

Robotics startup · Part-time field experience

Worked on robot-arm teleoperation, task data generation, annotation, and QA for learning-oriented manipulation datasets.

Generated robot manipulation task data through teleoperation
Annotated subtask boundaries in manipulation trajectories
Reviewed task data for ambiguous transition points
Introduced multimodal cross-checking with camera and 3D views
Used joint-state graphs as supporting evidence
Grouped recurring edge cases with a Frequent Issue Number system

Core Work

Generated robot manipulation task data through teleoperation.
Annotated subtask boundaries in manipulation trajectories.
Reviewed task data for labeling quality and ambiguous transition points.

Self-initiated Quality Work

View quality work

How This Connects

Annotation quality depends on evidence, context, and shared criteria, not just individual carefulness.
The goal is to move from high-quality annotation review to systems that make label quality consistent, traceable, and scalable.

Skills & Tools

Built around practical dataset work.

Used / Building With

Python
Pandas
NumPy
JSON
pathlib
Git/GitHub
Grounding DINO
SAM

Exploring / Planned Integration

CVAT
Label Studio
Docker
DVC or Git LFS

Learning map

AI data quality · computer vision datasets · data engineering basics

View learning map →

Technical Notes

Short notes from learning and project-building.

Draft

Understanding COCO Annotation JSON

Coming Soon

Why Bounding Box Validation Matters

Coming Soon

Why Auto-labeling Still Needs Human Review

About

B.S. in Mathematics.
Dataset QA focused.

I'm Skylar Ahn, a mathematics graduate focused on AI training dataset quality. My work connects robotics labeling experience with annotation validation, review queues, and human-in-the-loop QA workflows.

I have experience in teaching, data labeling, and robotics teleoperation / annotation QA.

Contact / Links

Resume, GitHub, and contact.

Email skylarjya11@gmail.com GitHub github.com/Skylar-Ahn Resume /resume.pdf

AI Dataset Quality & Human-in-the-Loop QA