Applied mathematics · research systems · evidence design
Author / ResearcherLi Yunyi
Build what data can become.
I work across statistical software, predictive modeling, and environmental communication. The common thread is practical: make data work traceable, make models easier to inspect, and make research findings clear enough to use.
01 systems engineering02 predictive intelligence03 environmental insight
SYSTEMS
01 / RESEARCH SYSTEMS
Models that move.
These projects focus on the full path from input to explanation: data access, parameter settings, computation, visual output, and audit trails. The aim is not simply to display a result, but to preserve the steps that produced it. Click “Research details” to enter the expanded scientific pages.
A full-stack platform for data-source integration, parameterized statistical models, task scheduling, and browser-based rendering. Each model configuration can be saved, compared, run as a task, and mapped to a visual rule, keeping analysis and presentation in the same workflow.
SYSTEM ONLINE
Data Source IntegrationMySQL · Redis · files · schema preview
Parameter Workbenchvariables · model settings · version control
STATISTICAL ENGINE → task queue → state transitions
Rendering Template Studiorules · geometry · color gradients · trajectories
Designed for industrial inspection, spectral analysis, and vibration-signal workflows. It accepts CSV, MAT, XLSX, and Parquet files; uses a directed acyclic graph to organize feature steps; and combines dimensionality reduction, clustering, scatter plots, heatmaps, and report generation in one reproducible workspace.
DAG pipelinePCA / UMAPScatter / HeatmapCSV · MAT · XLSX · Parquet
A spatial analysis system for event monitoring, geometry checks, layer management, data import, and scenario analysis. It keeps event states, coordinate validation, permissions, and audit records visible so that spatial decisions can be reviewed later.
Event state machineGeometry validationLayer compositionAudit logging
5business centers plus a situation-overview workspace
RESTful APIs move validated configuration and model-state data between the interface, services, and storage. Each request returns a clear status and trace identifier, which makes failures easier to diagnose and key actions easier to follow.
Real-time behavior
Long-running jobs report progress back to the interface. Queues, task states, failures, and completed render payloads remain visible while work continues in the background.
Research utility
Parameter settings are stored as part of the research record. Users can compare versions, reproduce a configuration, and map model outputs to geometry, color, trajectories, or chart states without losing the link to the original assumption.
PREDICT
02 / KAGGLE HOUSE PRICES
Prediction, under a microscope.
The Kaggle House Prices work is presented as a small, repeatable modeling study. It records how raw fields become features, how validation tests assumptions, how model families are compared, and how residuals inform the next revision.
Read Research Article ↗Expanded paper: audit, modelling, validation, and residual diagnostics.
End-to-end modeling protocol
feature → fit → diagnose
01
Audit the raw table
Separate numeric and categorical fields, inspect missingness, flag high-leverage observations, and record the preprocessing rules. Before fitting a model, make sure the table itself is coherent.
02
Engineer the feature space
Encode categorical fields, derive a few composite indicators, and inspect skewed variables. Keep every feature traceable to an original column so the model still has an interpretable story.
03
Transform the target
Inspect the target distribution and compare raw and transformed target scales when useful. The aim is a steadier learning signal and residuals that are easier to read—not transformation for its own sake.
04
Compare model families
Move from a transparent linear baseline to regularized and gradient-boosting models. The baseline exposes direct structure; regularization checks feature discipline; boosting tests nonlinear interactions.
05
Validate, tune, and read residuals
Use cross-validation, bounded parameter search, and error slices rather than optimizing a single training split. Residuals point to missing representation: sparse neighborhoods, extreme homes, or interactions that need a better feature.
Analysis principle: every score is treated as evidence of a modeling decision, not as the final story.
House-price inference lab
A compact visual record of a tabular-regression workflow: inspect, encode, compare, validate, and review residuals.
Method view · no score claim
Illustrative normalized observationsleast-squares fit
Model comparison lens
Linear baseline — interpretable anchor
Start with a transparent regression model to surface direct relationships, expose data issues, and anchor later comparisons.
Method note: this panel compares modeling roles, not measured leaderboard scores.
Overall qualitystructural rating
Living areasize signal
Year builttemporal context
Garage capacityutility proxy
Neighborhoodlocation encoding
Basement arealatent capacity
Research note: this panel visualizes the analytical method and decision logic rather than claiming an unpublished score or ranking.
Q?
Decision trail: begin with a reliable baseline, then ask whether a new feature, a transformed target, or a more flexible learner improves validation behavior for a reason that can be explained. A better score matters only when its error pattern is understood.
EARTH
03 / ENVIRONMENTAL RESEARCH
Nature, made legible.
Environmental work connects descriptive statistics, time-series thinking, uncertainty, and public communication. The goal is to interpret a changing system carefully and then present the evidence in a form that people can understand and discuss.
Read Research Article ↗Expanded paper: observation, uncertainty, interpretation, and public communication.
Environmental data as a living system
The work begins with monitoring signals, trend structure, and uncertainty. It then turns those findings into plain visual explanations and interactive exhibits, so audiences can engage with evidence instead of receiving it as a one-way message.
01
Distribution lens
Describe variability, concentration, and outlier structure before making causal claims.
02
Temporal lens
Trace trend, seasonality, anomaly, and persistence across environmental time series.
03
Public lens
Use visual explanation and interaction design to turn evidence into attention and action.
74%of 100 valid respondents attended once a year or less
50%of 100 valid respondents reported satisfaction
83%of 100 valid respondents would participate again
Illustrative signal fieldwater / motion / pattern
Interactive environmental science communication
An art-based public science initiative used discarded garments, plastic bottles, and interactive exhibits to make water-resource protection tangible. Water footprint, pollution, and circular-use themes became exhibits; audience feedback then tested whether visual artistry, novelty, and interaction improved attention to the science.
94.67%satisfaction
75 returned surveys: visitors most valued visual artistry, interaction, and novelty. The next design priorities were exhibit quantity, spatial layout, and thematic depth.
50offline visitors
104online viewers
75survey responses
EVIDENCE
Research profile
One quantitative language. Multiple frontiers.
Across these projects, the working method stays consistent: define the system, test it with data, and present the evidence clearly. Whether the unit is a software module, a validation fold, a geometric layer, or an exhibition visitor, the reasoning should remain visible.
01
Software systems
Three documented applications spanning data integration, statistical modeling, feature extraction, visual analytics, geometric expression, access control, and auditability. Together they show an interest in research infrastructure as well as research outputs.
engineering
02
Predictive analytics
A Kaggle House Prices workflow centered on feature engineering, regression, model comparison, validation, and residual interpretation. It is presented as a reusable method rather than a leaderboard claim.
machine learning
03
Environmental research
Data-informed communication research combining survey signals, interactive art, water-resource science, material reuse, and measurable audience feedback. It expands quantitative practice toward questions of attention, understanding, and action.