Complete transparency on data sources, algorithms, and statistical methods used throughout SmileAccess AI
SmileAccess AI uses a combination of real-world data from government databases, statistical modeling, and machine learning algorithms to provide accurate dental access information. This page documents our complete methodology so users, researchers, and policymakers can understand exactly how we arrive at our numbers.
Source: Centers for Medicare & Medicaid Services (CMS)
URL: https://npiregistry.cms.hhs.gov/
What we use: Provider names, practice names, addresses, phone numbers, specializations, taxonomy codes
Update frequency: Real-time API access
Accuracy: 99%+ (official government database maintained by CMS)
Source: Centers for Disease Control and Prevention (CDC)
URL: https://www.cdc.gov/fluoridation/statistics/
What we use: State-level fluoride concentrations (mg/L), population coverage percentages
Data year: 2022 (most recent available)
Accuracy: 95%+ (direct measurements from water utilities)
Source: American Community Survey (ACS)
What we use: Population density, median household income, insurance coverage rates
Data year: 2021 5-Year Estimates
Accuracy: 90%+ (sample-based survey with margin of error)
Source: OpenStreetMap Foundation
What we use: Geocoding (converting addresses to coordinates), distance calculations
Accuracy: 95%+ for U.S. addresses
We calculate a match score (0-100%) for each dentist based on multiple weighted factors:
We use Monte Carlo simulation to estimate wait times for new patient appointments:
Note: Monte Carlo simulations provide probabilistic estimates, not guarantees. Actual wait times may vary based on factors we cannot model (e.g., provider vacation schedules, sudden demand spikes).
Our Coverage Map uses Bayesian inference to calculate dental access risk:
Risk Categories:
We use the Haversine formula to calculate great-circle distances between two points on Earth:
This provides accurate "as-the-crow-flies" distances. Actual driving distances may be 10-20% longer.
All probabilistic estimates (wait times, risk scores) include 90% confidence intervals calculated using percentile bootstrap methods from Monte Carlo simulations. This means we're 90% confident the true value falls within the stated range.
We validate all data inputs through multiple checks:
We acknowledge the following limitations:
Released: January 2025