Can I use multiple wearables together for better data?

Yes. Many people wear multiple devices to capture different metrics from each device's strengths—for example, one device for sleep and HRV, another for activity and workouts. The challenge is correlating data across platforms, which typically requires a third-party integration platform or manual comparison.

Why do different studies show different wearable accuracy rankings?

Study funding, sample size, population demographics, device firmware version, number of nights tested, and PSG scoring protocols all affect results. Industry-funded studies and independent studies sometimes produce different rankings for the same devices.

What's the Most Accurate Wearable Data? A 2024-2025 Study Breakdown by Device

Q: Does skin tone affect wearable accuracy?

Yes. PPG (photoplethysmography) sensor accuracy is affected by skin pigmentation. Most validation studies have predominantly Caucasian participants, creating a known research gap. Accuracy data may not generalize equally across all skin tones.

Q: Is WHOOP accurate for HRV tracking?

WHOOP 4.0 showed a CCC of 0.94 and MAPE of 8.17% in the independent Dial et al. (2025) study—rated Moderate on the concordance scale.

Ryan - Kygo Health
Jan 27
10 min read

Updated: Mar 22

Last Updated: March 22, 2026

A smartwatch with a smiling face is surrounded by icons: pink heart rate, blue sleep cloud, bar chart, and metrics ring. Mood is cheerful. Summarizing the different data health wearables provide. Kygo Health App connects multiple wearable devices to utilize the best metrics from each.

The most accurate wearable depends on what you’re tracking. We analyzed peer-reviewed studies from 2024–2025 comparing Oura Ring, Apple Watch, WHOOP, Garmin, Fitbit, and others against gold-standard medical measurements across sleep staging, HRV, heart rate, SpO2, step counting, VO2 max, and more. Below is everything we found—organized by metric, with study funding flagged so you can evaluate the data for yourself.

We also built a free interactive comparison tool based on this research that lets you pick your devices and the metrics you care about to see them side by side: http://kygo.app/tools/wearable-accuracy

Most Accurate Wearable: Master Summary by Metric

This table compiles findings across all peer-reviewed studies analyzed. Each metric section below includes the full data, study details, and funding disclosures.

Biometric	🥇 Winner	🥈 Second	🥉 Third	Worst
Sleep Staging (Oura-funded)	Oura (κ=0.65)	Apple Watch (κ=0.60)	Fitbit (κ=0.55)	—
Sleep Staging (Independent)	Apple Watch (κ=0.53)	Fitbit Sense (κ=0.42)	Fitbit Charge 5 (κ=0.41)	Garmin (κ=0.21)
Deep Sleep Detection (Independent)	WHOOP (69.6%)	Apple Watch (50.7%)	Fitbit Sense (48.3%)	Withings (29.8%)
REM Detection (Independent)	Apple Watch (68.6%)	WHOOP (62.0%)	Fitbit Sense (55.5%)	Garmin (28.7%)
Wake Detection (Independent)	Apple Watch (52.2%)	Fitbit Charge 5 (42.7%)	Fitbit Sense (39.2%)	Garmin (27.6%)
Nocturnal HRV	Oura Gen 4 (MAPE 5.96%)	WHOOP (8.17%)	Garmin (10.52%)	Polar (16.32%)
Resting Heart Rate	Oura Gen 4 (CCC 0.98)	Oura Gen 3 (0.97)	WHOOP (0.91)	Polar (0.86)
Active Heart Rate	Apple Watch (86.3%)	Fitbit (73.6%)	Garmin (67.7%)	—
HR Correlation vs ECG	Polar Chest Strap (r=0.99)	Apple Watch (r=0.80)	Garmin (r=0.52)	—
SpO2	Apple Watch (MAE 2.2%)	Garmin Fenix (~4.5%)	Withings (~4.8%)	Garmin Venu (5.8%)
Step Count	Garmin (82.6%)	Apple Watch (81.1%)	Fitbit (77.3%)	Oura (poor)
Calories/Energy	Apple Watch (71%)	Fitbit (65.6%)	—	Garmin (48%)
VO2 Max	Garmin Fenix 6 (7.05%)	Apple Watch (13–16%)	—	—
Skin Temperature	Oura (r²>0.99 lab)	—	—	—

Sleep Staging Accuracy (4-Stage Classification)

Sleep staging is the most studied—and most contested—metric in wearable accuracy research. Three major studies from 2023–2025 produced different rankings, and study funding is a factor worth noting.

Brigham and Women’s Hospital Study (2024) — Oura-Funded

Robbins et al. compared Oura Ring Gen 3, Fitbit Sense 2, and Apple Watch Series 8 against polysomnography (PSG) across 36 participants over multiple nights.

Device	Overall (κ)	Deep Sleep Sensitivity	Deep Sleep Bias
Oura Ring Gen 3	0.65 (Substantial)	79.5%	No significant bias
Apple Watch Series 8	0.60 (Moderate)	50.5%	-43 min (underestimates)
Fitbit Sense 2	0.55 (Moderate)	61.7%	-15 min (underestimates)

⚠️ Funding: This study was funded by Oura Ring Inc. Lead author Dr. Rebecca Robbins is an Oura scientific advisor.

University of Antwerp Study (2025) — Independent

Schyvens et al. tested six devices against PSG in 62 adults. Funded by VLAIO (Flanders Innovation & Entrepreneurship)—no device manufacturer funding. Oura was not included in this study.

Device	κ	TST Bias	Deep Sleep	REM	Wake	Light Sleep	Notes
Apple Watch 8	0.53	+19.6 min	50.7%	68.6%	52.2%	84.5%	Best κ
Fitbit Sense	0.42	+6.3 min	48.3%	55.5%	39.2%	76.2%	Lowest bias
Fitbit Charge 5	0.41	+11.1 min	43.3%	47.5%	42.7%	73.8%
WHOOP 4.0	0.37	+24.5 min	69.6%	62.0%	32.5%	60.9%	Best deep
Withings Scanwatch	0.22	+39.9 min	29.8%	36.5%	29.4%	73.5%
Garmin Vivosmart 4	0.21	+38.4 min	32.1%	28.7%	27.6%	72.2%	Oldest HW

Note: All six devices misclassified wake, deep sleep, and REM as light sleep—a conservative algorithmic approach shared across all consumer wearables. All devices significantly underestimated Wake After Sleep Onset by 12–48 minutes.

Korean Multicenter Study (2023) — Independent

Park et al. tested 11 devices in 75 participants across 2 centers (349,114 epochs analyzed). No industry funding disclosed.

Device	Cohen’s Kappa (κ)
Google Pixel Watch	0.4–0.6 (Moderate)
Galaxy Watch 5	0.4–0.6 (Moderate)
Fitbit Sense 2	0.4–0.6 (Moderate)
Apple Watch 8	0.2–0.4 (Fair)
Oura Ring 3	0.2–0.4 (Fair)

Note: This study produced different rankings than the Brigham study. Oura scored lower here. Different study populations, methodologies, and PSG protocols can affect results.

Deep Sleep Detection Sensitivity

Deep sleep sensitivity data comes from two studies with different device lineups:

From Robbins et al. (2024) — Oura-funded:
- Oura Ring Gen 3: 79.5%, Fitbit Sense 2: 61.7%, Apple Watch Series 8: 50.5%.
From Schyvens et al. (2025) — Independent:
- WHOOP 4.0: 69.6%, Apple Watch: 50.7%, Fitbit Sense: 48.3%, Fitbit Charge 5: 43.3%, Garmin Vivosmart 4: 32.1%, Withings: 29.8%. Oura was not tested in this study.

Nocturnal HRV (Heart Rate Variability) Accuracy

An Ohio State University / Air Force Research Lab study (Dial et al., 2025) validated nocturnal HRV across 13 participants and 536 nights using a Polar H10 ECG chest strap as reference. No industry funding disclosed.

Device	CCC	MAPE	Rating
Oura Gen 4	0.99	5.96% ± 5.12%	Nearly Perfect
Oura Gen 3	0.97	7.15% ± 5.48%	Substantial
WHOOP 4.0	0.94	8.17% ± 10.49%	Moderate
Garmin Fenix 6	0.87	10.52% ± 8.63%	Poor
Polar Grit X Pro	0.82	16.32% ± 24.39%	Poor

CCC Scale: >0.99 = Nearly Perfect, 0.95–0.99 = Substantial, 0.90–0.95 = Moderate, <0.90 = Poor

Note: Garmin Fenix 6 is 2+ generations behind current hardware. The study authors acknowledged this limitation—current Garmin devices may perform differently. Sample size was 13 participants, though 536 total nights of data were collected.

Resting Heart Rate Accuracy

From the same Ohio State study (Dial et al., 2025):

Device	CCC	MAPE	Rating
Oura Gen 4	0.98	1.94% ± 2.51%	Nearly Perfect
Oura Gen 3	0.97	1.67% ± 1.54%	Substantial
WHOOP 4.0	0.91	3.00% ± 2.15%	Moderate
Polar Grit X Pro	0.86	2.71% ± 2.75%	Poor

Note: Garmin Fenix 6 was excluded from RHR analysis due to timestamp reporting issues that prevented alignment with the Polar H10 reference data.

Active Heart Rate Accuracy

Active heart rate data comes from the WellnessPulse Meta-Analysis (2025) and aggregate PubMed Central studies:

Device	Accuracy	Correlation vs ECG
Polar Chest Strap	—	r = 0.99
Apple Watch	86.31%	r = 0.80
Fitbit	73.56%	—
Garmin	67.73%	r = 0.52
TomTom	67.63%	—

Blood Oxygen (SpO2) Accuracy

Garmin Venu 2s underestimated SpO2 in 67.4% of readings. None of these SpO2 features are FDA-cleared for medical use—they are classified as wellness features.

Device	MAE	MDE	Within Range	Missing Data
Apple Watch Series 7	2.2%	-0.4%	58.3%	11%
Garmin Fenix 6 Pro	~4.5%	—	~44%	28%
Withings ScanWatch	~4.8%	—	~38%	31%
Garmin Venu 2s	5.8%	5.5%	18.5%	14%

Sources: PLOS, Nature, various validation studies.

Step Count Accuracy

Device	Accuracy	MAPE (where available)
Garmin	82.58%	Vivoactive 4: <2%
Apple Watch	81.07%	—
Fitbit	77.29%	Sense: ~8%
Jawbone	57.91%	—
Polar	53.21%	—
Oura Ring	Poor (50.3% error real-world, 4.8% controlled)	—

Source: WellnessPulse Meta-Analysis (2025)

Energy Expenditure (Calories) Accuracy

All wearables are weak at calorie estimation. Accuracy decreases during high-intensity or multi-modal exercise.

Device	Accuracy
Oura Ring	~87% (13% avg error)
Apple Watch	71.02%
Fitbit	65.57%
Polar	~50–65%
Garmin	48.05%

Source: WellnessPulse Meta-Analysis (2025).

Note: None should be treated as precise calorie counters.

VO2 Max Estimation Accuracy

All devices tend to underestimate VO2 max in highly fit individuals and overestimate in sedentary/lower fitness populations.

Device	MAPE	MAE	Notes
Garmin Forerunner 245	5.7%	—	Acceptable for runners
Garmin Fenix 6	7.05%	—	CCC=0.73 for 30s avg
Apple Watch Series 7	15.79%	6.07 ml/kg/min	Underestimates
Apple Watch (2025 study)	13.31%	6.92 ml/kg/min	Mixed bias

Sources: Caserman et al. (2024), Lambe et al. (2025), Garmin validation (2025).

Skin Temperature Accuracy

Oura’s internal validation study (2024) tested temperature sensing across 16 individuals over 1 week (93,571 data points):

r² > 0.99 in lab conditions, r² > 0.92 in real-world conditions, with precision of ±0.13°C per minute.

⚠️ Funding: This is Oura’s own study, not independently peer-reviewed. However, Oura’s temperature data has been validated in independent menstrual cycle tracking studies (Maijala et al., 2019). Apple Watch, Garmin, WHOOP, and Samsung all track skin temperature, but limited independent comparative data exists.

FDA-Cleared Features

Most wearable metrics are wellness estimates. A few features have earned FDA authorization:

Feature	Device	Status
ECG / Atrial Fibrillation Detection	Apple Watch (Series 4+)	FDA Cleared
ECG / Atrial Fibrillation Detection	Samsung Galaxy Watch (4+)	FDA Cleared
Sleep Apnea Notification	Apple Watch (Series 9+, Ultra 2)	FDA Authorized
Sleep Apnea Detection	Samsung Galaxy Watch	FDA Authorized (Feb 2024)
Blood Oxygen (SpO2)	Apple Watch	Wellness feature (NOT FDA cleared)
Irregular Rhythm Notification	Fitbit	FDA Cleared

Important Caveats

Before drawing conclusions from any of this data, keep these limitations in mind:

No single device wins everywhere. The best device depends on which metric matters most to you.
Study funding matters. The primary sleep study (Robbins et al.) was Oura-funded. Independent studies (Park, Schyvens) found different rankings. We’ve flagged funding sources throughout so you can decide for yourself.
Device generations matter. Studies often test older hardware. Garmin Fenix 6 and Vivosmart 4 are 2+ generations behind current devices. Results may not apply to current models.
Small sample sizes. The HRV/RHR study had 13 participants (536 nights). Antwerp had 62 participants, 1 night each. Brigham had 36 participants over multiple nights.
All wearables are estimates. None are medical devices (except specific FDA-cleared features listed above). Data should inform, not diagnose.
Individual variation. Accuracy can vary based on skin tone, tattoos, BMI, wrist fit, and activity level.
Skin tone bias. PPG sensor accuracy is affected by skin pigmentation. Most validation studies have predominantly Caucasian participants—a critical research gap.
PSG is imperfect too. The “gold standard” polysomnography has interrater reliability of κ≈0.75, meaning even human experts disagree ~25% of the time on sleep staging.
Common device failure mode. All consumer devices tend to misclassify wake, deep sleep, and REM as light sleep—a conservative algorithmic approach that inflates light sleep totals.

Why Accuracy Matters for Understanding Food-Biometric Patterns

If you’re trying to understand how nutrition affects your sleep, recovery, or energy levels, the accuracy of your wearable data is the foundation everything else builds on. When measurement error is high, real patterns between what you eat and how your body responds get harder to detect. When accuracy is high, the data can surface connections—like how meal timing affects your overnight heart rate, or whether a supplement is actually changing your HRV—that you’d never spot manually.

This is part of the reason we built Kygo Health to integrate with multiple wearable platforms. Different devices bring different strengths. Connecting them to nutrition data in one place gives you a more complete picture to work with.

Using Multiple Wearables Together

Many people in the biohacking and quantified self communities wear multiple devices simultaneously to capture different metrics from different strengths—Oura Ring for sleep plus Apple Watch for workouts, or WHOOP plus Garmin for different contexts.

The challenge is getting that data to talk to each other. We wrote a detailed guide on this: How to Centralize Health Data from Multiple Devices.

If you’re specifically using Oura for sleep and want to connect that with food tracking, check out: How to Combine Oura Ring with Food Tracking.

Want to compare devices yourself? Explore all the data from these studies in our free

Wearable Accuracy Comparison Tool.

Ready to see how your nutrition connects to the biometric data your wearable tracks? Join our app Kygo Health -iOS or Android and start exploring the patterns in your own data.

Sources

Robbins R, et al. (2024). “Accuracy of Three Commercial Wearable Devices for Sleep Tracking in Healthy Adults.” Sensors, 24(20), 6532. DOI: 10.3390/s24206532 — Funded by Oura Ring Inc.
Schyvens AM, et al. (2025). “Performance of six consumer sleep trackers in comparison with polysomnography in healthy adults.” Sleep Advances, 6(1), zpaf016. DOI: 10.1093/sleepadvances/zpaf016 — Independent (VLAIO-funded)
Dial MB, et al. (2025). “Validation of nocturnal resting heart rate and heart rate variability in consumer wearables.” Physiological Reports, 13(16), e70527. DOI: 10.14814/phy2.70527 — Independent
Park et al. (2023). “Accuracy of 11 Wearable, Nearable, and Airable Consumer Sleep Trackers.” JMIR mHealth and uHealth, 11, e50983. DOI: 10.2196/50983 — Independent
WellnessPulse Meta-Analysis (2025). Accuracy of Fitness Trackers — Aggregate data
Caserman P, et al. (2024). “Validity of Apple Watch Series 7 VO2 Max Estimation.” JMIR Biomedical Engineering, 9, e54023.
Lambe RF, et al. (2025). “Validation of Apple Watch VO2 max estimates.” PLOS One, 20(2), e0318498.
Christakis et al. (2025). “A guide to consumer-grade wearables in cardiovascular clinical care.” npj Cardiovascular Health, 2, 82.
Khodr R, et al. (2024). “Accuracy, Utility and Applicability of the WHOOP Wearable Monitoring Device.” medRxiv. DOI: 10.1101/2024.01.04.24300784
Oura Internal Validation (2024). Temperature sensor validation study. 16 participants, 93,571 data points.
Maijala et al. (2019). “Nocturnal finger skin temperature in menstrual cycle tracking.” BMC Women’s Health, 19, 150.
Lanfranchi et al. (2024). Samsung Galaxy Watch SpO2 validation. Journal of Clinical Sleep Medicine.

FAQ: Wearable Accuracy Questions

Which wearable is the most accurate for sleep tracking?

It depends on the study. In the Oura-funded Brigham study (2024), Oura led with κ=0.65 and 79.5% deep sleep sensitivity. In the independent Antwerp study (2025), Apple Watch led overall (κ=0.53) while WHOOP led deep sleep detection (69.6%). The independent Korean study (2023) ranked Google Pixel Watch and Galaxy Watch highest. Study funding, population, and methodology all affect results.

How accurate is Oura Ring HRV compared to medical devices?

Oura Gen 4 achieved a 0.99 concordance correlation coefficient with Polar H10 ECG in an independent 536-night study (Dial et al., 2025). This is the highest HRV accuracy among consumer wearables tested in that study.

Is WHOOP accurate for HRV tracking?

WHOOP 4.0 showed a CCC of 0.94 and MAPE of 8.17% in the Dial et al. (2025) study—rated “Moderate” on the concordance scale.

Does skin tone affect wearable accuracy?

PPG sensor accuracy is affected by skin pigmentation. Most validation studies have predominantly Caucasian participants, which is a known research gap. Accuracy data may not generalize equally across all skin tones.

Can I use multiple wearables together?

Yes. Many people wear multiple devices to capture different metrics from each device’s strengths. The challenge is correlating data across platforms, which typically requires a third-party platform or manual comparison.

Which wearable is best for tracking how food affects sleep?

For nutrition-sleep correlation analysis, you need accurate sleep and HRV data paired with consistent food logging. The studies above show which devices perform strongest for each metric—the best choice depends on which specific metrics you prioritize.

Are wearable calorie estimates reliable?

No wearable tracks calories with high precision. The highest reported accuracy is Apple Watch at 71%. All devices should be treated as rough estimates. Accuracy decreases further during high-intensity or multi-modal exercise.

Why do different studies show different accuracy rankings?

Study funding, sample size, population demographics, device firmware version, number of nights tested, and PSG scoring protocols all affect results. This is why we include multiple studies and flag funding sources throughout this article.

Disclaimer: Kygo Health LLC is a personal data aggregation and insights platform designed for informational purposes only. The information provided does not constitute medical advice, diagnosis, or treatment. Always consult a licensed healthcare provider with any questions regarding medical conditions.

Have questions about wearable accuracy or data you think should be included? Reach out directly at Ryan@kygo.app. If you have sources or credible data that isn’t listed here, share it and we’ll review and update accordingly.