1784
Journal of Clinical Sleep Medicine, Vol. 14, No. 10 October 15, 2018
YK Choi, G Demiris, SY Lin, et al. Review: Smartphone Applications to Support Sleep Self-Management
their quality. To date no studies have systematically assessed
the quality of commercially available apps to support sleep
self-management. To address this gap in the literature, we con-
ducted a thorough review of commercially available mHealth
apps focused specically on sleep self-management. Our ob-
jectives were to (1) identify the current landscape of commer-
cially available sleep self-management related mHealth apps;
(2) describe their characteristics; (3) identify the extent to
which available apps have been rigorously tested; (4) rate the
quality of the apps based on existing rating scales; and (5) pro-
vide recommendations for the design and implementation for
future apps to support sleep self-management.
METHODS
Systematic Search and Selection Criteria
In April 2017, we conducted a thorough review of mHealth apps
across four leading web-based mobile app stores: Apple iTunes
App Store, Android Google Play store, Amazon Appstore, and
Microsoft Appstore. The following search terms were used in
each app store: “sleep,” “sleep management,” “sleep monitor-
ing,” and “sleep tracking.” In the rst round of screening, the
duplicate apps identied from multiple search terms in each
app store were excluded. In the second round, two members of
the research team (YC, SL) conducted a preliminary screening
based on app titles, full market descriptions, and screenshots of
the potential apps to evaluate relevance. Inclusion criteria for
the apps were as follows: (1) focus on sleep self-management
based on user generated data (eg, monitoring or tracking us-
ers’ sleep patterns, providing guidance to improve sleep based
on user generated data); (2) must be able to be used without
the assistance of a healthcare provider; (3) must be currently
available on the public market; and (4) must be in English.
Because our focus was to assess apps that use user-generated
data, apps that only provided sleep education, tips, or relax-
ation techniques without the use of user-generated data were
not part of this review. More specic exclusion criteria can be
found in Figure 1.
Any discrepancies in ratings of inclusion and exclusion cri-
teria between the team members were discussed until consen-
sus was reached.
The remaining apps were downloaded and reviewed (using
the following platforms: iOS on iPhone 6; Android on Nexus
5x; Amazon Fire Tablet; Windows Phone 8.1 on Lumia 435).
Approximately 25% of the apps were independently evaluated
and rated by both reviewers and the interrater reliability was
high. For the remainder, apps on the Apple iTunes appstore
were reviewed by one member (SL) and apps on the other plat-
forms were reviewed by another (YC).
In reviewing the apps, 75 additional apps that did not meet the
inclusion criteria upon closer examination were excluded. The
preliminary list of apps to be included was reviewed for dupli-
cated apps identied across multiple platforms and for highly
similar versions of the same app (eg, “lite” or “pro” versions) to
produce the nal list of unique apps (see Figure 1 owchart).
A Microsoft Excel spreadsheet was used to characterize
each app as to its required platform (eg, iOS, Android, etc.),
country developed, cost to download, number of downloads,
rating and number of reviewers contributing to the rating, date
of last update, and primary features. Additionally, a data ex-
traction form was developed using a Research Electronic Data
Capture (REDCap) survey that included the two rating scales
(described in the next paragraph). In the REDCap survey, we
also measured the sleep tracking method of each app (eg, “au-
tomatic,” “manual entry,” “both”). Additionally, we conducted
PubMed searches using the app name of each of the included
apps as the search term to identify peer-reviewed publication
reporting on the app credibility (eg, development using evi-
dence-based intervention, ecacy testing).
Rating Tools
To systematically assess and appraise the apps, the reviewers
used two dierent rating tools: (1) Mobile Application Rat-
ing Scale (MARS) quality score,
13
and (2) IMS Institute for
Healthcare Informatics app functionality score.
14
The MARS
rating tool is a 23-item scale developed to systematically as-
sess the quality of mHealth apps (Table 1). The MARS instru-
ment includes an objective app quality section with 19 items
divided into 4 scales: engagement, functionality, esthetics,
and information quality and one subjective quality section
with 4 items evaluating the users’ overall satisfaction. Each
MARS item is rated on a 5-point Likert scale (1 = inadequate,
2 = poor, 3-acceptable, 4 = good, and 5 = excellent). For this
review, we did not rate the MARS item 19 pertaining to app
credibility, because a PubMed search identied only three
validation studies among the included apps.
15 –17
The MARS
rating tool has been previously applied to evaluate diverse
mHealth apps including mindfulness,
18
weight management,
19
smoking cessation,
20
heart failure symptom monitoring,
21
and
blood alcohol calculation.
22
The IMS Institute for Healthcare Informatics mobile app
functionality score consists of 7 functionality criteria and 4
functional subcategories
14
(Table 2). Each app was evaluated
to assess whether each of 11 functionalities exists and a func-
tionality score (0 to 11) was calculated accordingly. The IMS
functionality score is dierent from the MARS functionality
score as it focuses solely on the availability of the functional-
ity (inform, record, display, guide, remind, and communicate),
whereas the MARS functionality score measures the quality
of performance, ease of use, navigation, and gestural design of
the app with a 5-point Likert scale.
Additionally, we assessed the type of data recording into
three categories: automatic tracking (eg, using embedded sen-
sors such as an accelerometer), manual tracking (ie, manual
logging by the user), or both.
Data Analysis
Two reviewers (YC and SL) were trained in the use of the
MARS scale
13
following the steps presented in the YouTube
training tutorial.
23
Both reviewers rated 25% of randomly selected apps to eval-
uate interrater reliability for both MARS and the IMS func-
tionality scores. The intraclass correlation coecients were
calculated on all MARS subscales and total score, as well as
for the IMS Institute for Healthcare Informatics functionality