a Not available.
b SpO 2 : peripheral arterial oxygen saturation.
c AVPU: Alert, Voice, Pain, Unresponsive.
d GCS: Glasgow coma scale.
The stepped-wedge study comprised 2 arms, a control arm in which vital signs were recorded using paper observation charts and an intervention arm where the digital EWS system, SEND, was used. The EWS and escalation protocol were identical in both arms.
The study consisted of 20 clusters (and 21 steps). We defined a cluster as a group of between 1 and 5 eligible wards that implemented SEND simultaneously. All wards that were due to switch to the SEND system were eligible for inclusion in a cluster; we defined these as “study wards.” Study wards included all adult wards across the Trust, except for the obstetric wards, emergency departments, day units, high dependency units, ICUs, and investigation suites, which were excluded as they did not use standard hospital observation recording and escalation policies. We also excluded the 3 wards where the SEND system was initially developed and piloted, as the control condition, paper charting, was no longer used at the commencement of the study.
Clusters of wards were determined by pragmatic considerations related to the safe conduct of the rollout. For example, each cluster only contained wards from an individual hospital. The sequence of study clusters was predetermined by the system rollout strategy and was therefore not randomized.
The rollout schedule is depicted in Figure 2 . The time period between the start of each step was typically 2 weeks. The period was occasionally lengthened to account for project management issues such as reduced staffing over the Christmas holidays (exact dates are provided in Multimedia Appendix 2 ). The final period, which occurred after SEND was fully deployed to all wards, lasted 3 months. The extended period was designed to capture any delayed effects caused by wards adapting to the new system.
Each study ward admitted multiple patients during each step. Data for this study was obtained at an individual patient level. A patient’s data belonged to only 1 step, that is, each cluster and period contained data pertaining to different people. We included all patient admissions to the study wards during the study period rather than censoring data from repeated admissions. Therefore, some patients could potentially contribute data to multiple steps on different admissions. We treated multiple episodes within the same patient as independent, reasoning that the primary outcome was unlikely to be causally related to patient characteristics. We excluded data from admissions where patients crossed study arms (ie, the ward moved from paper to digital EWS) during their admission.
Data from the control arm were collected by 7 research assistants transcribing data from paper charts located on each study ward into a bespoke electronic form. This was a resource-intensive process, making it unfeasible to collect data from all clusters simultaneously for the duration of the study. Therefore, we commenced data for the control arm at the start of the roll-out to each hospital site and limited it to the site where SEND was actively rolled out (illustrated in Figure 2 ). To make this tractable, we further split the largest hospital (Hospital D), into 2 sites (Main Wing, second Wing). Data from the intervention arm was continued even once the roll-out of the intervention at a given hospital was complete such that patients from the hospital contributed more data to the intervention arm than data in subsequent hospitals. In summary, data collection may be considered as separate stepped wedges associated with each of the 5 sites, with varying lengths of data from after the intervention.
For each patient admission within each study cluster, we collected patient characteristics (age, gender, Charlson score, admission type, and admitting specialty), the date and times of admission to the ward; first observation with CEWS ≥3 and the immediate subsequent observation; hospital discharge; hospital mortality; transfer to ICU; cardiac arrest call; and theatre admission.
The primary outcome measure was the time to next observation (TTNO), defined as the time between a patient’s first triggering observations set (CEWS score ≥3) and the subsequent observations set. To address potential confounding by length of ward stay, analysis of the primary outcome measure was restricted to triggering observation sets that occurred within 48 hours of transfer to the first study ward of an admission.
Secondary outcome measures were time to death in the hospital, time to unplanned ICU admission, time to cardiac arrest call, and hospital length of stay (LOS). In each case, the start time was the time of the initial triggering set of observations.
We reported these outcomes for the subgroup included in the analysis of the primary outcome measure (ie, those patients who had a CEWS score ≥3), in line with our causal hypothesis. We also reported the secondary outcomes for all eligible admissions. In these analyses, we used the time of admission to the study ward as the start time.
Finally, we reported system usability to provide further context. System usability was measured using the system usability scale, a validated 10-item questionnaire that is used to generate a score between 0 and 100 [ 22 ]. We delivered the questionnaire electronically to all users of the digital system. The questionnaire is included in Multimedia Appendix 3 .
The upper bound on the number of patient admissions included in the study was determined by the pragmatic roll-out schedule of the intervention. To determine whether this would be sufficient, we initially undertook a power calculation for steps 1-8, using unpublished pilot data from the Computer Alerting Monitoring System 2 study [ 23 ]. We assumed that the proportion of patients who have a further observation within 3 hours of recording an EWS ≥3 would be 0.5 in the paper arm and 0.6 in the electronic arm, that there would be an average of 11 patients with an initial CEWS ≥3 per cluster, and conservatively that the intracluster correlation will be 0.15. The power was then estimated to be 79.3% for a 5% α level. While the calculation depended on statistics estimated from limited pilot data, it indicated that the inclusion of all steps would be sufficiently powerful to detect a difference of 10% in the primary outcome between groups. Full details of this calculation are provided in Multimedia Appendix 4 .
The primary outcome, the difference in TTNO between arms, was analyzed using a mixed-effects Cox model with a random intercept for cluster and a fixed effect for time as described by Hussey and Hughes [ 24 ]. The model included in-hospital death, ICU admission, theatre admission, and cardiac arrest calls as competing events.
We conducted a sensitivity analysis using 5 variants of the basic Hussey and Hughes model, as originally proposed by Hemming et al [ 25 ]. The five variants were: (1) time by strata interaction (fixed effects), (2) time by cluster interaction (random effects), (3) treatment by strata interaction (fixed effects), (4) treatment by cluster interaction (random effects), and (5) treatment by time interaction (fixed effects). Secondary outcomes were analyzed using the same method.
To aid interpretation, we calculated the average TTNO in each arm as the mean of the median (IQR) TTNO within each unit of the stepped wedge cluster.
We reported baseline descriptive statistics on patient characteristics, including age and sex, by study arm. We also reported these data for each time period to help understand whether trends in baseline characteristics differed between the control and intervention arms.
We conducted the study between January 2015 and September 2016, after the conclusion of the rollout of SEND. During this time, there were 90,262 admissions to the study wards. For 2927 (3%) of admissions, vital signs were recorded on both paper and SEND systems and thus excluded. Of the remaining 87,335 admissions, 40,885 (47%) had vital signs recorded exclusively on paper (control arm) and 46,450 (53%) admissions involved patients who had vital signs recorded exclusively using SEND (intervention arm). Of the admissions in the control arm, 11,597 occurred during the implementation period and were available for data capture. In total, 12,802 admissions were entered into the analysis, consisting of 1084 admissions in the control arm and 11,718 admissions in the intervention arm that had a triggering observation within 48 hours of arrival on their first study ward ( Figure 3 ).
Admission characteristics for the control and intervention are presented in Table 2 . Admissions in the intervention arm tended to be slightly older (median age 65 vs 70 years), more likely to be male (49.3% vs 45.6%), and have a higher number of comorbidities (median Charlson score 3 vs 4).
Characteristics | Control (paper) | Intervention (SEND ) | |||
Admissions | 1084 | 11,718 | |||
Patients | 1048 | 10,708 | |||
Age (years), median (IQR) | 65 (49-79) | 70 (54-81) | |||
Sex (male), n (%) | 494 (45.6) | 5777 (49.3) | |||
Charlson score, median (IQR) | 3 (0-10) | 4 (0-12) | |||
Elective | 392 (36.2) | 4281 (36.5) | |||
Emergency | 692 (63.9) | 7427 (63.4) | |||
Other | 0 (0) | 10 (0.1) | |||
Medical | 430 (40) | 5618 (47.9) | |||
Surgical | 645 (59.5) | 5894 (50.3) | |||
Other | 9 (0.8) | 206 (1.76) |
a SEND: system for electronic notification and documentation.
The proportion of male to female sex in both study arms was similar across all steps apart from cluster 1, in which there were a small number of admissions on paper (n=10). There were no males in cluster 20, a cluster that contained only obstetrics and gynecology wards. Proportions of elective and emergency admissions, and medical and surgical admissions, were similar for each study arm across all clusters.
There was no significant difference in the TTNO between the 2 arms after adjustment for competing events ( Table 3 ). The median TTNO in the control arm was 128 (IQR 73-218) minutes. The median TTNO in the observation arm was 131 (IQR 73-223) minutes. The hazard ratio of the TTNO using paper charting and the TTNO using SEND was 0.99 (95% CI 0.91-1.07, P =.73). All model variants in the sensitivity analysis gave results consistent with the Hussey and Hughes model primary analysis. The numbers of each type of competing events in each arm are shown in Table 4 .
Model | Hazard ratio (95% CI) | value | |
0.99 (0.91-1.07) | .73 | ||
Time by strata interaction (FE ) | Does not fit | — | |
Time by cluster interaction (RE ) | 0.98 (0.91-1.07) | .72 | |
Treatment by strata interaction (FE) | 0.96 (0.83-1.12) | .63 | |
Treatment by cluster interaction (RE) | 0.99 (0.90-1.07) | .73 | |
Treatment by time interaction (FE) | Does not fit | — |
a FE: Fixed Effects.
b Not available.
c RE: Random Effects.
Competing events | Control (paper), n (%) | Intervention (SEND ), n (%) |
Death | 50 (5) | 826 (7) |
ICU admission | 22 (2) | 237 (2) |
Theatre admission | 181 (14) | 1508 (12) |
Arrest call | 4 (<1%) | 44 (<1%) |
b ICU: intensive care unit.
Figure 4 shows the TTNO for each step during the study. Confidence intervals for the electronic arm were much narrower than the electronic arm because there was more electronic data (collected after the initial intervention rollout period). There was a marked variation in the TTNO according to cluster ( Figure 4 ); the introduction of the digital system did not reduce this variance. There was insufficient power to determine if the intervention had an impact at a cluster level. However, we note that there appeared to be a large reduction in TTNO for cluster 12, which were acute general medicine wards.
The introduction of SEND had no significant effect on time to death in hospital, LOS, or time to unplanned ICU admission for the cohort included in the primary analysis ( Table 5 ). There were only 48 cardiac arrest calls across the 2 arms of the study, therefore, there were insufficient events to model this outcome. The findings were consistent irrespective of modeling assumptions. Sensitivity analyses are reported in Multimedia Appendix 5 .
Outcome | Hazard ratio (95% CI) | value |
Time to death in hospital | 0.96 (0.68-1.36) | .84 |
Time to ICU admission | 1.85 (0.98-3.49) | .06 |
Hospital length of stay | 0.99 (0.65-1.51) | .97 |
a ICU: intensive care unit.
We also calculated the same secondary measures for the entire patient population (11,597 control and 46,450 intervention), including all those who did not score a CEWS ≥3 within the first 48 hours of admission ( Multimedia Appendix 6 ). For this population, the start time was taken to be the time of admission to the study ward. In this group, there were no significant differences in time to death or LOS. However, there was a borderline reduction in time to ICU admission from the initial triggering set of observations in the intervention arm (hazard ratio 1.25, 95% CI 1.02-1.54).
System usability scores were only available from Hospital A. The feedback questionnaire was sent to 1891 users, of which 208 (11%) responded. The system usability score was 77.6.
In this large, stepped wedge trial conducted across 4 hospital sites of the same National Health Service trust, the introduction of a digital charting system did not affect the frequency of vital signs recording, nor was it associated with changes in hospital mortality, cardiac arrest rates, or hospital LOS within the subgroup of patients who had a triggering EWS.
Our findings contrast with previous studies of digital vital signs charting. Jones et al [ 26 ] reported a reduction in the mean LOS from 9.7 to 6.9 days following the introduction of Patientrack (Alcidion Group Ltd). Schmidt et al [ 15 ] reported a reduction in hospital mortality following the introduction of VitalPAC (System C Healthcare Ltd).
The differences between our findings and those of previous researchers may be related to trial design and statistical analysis. A significant strength of our work is the use of a stepped-wedge trial design and a large data set, in line with international recommendations regarding digital health evaluation [ 27 ]. Furthermore, we did not institute any new clinical workflows when implementing SEND, which would have confounded the results.
Beyond issues related to design and analysis, 4 other hypotheses could explain our findings. First, it might be that the design or usability of SEND meant that nurses did not engage with the system. However, the system has previously been shown to be more efficient than the charting on paper and the score of 77.6 on the system usability scale is representative of good usability [ 28 , 29 ].
A second possibility is that, although the system was well-liked by staff, advice was not presented at the right time or in the right context and was therefore ineffective in reminding nurses to recheck vital signs [ 30 ]. Advice from the hospital protocol was presented at the time of observation recording but there was no mechanism for automatically notifying staff that the next set of observations was due and our implementation did not include the display of the time to the next observations on a dedicated screen at the nursing station. The understanding of how digital systems influence behavior is poorly understood.
A third possibility is that the system was effective in reminding nurses to recheck observations more frequently, but that the reminder alone was insufficient to trigger behavior change. Behavior change requires a combination of capability, opportunity, and motivation [ 31 ]. Even if a digital charting system positively alters motivation (through user prompts) and capability (through increased efficiency), these influences may be nullified by competing demands.
Finally, there is the possibility that, even with an effective reminder and supportive context, nurses were exercising clinical judgment and deliberately choosing to deviate from the hospital protocol. The gap between hospital protocols (“work as imagined”) and routine clinical practice (“work as done”) is well recognized and is often an essential adaptation to ensuring that hospitals continue to function [ 32 ]. While the hospital protocol recommended the same frequency of monitoring for all patients with an EWS greater than or equal to 3, our results showed that nurses increased the frequency of vital signs monitoring with the EWS score. It is possible that increasing the frequency of vital signs recording would not improve patient outcomes and rather than the nurses changing practice to match the hospital protocol, the protocol should be changed to match nursing practice more closely.
An unexpected finding was that when including all patients, irrespective of whether they had a triggering observation, the time to ICU admission in the intervention arm was less than in the control arm. Similar reductions in time to ICU transfer have recently been observed in a pre and postintervention study of a digital EWS system that used the electronic Cardiac Arrest Risk Triage EWS [ 33 ]. The difference was observed without any difference in the primary outcome measure, which might be explained in 2 ways. Either the result may not correspond to a true effect (which is consistent with the associated wide confidence intervals), or else SEND may be exerting effects via a mechanism other than increased frequency of patient observations.
The primary limitation of the study design was that clusters were not randomized but were instead determined by the predetermined phased rollout plan for SEND. Lack of randomization may be a problem since the estimate of the treatment may be unbiased if secular trends exist. To mitigate against this, we included a large number of clusters and explored a variety of analysis methods to examine the possibility of a secular trend. The stepped approach retains advantages over a simple before-after design. The presence of a control group available throughout the study period means that system-level changes may be detected.
A further limitation was the relatively small number of secondary end points. This led to instances in which some clusters had zero secondary end point events. Therefore, conclusions from the secondary outcome analysis ought to be interpreted with caution.
Caution is also required in interpreting the usability survey results. In our original study protocol, we had intended to obtain system usability score data from all new users of the system at the end of roll-out to each hospital site. However, flaws in our survey administration procedures inhibited us from identifying new users versus clinical users who worked in multiple hospitals. Therefore, we only surveyed users of the first site. It is possible that they were not representative of all users. Furthermore, there may be responder bias associated with the low response rate. However, the results obtained in this study are consistent with the findings of questionnaires from staff on pilot wards during the SEND development process [ 28 ].
Although data in this study were collected in 2016, we emphasize that the findings remain highly relevant to both the United Kingdom and international health care providers. In the United Kingdom, digital EWS systems are not yet ubiquitous and have been implemented at multiple hospital Trusts in the last year [ 34 , 35 ]. Internationally, the use of both EWS and an accompanying digital system is an emerging practice [ 36 ]. More pertinently, the effectiveness of EWS and the mechanism by which any potential benefits are obtained is still an open question. Indeed, a recent pre- and postevaluation of a digital sepsis score system highlighted the ongoing need for understanding how the use of alert systems evolves over time and impacts clinical workflow [ 37 ].
Finally, the findings presented here likely underestimate the true overall benefit of the system. We only examined the effects of SEND using a single measure of observation recording practice, the time between observations, is primarily a reflection of the impact of the system on nursing processes. We did not examine the impact of SEND on other clinical processes or the benefits of secondary use of the data for clinical governance and research.
The introduction of a digital vital signs charting system had no effect on the frequency of vital signs observation or the time to ICU admission, hospital LOS, and hospital mortality in patients with a high EWS. Our findings stand in contrast to previous claims that the introduction of a digital vital signs charting system is associated with significant improvement in clinical outcomes. Future research should continue to investigate the mechanisms by which digital vital signs charting systems alter staff behaviors and improve patient outcomes.
The authors thank Soubera Yousefi, Samuel Wilson, Alan Dodge, David Vallance, Simon Kerr, Deolyn Makoni, and Giovanni Rizzo for transcribing information from paper observation charts during the study. This study was supported by the National Institute for Health and Care Research (NIHR) Biomedical Research Centre, Oxford. System for electronic notification and documentation (SEND) was developed and implemented with funding from the National Health Service England Safer Wards Safer Hospitals Fund. PJW is employed by the OUH Foundation National Health Service Trust. PJW, TB, and DC-W were supported by the National Institute for Health and Care Research Biomedical Research Centre, Oxford.
The data sets generated and analyzed during this study are not publicly available as they are recorded at the patient level, such that it might be possible to reidentify individuals. They are available from the corresponding author on reasonable request.
TB, SG, DC-W, JB, and PJW have substantially contributed to the design of the study and the writing of this manuscript. Statistical analysis was undertaken by SG and JB. All authors read and approved the final manuscript. The funders have not been involved in the study design or reporting.
TB, DC-W, and PJW were part of the team that developed the system for electronic notification and documentation (SEND). Sensyne Health has since purchased the sole license for SEND. DC-W has previously undertaken consultancy for Sensyne Health. PJW was previously employed part-time and held shares in Sensyne Health. SG and JB declare that they have no competing interests.
EWS chart and escalation protocol.
Dates of steps.
System Usability Scale questionnaire adapted for SEND.
Power calculation.
Sensitivity analysis.
Secondary outcomes for the entire patient population (11,597 control and 46,450 intervention), including all those who did not score a CEWS≥3 within the first 48 h of admission.
centile early warning score |
early warning score |
intensive care unit |
length of stay |
Oxford University Hospitals Foundation NHS Trust |
system for electronic notification and documentation |
time to next observation |
Edited by A Mavragani; submitted 21.02.23; peer-reviewed by SB Ho, D Barra, C Subbe; comments to author 30.10.23; revised version received 17.11.23; accepted 08.04.24; published 20.06.24.
©David Chi-Wai Wong, Timothy Bonnici, Stephen Gerry, Jacqueline Birks, Peter J Watkinson. Originally published in the Journal of Medical Internet Research (https://www.jmir.org), 20.06.2024.
This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research (ISSN 1438-8871), is properly cited. The complete bibliographic information, a link to the original publication on https://www.jmir.org/, as well as this copyright and license information must be included.
At the 2024 Worldwide Developers Conference , we introduced Apple Intelligence, a personal intelligence system integrated deeply into iOS 18, iPadOS 18, and macOS Sequoia.
Apple Intelligence is comprised of multiple highly-capable generative models that are specialized for our users’ everyday tasks, and can adapt on the fly for their current activity. The foundation models built into Apple Intelligence have been fine-tuned for user experiences such as writing and refining text, prioritizing and summarizing notifications, creating playful images for conversations with family and friends, and taking in-app actions to simplify interactions across apps.
In the following overview, we will detail how two of these models — a ~3 billion parameter on-device language model, and a larger server-based language model available with Private Cloud Compute and running on Apple silicon servers — have been built and adapted to perform specialized tasks efficiently, accurately, and responsibly. These two foundation models are part of a larger family of generative models created by Apple to support users and developers; this includes a coding model to build intelligence into Xcode, as well as a diffusion model to help users express themselves visually, for example, in the Messages app. We look forward to sharing more information soon on this broader set of models.
Apple Intelligence is designed with our core values at every step and built on a foundation of groundbreaking privacy innovations.
Additionally, we have created a set of Responsible AI principles to guide how we develop AI tools, as well as the models that underpin them:
These principles are reflected throughout the architecture that enables Apple Intelligence, connects features and tools with specialized models, and scans inputs and outputs to provide each feature with the information needed to function responsibly.
In the remainder of this overview, we provide details on decisions such as: how we develop models that are highly capable, fast, and power-efficient; how we approach training these models; how our adapters are fine-tuned for specific user needs; and how we evaluate model performance for both helpfulness and unintended harm.
Our foundation models are trained on Apple's AXLearn framework , an open-source project we released in 2023. It builds on top of JAX and XLA, and allows us to train the models with high efficiency and scalability on various training hardware and cloud platforms, including TPUs and both cloud and on-premise GPUs. We used a combination of data parallelism, tensor parallelism, sequence parallelism, and Fully Sharded Data Parallel (FSDP) to scale training along multiple dimensions such as data, model, and sequence length.
We train our foundation models on licensed data, including data selected to enhance specific features, as well as publicly available data collected by our web-crawler, AppleBot. Web publishers have the option to opt out of the use of their web content for Apple Intelligence training with a data usage control.
We never use our users’ private personal data or user interactions when training our foundation models, and we apply filters to remove personally identifiable information like social security and credit card numbers that are publicly available on the Internet. We also filter profanity and other low-quality content to prevent its inclusion in the training corpus. In addition to filtering, we perform data extraction, deduplication, and the application of a model-based classifier to identify high quality documents.
We find that data quality is essential to model success, so we utilize a hybrid data strategy in our training pipeline, incorporating both human-annotated and synthetic data, and conduct thorough data curation and filtering procedures. We have developed two novel algorithms in post-training: (1) a rejection sampling fine-tuning algorithm with teacher committee, and (2) a reinforcement learning from human feedback (RLHF) algorithm with mirror descent policy optimization and a leave-one-out advantage estimator. We find that these two algorithms lead to significant improvement in the model’s instruction-following quality.
In addition to ensuring our generative models are highly capable, we have used a range of innovative techniques to optimize them on-device and on our private cloud for speed and efficiency. We have applied an extensive set of optimizations for both first token and extended token inference performance.
Both the on-device and server models use grouped-query-attention. We use shared input and output vocab embedding tables to reduce memory requirements and inference cost. These shared embedding tensors are mapped without duplications. The on-device model uses a vocab size of 49K, while the server model uses a vocab size of 100K, which includes additional language and technical tokens.
For on-device inference, we use low-bit palletization, a critical optimization technique that achieves the necessary memory, power, and performance requirements. To maintain model quality, we developed a new framework using LoRA adapters that incorporates a mixed 2-bit and 4-bit configuration strategy — averaging 3.5 bits-per-weight — to achieve the same accuracy as the uncompressed models.
Additionally, we use an interactive model latency and power analysis tool, Talaria , to better guide the bit rate selection for each operation. We also utilize activation quantization and embedding quantization, and have developed an approach to enable efficient Key-Value (KV) cache update on our neural engines.
With this set of optimizations, on iPhone 15 Pro we are able to reach time-to-first-token latency of about 0.6 millisecond per prompt token, and a generation rate of 30 tokens per second. Notably, this performance is attained before employing token speculation techniques, from which we see further enhancement on the token generation rate.
Our foundation models are fine-tuned for users’ everyday activities, and can dynamically specialize themselves on-the-fly for the task at hand. We utilize adapters, small neural network modules that can be plugged into various layers of the pre-trained model, to fine-tune our models for specific tasks. For our models we adapt the attention matrices, the attention projection matrix, and the fully connected layers in the point-wise feedforward networks for a suitable set of the decoding layers of the transformer architecture.
By fine-tuning only the adapter layers, the original parameters of the base pre-trained model remain unchanged, preserving the general knowledge of the model while tailoring the adapter layers to support specific tasks.
We represent the values of the adapter parameters using 16 bits, and for the ~3 billion parameter on-device model, the parameters for a rank 16 adapter typically require 10s of megabytes. The adapter models can be dynamically loaded, temporarily cached in memory, and swapped — giving our foundation model the ability to specialize itself on the fly for the task at hand while efficiently managing memory and guaranteeing the operating system's responsiveness.
To facilitate the training of the adapters, we created an efficient infrastructure that allows us to rapidly retrain, test, and deploy adapters when either the base model or the training data gets updated. The adapter parameters are initialized using the accuracy-recovery adapter introduced in the Optimization section.
Our focus is on delivering generative models that can enable users to communicate, work, express themselves, and get things done across their Apple products. When benchmarking our models, we focus on human evaluation as we find that these results are highly correlated to user experience in our products. We conducted performance evaluations on both feature-specific adapters and the foundation models.
To illustrate our approach, we look at how we evaluated our adapter for summarization. As product requirements for summaries of emails and notifications differ in subtle but important ways, we fine-tune accuracy-recovery low-rank (LoRA) adapters on top of the palletized model to meet these specific requirements. Our training data is based on synthetic summaries generated from bigger server models, filtered by a rejection sampling strategy that keeps only the high quality summaries.
To evaluate the product-specific summarization, we use a set of 750 responses carefully sampled for each use case. These evaluation datasets emphasize a diverse set of inputs that our product features are likely to face in production, and include a stratified mixture of single and stacked documents of varying content types and lengths. As product features, it was important to evaluate performance against datasets that are representative of real use cases. We find that our models with adapters generate better summaries than a comparable model.
As part of responsible development, we identified and evaluated specific risks inherent to summarization. For example, summaries occasionally remove important nuance or other details in ways that are undesirable. However, we found that the summarization adapter did not amplify sensitive content in over 99% of targeted adversarial examples. We continue to adversarially probe to identify unknown harms and expand our evaluations to help guide further improvements.
In addition to evaluating feature specific performance powered by foundation models and adapters, we evaluate both the on-device and server-based models’ general capabilities. We utilize a comprehensive evaluation set of real-world prompts to test the general model capabilities. These prompts are diverse across different difficulty levels and cover major categories such as brainstorming, classification, closed question answering, coding, extraction, mathematical reasoning, open question answering, rewriting, safety, summarization, and writing.
We compare our models with both open-source models (Phi-3, Gemma, Mistral, DBRX) and commercial models of comparable size (GPT-3.5-Turbo, GPT-4-Turbo) 1 . We find that our models are preferred by human graders over most comparable competitor models. On this benchmark, our on-device model, with ~3B parameters, outperforms larger models including Phi-3-mini, Mistral-7B, and Gemma-7B. Our server model compares favorably to DBRX-Instruct, Mixtral-8x22B, and GPT-3.5-Turbo while being highly efficient.
We use a set of diverse adversarial prompts to test the model performance on harmful content, sensitive topics, and factuality. We measure the violation rates of each model as evaluated by human graders on this evaluation set, with a lower number being desirable. Both the on-device and server models are robust when faced with adversarial prompts, achieving violation rates lower than open-source and commercial models.
Our models are preferred by human graders as safe and helpful over competitor models for these prompts. However, considering the broad capabilities of large language models, we understand the limitation of our safety benchmark. We are actively conducting both manual and automatic red-teaming with internal and external teams to continue evaluating our models' safety.
To further evaluate our models, we use the Instruction-Following Eval (IFEval) benchmark to compare their instruction-following capabilities with models of comparable size. The results suggest that both our on-device and server model follow detailed instructions better than the open-source and commercial models of comparable size.
We evaluate our models’ writing ability on our internal summarization and composition benchmarks, consisting of a variety of writing instructions. These results do not refer to our feature-specific adapter for summarization (seen in Figure 3 ), nor do we have an adapter focused on composition.
The Apple foundation models and adapters introduced at WWDC24 underlie Apple Intelligence, the new personal intelligence system that is integrated deeply into iPhone, iPad, and Mac, and enables powerful capabilities across language, images, actions, and personal context. Our models have been created with the purpose of helping users do everyday activities across their Apple products, and developed responsibly at every stage and guided by Apple’s core values. We look forward to sharing more information soon on our broader family of generative models, including language, diffusion, and coding models.
[1] We compared against the following model versions: gpt-3.5-turbo-0125, gpt-4-0125-preview, Phi-3-mini-4k-instruct, Mistral-7B-Instruct-v0.2, Mixtral-8x22B-Instruct-v0.1, Gemma-1.1-2B, and Gemma-1.1-7B. The open-source and Apple models are evaluated in bfloat16 precision.
Advancing speech accessibility with personal voice.
A voice replicator is a powerful tool for people at risk of losing their ability to speak, including those with a recent diagnosis of amyotrophic lateral sclerosis (ALS) or other conditions that can progressively impact speaking ability. First introduced in May 2023 and made available on iOS 17 in September 2023, Personal Voice is a tool that creates a synthesized voice for such users to speak in FaceTime, phone calls, assistive communication apps, and in-person conversations.
Earlier this year, Apple hosted the Natural Language Understanding workshop. This two-day hybrid event brought together Apple and members of the academic research community for talks and discussions on the state of the art in natural language understanding.
In this post, we share highlights from workshop discussions and recordings of select workshop talks.
Our research in machine learning breaks new ground every day.
Work with us
AI for the rest of us.
Coming in beta this fall *
Built into your iPhone, iPad, and Mac to help you write, express yourself, and get things done effortlessly.
Draws on your personal context while setting a brand-new standard for privacy in AI.
Apple Intelligence powers new Writing Tools, which help you find just the right words virtually everywhere you write. With enhanced language capabilities, you can summarize an entire lecture in seconds, get the short version of a long group thread, and minimize unnecessary distractions with prioritized notifications.
Transform how you communicate using intelligent Writing Tools that can proofread your text, rewrite different versions until the tone and wording are just right, and summarize selected text with a tap. Writing Tools are available nearly everywhere you write, including third-party apps.
Priority notifications appear at the top of the stack, letting you know what to pay attention to at a glance. And notifications are summarized, so you can scan them faster.
Priority messages in Mail elevate time-sensitive messages to the top of your inbox — like an invitation that has a deadline today or a check-in reminder for your flight this afternoon.
Tap to reveal a summary of a long email in the Mail app and cut to the chase. You can also view summaries of email right from your inbox.
Just hit record in the Notes or Phone apps to capture audio recordings and transcripts. Apple Intelligence generates summaries of your transcripts, so you can get to the most important information at a glance.
Reduce Interruptions is an all-new Focus that understands the content of your notifications and shows you the ones that might need immediate attention, like a text about picking up your child from daycare later today.
Use a Smart Reply in Mail to quickly draft an email response with all the right details. Apple Intelligence can identify questions you were asked in an email and offer relevant selections to include in your response. With a few taps you’re ready to send a reply with key questions answered.
Apple Intelligence enables delightful new ways to express yourself visually. Create fun, original images and brand-new Genmoji that are truly personal to you. Turn a rough sketch into a related image that complements your notes with Image Wand. And make a custom memory movie based on the description you provide.
Produce fun, original images in seconds with the Image Playground experience right in your apps. Create an entirely new image based on a description, suggested concepts, and even a person from your Photos library. You can easily adjust the style and make changes to match a Messages thread, your Freeform board, or a slide in Keynote.
Experiment with different concepts and try out image styles like animation, illustration, and sketch in the dedicated Image Playground app . Create custom images to share with friends in other apps or on social media.
Make a brand-new Genmoji right in the keyboard to match any conversation. Provide a description to see a preview, and adjust your description until it’s perfect. You can even pick someone from your Photos library and create a Genmoji that looks like them.
Image Wand can transform your rough sketch into a related image in the Notes app. Use your finger or Apple Pencil to draw a circle around your sketch, and Image Wand will analyze the content around it to produce a complementary visual. You can even circle an empty space, and Image Wand will use the surrounding context to create a picture.
Create a custom memory movie of the story you want to see, right in Photos. Enter a description, and Apple Intelligence finds the best photos and videos that match. It then crafts a storyline with unique chapters based on themes it identifies and arranges your photos into a movie with its own narrative arc.
Search for photos and videos in the Photos app simply by describing what you’re looking for. Apple Intelligence can even find a particular moment in a video clip that fits your search description and take you right to it.
Remove distractions in your photos with the Clean Up tool in the Photos app. Apple Intelligence identifies background objects so you can remove them with a tap and perfect your shot — while staying true to the original image.
Siri draws on Apple Intelligence for all-new superpowers. With an all-new design, richer language understanding, and the ability to type to Siri whenever it’s convenient for you, communicating with Siri is more natural than ever. Equipped with awareness of your personal context, the ability to take action in and across apps, and product knowledge about your devices’ features and settings, Siri will be able to assist you like never before.
Siri has an all-new design that’s even more deeply integrated into the system experience, with an elegant, glowing light that wraps around the edge of your screen.
With a double tap on the bottom of your iPhone or iPad screen, you can type to Siri from anywhere in the system when you don’t want to speak out loud.
Tap into the expansive product knowledge Siri has about your devices’ features and settings. You can ask questions when you’re learning how to do something new on your iPhone, iPad, and Mac, and Siri can give you step-by-step directions in a flash.
Siri, set an alarm for — oh wait no, set a timer for 10 minutes. Actually, make that 5.
Richer language understanding and an enhanced voice make communicating with Siri even more natural. And when you refer to something you mentioned in a previous request, like the location of a calendar event you just created, and ask ”What will the weather be like there?” Siri knows what you’re talking about.
Apple Intelligence empowers Siri with onscreen awareness , so it can understand and take action with things on your screen. If a friend texts you their new address, you can say “Add this address to their contact card,” and Siri will take care of it.
Awareness of your personal context enables Siri to help you in ways that are unique to you. Can’t remember if a friend shared that recipe with you in a note, a text, or an email? Need your passport number while booking a flight? Siri can use its knowledge of the information on your device to help find what you’re looking for, without compromising your privacy.
Seamlessly take action in and across apps with Siri. You can make a request like “Send the email I drafted to April and Lilly” and Siri knows which email you’re referencing and which app it’s in. And Siri can take actions across apps, so after you ask Siri to enhance a photo for you by saying “Make this photo pop,” you can ask Siri to drop it in a specific note in the Notes app — without lifting a finger.
Apple Intelligence is designed to protect your privacy at every step. It’s integrated into the core of your iPhone, iPad, and Mac through on-device processing. So it’s aware of your personal information without collecting your personal information. And with groundbreaking Private Cloud Compute, Apple Intelligence can draw on larger server-based models, running on Apple silicon, to handle more complex requests for you while protecting your privacy.
With ChatGPT from OpenAI integrated into Siri and Writing Tools, you get even more expertise when it might be helpful for you — no need to jump between tools. Siri can tap into ChatGPT for certain requests, including questions about photos or documents. And with Compose in Writing Tools, you can create and illustrate original content from scratch.
You control when ChatGPT is used and will be asked before any of your information is shared. Anyone can access ChatGPT for free, without creating an account. ChatGPT subscribers can connect accounts to access paid features within these experiences.
New App Intents, APIs, and frameworks make it incredibly easy for developers to integrate system-level features like Siri, Writing Tools, and Image Playground into your favorite apps.
Learn more about developing for Apple Intelligence
Apple Intelligence is free to use and will initially be available in U.S. English. Coming in beta this fall. *
IMAGES
VIDEO
COMMENTS
The introduction must end with a clear thesis statement that acts as the backbone of the research paper. Be sure to utilize citations where necessary to give credit for the ideas of others, as omitting these can result in plagiarism. Depending on the field of research and the requirements of the journal in which the paper will be published, the ...
The introduction paragraph of your research paper must contain all relevant background information that pertains to the research topic.You need to choose a topic and accordingly include preliminary information about the same as your background so the reader knows what your research is about.. What should an introduction paragraph of a research paper contain?
Writing a good introduction for a research paper entails starting with brief, yet comprehensive background information about the subject matter. The introduction should also present the rationale of the research, justifying the work being carried out. It´s crucial to mention the goal of the study, which is usually the hypothesis or research ...
6. End the introduction paragraph with a clear thesis statement that summarizes the main argument of the paper. In conclusion, an introduction paragraph is a crucial part of a research paper that sets the tone and provides the readers with context and background information about the topic of the paper. It should be written in a concise and ...
Table of contents. Step 1: Introduce your topic. Step 2: Describe the background. Step 3: Establish your research problem. Step 4: Specify your objective (s) Step 5: Map out your paper. Research paper introduction examples. Frequently asked questions about the research paper introduction.
The introduction of a research paper should include Background information, Research question or objective, Significance of the research, Scope of the research, Literature review, Methodology and Structure of the research paper.. 1. Background information: The introduction should provide some context for the study. The background information provides the reader with information about the ...
The introduction leads the reader from a general subject area to a particular topic of inquiry. It establishes the scope, context, and significance of the research being conducted by summarizing current understanding and background information about the topic, stating the purpose of the work in the form of the research problem supported by a hypothesis or a set of questions, explaining briefly ...
Step 1: Hook your reader. Step 2: Give background information. Step 3: Present your thesis statement. Step 4: Map your essay's structure. Step 5: Check and revise. More examples of essay introductions. Other interesting articles. Frequently asked questions about the essay introduction.
Research paper introduction is the first section of a research paper that provides an overview of the study, its purpose, and the research question (s) or hypothesis (es) being investigated. It typically includes background information about the topic, a review of previous research in the field, and a statement of the research objectives.
Define your specific research problem and problem statement. Highlight the novelty and contributions of the study. Give an overview of the paper's structure. The research paper introduction can vary in size and structure depending on whether your paper presents the results of original empirical research or is a review paper.
D.The introduction in a research paper serves to provide background information, present the research purpose and goals, and outline the paper's structure, making it a critical component for setting the stage for the study. The purpose of an introduction in a research paper is D.
Consider opening with an anecdote, a pithy quotation, a question, or a startling fact to provoke your reader's interest. Just make sure that the opening helps put your topic in some useful context for the reader. Overall, your focus in an introduction should be on orienting your reader. Keep in mind the following five Ws: who, what, when ...
Depending on the type of the paper, there might be additional sections as well. In the introduction, a statement previews the main idea and briefly touches on key points of the research. It includes the aim or the hypothesis of the study, clearly identifying the purpose of the research. Learn more about Introduction to a Research Paper here:
report flag outlined. An introduction is what makes readers want to read your paper. So make it interesting! Write general ideas on what your research is about. This is information that your readers need to know before you can talk about your research. If your topic is something a bit complicated, start of something basic then slowly get more ...
When Kate L. Turabian first put her famous guidelines to paper, she could hardly have imagined the world in which today's students would be conducting research. Yet while the ways in which we research and compose papers may have changed, the fundamentals remain the same: writers need to have a strong research question, construct an evidence-based argument, cite their sources, and structure ...
Answer: The above given statement is True.. Explanation: The introduction serves a number of functions. It establishes the framework for your research, discusses your topic and goals, and offers a work outline.; A solid first paragraph will establish the tone for the rest of your piece, enticing readers to move on to the methodology, results, and discussion.
HSCI 203 Introduction to Healthcare Informatics Healthcare Technology Innovation Research Paper Assignment Assignment Directions: Healthcare informatics continues to develop at a fast pace and is an integral part of clinical nursing practice. This assignment requires that students identify a healthcare technology innovation that improves the delivery of quality and safe care.
REVIEWS HIRE. 14550 +. Place an order. 1 (888)814-4206 1 (888)499-5521. Example Of Introduction In Research Paper Brainly -.
Example Of Introduction In Research Paper Brainly - +1 (888) 985-9998. 100% Success rate 2646 . Customer Reviews. 989 Orders prepared. Order now Login. Yes, all of our papers are completely free from any type of errors and plagiarism. Essay, Research paper, Coursework, Powerpoint Presentation, Discussion Board Post, Research proposal, Term ...
Example Of Introduction In Research Paper Brainly | Top Writers. Gombos Zoran. #21 in Global Rating. Level: Master's, University, College, PHD, High School, Undergraduate. 100% Success rate. Look up our reviews and see what our clients have to say! We have thousands of returning clients that use our writing services every chance they get.
This Research Topic is the third volume of 'Case Reports in Cardio-Oncology'. Please see the previous volume here.This Research Topic aims to collect all the Case Reports submitted to the Cardio-Oncology section. If submitted directly to this collection the paper will be personally assessed by a Senior Associate Editor before the beginning of the peer-review process.
This paper is in the following e-collection/theme issue: E-Health / Health Services Research and New Models of Care (447) Clinical Information and Decision Making (1302) mHealth in a Clinical Setting (573) Electronic Health Records (956) Clinical Informatics (891) Decision Support for Health Professionals (1110) Intensive Care Unit (ICU) (101) Digital Biomarkers and Digital Phenotyping (175)
Figure 1: Modeling overview for the Apple foundation models. Pre-Training. Our foundation models are trained on Apple's AXLearn framework, an open-source project we released in 2023.It builds on top of JAX and XLA, and allows us to train the models with high efficiency and scalability on various training hardware and cloud platforms, including TPUs and both cloud and on-premise GPUs.
Transform how you communicate using intelligent Writing Tools that can proofread your text, rewrite different versions until the tone and wording are just right, and summarize selected text with a tap. Writing Tools are available nearly everywhere you write, including third-party apps.