White Paper on Amazon Alexa Digital Forensics – Approaches
Amazon’s product Amazon Echo has been in news for both useful and controversial reasons. Amazon Echo devices connect to the cloud-based voice service, Alexa Voice Service (AVS) which works on the concept of artificial intelligence. With Alexa as a voice-activated intelligent virtual assistant, the Echo is capable of doing various things, such as managing to-do lists, playing music, controlling other smart devices, etc.
This paper provides details on how Alexa works and its correlation with digital forensics. It will allow the readers to understand the technology and types of digital evidence which can be extracted out from Alexa. As it is a part of IoT, and is spreading widely in our lives in different arena, it is important to focus on it. Amazon Alexa Digital forensics helps in the analysis of the technology in case of a crime, it might help an investigator to find out potential evidence and missing links in cases. Here we will discuss in detail on Alexa Forensics and its demanding applications along with various case studies.
Information in this paper is segmented into three sections. Starting with an overview of the Alexa device system. Followed by a description of how it communicates with Google Cloud. Then, a description of data retention that is, what all information is stored. The last segment is going to focus on what all evidence can be found in analyzing it. In total, this a study of Amazon Alexa Ecosystem. Explore the digital forensics approach to analyze and evaluate Amazon Alexa service and the extraction of possible digital evidence.
Evaluate and Understand Digital Forensic Approaches for Amazon Alexa Ecosystem – The Study in Brief
What is Alexa System?
Alexa is an IPA (Intelligent Personal Assistant), developed by Amazon. It aids the users in their day to day activities as it is capable of voice interaction, playing audiobooks, making to-do lists, streaming podcasts, music playback, setting alarms, and providing weather, traffic, sports, and the other real-time information, such as news. It can control several smart devices using itself as a home automation system. Amazon allows device manufacturers to integrate Alexa voice capabilities into their products by using the Alexa Voice Service (AVS). AVS is a cloud-based service that provides APIs (Application program interface) to interface with Alexa. AVS also provides cloud-based automatic speech recognition (ASR) and natural language understanding (NLU).
Alexa system comprises of hardware as well as software with which the users directly interact such as echo devices like echo plus, echo 2nd generation and echo dot 2nd generation, echo show and echo spot, etc.; Cloud component is mainly responsible for the intelligence of the device as it actively works for providing automatic speech recognition, natural language understanding, and response. Some responses are provided by third party services through “skills.” The 3rd-parties that write and publish those skills are responsible for their skill’s behavior.
Long short-term artificial neural network is responsible for the generation of the voice of Amazon Alexa. Amazon uses the past voice recordings of the users to the cloud services in order to improve responses to future questions. Voice recording associated with the user’s account can be deleted by the users. Digital recordings of users’ audio spoken after the “wake word” are retained by Amazon.
Figure: Overview of the Echo and the Alexa System
Amazon Alexa Ecosystem
The Amazon Echo controls an interface for communicating with the cloud-based service, Alexa. All the cloud-based operations represent a general operating method because they are inseparable from cloud services in providing interoperability with compatible devices and companion clients for the convenience of a user.
The Amazon Alexa ecosystem comprises of various components:
- One or more Alexa-enabled devices are required for talking to the Alexa cloud service.
- Alexa represents all Amazon cloud platforms supporting the operation of the ecosystem. So, it includes various cloud services for authentication, logging, data management, and Alexa Voice Service.
- From Digital Forensic point of view on Amazon Alexa, the companion clients are essential to managing the overall operating environment through access to the cloud server. Any personal device which is used for the execution of Alexa companion applications such as Amazon Alexa App for Fire OS, Android, and iOS is referred to as companion client.
Although there are no official applications for a PC, users can still utilize web browsers for accessing the cloud. Therefore, any kind of digital device having web browsing capabilities may contain some potential digital evidence relating to the use of the Alexa ecosystem. Alexa can also be expanded through connections through additions of skills (3rd party applications) and to compatible IoT devices for various services.
Figure: Amazon Alexa Ecosystem
Automatic Speech Recognition (ASR)
The Alexa system is constructed in such a way that ASR is the first to receive the data. Its basic mechanism is that it takes the audio stream and turns it into a set of possible text strings that are then forwarded to the Natural Language Understanding (NLU) system. These strings and their corresponding confidence scores are used to improve speech recognition. The string with the highest score (i.e., the one that Alexa acts on) is also stored and is displayed to the user in the Alexa application.
Natural Language Understanding (NLU)
The “Natural Language Understanding” (NLU) interprets the recognition result and produces an intent. The NLU performs:
- Intent classification
- Entity recognition
- Slot resolution
The service looks at the intent and routes the request to the proper application (Skill) with the slots filled with the provided information. The data about the chosen intent and information related to entity recognition and slot resolution is stored for machine learning purposes.
Skills
Skills allow the Alexa device to extend its capability of processing requests. It does not do it on its own but it extends this task to third-party developers. The programs have been designed in such a way that very limited amount of information is shared with the third party developers of those skills. For example, voice recordings are not shared with skills. The skill takes the input and retrieves the appropriate information from the designated data source, which returns the needed data. The skill then formulates its response. It takes the raw data and according to that it constructs a textual response formatted Simple Speech Mark-up Language (SSML) which gives the commands to TTS regarding the next step response. Once the response is generated, it is sent to the response system. Customer’s personal information (e.g. name, address) is not released to the 3rd-party unless specifically requested to be shared by the customer.
A permission framework similar to mobile phones is used in which if customers grant permission to share certain data with the skill developers only then that information is shared. Each time a customer talks to a skill, the skill gets the same token for that user. Each 3rd-party skill developers have their own policies concerning storage and retention of skill-specific data. Customers can read the 3rd-party skill developer privacy policies which are provided on the skill detail page of Amazon website.
Responses (TTS)
The SSML which is produced by the skill is taken by the response system. TTS (Text to speech) is used to generate the audio speech file, and this audio is streamed to the appropriate device. For many skills, this ends the interaction. Some skills are interactive and ask to follow up questions. Echo devices are designed in such a way that the blue ring on the Echo device is lit when the device is waiting for a response to a question that Alexa has asked. The text of the response is stored by the Alexa system so that users of personal devices can review past answers using the Alexa app. The response can be used by the Amazon team who built the specific skill to ensure that Alexa is providing relevant answers to queries and that the TTS system is properly translating the text to speech.
How Does an Echo Detect its Wake Word and Send Audio to the Alexa Cloud?
Amazon Echo devices are designed to use on-device keyword spotting to detect the wake word and only the wake word. Unless the microphone is turned off, this technology inspects acoustic patterns in the room to detect when the wake word has been spoken using a short, on-device buffer that is continuously overwritten. Multiple algorithms are running on the Echo device looking for the specified wake word. At this point, no audio is sent to the Alexa cloud. If the algorithms do not detect the wake word, then the Echo device continues to wait for the wake word, continuously overwriting the contents of the small internal audio buffer.
Importantly, Echo devices do not keep local records of audio; they keep only a small amount of audio to detect the wake word. This on-device buffer exists in temporary memory (RAM); audio is not recorded to any on-device storage. When the wake word is detected or the action button is pressed, a connection to the cloud is opened up. The Echo device turns on the blue ring and starts streaming the audio, starting with a fraction of a second of audio before the wake word and continuing until the Alexa system in the cloud turns off the audio stream. Echo devices use a signal processing technique called beamforming to emphasize the user’s speech from the desired direction while suppressing audio interference (like conversations outside the room) from other directions. Customers see beamforming take place on an Echo device with a visual cue – the lightest blue color on the light indicator points towards the source of the audio that is being recorded.
If Alexa is activated using the wake word, the first step that occurs when the stream reaches the cloud is that the audio is re-analyzed using the more powerful processing capabilities of the cloud to verify the wake word was spoken. These additional algorithms are in the cloud, and not on the device, for reasons including requiring more processing power than the Echo device has available or using machine-learning derived models based on recent learnings. The on-device algorithms are automatically updated regularly. If this cloud software verification is unable to confirm the wake word was spoken, the Alexa system stops processing the audio. If the wake word is verified (or if Alexa was activated using the action button), our ASR and NLU systems process the customer’s request so Alexa can respond appropriately. As our speech recognition system analyzes the audio stream, the system continually attempts to determine when the customer’s request to Alexa has ended and then immediately ends the audio stream. The light ring then typically flashes blue/light blue until the response is ready for playback. It then sends the response (the blue ring pulses while Alexa is speaking), and the Echo device returns to monitoring for its wake word.
What all Information Alexa Stores (Data Retention)?
Amazon Alexa uses the power of machine learning and the cloud to retain the data and use it. It is designed in such a way that Alexa gets smarter every day with every input of data or information. That is, the more a user utilizes Alexa, the more it adapts to their patterns of speech, vocabulary, and personal preferences.
Different types of data are used and stored by the Alexa system to provide the Alexa service. Configuration parameters are set by the user either on
the device or using the Alexa app. These parameters include such things as the device location (set by the administrator or user), preferred time zone and unit measures, volume level, and other preferences.
Audio and text inputs are the core piece of Alexa data. Voice recordings are processed through speech-to-text algorithms and then through natural language processing algorithms to extract the user’s intent and the parameters of the Alexa query. These systems use machine learning techniques to continuously improve themselves with each input.
Data Retention Data is stored in multiple forms and for multiple purposes in various Amazon services, such as S3 and DynamoDB (under the control of the Alexa service). Each data type has an associated retention policy and access policy. Sensitive customer data in the Alexa system (such as voice recordings) is stored in databases and encrypted at rest and in transit, using Amazon’s internal key management systems.
Some system-level data is also stored in log files, for either service troubleshooting purposes, or security incident resolution. Troubleshooting logs contain information necessary for developers to troubleshoot the Alexa system, but do not contain customer voice recordings or data derived from customer voice recordings, such as slot values or the TTS response. Access to these logs is restricted to teams needing access to this data to perform their business functions. Troubleshooting logs are encrypted and their access is audited. Security logs are retained for purposes of audits and are restricted to those operating in security incident roles. They contain data that describes:
- When systems or users authenticated themselves to the system
- Which systems and users accessed which data, and when.
These logs are encrypted and the data in them is used to ensure that system use complies with applicable policies. Metrics are stored in databases. Metrics are used for internal business processes, to direct system improvements, for systems performance analysis and reporting, and customer reports. Access to metrics is restricted to the teams and individuals that need this data to perform their work.
The speech recognition and natural language understanding in the Alexa system are based on machine learning (ML) algorithms. Data sets from real use cases are fed into the various ML systems to build new algorithms and improve existing algorithms. Again, access to speech and derived data in Amazon’s ML systems is strictly controlled and audited. Third-party skill developers store data they receive through customers’ use of their skills according to their privacy policies.
What all Evidence can be Found on Analysing Amazon Alexa?
Devices like Amazon Echo are undoubtedly a great source of potential digital evidence due to their ubiquitous use, their always-on mode of operation and their increasingly widespread use. Under such circumstances, lots of data are being produced in real time in response to user behaviour. Such products are closely connected with local as well as cloud systems, so it is possible to conduct an integrated analysis of forensically meaningful data from both systems upon consideration of the target device’s ecosystem. Here comes the the importance of understanding Amazon Alexa digital forensics approaches and its real-time applications.
There are many IoT, wearable and personal assistant devices that are constantly recording an abundance of personal data from their users, which can also be used by law enforcement to investigate and prosecute crimes.
Android WebView Cache: Amazon Alexa is a web-based application, it uses the WebView class to display online content in Android. Thus, there are chances that the cloud-native artifacts are cached by the WebView. The cache directory may contain multiple cache files. Each WebView cache file (Android 4.4.2) simply consists of a string with the original URL and a data stream. The internal format of the WebView cache consists of:
- An 8-byte fixed header and footer, and also a 4-byte field for storing the length of a string with the original URL.
- It shows a file cached after calling phoenix API as a data stream of the example is gzip-compressed data, it is necessary to decompress it to obtain the original JSON.
Chrome Web Cache: In the Chrome cache system, the data stream with Alexa native artifacts are usually stored inside the data block files (data_#) for small amounts of data. Thus, it is necessary to parse all cache entries and verify their data streams inside the data block files. There is a possibility that the data stream is stored as a separate file but the possibility is small because it is a compressed JSON string with gzip. Alexa related cache, like any other cache entries, have two data streams: One of them is for HTTP heads and the other one is the actual cached data. Both of these cached data, the Android WebView, and Chrome web caches are potential sources of digital evidence. Although these caches are very helpful for understanding user behaviors, especially when valid user credentials are not available or some native artifacts are deleted from the cloud, it also has inevitable limitations. For example, caches are only created when users click menus that trigger Alexa APIs, and they also can be deleted or overwritten at any time.
A lot of efforts have been put in to identify forensic artifacts in case of IoT products. For example, a research team presented their findings on artifacts saved while users used Android mobile applications related to IoT products. They considered the various application and specifically, regarding Amazon Echo, that research team mentioned SQLite databases and web cache files which include forensically significant information, such as accounts and interactions associated with Alexa.
Each API can be categorized as one of the following seven categories on the basis of the data analysis results: account, customer setting, Alexa enabled device, compatible device, skill, user activity, etc. There is a possibility to acquire forensically significant artifacts from the Alexa, such as registered user accounts, linked Google calendars, Alexa enabled devices, saved Wi-Fi settings (including unencrypted passwords), and lists of installed skills that may be used to interact with other cloud services. A large amount of data with timestamps can be found. More specifically, JSON data which can be acquired by APIs such as cards, activities, media, notifications, phoenix, and to-dos, contains values with UNIX timestamps. All of this may act as a potential source of evidence and this might aid the investigator to allow reconstruction of user’s activities with a time zone identified by device-preference API. Also, there are interesting values acquired by cards, activities, and to-dos that includes the rear part of a URL possessing a user’s voice file on the cloud. Thus, it is possible to download the voice file using utterance API if necessary.
In the Android system, the application uses two SQLite files: map_data_storage.db and DataStore.db. The first database contains token information about a user who is currently logged in. That is, all data in this database are deleted when a user signs out. There are chances that a part of the deleted records could be found from unused areas of the SQLite database and its journal file. In addition, the other file includes to-do and shopping lists. The mentioned lists can be acquired from the cloud using to-dos API.
In the case of the iOS system, the application manages one SQLite file titled LocalData.sqlite. This file also includes to-do and shopping lists like DataStore.db in Android (Benson). Some information gets stored locally on companion devices by the application.
Cloud forensics plays a key role in approaching the Amazon Alexa ecosystem as a source of digital evidence. Currently, according to the studies, there are two proposed perspectives: client-based cloud forensics and cloud-native forensics. During the initial stage of research on cloud forensics, several researchers have performed client-based cloud forensics, acquiring and analyzing data that was locally saved by applications or web browsers related to the use of famous cloud services, including but not limited to Amazon S3, Google Docs, Dropbox, Evernote, and ownCloud. In recent researches efforts have been made examine cloud native forensics to overcome the fundamental limitation that there is, there is still a large amount of data that is not stored in storage devices or the data that is just stored in temporary caches. Cloud drive acquisition is done by using forensic approaches on native artifacts of Google drive Microsoft OneDrive, Dropbox, and Box using APIs supported by the cloud services.
The Cloud Extractor Tool Unlocks a World of Information that Includes the Following:
- Account details: Information about the account holder
- Information on all the connected devices
- Contact list
- User-created lists such as Shopping list, Travel Itinerary, etc.
- User Activity
- Inbound and Outbound calls and messages
- Calendar
- Notifications
- Preferences
- Stored voice commands
- WiFi Configurations
Therefore, the tool allows investigators to gain an insight into the user’s schedule, everyday activity, calls and messages and voice commands that could serve as evidence in trials.
Types of Analysis & Forensic Artifacts Obtained
Hardware Analysis: Each Alexa enabled device needs to be decomposed for performing hardware-level analysis. For hardware analysis, reverse engineering of the device through approaches, such as eMMC Root, JTAG, and debug ports is done. There are some more possible methods for enabling access to the internal parts, including soldered memory chips.
Network Analysis: The communication protocol in Alexa enabled devices and companion clients communicates with Alexa through the internet. By using the Charles web debugging proxy (XK72) for traffic analysis, it was confirmed that most of the traffic associated with forensically significant artifacts are transferred over an encrypted connection after creating a session with a valid user ID and password. With this network analysis, identification of cloud-native and client-centric artifacts can be efficiently done.
Cloud Analysis: The Alexa cloud service is a core component of the target ecosystem. Alexa operates using pre-defined APIs to transceive data just like other cloud services. But, the available API list is not officially open to the public. There are few studies which have been done to reveal unofficial APIs used by Alexa and to acquire cloud-native artifacts for supporting investigations.
Lastly, for managing and setting up an Alexa enabled device there is a definite need of at least one companion client. For example, users can configure environment settings, enable/ disable skills using a mobile app or web-browser and can also review previous conversations with Alexa. During this entire process, a large amount of Alexa associated data can be stored naturally in companion clients. This makes it necessary to acquire these client-centric artifacts and these must be analyzed along with cloud-native artifacts.