When it comes to their own devices, many consumers take for granted the security of Internet of Things (IoT). This can be problematic when products such as digital assistants or video cameras are able to access highly privacy-sensitive areas of the consumer. These devices can remotely record audio and video streams and therefore, if compromised by an attacker, could be used to peek or listen into on the user’s home.
Unfortunately, IoT is a field in which customer expectations and real-life implementation are often significantly disconnected. Many devices are marketed as secure but do not withstand any scrutiny when examined in detail. At UL, we occasionally sample devices on the market, reviewing their security in a completely black-box manner, without any information provided by the product vendor, i.e., with the same knowledge and tools as a consumer. This technical report describes some of our findings when testing a consumer CCTV camera, namely the VTech Telecommunications Ltd. Baby Monitor RM5752.
When engineering a CCTV camera device, the design brief should require that the camera is able to send the video stream to an authorized smartphone that then displays the video in real time. The next step in the design process would be to look at the communication that must occur between the camera and the smartphone, i.e., in the simplest terms, the following communication path:
Now, consider that in a consumer scenario, this process would typically be dealing with Wi-Fi cameras. That is, the consumer integrates the camera using their home Wi-Fi network, typically via their Internet router. If your network security is well configured, this should not be problematic.
However, in the illustration above, note that the router for the Wi-Fi routes between two distinct networks — the local area network (LAN) and the wide area network (WAN), which is typically the internet. Any traffic that occurs between devices in the same LAN obviously does not need to be routed to the internet at all. Ideally, and sensibly, this data is only routed locally.
This type of design is common, but it does have one major drawback — as soon as the smartphone leaves residential Wi-Fi, it can no longer reach the camera, and the consumer would be unable to view the video stream. Therefore, while a compellingly simple design, many vendors opt to deliver an added feature of remote viewing for their end users by employing a type of rendezvous server that sits on the internet and allows the smartphone to find its paired camera, regardless of the networks containing the two. Video data is simply relayed through the back-end system, much in the same way physical mail is routed through a central mail-processing service before arriving at its destination.
While such a system has the benefit of allowing remote viewing of the camera, it also introduces many new issues. First, traffic that travels through the internet must be well protected against eavesdropping and manipulation. Second, a strong authentication mechanism — by which the smartphone proves to the rendezvous server that it actually is authorized to view the video stream — must be in place. Third, depending on the actual key management solution employed, the back-end system may be able to access the clear video stream.
Imagine, for example, that traffic between the camera and the rendezvous server is well protected, and the traffic between the rendezvous server and smartphone is equally well protected. Such protection is typically realized using the Transport Layer Security (TLS) protocol. This would mean that anyone listening in on the internet traffic, regardless of the communication path, would be unable to peek into the video data. However, it would also imply that, if the rendezvous server itself were to be compromised — e.g., by an external hacker, law enforcement or a malicious employee operating the server — all of the video streams could be exposed.
So, from a security perspective, having true end-to-end encryption is desirable, in which the keys to encrypt video data are only stored locally (both on the phone and the camera) and never become known to the vendor. In such a scenario, even if the rendezvous server were to get hacked, all video data would remain secure. Note that, confusingly, the former variant — in which TLS is used for both connections — is often also referred to as end-to-end encryption because the rendezvous server is viewed as an ultimately trusted communication end.
Also note that, in practice, a rendezvous server — used only to establish a connection — is a sophisticated way of communicating data. The bulk of data is, however, communicated directly. One way to achieve this is firewall hole punching, which is the routing variant used in the product we will look at next. For the sake of clarity, however, this detail does not matter and has been omitted here.
When testing an IoT device, you should look at the product itself and review any documentation and claims made therein. Then, operate the product as intended while passively monitoring the network traffic. The UL Cyber Assurance Team bought the VTech RM5752.
The device includes a camera, a monitor unit and a smartphone app that can be used to view the camera video stream. We also looked at the website that accompanies the product, focusing on technical information that could be helpful during our investigation. We found numerous technical statements in the marketing materials stating that the product employs strong encryption using the AES-128 algorithm and protects the video stream data in an end-to-end fashion.
The statement that the product uses AES-128 encryption is of technical nature and unusually specific in its level of detail. Our initial assumption was that this algorithm is indeed what the product is using. After all, the statement could be easily disproven if untrue. In fact, we show in the following that the product we have tested does not use AES-128 to protect customer data. Using strong AES-128 encryption would be something that, if done correctly, could lead to a state-of-the-art camera security system. However, as soon as we operated the product and looked at the passively captured data transmitted over Wi-Fi, we immediately noticed data patterns that appeared to contradict the claim that AES-128 encryption was applied. For example, in the capture below are listed network communications during an ongoing viewing session that likely contain video data:
We found this unusual, as the supposedly encrypted packet seemed to end in cleartext, one time ending in “Charlie is t” and another time “Charlie.” The reason this is so obviously incorrect is that AES-128 is a block cipher that has near-ideal operating characteristics. This means that anything that is, in fact, AES-128 encrypted looks like random characters with a character distribution that is uniform; i.e., every ciphertext character appears with approximately the same probability. It is highly unlikely to find 12 consecutive characters in that stream that are text-only. Let’s say, for example, we want to determine the probability that only lowercase letters, uppercase letters and the space character is present. That makes 26 + 26 + 1 = 53 out of 256 possibilities, or an approximately 21% chance for a single character to appear. The probability of having this occur 12 times in a row is one in nearly 1.61 million. In comparison, the odds of being struck by lightning are about one in a million per year, making the described event 161 times less likely than getting struck by lightning. In other words, it is exceptionally unlikely that this is a coincidence.
It should also be noted that, in the second packet capture, we observed many consecutive 16-byte blocks that were nearly identical. This would not happen with AES encryption. It is conceivable that it could happen in some stream modes, but only if those modes are not implemented correctly. This is something that is also atypical for AES-128 encryption. What we would expect for properly encrypted AES-128 traffic — even when the most broken mode of operation, ECB, is used — is to see either entirely identical ciphertext blocks, which occurs when identical plaintext blocks are repeatedly encrypted to the same ciphertext blocks, or entirely different ciphertext blocks. A single flipped bit in the plaintext would cause, on average, 50% of the ciphertext bits to flip. We should never see two blocks that are almost identical, as was the case in the trace we observed. Therefore, the data patterns we observed were a fundamental giveaway that AES-128 encryption is not used in this scenario and instead that something must be incorrect. By simply looking at the transmitted data intended to be encrypted, this is clear, even to an inexperienced security tester.
A Closer Look
To understand what is happening here, we need to take a closer look. An excellent source of information can be found by physical disassembly of the device to tap into the board-level electronics directly.
After opening the camera, we immediately identified the internal memory storage device — a flash ROM IC as the Winbond WN25N01GV (a 1 GiBit NAND Flash IC). One of the test points on the printed circuit board (PCB) — a location used to validate the system’s correct operation during manufacturing — showed a logic high level (3.3V) during normal operation, with intermittent edges at bootup. This is typical for a connected UART, a communications interface used in embedded systems such as this. We attempted to decode the signal as UART traffic and saw characters when configuring to 115200 8N1 — also quite typical for serial consoles.
A bit more probing allowed us to identify the correct interface signals and gave us our first significant step in extracting data using a U-Boot serial console:
At this stage, we knew extraction of all data was already feasible over the UART line. However, the achieved data rates would be somewhere around 6 kiB/sec (hex-encoded over a 115200 Baud serial line), taking roughly six hours to extract the 128 MiB image.
Having that console, we could dump the firmware easily using netcopy and tar. Essentially, we used the already-established Wi-Fi connection and piped the output of “tar c /” into nc, listening on the peer for the root file system data to pour in.
Root File System Analysis
Once we had the firmware, it was time for a detailed analysis. First, we identified the main binary and associated libraries that provide the actual functionality. This was straightforward using “ps” in our root console and subsequently looking at the dependencies of the single monolithic binary. Of course, given the marketing claims, we attempted to find out if AES code was in fact present and, if so, where it would be found.
An AES implementation has numerous characteristics. For example, a specific data structure called an “S-Box” is typically found for both encryption and decryption, and routines are named accordingly to perform key scheduling, encrypting and decrypting operations. To our surprise, we found code that did in fact appear to do exactly that in the libIOTCAPIs.so library file. This library is a proprietary implementation by ThroughTek Co. Ltd., also known as TUTK. Our first step was to analyze the library in Ghidra and look at the AES functions:
It was clear that the function intended to handle an AES key schedule — named “AesGenKeySched” — was not doing so by any means. It was simply copying 16 bytes from source to destination. This is indicative of an implementation stub, since the actual implementation of the AES key schedule isn't implemented. Consequently, the AES implementation that follows would have to do its own key schedule because no round keys have been derived in the function intended to derive them. This is highly atypical.
To make some sense of our findings, we searched for public files related to libIOTCAPIs.so and found them in the ThroughTek GitHub repository itself. The header files IOTCAPIs.h gave us an interesting lead:
One of the authors of the library was named Charlie, exactly the string we found numerous times in the ciphertext data stream. Our final clue came in reviewing both the name and, again, the binary, which revealed a function called “ReverseTransCodePartial.” This gave us the answer to why we found the name in the ciphertext:
Concretely, what we were seeing was a makeshift crypto implementation of a cipher with exceptionally bad characteristics. Furthermore, it used a static key that was the 32-byte long ASCII string “Charlie is the designer of P2P!!” Therefore, our suspicion was that the dysfunctional AES implementation, although broken anyway, was simply never used and instead only this “transcoding” would be applied to transmitted data.
Reverse Transcoding the Ciphertext
To test our theory, the most straightforward way was to run some of the captured ciphertext data through the transcoding algorithm and see what came out the other end.
In our search for the TUTK SDK, we found some libIOTCAPIs.so that was compiled for x86_64. This simplified our work, as that library also contained the function in question. Instead of recoding from the Ghidra source code, we simply compiled a small program and linked it against the official binary, making the library do nearly all the work for us.
The ReverseTransCodePartial function was marked as static, meaning that it was not exported by default. This means that introduction of the (guessed) prototype would be insufficient, because linking would fail for symbol visibility reasons. This is not a problem at all, however, since we can simply choose any exported symbol and compute the offset in the binary, then reference that in our C code. Obviously, absolute offsets cannot be used because ASLR means that the library is always present at different locations in the virtual address space of our process for different runs.
We then ran this process on some of our captured data, and it was clearly successful, as the following decrypted packet shows:
This was one of the packets that appeared early in the stream, and it appears to contain a username/password authentication in plain text.
Extracting the Video Stream
To demonstrate that our reverse transcoding ability was sufficient to view the video stream, we applied it to all the data and looked at the plain text. Some of the frames clearly contained what appeared to be compressed data, such as the following:
The header in that file was reminiscent of an H.264 transport stream. To that end, we looked up the ITU-T specification (ITU-T H.264 Series H: Audiovisual and Multimedia Systems, Infrastructure of Audiovisual Services - Coding of Moving Video) and the contained Network Abstraction Layer Units (NALUs) to identify where we would need to start extracting data. We then pointed the CCTV camera toward a smartphone, which played back a YouTube video so that we would have lots of moving data. Finally, we wrote a script that would reverse transcode the stream data and concatenate it so we could successfully play it back using MPlayer:
This meant we could positively verify that the data transmitted over the internet in a non-local scenario was not encrypted, only obfuscated using the Transcode function. Anyone eavesdropping on the communication stream could not only extract the login information of the camera, but also view and modify the video stream itself. Contrary to the marketing claim made by VTech, at no point did we identify AES-128 as being used to encrypt data for the firmware version we have analyzed.
We have disclosed the issue to VTech following UL’s internal responsible disclosure process.
Sensitive data must be protected using appropriate means if the data is to be kept secure. Custom cryptography, such as the obfuscation layer we saw in this product, might look secure, but it certainly does not hold up to any level of scrutiny. As we have clearly demonstrated, it is possible — with relatively low effort — to identify in a completely black-box scenario what kind of custom obfuscation is used and how to reverse this to reveal the data. We strongly encourage our customers to use proven and tested means of encryption and authentication of data, such as TLSv1.3, and, ideally, protect data in a true end-to-end fashion, so that even a compromised backend does not affect the integrity of customer data.