OIT Network Systems

Android 2.1 - 2.3, 3.0 - 3.1 Allows DHCP Lease to Expire, Keeps Using IP Address

This document provides details for a group of DHCP bugs exhibited by the Android operating system on a variety of devices. We provide this information for individuals who would like detailed technical information about the issues.

The bugs can cause the Android device to disrupt service to other devices on the network.

Princeton University has reported the bugs to Google, the vendor responsible for the Android operating system. As far as we know, no fix has been made available by Google.

Contents

  1. What is DHCP?
  2. Which Versions of Android Are Affected?
  3. What are the Issues?
  4. How Has Princeton Handled These Issues?
  5. Have We Banned Android Devices at Princeton?
  6. Why Haven't Other Sites Reported This Particular Issue?

What is DHCP?

DHCPv4, the Dynamic Host Configuration Protocol for IPv4, allows a device attached to the network to automatically learn some or all of its network configuration, including its IPv4 (Internet) address. Most operating systems include DHCP client software.

A device that uses DHCPv4 runs DHCP client software on a network (e.g., Ethernet or Wireless) interface. The DHCP client software contacts DHCP servers to obtain network configuration; in particular, it usually obtains a lease (a loan) of an IP address.

For example, the DHCP client tells the DHCP server "I am network interface 0:1:2:3:4:5; please lease an IP address to me." The DHCP server might respond "You may use IP address 192.168.1.2 for the next six hours; if you would like to continue using that address, please renew it when three hours have elapsed." When three hours have elapsed, the DHCP client contacts the DHCP server which granted the lease; the client asks that server to renew the lease Typically the DHCP server responds to the client: "You may use IP address 192.168.1.2 for the next six hours; if you would like to continue using that address, please renew it again when three hours have elapsed." (If the DHCP client is unable to contact the DHCP server to the renew its unexpired lease, it will retry from time to time, and is permitted to continue using the IP address until the lease is due to expire.)

Assuming the DHCP client successfully renews the lease before it expires, this repeats periodically until the device goes offline. Once the device is offline, it no longer contacts the DHCP server to renew the lease, so eventually the last lease renewal expires. Once the last lease renewal has expired, the DHCP server is free to lease the IP address to another client.

If the device goes offline, when it later comes back online it broadcasts a DHCP request for a new lease. It may choose to request a brand-new lease, or (if it believes the old lease has not yet expired) may request a new lease on the old IP address.


Which Versions of Android Are Affected?

We have seen these issues on a range of models from a variety of vendors; the bugs are not confined to one vendor or device model.

Based upon information provided by Princeton University customers, one or more of these issues are present in (at least) Android versions below. These are the Android versions our customers report running on those devices we have detected exhibiting one or more of the issues.

It is possible that these bugs are also present in other versions of Android. Not all customers with malfunctioning devices tell us the version of Android they are running. Additionally, we may not detect every malfunctioning device.


What are the Issues?

Under some circumstances, a device running one of the affected versions of Android stop renewing its DHCP lease, yet continues (or resumes) using the IP address after the lease expires. Although the owner of the device may not realize there is a problem, this interferes with service to others on the network.

We've only observed Android devices exhibit these issues when connected via their Wi-Fi network interfaces. We do not know whether these issues also affect Android device connected via Ethernet network interfaces; Android devices with Ethernet interfaces appear to be rare at this time.

DHCP Behavior

We have observed a number of issues from affected Android devices. Some devices exhibit only a subset of these issues; this is likely because the issues are due to more than one bug.

Issue 1

After the lease has expired, the device continues using the IP address.

That is, it continues to respond to ARP Requests, claiming to own that IP address. It responds to ICMP Echo Request for that IP address. It responds to UDP and TCP traffic sent to that IP address. It initiates traffic to the Internet from that IP address.

It may do so for hours after the lease has expired.

This issue is common to all the Android devices exibibiting the bugs.

Eventually the device uses DHCP (in the INIT or INIT-REBOOT state) to request a new lease, which usually ends the incident. One situation that appears to trigger this is that the device disconnects from the wireless network (or loses it connection), for example, as a result of leaving the ESSID's coverage area; when it next connects, it starts in the DHCP INIT state. We assume there other circumstances that also will trigger a return to the DHCP INIT state.

Issue 2

Sometimes the malfunctioning device exhibits this issue in addition to issue #1.

While the device is continuing to use the IP address after the lease has expired, sometimes it unicasts DHCPREQUEST packets to the DHCP server for that lease, asking to renew the lease. It may do this for a few minutes, or for hours.

That makes no sense, as the lease has already expired. (A DHCP client wishing to renew a lease must renew it before the lease expires, not afterwards.)

(Our DHCP server refuses to renew the lease if the IP address was for a dynamically-assigned IP address; the lease has already expired.)

The client clearly believes that its DHCP lease has not yet expired.

A check of one affected client confirms that the problem is not due to the client's clock going backwards in time since it obtained the lease. The clock on the client seems to be counting forwards just fine.

Issue 3

This issue appears independently of issues #1 and #2. Some devices exibit this issue alone; others exhibit this issue along with issue #1, while others exhibit this issue along with issues #1 and #2.

As described above, eventually the device chooses to return to the DHCP INIT or INIT-REBOOT state. The device asks for a new lease, and obtains one. Sometimes this new lease is for a different IP address than the old lease. The device begins using the IP address from the new lease. This is normal.

But sometimes, the device continues (or resumes) using the IP address from the old expired lease as well. That is, the device is now using both IP addresses simultaneously. (It answers IP ARP Request packets for both IP addresses.)

Over time, the device may use DHCP to obtain a series of leases, but from time to time, it resumes using the IP address from that earlier expired lease as well. Some devices simply use the old expired lease all of the time (in addition to whatever other IP address they have leased), rather than doing so only from time to time.

Sometimes the device resumes using the IP address from an old expired lease "out of the blue" (at the same it is using another IP address for which it has an unexpired DHCP lease). We have seen devices do so hours, days, even months after the old lease has expired. (We've even seen an example of this over 16 months after the original lease expired.) The device may have slept and awoken many times (or perhaps even rebooted) since the time the original lease expired.

We sometimes see devices simultaneously use multiple IP addresses from expired DHCP leases well after the leases have expired, as if the device were accumulating the expired leases.

Sometimes, the malfunctioning device exhibits issues #1, #2, and #3 simultaneously. For example, the original lease on IP address 'a' expires, and the device continues using that IP address after lease expiration. Later, the device enters DHCP INIT and obtains a lease on IP address 'b'. The device now uses both IP addresses 'a' and 'b' simultaneously. While doing so, the device also tries to renew the expired lease on IP address 'a'.

Why are these DHCP Behaviors a Problem?

When a device continues to use an IP address from an expired DHCP lease after that lease expires, this can interfere with service to other devices. Once the malfunctioning device has allowed its DHCP lease to expire, the DHCP server may lease the same IP address to another client.

If two devices on the same IP network try to use the same IP address at the same time, one or both can experience difficulties using IP.

The DHCP servers try to reduce the impact of these malfunctioning clients. Before offering a client a new lease for a dynamically-assigned IP address, the servers perform a quick PING test to determine whether the IP address is unexpectedly in use. (For example, is some device "stealing" the IP address?) This quick test helps, but does not entirely work around the problem caused by the malfunctioning clients. For example, sometimes the malfunctioning device may not respond to PING at the time the DHCP server checks before leasing the IP address to another client. In some DHCP server implementations, the DHCP server may have limited time to perform the test, as other clients are waiting for responses from the DHCP server. And when a device exhibiting issue #3 resumes using an IP address from an expired lease "out of the blue," that IP address may have already been leased to another client; this makes it impossible for the DHCP server to discover the malfunctioning client at the time the server leases the IP address to a soon-to-be victim.

Circumstances Leading to the Issues

We have observed that one situation in which the device can exibit issue #1 and issue #2 is for the device to choose to remain attached to a Wi-Fi network while the device is asleep. If the DHCP lease comes due for renewal while the device is asleep, the device doesn't renew the lease. If the device remains asleep through the time that the lease expires, the device allows the lease to expire. The device continues to behave as if it believes the lease has not yet expired; it continues to use the IP address, and in some cases, tries renew the lease after expiration time. We have observed that it makes no difference whether the device is plugged into a power source throughout this period.

Some of the information from Google suggests that the cause (or one cause) of issue #3 may be a known bug in the Broadcom firmware supporting an Android device's wireless interface. Google has indicated that a bug in that firmware's "ARP Offload" feature can cause a device to claim IP addresses from expired leases. (They indicate the ARP Offload feature is used to allow the device to respond to ARP requests while the device is asleep, without fully waking the device.) We do not know if the problematic firmware is used on all Android devices, or only those with wireless hardware made by Broadcom. We do not know if the version of Android provided by some vendors for some devices might be customized to disable that feature in the Broadcom firmware, or to replace the problematic version of the Broadcom firmware with a fixed version. Any of these could explain why not all devices exhibit issue #3.


How Has Princeton Handled These Issues?

Princeton recognized this as a pattern involving Android devices during the Summer of 2010.

We first saw an Android device attached to our network exhibit the problem in February 2010, another in April 2010, one more in June 2010, nine more in July 2010, and ten more in August 2010. As students returned to campus during Fall 2010, we saw the numbers of malfunctioning Android devices grow rapidly.

Nearly all of the devices we've detected exhibiting the bugs have malfunctioned repeatedly. Often the device will malfunction in this way several times per day.

To help us better understand which Android platforms malfunction in this way, our customer support organization collected from owners of the malfunctioning devices the following information: Android version, device make and model. While only a small fraction of our Android customers responded, the data we collected indicates that the problem is widespread, present in Android versions 2.1 through 3.1 running on different device models from different vendors.

We collected data showing malfunctioning Android devices' DHCP behavior and IP address use, and determined that the devices were all exhibiting a set of bugs described above.

On September 14 2010, we filed a bug report #11236 with Google, the vendor of the Android operating system.

On September 14 2010, we published the first version of the document you are presently reading. A day later, OIT added a pointer to this information to its KnowledgeBase, used by both Princeton University customers and support staff.

In the following months, we continued updating our bug report at Google with more information demonstrating the problems, and showing how they affected a wide variety of Android devices.

There was no response from Google until after an April 19 2010 mention of our bug report on Slashdot.

On April 20 2010, an engineer at Google updated our bug report to say that Google had identified a couple of the causes for the issue we reported. The engineer indicated that Google had identified multiple bugs causing these behaviors; this is not just a single bug. They found bugs associated with the way the device renews DHCP leases with respect to the way the device sleeps. And they reported there is a bug in the firmware for Broadcom Wi-Fi hardware, causing its ARP offload feature to claim old IP addresses after the DHCP lease has expired. The engineer indicated that they have fixes for the bugs they have identified, and would soon be releasing those fixes.

Through March 26 2012, we have not received word from Google that any Android fixes for these bugs have been released.

During late May 2011 through late July 2011, we tested a workaround proposed by a Princeton University customer. For each Android device previously identified as exhibiting these bugs, as well as those identified during the test period, we contacted the customer associated with the device (where that person was known). We invited these customers to participate in this test. Of the 730 malfunctioning devices identified through that time, it was practical to contact the customers associated with 205 of these devices. The remaining 525 devices belonged mostly to anonymous visitors; a few belonged to customers who were impractical to contact. Of the 205 customers contacted, 52 chose to participate in the test. The test ran for ten weeks. Some of the devices participated for the entire period; most joined as the test proceeded. The typical device participated for about a month. We found that the proposed workaround was effective for 75-80% of the devices exhibiting issues #1, #2, and/or #3. It was ineffective for the remaining devices; all three issues were represented among the failure cases.

Based upon the test above, on July 29 2011 we published the procedure as a Partial Workaround for "Android 2.1 - 2.3, 3.0 - 3.1 Allows DHCP Lease to Expire, Keeps Using IP Address" Bugs. While that procedure does not fix the bugs, it allows some of the malfunctioning Android devices to be used on Wi-Fi network without disrupting service to others. We began including a pointer to that procedure in the information we provide to affected Android customers.

If a device malfunctions in a similar manner after the customer advises us that s/he had adopted the partial workaround, we take that as final indication that the partial workaround has proven not effective for that particular customer's device. We block the device, advise the customer, and then keep the block in place permanently (or until a fix for the device is available from Google). We do not allow the device to be unblocked and have "another try" to use the partial workaround, even if the customer believes the reason for the malfunction was that the customer didn't apply the partial workaround properly (or unwittingly removed the workaround). This is beause we have experienced so many Android devices on our network exhibiting these bugs, it is impractical for us to allow each one to interfere with service repeatedly. Each device gets one opportunity to try the partial workaround.

During July 29 2011 - September 1 2011, we noticed that some of the test participants previously counted as successes malfunctioned again. We therefore updated our results on September 2 2011 to reflect that the partial workaround was effective for 70% of the tested devices exhibiting issues #1, #2, and/or #3.

On December 30 2011 we reviewed incident records for all devices which had attempted to use the partial workaround to-date. That data shows that over time, more of these devices eventually malfunction again in the same way. We therefore updated our results to reflect that the partial workaround has been effective for 61% of the devices exhibiting issues #1, #2, and/or #3.

We continue to encounter a growing number of Android devices exibiting these bugs, disrupting network service for others on a daily basis. The partial workaround has been ineffective for a significant fraction of Android devices.

Through May 13 2012, we have seen well over 1500 Android devices malfunction in this way while attached to our campus network.

We have continued updating our bug report at Google, showing how the bugs have remained present in newer Android releases across a variety of devices.

We have heard nothing from Google regarding these bugs since April 26 2011.

We have no further information from Google about whether these bugs in Android will be fixed.

If Google does fix these bugs in a future version of Android, it is not clear to us that owners of devices running prior versions of Android will be able to obtain such bug fixes. Google's distribution model for Android updates does not appear to result in timely updates for most owners. Often Android updates for existing devices are simply not available.

We will continue to update this document when we have additional information.


Have We Banned Android Devices at Princeton?

We have not banned the use of Android devices at Princeton. Each Android device is welcome on our network, unless or until that device malfunctions in such a way as to disrupt or degrade service. Only those that are detected malfunctioning in this way are blocked from using the network. However, it is an unfortunate fact that most Android devices running the affected versions of the operating system do malfunction in this way, ultimately resulting in us blocking each of those devices, one at a time.

Once an individual Android device exhibits this bug, we contact the customer to advise him or her of the problem. We advise the customer that if the device interferes with service a second time in this way, network service for the device will be blocked.

If the same device exhibits the problem more than once, we block that individual device from our network. (Most affected Android devices malfunction so frequently, often we detect the device malfunction several times in the same day, and so we block the device at the same time we first contact the owner.)

Once blocked, if the device has a cellular network interface, the device can still be used with the customer's cellular network provider, of course.

If it is not practical to contact the customer (for example, because the device is using our visitor wireless service and the owner is anonymous), we block that individual device from our network the first time it exhibits this bug. If at a later time it becomes practical to contact the customer (for example, because the customer has registered the device in the University's Host Database), we contact the customer to advise him or her of the problem.

Beginning with the availability of the partial workaround during Summer 2011, if the owner of a blocked malfunctioning Android device chooses to adopt the partial workaround, we unblock the device, allowing it to resume using the campus network. Because the partial workaround is not fully effective, some of these devices will continue to disrupt service. When we detect one of these devices again disrupting service, we block the device from the network and contact the customer again. In the absence of a fully-effective workaround or a fix from Google, these devices remain blocked from our network.

This is similar to how we handle other malfunctioning devices which disrupt or degrade service. We typically do not ban entire classes of devices. We have not singled out Android devices for special handling. We block individual devices after they actually disrupt or degrade service. In most cases, we unblock such devices when the owner takes acceptable action to address the issue. (Lacking a fix from Google, for those Android devices where the partial workaround is not fully effective, there is nothing those customers can do to address this particular issue at this time.) Device Blocking Policies describes these policies in greater detail.


Why Was Princeton The Only Site to Report This Particular Issue (at First)?

Some may wonder why Princeton was the only site to report this problem at first. Some may believe that because other sites did not report the problem at first, the problem must be due to a problem with Princeton's network.

Princeton detected this issue because we take a very pro-active stance to monitor for certain kinds of common network problems which interfere with service or degrade service, including this one.

Our network monitoring includes comparing actual IP address usage to DHCP server lease assignments on a daily basis. Specifically, we compare our IP router ARP cache data to our DHCP server logs. This allows us to detect some devices using IP addresses not assigned for their use. This is a degree of monitoring that many sites do not perform. As many sites place client devices -- especially wireless clients -- behind NATs, performing such monitoring may be difficult for most sites. At the time we first encountered these Android issues, our client networks were not behind NATs, making such monitoring somewhat easier for us.

We also monitor our DHCP servers very closely for any problems they detect, including when they see DHCP-leased IP addresses in-use when they should not be, or when a client tries to SELECT an offer that was not made to it, or when a client tries to renew or rebind an IP address after the client's lease on that IP address has already expired. We have instrumented our DHCP server software to make it (somewhat) easier to see such events. Our monitoring also reports DHCP clients which are the source of excessive transactions; occasionally these are victims of malfunctioning Android devices "stealing" IP addresses.

As a result of the close monitoring we perform to detect DHCP issues, Princeton tends to learn about some kinds of bugs in DHCP client implementations sooner and more often than do many other sites.

We could choose instead to not take a pro-active stance to these kinds of issues. A more common approach is to ignore the kinds of problems caused by devices using IP addresses not leased to them, allowing such malfunctioning devices to cause sporadic mysterious network problems for customers as their IP addresses are "stolen". Sites that use that approach may take action only when a victim of a malfunctioning device chooses to complain. Most victims probably don't complain because these kinds of problems appear random and short-lived to each victim, and often go away when they "try again."

We feel that the stance we take ultimately benefits our customers, as it results in more reliable network service to the customers. It reduces the frequency that our customers experience network disruptions due to others' malfunctioning devices.

As a side note, this pro-active stance has also resulted in our discovering DHCP client issues a number of times over the years for a variety of common platforms. Typically we've provided technical details of these issues to the DHCP client vendors, which has helped the vendors to fix bugs and improve DHCP client behavior. Although identifying issues in vendors' DHCP client software is not our goal -- our mission is to provide excellent network service to Princeton University customers -- it does speak to the technical accuracy of the bugs we've discovered.

In the time since we reported this issue, a small number of others sites have indicated that they too are seeing one or more of the bugs described above.

Sites that monitor for these problems closely are less likely to notice issue #1 if they assign DHCP leases with long expiration times., for example, on the order of days. Princeton's wireless services rely on DHCP leases in the 1-3 hour range. (Shorter leases allow us to recover unused IP addresses rapidly, in turn permitting us to assign globally-routable IP addresses to clients without requiring Princeton to impose a NAT between wireless clients and the Internet.) Expiration times so long that the Android device is likely to be woken from sleep by the customer before the lease expires might hide issue #1 in some cases, but we have found that even waking an Android device exhibiting issue #1 will not always cause the device to use DHCP to obtain a fresh lease. And issue #3 is not hidden by using long lease times, even on the order of months.


A service of OIT Network Systems
The Office of Information Technology,
Princeton University
Last Updated: May 14 2012