OIT Networking & Monitoring Services

Running Multiple Simultaneous DHCP Clients on a Single Network Interface Can Interfere With Service

DHCPv4, the Dynamic Host Configuration Protocol for IPv4, allows a device attached to the network to automatically learn some or all of its network configuration, including its IPv4 (Internet) address. Most operating systems include DHCP client software.

A device that uses DHCPv4 runs a DHCP client instance on a network (e.g. Ethernet or Wireless) interface. The DHCP client contacts DHCP servers to obtain network configuration; in particular, it usually obtains a lease (a loan) of an IP address. The DHCP client instance identifies itself using the network interface's hardware address. For example, the DHCP client tells the DHCP server "I am network interface 0:1:2:3:4:5; please lease an IP address to me." The DHCP server responds "You may use IP address 192.168.1.2 for 6 hours; if you would like to continue using that address, please renew it when 3 hours have elapsed."

If a device has more than one network interface, each interface runs a separate DHCP client instance, and uses a unique DHCP Client Identifer (which at Princeton is the network interface's hardware type and hardware address). Each client instance is leased a different IP address by the DHCP server. So if a device has two physical network interfaces (Ethernet and Wireless, for example), each runs a DHCP client instance, is is identified uniquely, and each is leased its own IP address.

A device that runs two (or more) DHCPv4 clients simultaneous independent DHCPv4 (DHCP for IPv4) clients on the same network interface in such a way that the instances all identify themselves as the same client (that is, using the same explicit or implied DHCP Client Identifier) is not operating properly.

In most cases, malfunctioning in this way interferes with service to other devices attached to the campus network. This is because the device sometimes uses not only the single IP address currently assigned to it via DHCP, but also to use other IP addresses previously assigned to it, but no longer assigned for its use. We see such malfunctioning devices using two ore more IP addresses simultaneously on the same network interface. When the device uses an IP address no longer assigned for its use, often it will interfere with service to another device which has been assigned to use that IP address.

To stop a device malfunctioning in this way from continuing to interfere with service to others, we mark the malfunctoning device ineligible for certain campus network services.

Contents

  1. What Can Cause a Device to Malfunction In This Way?
  2. Properly Functioning DHCP Client Software is Needed to Use Some Popular Campus Services
  3. OIT Marks a Malfunctioning Device Ineligible for Certain Services
  4. Large Increase from iPhone, iPod Touch, and Macintosh Computers Starting September 2008
  5. Technical Details
    1. A Timeline
    2. Discussion
    3. The "Lax Server Behavior" Bandage
    4. The "Client Steering" Bandage
    5. Summary of Bandages

What Can Cause a Device to Malfunction In This Way?

There are a variety of ways a device might be misconfigured to run multiple DHCPv4 clients simultaneously on the same physical interface in such a way that all the clients identify themselves as the same client. Some include:


Properly Functioning DHCP Client Software is Needed to Use Some Popular Campus Services

A number of popular campus network services rely on the customer's device having properly functioning DHCP client software:

All of the network services above are ones that OIT provides beyond the more basic OIT Static IP Service. OIT Static IP Service can be used by Ethernet devices even if they lack working DHCP client software. But to use any of the added services above, a device needs to have properly functioning DHCP client software.


OIT Marks a Malfunctioning Device Ineligible for Certain Services

We usually discover this particular problem as a result of the device interfering with service to others. Often we only detect it after it's been doing so for a number of days. Due to the nature of the problem, sometimes it can take weeks to discover.

If the device is one that is registered in the Princeton University Host Database, when we detect it malfunctioning in this way, we mark the device ineligible for OIT Mobile IP Service, and for all wireless services provided by OIT. OIT then contacts a person responsible for the device.

If the device is not registered in the Host Database, we mark the device's hardware address(es) ineligible for a variety of services, including (for example) all wireless services provided by OIT, OIT Mobile IP Service, Temporary Unregistered Dormnet (TUD) IP Address Service, and Visitor IP (VIP) Service. Since the device is not registered in the Host Database, OIT often will be unable to contact the person responsible for the device. If later the device is registered in the Host Database, the blocks will be updated as necessary, and the customer will be contacted.

We install those blocks to prevent the malfunctioning device from continuing to interfere with service to others. These are the most conservative steps we can take to stop the malfunctioning device from continuing to interfere with service to others.

Indeed, while marked ineligible for these services, a device that is registered in the Host Database might still be able to use OIT Static IP Service. To be able to use OIT Static IP Service, it would have to have an Ethernet interface, the Ethernet interface would need to be assigned (in the Host Database) an OIT Static IP Address appropriate for the device's "home" network, and the Ethernet interface would need to be connected to the campus network on that "home" IP subnet.

Note that when we block service to a malfunctioning iPod as described above, it remains a functioning music player; the customer just cannot use the University's network to obtain network service on the iPod.

Note that when we block service to a malfunctioning iPhone as described above, it remains a functioning cell phone and even a functioning Internet device (using the cell phone provider's services); the customer just has to rely on the cell provider for network service instead of the (faster) University network.

If a registered device (or just its hardware addresses) is removed from the Princeton University Host Database while the blocks described above are still in place, the blocks will be updated as necessary to follow the device's hardware addresses.

If the customer wants those blocks removed, the problem with the device needs to be corrected. Once OIT has been advised that the problem with the device has been been corrected, we will remove the blocks.


Large Increase from iPhone, iPod Touch, and Macintosh Computers Starting September 2008

Prior to September 2008, OIT saw a small number of incidents of this nature each year. (During the preceeding year, we saw 37 incidents.) The incidents involved a variety of different kinds of devices. (The breakdown was: 16 Macintosh, 0 iPod/iPhone, 18 Other, and 3 Unknown.) In nearly all cases, the cause was found to be some identifiable misconfiguration of the device.

Starting in September 2008, we began seeing a large increase in the number of these incidents. The large increase is coming from Apple iPhones, Apple iPod Touches, and Apple Macintosh computers. (During September 1 2008 - February 25 2009, we saw 163 incidents. The breakdown was: 63 Macintosh, 76 iPod/iPhone, 20 Other, and 4 Unknown.)

Only a fraction of the iPhones, iPod Touch, and Macintosh computers on campus exhibited this problem.

While we continued to see other kinds of devices exhibit this particular problem, we saw no sudden jump in the number of malfunctions of this nature from those other devices; they seemed to exhibit "reasonable" growth year-to-year. Those other devices continued to exhibit this particular problem infrequently, and when they did, there was nearly an identifiable misconfiguration causing it.

OIT support staff tried to troubleshoot many of the malfunctioning Apple iPhones, Apple iPod Touches, and Apple Macintosh computers:

Some have asked why Princeton saw this but other did not report it. As the technical details below explain, for the problem to happen there must be multiple DHCP servers willing to serve the malfunctioning client, with each DHCP server offering the client a lease on a different IP address. The more DHCP servers willing to serve the client, the worse the problem caused by the malfunctioning clients becomes; Princeton operates four DHCP servers. Many sites operate with fewer DHCP servers; some may operate in such a way that only one of their DHCP servers may prefer to serve a client most of the time. Our DHCP service is also unusual in that it reclaims leases when a client asks for a new lease; it doesn't expect a single client's network interface to retain and use more than one lease at a time.

Even among those sites that meet the requirements to experience this problem, many may operate in such a way that they don't realize the problem is happening. Our DHCP service is unusual in that it alerts us every time it realizes that an IP address is being "stolen". Furthermore, we closely monitor usage of dynamically-assigned IP addresses, reconciling daily usage against DHCP leases; that helps us to discover anomolies like this. It appears many other sites don't monitor for such issues. If other sites collect the data to needed to monitor this, they too may observe the malfunctioning devices absorbing multiple IP addresses (limited by the number of local DHCP servers), placing stress on the IP pools.

Faced with the problems caused by the growing numbers of iPhone OS 2.x and Mac OS X 10.5.x systems behaving this way, at Princeton we modified our DHCP service to "hide" the problem. Specifically, in mid-March 2009 OIT modified our DHCP service to adopt the "Lax Server Behavior" Bandage. That only hides the problem, so that the iPhones, iPods, and Macs caching multiple unexpired DHCP leases stop interfering with other devices; it doesn't address the underlying issue.

The "Lax Server Behavior" bandage introduces a second problem: it causes IP addresses to be wasted. In fact, it could cause so many IP addresses to be wasted that the University could run out of IP addresses for certain services. To try to reduce the number of IP addresses wasted by the "lax" server behavior, in late March 2009 OIT began testing another change to our DHCP service: The "Client Steering" Bandage. It's essentially a "bandage on a bandage."

Tests of "Client Steering" showed that it is ineffective in steering Apple clients, due to an (unrelated) DHCP behavior exhibited by those clients. As a result, the Apple clients continued to consume too many IP addresses. This put excessive pressure on the IP address pools. It seemed likely we will exhaust our available IP address space unless something else changes.

We are not able to publish information about Apple's response to our report of the "client lease caching" issue, as Apple's response is protected by a "non-disclosure" clause.

However, without violating that non-disclosure clause, we can say that if one compares the source code Apple has published for the Mac OS X 10.5.8 and Mac OS X 10.6 clients, one of the changes is that in the newer code, the OS attempts to retain only a single IP address IP network at a time. (More accurately, if two leases have the same router IP and hardware address, Mac OS X 10.6 OS only retains one of the leases. At sites where DHCP clients on an IP network might use differing router IP or hardware addresses, this might not help. But at Princeton, all DHCP clients on an IP network use a single router IP and hardware address, so the result is that the OS retains only a single IP address on an IP network at one time.) That change suggests that the problem may be resolved starting in Mac OS X 10.6. That would also imply it may be addressed starting in iPhone OS 3.0.

Indeed, our experience to-date is that the changes made in Mac OS X 10.6 and iPhone OS 3.0 do appear to have addressed the issue for clients that have upgraded to those versions. (We cannot verify because we are still running the "bandages", which tend to obscure the problem; we cannot remove the bandages as long as any older clients are still present.)


Technical Details

In these sections we provide greater technical details to illustrate why it is a problem when a device runs multiple simultaneous DHCP clients on a single network interface in such a way that all the DHCP clients identify themselves as the same client.

A Timeline

Imagine we have a device running two DHCP client instances, both running on Ethernet interface 0:1:2:3:4:5. Both DHCP client identify themselves to the DHCP server as having hardware address 0:1:2:3:4:5. (And if they identify themselves to the DHCP server with a "DHCP Client Identifier" option, the value of the "DHCP Client Identifier" option also corresponds to Ethernt address 0:1:2:3:4:5.)

Follow what happens:

  1. DHCP Client instance 'A' broadcasts a DHCPDISCOVER message saying "I am 0:1:2:3:4:5. Please give me a new lease." The message reaches DHCP servers 'foo' and 'bar'.

  2. One DHCP server 'foo' responds with a DHCPOFFER message saying "Client 0:1:2:3:4:5, I offer to you a 6-hour lease on IP address 192.168.1.2." Another DHCP server 'bar' responds with a DHCPOFFER saying "Client 0:1:2:3:4:5, I offer to you a 6-hour lease on IP address 192.168.1.100."

  3. DHCP Client instance 'A' decides to select the offer from DHCP server 'foo'. It broadcasts a DHCPREQUEST message saying "I'd like to accept the offer from DHCP server 'foo' for a lease on IP address 192.168.1.2." The message reaches all the DHCP servers.

  4. DHCP server 'bar' sees that its offer has not been selected; it makes IP address 192.168.1.100 available for re-assignment. DHCP server 'foo' sees that its offer has been selected; it responds with a DHCPACK message awarding the lease to the client.

  5. DHCP Client instance 'A' receives the DCHPACK message, and causes the device's network interface to begin using IP address 192.168.1.2.

    Some time may pass, but then DHCP Client instance 'B' gets into the act...

  6. DHCP Client instance 'B' broadcasts a DHCPDISCOVER message saying "I am 0:1:2:3:4:5. Please give me a new lease." The message reaches several DHCP servers.

  7. When DHCP server 'foo' sees the DHCPDISCOVER, it determines that the client must be in the DHCP INIT state. Based upon the DHCP client state transition diagram in RFC 2131 section 4.4 (part of the DHCP specification), the DHCP server knows that client 0:1:2:3:4:5 has abandoned its unexpired lease on IP address 192.168.1.2. DHCP server 'foo' terminates the unexpired lease it was holding on IP address 192.168.1.2 for client 0:1:2:3:4:5, making the IP address available for re-use.

    This is because based on the client state transition diagram, there is no way for the client to transition from the INIT state to another state where it is permitted to use that IP address (without being awarded a new lease on that IP address by a DHCP server).

  8. DHCP server 'foo' responds with a DHCPOFFER saying "Client 0:1:2:3:4:5, I offer to you a 6-hour lease on IP address 192.168.1.2". Another DHCP server 'bar' responds with a DHCPOFFER saying "Client 0:1:2:3:4:5, I offer to you a 6-hour lease on IP address 192.168.1.100." (In each case, a DHCP server tries to offer the client the same IP address the server last leased or offered to the client, if possible; otherwise the server picks another IP address.)

  9. DHCP Client instance 'B' decides to select the offer from DHCP server 'bar'. It broadcasts a DHCPREQUEST message saying "I'd like to accept the offer from DHCP server 'bar' for a lease on IP address 192.168.1.100.

  10. DHCP server 'foo' sees that its offer for IP address 192.168.1.2 has not been selected by the client; after a short time, DHCP server 'foo' makes IP address 192.168.1.2 available for re-assignment.

    DHCP server 'bar' sees that its offer for IP address 192.168.1.100 has been selected by the client; DHCP server 'bar' responds with a DHCPACK message awarding the lease to the client.

  11. DHCP Client instance 'B' receives the DHCPACK message, and causes the device's network interface to also begin using IP address 192.168.1.100.

  12. At this time, the device's network interface 0:1:2:3:4:5 is using both IP addresses 192.168.1.2 and 192.168.1.100.

This is a problem. As far as the DHCP servers are concerned, the device 0:1:2:3:4:5 currently has a lease only on IP address 192.168.1.100. (In fact, the DHCP servers will try to lease IP address 192.168.1.2 to a second device. When that happens, the malfunctioning first device will interfere with service to the second device.)

Assuming the malfunctioning device remains online long enough, the following will also happen:

  1. DHCP Client instance 'A' still believes it has a lease on IP address 192.168.1.2 from DHCP server 'foo'. After some time has passed (for example, when the 6-hour lease period is half over), DHCP Client instance 'A' sends a DHCPREQUEST message to DHCP server 'foo' to ask that its lease be renewed.

  2. DHCP server 'foo' has no lease for client 0:1:2:3:4:5; that lease was terminated as a result of the activity from DHCP client instance 'B' using the same hardware address. DHCP server 'foo' responds with a DHCPNAK message, telling the client that it may not renew the lease because the lease doesn't exist.

  3. DHCP Client instance 'A' ignore the DHCPNAK message; it believes it still has time remaining on its lease. The device continues to use IP address 192.168.1.2 (in addition to 192.168.1.100). The problem continues.

Discussion

A key factor in this problem is that there are multiple DHCP servers present, and the IP address a DHCP client instance may obtain may vary depending on which DHCP server the client instance happens to use.

The campus network currently has four DHCP servers (two located at each data center) to provide a very high level of reliability for the DHCP service. The service is designed to continue functioning reliably even when one server is down for maintenance, and to function acceptably even in an emergency that affects a data center.

All the DHCP servers are authoritative for the IP addresses that are assigned statically to clients. When a client is attached to the campus network via its Ethernet interface on its "home" subnet, it is awarded its statically-assigned IP address, which is appropriate for use for that subnet. Regardless of the DHCP server from which the device obtains its IP address, the IP address will be the same.

When a client is attached to the campus network on a subnet other than its "home" subnet, it is awarded a dynamically-assigned IP address appropriate for use on that subnet. Each DHCP server is authoritative for a separate pool of IP addresses for dynamic assignment. Because the DHCP servers are independent of each other, and it would be an error for two servers to assign the same dynamic IP address to two different clients at the same time, the servers cannot share a single pool of dynamically-assigned IP addresses. The servers' pools do not overlap.

As a result, when a client is attached to the network in such a way as to need a dynamically-assigned IP address, the IP address offered to it by each DHCP server will be different. The DHCP client is responsible for choosing one from among the offers.

If the device were to run multiple DHCP clients on single network interface (all identifying themselves as the same client), each DHCP client might select a lease offered by a different DHCP server. When the leases are for dynamically-assigned IP addresses, each IP addres will be unique. So each of the multiple DHCP client instances may end up using a different IP address.

This explains why the problem is hidden when the device is attached in such a way that it uses OIT Static IP Service. All the DHCP servers offer the device the same IP address, so the multiple DHCP client instances all use the same IP address, even if they select from different DHCP servers.

This also explains why the problem happens more often when the malfunctioning device uses wireless service instead of Ethernet service. While Ethernet-attached devices at Princeton use dynamically-assigned IP addresses while attached outside their home subnet, wireless-attached devices always use a dynamically-assigned IP address. Furthermore, wireless-attached devices perform DHCP far more often than Ethernet-attached devices, because their network interfaces are disconnecting and reconnecting much more frequently. The problem isn't a wireless problem; it's simply far more likely that a malfunctioning device will demonstrate the problem when using wireless.

The DHCP servers are designed so that if it has an unexpired lease for the client, it will prefer to offer that same IP address to the client. In fact, even if the server's last lease to the client has expired, the server will still prefer to offer the same IP address to the client if that IP address is not currently leased to another client, and it's still reasonable for the client to use given the subnet to which the client is currently attached.

Given that, you can see that when a device is malfunctioning in the way described here, the number of IP addresses it might simultaneously use is limited to the lesser of the number of DHCP client instances running on the device and the number of DHCP servers. As Princeton has four DHCP servers, that's the most IP addresses a device malfunctioning in this way will use at the same time on one network interface.

Another key factor is that the campus DHCP servers terminate an existing lease they may have for a client when they see a DHCP client broadcast a DHCPDISCOVER to obtain a new lease. They do this because they expect the DHCP client to behave as described in the DHCP client state transition diagram in RFC 2131 section 4.4. If the campus DHCP servers did not terminate the existing lease, it would indeed prevent the DHCP servers from awarding the "stolen" IP addresss to other clients. In fact, that is apparently how the DHCP server software at many other sites operates. That prevents other sites from seeing the client "steal" an IP address that has since been assigned to another device. But this "lax" DHCP server behavior doesn't solve the underlying problem: the single device is still consuming too many IP addresses.

The "Lax Server Behavior" Bandage

In March 2009 OIT modified our DHCP service to adopt the "lax" DHCP server behavior mentioned above. We did that as a "bandage" to allow us to provide service to the growing number of Apple devices behaving as if they are running multiple DHCP clients.

Specifically, our DHCP service no longer terminates an unexpired DHCP lease for a client upon receipt of a DHCPDISCOVER from that client (as long as the server does not choose to offer the client a lease on a different IP address).

In the timeline above, in step seven DHCP server 'foo' will not terminate the unexpired lease it was holding on IP address 192.168.1.2 for client 0:1:2:3:4:5. As a result, following step 12, DHCP server 'foo' still has an unexpired lease for IP address 192.168.1.2 for client 0:1:2:3:4:5, and DHCP server 'bar' has an unexpired lease for IP address 192.168.1.100 for client 0:1:2:3:4:5. When another client comes along to request a lease, server 'foo' will not lease IP address 192.168.1.2 to another client. There will be no IP address conflict. And continuing in the timeline above, in step 13, DHCP server 'foo' responds with a DHCPACK to the client, since it still has an unexpired lease for the client.

This "lax" server behavior "hides" the problem of the client running multiple DHCP clients, rather than truly fixing it.

While the lax DHCP server behavior insulates other clients from "duplicate IP address" problems, it forces the DHCP servers to maintain multiple DHCP leases for a single client. This wastes IP address space. In fact, since the campus network is served by four DHCP servers, eventually each problematic client will consume four IP addresses. That's wasting three addresses. As the number of problematic clients grows, we will exhaust our available IP address space.

In fact, the "lax" DHCP server behavior required to hide this problem results in even more address waste that described above. It wastes IP addresses even when confronted with clients that don't behave as if they are running multiple DHCP client instances. (Well refer to these as "well-behaved clients" below.) This is because the lax DHCP server behavior fails to prune uneeded leases when a well-behaved client decides to abandon a lease from one server and select a lease from a different server. Here's what happens:

  1. A well-behaved client obtains a lease from DHCP server 'foo'.

  2. At some time before that lease expires, the client finds it must bring down and later bring up DHCP; for example, its network connection goes down and comes up. (This happens very frequently with wireless connections.)

  3. If the well-behaved client chooses to start DHCP in the INIT state (instead of INIT-REBOOT), it broadcasts a DHCPDISCOVER message.

  4. When it receives the DHCPDISCOVER message. DHCP server 'foo' does not terminate the unexpired lease for the client, based on the "lax" DHCP server behavior.

  5. All the DHCP servers offer the client a new lease.

  6. Imagine the DHCP client chooses an offer from a server 'bar', rather than from 'foo'.

  7. The DHCP client is BOUND to server 'bar'.

At this point the well-behaved client is using only the IP address it obtained from server 'bar'. But there is a DHCP lease for this client on both server 'foo' and on server 'bar', each for a different IP address. The presence of the lease on server 'foo' ties up an IP address that could otherwise be used for other clients. This wastes IP address space.

It's quite possible (common on a wireless network) for the client to return to the INIT state again before either of those two leases would expire. If the well-behaved client now chooses an offer from server 'baz', there will be three DHCP leases sitting in the servers for this client. Eventually (if the client has enough link down/up events before the leases expire), we can end up with a lease for this client on every DHCP server, even through this well-behaved client only needs one lease.

The address waste caused by the "lax DHCP server behavior" bandage led OIT to introduce the "Client Steering" bandage, described below.

The "Client Steering" Bandage

As described above, the "Lax Server Behavior" Bandage to hide the Apple DHCP client issue causes the DHCP service to waste IP addresses. In late March 2009 OIT modified our DHCP service to try to reduce the number of IP addresses wasted. This modification we refer to as "Client Steering". We currently consider this modification "under test."

With Client Steering, our DHCP servers attempt to "steer" each DHCP client to a particular DHCP server so that when using a dynamically-assigned IP address, it is more likely to choose one offered by a single DHCP server each time, rather than to select one from a different DHCP server each time.

For each client, Client Steering assigns a preference order to the multiple campus DHCP servers. We base this on the client's hardware address. For example, client 0:1:2:3:4:5 may be assigned to prefer server 'foo', followed by server 'bar', followed by server 'baz', followed by server 'shazam'. When the client starts in DHCP INIT state and broadcasts a DHCPDISCOVER message, the message reaches all the DHCP servers. The message contains a seconds field specified by the client, intended to represent the number of seconds since the client starting booting (that is, trying to use DHCP to obtain an IP address). Ideally, the client will set the seconds field to 0 in the first DHCPDISCOVER it sends. Because the request's seconds field is 0, only this client's most-preferred server ('foo') will respond to the client with a DHCPOFFER. The other servers will not respond because the seconds field is too low. If the client doesn't hear a response within a few seconds (perhaps the DHCP server's response was lost on its way to the client, or perhaps server 'foo' is unavailable), the DHCP client will eventually timeout and retransmit its DHCPDISCOVER message, increasing the value of 'seconds' field. When the request's seconds field is large enough, this client's second-most preferred server will also chime in; both servers 'foo' and 'bar' will respond to the client. If the client still hears no response, it will timeout and retransmit, again increasing the value of the seconds field. Larger seconds values will result in server 'baz' (and later server 'shazam') also chiming in.

Most of the time, the client will hear a response (from its most-preferred server) to its first request. As a result, most of the time the client will accept the offer from its most-preferred DHCP server. The client has been "steered" to prefer one server most of the time. This will tend to reduce the likelihood that there will be lease for the client on more than one DHCP server as a result of the "lax" DHCP server behavior described above.

Client Steering is not expected to be a perfect solution. Different clients set the seconds field differently; some don't always start with the field set to 0, and some do not increment the field as one might expect. Different clients use different timeout/retransmission strategies. So the threshholds we use to determine when each additional server will chime in will not be perfect in all cases. Some clients will set their seconds high enough from the outset that they will receive offers from more than one server, and so will (over time) end up selecting offers from different servers. And any clients that never increment the seconds field beyond 0 will not receive DHCP service when their preferred DHCP server is unavailable.

The DHCP servers do not apply Client Steering when they send a DHCPOFFER for a statically-assigned IP address. When a client is attached to the network in such a way that it will be assigned a statically-assigned IP address, all the DHCP servers will assign the same IP address to the client. As a result, there's no IP address waste even with the "lax" DHCP server behavior. Since there's no benefit from Client Steering when a client is offered a static IP address, we don't apply Client Steering in that situation. We only apply Client Steering when the client is offered a dynamically-assigned IP address, since the address it would be offered by each DHCP server is different.

Unfortunately, initial results from testing "Client Steering" have not been encouraging. We find it is ineffective in steering one particularly large class of clients. As it happens, those clients are Apple Mac OS X 10.5.x, iPhones, and iPods. While this is not directly related to the issue of those devices caching multiple leases, it's particularly unfortunate, as it means we cannot steer those clients to reduce the number of leases they cache.

The reason that Client Steering is ineffective for Apple Mac OS X 10.5.x, iPhones, and iPods is that when they enter they DHCP INIT-REBOOT state (as often happens when connected via wireless), if they do not believe they received a DHCPACK in response to their DHCPREQUEST, after several retries they switch to DHCP INIT state but do not reset the bootp_seconds value to 0. As a result, their DHCPDISCOVER packets typically have bootp_seconds set to 8 or 9. That value is high enough that it will not be steered by Client Steering. It's not apparent why these clients believe they received no response to their DHCPREQUEST broadcasts (they typically send four such broadcasts and are sent four DHCPACKs in response). It's also unfortunate that they enter DHCP INIT state at that point instead of simply entering the DHCP BOUND state, which is permitted by the protocol.)

Summary of Bandages

To summarize where things stand: The "lax" DHCP server behavior introduced in mid-March 2009 is a "bandage" to prevent the Apple clients caching multiple IP addresses from interfering with other clients.

The "lax" DHCP server behavior results in many devices being leased multiple IP addresses when only one was necessary, wasting IP address space.

We began testing "Client Steering" in late March 2009 to try to reduce the number of wasted IP addresses; it's a "bandage on a bandage." Initial results indicate that "Client Steering" is ineffective in steering Apple clients.

Since "Client Steering" is ineffective for such a large class of clients, and that class of clients is the same one that already is putting excessive pressure on the IP address pools due to their lease caching behavior, it seems likely we will exhaust our available IP address space unless something else changes.

Resolution

We are not able to publish information about Apple's response to our report of the "client lease caching" issue, as Apple's response is protected by a "non-disclosure" clause.

However, without violating that non-disclosure clause, we can say that if one compares the source code Apple has published for the Mac OS X 10.5.8 and Mac OS X 10.6 clients, one of the changes is that in the newer code, the OS attempts to retain only a single IP address IP network at a time. (More accurately, if two leases have the same router IP and hardware address, Mac OS X 10.6 OS only retains one of the leases. At sites where DHCP clients on an IP network might use differing router IP or hardware addresses, this might not help. But at Princeton, all DHCP clients on an IP network use a single router IP and hardware address, so the result is that the OS retains only a single IP address on an IP network at one time.) That change suggests that the problem may be resolved starting in Mac OS X 10.6. That would also imply it may be addressed starting in iPhone OS 3.0.

Indeed, our experience to-date is that the changes made in Mac OS X 10.6 and iPhone OS 3.0 do appear to have addressed the issue for clients that have upgraded to those versions. (We cannot verify because we are still running the "bandages", which tend to obscure the problem; we cannot remove the bandages as long as any older clients are still present.)


A service of OIT Networking & Monitoring Services
The Office of Information Technology,
Princeton University
Last Updated: April 14 2010