IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, VOL. 3, NO. 3, JULY 2006 309 Robust Supervisory Control for Production Systems With Multiple Resource Failures Song Foh Chew, Student Member, IEEE, and Mark A. Lawley, Member, IEEE Abstract—Supervisory control for deadlock-free resource allocation has been an active area of manufacturing systems research. To date, most work assumes that allocated resources do not fail. Little research has addressed allocating resources that may fail.

In our previous work, we assumed a single unreliable resource and developed supervisory controllers to ensure robust deadlock-free operation in the event of resource failure. In this paper, we assume that several unreliable resources may fail simultaneously. In this case, a controller must guarantee that a set of resource failures does not propagate through blocking to stall other portions of the system. That is, the controller must ensure that every part type not requiring any of the failed resources should continue to produce smoothly without disruption.

To do this, the controller must constrain the system to states that serve as feasible initial states for: 1) a reduced system when resource failures occur and 2) an upgraded system when failed resources are repaired. We develop the properties that such a controller must possess

We developed supervisory controllers to guarantee this requirement for these systems. In this paper, we extend the previous results to a more general class of systems where there are multiple unreliable resources. We establish a set of desired properties that the supervisory controller must possess Manuscript received October 25, 2004; revised February 22, 2005 and August 30, 2005. This paper was recommended for publication by Associate Editor M. Zhou and Editor N. Viswanadham upon evaluation of the reviewers’ comments. This work was supported by the National Science Foundation under Grant GOALI 0085047.

The authors are with the School of Industrial Engineering, Purdue University, West Lafayette, IN 47907-1287 USA (e-mail: malawley@ecn. purdue. edu; chew@ecn. purdue. edu). Digital Object Identi? er 10. 1109/TASE. 2005. 861397 in order to guarantee robust operation for these systems, and then develop a number of controllers that satisfy these properties. Index Terms—Deadlock avoidance, fault tolerance, ? exible manufacturing systems, robust control. NOMENCLATURE Part type . th stage of . Residual vector of stages from . Route of . Residual route of . Set of reliable resources.

Set of unreliable resources. Set of unreliable resources required by . Failure dependent resources for unreliable resource . Neighborhood of , failure dependent on . Maximal strongly connected set of neighborhoods. I. INTRODUCTION D EADLOCK-FREE resource allocation has been an active area of manufacturing research for the past decade, and researchers have made tremendous progress in understanding the computational, structural, and control aspects of the deadlockavoidance problem [4]–[11], [13]–[23], [26]–[29], and [32]. In most of this work, resources are assumed not to fail.

However, as any practicing manufacturing engineer knows, downtime due to resource failure is a common problem on the plant ? oor. Downtime results from a variety of causes, including tool breakage, defective parts, faulty sensors, and so forth. Automated manufacturing systems are often conservatively designed and controlled so that a single fault or failure causes system shutdown. This is a major problem that the manufacturing research community needs to begin addressing. As stated in [30], only a few researchers have addressed the control implications of resource failure [24], [25], [27], [29], [33]. Reference [30] contrasts the approach presented here with that of [24], [25], [27], and [33]. ) In [30] and [31], we develop controllers that allocate resources so that the failure of a single resource does not propagate to stall larger portions of the system. Our concern is not with repairing the failed resource. Rather, we investigate the potential effects the failure has on the rest of the system and develop controllers that keep the system running in the face of the resource failure. Our objective is to control the system so that the unexpected failure of an unreliable resource does not affect the processing of 545-5955/$20. 00 © 2006 IEEE 310 IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, VOL. 3, NO. 3, JULY 2006 Fig. 1. Example system with two unreliable resources. part types that do not require that resource. We note that when a resource fails, all parts in the system requiring that the resources for future processing are unable to be completed until the resource is repaired. Because these parts must occupy some of the system’s buffer space, they can pose an impediment to the smooth ? ow and production of the part types not requiring the failed resource.

Thus, one objective of our control theory is to ensure that when an unreliable resource fails, the buffer space allocation of those parts requiring that unreliable resource can be recon? gured so that parts not requiring that resource are not blocked. This recon? guration must ensure, however, that when the failed unreliable resource is repaired, the system is not in an unsafe state. We refer to a supervisory control policy that guarantees these properties as being robust to the failure of unreliable resources. We will provide examples of these phenomena in the following sections.

While the work presented in [30] and [31] develops robust supervisors for systems with a single unreliable resource, the work presented here develops supervisors for systems with multiple unreliable resources. Also, the policies presented here provide more allocation ? exibility than the approaches presented in both [30] and [31]. The remainder of this paper is organized as follows. Section II formally de? nes the system and the problem that we address in this paper. Section III motivates properties for robust supervisory control. Section IV develops and synthesizes supervisors that satisfy these properties.

Finally, Section V offers some concluding remarks and directions for future research. The Appendix containing the theoretical development and detailed proofs is added at the end of the paper. II. DISCRETE EVENT SYSTEM We model our systems using the approach of Ramadge and Wonham [3]. This is necessary to de? ne the properties that we want our supervisors to enforce. The following model is similar to that found in [30], but differs in that we now have more complex failure scenarios and, thus, some of the underlying formalism has to be generalized. Fig. 1 provides an example for the following development.

The system is de? ned as a nine-tuple vector S . In S R is the set of system re, where source types, with is the set of reliable resource types, not subject to failure and is the set of unreliable resource types, subject to failure. , where is the capacity of the Let . buffer space associated with system resource type The set P of part types is produced by the system with each representing an ordered set of processing part type , where represents the th stages be processing stage of . Also, let the residual part stages. to represent a part instance of .

Let We will use such that returns the resource required by . Thus, the route of is , and the residual route . Finally, let be the set of part type stages . associated with resource We assume our resource types to be workstations with buffer space for staging and storing parts and a processor or server for operating on parts. We will use the standard assumption from queuing theory that the server is not idle so long as there are un? nished parts in a workstation’s buffer space. The resource units that we are concerned with allocating are instances of the workstation’s buffer space.

The controllers that we design are not intended to allocate the server among parts waiting at the workstation. We assume this to be done by some local queuing discipline. Workstation failure will imply the failure of the workstation’s server, not any of its buffer space. We will assume that when the server of a workstation fails, we can continue to allocate its CHEW AND LAWLEY: ROBUST SUPERVISORY CONTROL FOR PRODUCTION SYSTEMS 311 buffer space up to capacity, but that none of the waiting parts can be processed and, thus, cannot proceed along their respective routes until the server is repaired.

We further assume that server failure does not prevent ? nished parts occupying the workstation’s buffer space from being moved away from the workstation and proceeding along their respective routes. Finally, we assume that server failure does not damage or destroy the part being processed and that failure can only occur when the server is working. We are now in a position to de? ne the system states and events. Let Q represent the set of system states, where and , with being the status of the server of workstation i (0 if failed, 1 if operational), being the number of (parts waiting or in-process) located in un? ished units of , and being the number of ? nthe buffer space of located in the buffer space of . is ished units of the set of initial states with being the state in which no resources are allocated and all servers are operational. The . dimension of is Let , where and is the set of controllable events with , to a part instance of ; representing the allocation of that is, is the event that a part instance of a part type advances into the buffer space of a workstation that will perform represents a ? nished part of its th operation. Then, leaving the system.

We assume that the supervisor contype trols the occurrences of these events through resource allocation decisions. We have being the set of uncontroland lable events where represents the completion of service for . represents the failure Then, and repair of the server of unreliable resource . Service completions, failures, and repairs are assumed to be beyond the controller’s in? uence. be a function that, for a given state, returns Let the set of enabled events. This function is de? ned for a state as follows. 1) For , if , then . Events that release new parts into the system are enabled when space is available on the ? st required workstation in the route. , if and , then . For If a part is at service, then the corresponding service completion event is enabled. and . For If the server is busy with a part, then the corresponding failure event is enabled. , if , then and For . If the server fails, the corresponding repair event is enabled and the corresponding service completion events are disabled. , if and For , then . When a part ? nishes its current operation and buffer space becomes available at the next required workstation in its route, the event corresponding to advancing the part is enabled. , if , then . ) For If a part has ? nished all of its operations, the event corresponding to unloading it from the system is enabled. The state transition function is now de? ned as follows. The transition function is a partial function from the cross product to the set Q of system states. Speci? cally, let such that advancing a part service completion of a part failure of server repair of server where , and are the standard unit vec, and tors with components corresponding to being 1, respectively. Note that, , the zero represents a raw vector with the same dimension, and that part of waiting to be released into the system. In this case, any In this work, we assume that subset of the unreliable resources can be in a failed state simultaneously. Thus, if one of the subsets of size , is down, we want the remaining resources to continue producing parts not requiring any of the failed resources without human intervention to remove or rearrange the parts requiring failed resources. Further, when one of the failed resources is repaired, we want production of parts requiring that resource to resume. The next section will motivate and develop properties that a supervisory controller must possess in order to accomplish this.

III. MOTIVATING EXAMPLES FOR PROPERTIES FOR ROBUST SUPERVISORY CONTROL This section motivates a set of desired properties for a robust supervisory controller based upon an example production system. Fig. 1 presents an example manufacturing system with two unreliable resources. The stages, routes, and resource capacities are given, as is the complete discrete event model. This model enumerates the resources, capacities, events, and so forth, and will be used to illustrate the concepts de? ned in Appendix A. For now, we will constrain our discussion to the system states presented in Figs. –4. We recall that, by de? nition, a resource allocation state is safe if, starting from that state, there exists a sequence of resource allocations/deallocations that completes all parts and takes the system to the empty state, the state in which no resources are allocated. Our underlying assumption is that if a resource allocation state is safe, then, under correct supervision and starting from that state, it is possible to produce all part types inde? nitely. We have several control objectives for the system of Fig. 1.

First, we desire that the controller guarantee deadlock-free operation (i. e. , that it keeps the system producing all part types). 2) 3) 4) 5) 312 IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, VOL. 3, NO. 3, JULY 2006 Fig. 2. Undesirable system state since unreliable resource r may fail while processing part p . Fig. 3. Undesirable system state since unreliable resource r may fail while processing part p . Second, in the event that fails, we want to continue producing , without having to interpart types not requiring vene by clearing the system of parts requiring .

Similarly, in the event that fails, we want to continue producing part types not requiring , , again without having to intervene by clearing the system of parts requiring . Further, if both and are in the failed state, we want to continue producing again without explicit inpart types not requiring or tervention. Consider, for example, the state given in Fig. 2. This state is safe; however, if fails in this state, the production of both and will be blocked by two ’s at . Note that if we advance a from to , then production of can proceed. will now be blocked.

Thus, this state However, production of does not satisfy our condition that after the failure of , we should be able to continue producing both and . As another example, consider the state of Fig. 3. Again, we see that this state is safe. However, if fails in this state, production of part types and will be blocked by at , although the production is unaffected. of Thus, these examples illustrate that parts requiring a failed resource can prevent the system from producing parts not requiring the failed resource through propagation of blocking.

Our objective is to develop supervisory controllers that avoid this by guaranteeing that if an unreliable resource fails, it is possible to redistribute the parts requiring that resource so that part types not requiring that resource can continue to produce. For the third objective, consider the state of Fig. 4. Again, we is blocked see that the state is safe. If fails, production of at . Further, production of is blocked by at . by from to its next required reWe note that by advancing and can now be resolved and, source , the blockages of thus, the system can continue producing both and as desired.

However, when is repaired, the system is no longer safe since resources and are now involved in deadlock. This illustrates that our controller must guarantee that any redistribution of parts requiring the failed resource do not result in system deadlock when the resource is repaired. The above discussion lays a foundation for a robust supervisory controller. In summary, a supervisory controller is said to CHEW AND LAWLEY: ROBUST SUPERVISORY CONTROL FOR PRODUCTION SYSTEMS 313 Fig. 4. Undesirable system state since r may fail while processing part p . be robust to resource failures of satis? es the Property 3. 1.

Property 3. 1: ) if the supervisory controller A. Neighborhood Policy In this subsection, we discuss neighborhood constraints based on the notion of failure dependency. Informally, a resource is failure dependent if every part that enters its buffer space requires some future processing on a given unreliable workstation. Thus, all unreliable resources are failure dependent. Some reliable resources may also be failure dependent if they only process parts that require future processing on a given unreliable resource. This is de? ned more precisely later. For each failure-dependent resource, we generate a neighborhood.

The neighborhood of a failure-dependent resource is a virtual space of ? nite capacity that is used to control the distribution of parts requiring that failure-dependent resource. Again, this is formalized in the following, where we extend the neighborhood concepts presented in [30] for systems with multiple unreliable resources. We ? rst discuss and illustrate neighborhood concepts, and then illustrate how neighborhood constraints are constructed for failure-dependent resources. is the set of unreliable resources in the system Recall that is the set of part type S, and that , then is said stages supported by resource .

If to be failure dependent on if with . In other words, is failure dependent on if every part that enters the buffer of requires future processing on unreliable resource (note that is failure dependent on itself). For , let and with being the set of failure-dependent resources on , and let and . , we construct For each failure-dependent resource of a neighborhood. The neighborhood of is de? ned as the set of part type stages that require now or later in their processing and have no intervening failure-dependent re. Formally, with sources of and . Thus, if , with and , then , and ) The supervisory controller ensures continuing production of part types not requiring failed resources, given that additional failures/repairs do not occur. The supervisory controller allows only those states that serve as feasible initial states if an additional resource failure occurs. The supervisory controller allows only those states that serve as feasible initial states if a failed resource is repaired and becomes operational. We say that a state is a feasible initial state if, starting from that state, it is possible to produce all part types not requiring failed resources.

Appendix A formally develops and de? nes this property using language theory. The objective of this research is to develop supervisory controllers that satisfy Property 3. 1 (expressed more formally in Appendix A). Such a controller will be capable of ensuring robust operation in the face of multiple resource failures or repairs without manual intervention. IV. ROBUST SUPERVISORY CONTROL This section develops controllers that satisfy Property 3. 1 above, while maintaining polynomial complexity. Each controller is a conjunction of a modi? ed deadlock avoidance policy and a set of neighborhood constraints.

The deadlock avoidance policy guarantees deadlock-free operation, while the neighborhood constraints control the distribution of parts that require unreliable resources. Section IV-A develops the neighborhood constraints, NHC. Section IV-B develops a supervisor based on a modi? ed Banker’s Algorithm and NHC, while Section IV-C develops a supervisor based on single-step look-head (SSL) and NHC. Appendices B–D establish that these controllers guarantee Property 3. 1. 314 IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, VOL. 3, NO. 3, JULY 2006 . Let be the neighborhood and let . et for For example, the system of Fig. 1 has two unreliable resources . Note that anytime appears in a route, appears later in the route and, thus, . Also, anytime or appear in a route, appears later in the route, so, . Thus, and where the neighborhoods are as follows: is the capacity of . is said to be capacitated if and over-capacitated if . De? ne the set of all as possible neighborhood constraints with respect to In the example, we have To understand this construction, consider . Note that and . Since . Now consider Since . Similarly, . Since . Thus, we get and nd , and . . Although all parts supported by later need an unreliable is shared by and and, thus, it is not failure resource, dependent on either. This implies that failure-dependent sets are ). Furthermore, we observe disjointed (i. e. , that no part stage is in more than one neighborhood (i. e. , ). These and other important neighborhood properties are established in Appendix B. We restrict the number of parts allowed in a neighborhood. Our intention is to guarantee that every part in the neighborhood of a failure-dependent resource has capacity reserved at that resource.

That is, we want to be able to advance every part requiring an unreliable resource into its associated failure-dependent resource in the event of a resource failure so that it will not block production of parts not requiring the failed resource. In the example, for a permissible state, we want, for example, to have a reserved unit every part in of buffer at . As a consequence, we will reject a state if this constraint is violated. For instance, a state is not admissible if, ; recall that has a at this state, the sum of parts in single unit of capacity.

To see this, at this inadmissible state, if fails, at least one part of must reside at or . Although , and do not require failed in their processing, this distribution of parts may, in turn, block production of some of these part types. Our objective is to develop supervisory controllers capable of rejecting these undesirable states. We now construct neighborhood constraints to enforce the above intention. The constraint for a neighborhood, , is an inequality of the form where say . Recall that is the number of ? nished instances, and is the number of un? ished instances located in the buffer of ; and the right-hand side of Constraints of ensure that no neighborhood of becomes overcapacitated. may induce deadlock among As discussed in [30], , since if all neighborhoods failure-dependent resources of are capacitated, parts cannot move from one neighborhood to another without over-capacitating a neighborhood. In the exand ample, a state may satisfy both . But a part moving from one of these associated neighborhoods to another must overcapacitate the other neighborhood. To resolve this dilemma, we develop an additional set . f constraints It is ? rst necessary to compute the set of strongly connected . To do this, for each , we conneighborhoods for struct a directed graph where . Thus, in operation, there to . We then compute the set will be part ? ow from using standard polynomial of strong components of ( graph algorithms, such as those found in [12]. (Recall that a set of nodes in a directed graph is strongly connected if each pair of nodes in the set is mutually reachable. A singleton is, by de? nition, strongly connected. A strong component is a maximal set of mutually reachable nodes. For example, we see and are strongly connected, since that and . Therefore, in operation, there to and from to . Let is ? ow from be the set of strongly connected components of . In , and the example, . Then, is stated as follows: Hence, for every strongly connected component of guarantees that, at most, one neighborhood CHEW AND LAWLEY: ROBUST SUPERVISORY CONTROL FOR PRODUCTION SYSTEMS 315 can be capacitated at a time. In the example, we have the following: pendent part-type stages (those that do require failure-dependent resources in the residual route). Algorithm A1: Query: Is state admissible?

Input: state ; Output: ACCEPT/REJECT STEP 1 INITIALIZATION For For ; End For End For For End For For For End For For For For For For For End For For End For STEP 2 TEST AND EVALUATION While Find such that for all If no such exists Return REJECT Else guarantees that and are not simultaneously capacitated. guarantees that no To summarize, neighborhood is overcapacitated, and that neighborhoods with mutual ? ow dependencies are not simultaneously capacitated. The complete set of neighborhood constraints is de? ned as and Note that in the worst case, we generate one constraint for each . air of resources and, thus, the size of NHC is of Appendix B establishes several important properties of NHC. These properties are required to establish robustness of the two supervisors that we develop later. We now modify two deadlock avoidance policies that we use in conjunction with NHC to develop robust supervisors. B. Banker’s Algorithm In this subsection, we con? gure Banker’s Algorithm [1] (BA) to work with NHC. BA is perhaps the most widely known deadlock avoidance policy (DAP), and its underlying concepts have in? uenced the thinking of numerous researchers.

BA is a suboptimal DAP in the sense that it achieves computational tractability by sacri? cing some safe states. BA avoids deadlock by allowing an allocation only if the requesting processes can be ordered so that the terminal resource needs for the th process in the ordering can be met by pooling available resources and those re. The ordering leased by completed processes is essentially a sequence in which all processes in the system can complete successfully. BA is of O(mnlog n) where m is the number of resource types and n is the number of requests. Other manufacturing-related work that uses BA includes [21], [26], and [32].

Our modi? cations (algorithm A1) are straightforward and are a generalization of those found in [30]. Our objective is to search for an ordering of parts that advances failure-dependent parts (those requiring unreliable resources) into the resource of their current neighborhood, and nonfailure-dependent parts (those not requiring unreliable resources) out of the system. Again, the ordering is such that the resources required by the ? rst part are all available, those required by the second part are all available after the ? rst part has ? nished and released the resources held by the part, and so forth.

If the system can be cleared in this way (all failure-dependent parts are advanced into failure-dependent resources and all nonfailure-dependent parts are advanced out of the system), then we can guarantee that if any unreliable resource fails, the system can continue producing parts that do not require this failed resource. be the set of partIn the following, let type stages instantiated in whose parts hold nonfailure-depenis the set of nonfailure-dependent dent resources, where part-type stages (those that do not require failure-dependent reis the set of failure-desources in the residual route) and

End For End For End For and If End For End If End for End If End While Return ACCEPT Step 1 of the algorithm con? gures the data structures required. For every part-type stage represented in the system, these capture the current resource holding and the future processing need. These structures also capture the resource availability of resources in the state being tested. Three additional comments regarding the algorithm are in order. First, the need for every failure-dependent resource is explicitly set to zero, so this version looks only at the availability of nonfailure-dependent resources.

Second, for nonfailure-dependent part-type stages (those not requiring unreliable resources), the need for every resource in the residual route (except the one held) is set to one. Finally, for failure-dependent part-type stages (those requiring unreliable resources), the need for every resource in the residual route (except the one held) up to the one 316 IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, VOL. 3, NO. 3, JULY 2006 Fig. 5. Admissible system state by supervisor 1. yields a robust superAppendix C formally establishes that visor for systems where every part type requires in its route, at most, one unreliable resource.

C. Single-Step Look Ahead Policy It is well known that certain system structures, such as a central buffer, input/output bins, and nonunit buffer capacities, eliminate the possibility of deadlock-free unsafe states [28]. In these systems, every state is either deadlock or safe and, therefore, a single-step look-ahead policy (SSL) is a correct and optimal deadlock avoidance policy. Further, it is of polynomial complexity and, thus, ideal for runtime applications in real systems. In the following, we will modify the SSL presented in [23] so that it works with systems with multiple unreliable resources.

A resource allocation graph (RAG) is a digraph that encodes the resource requests and allocations of parts , where [23]. For our purposes, let is the set of system nonfailure-dependent resource types and and is with . A subdigraph holding a part , is induced when and of RAG, say . A subdigraph, forms a knot in RAG if , where is the set of all nodes reachable from in RAG. In other words, a set of nodes forms a knot in RAG when, for every node in , the set of nodes reachable along arcs in RAG is exactly . Further, we de? ne a capacitated knot to be a knot in which every resource in the knot is ? led to capacity with parts requesting other resources in the knot. It is commonly known that a capacitated knot in RAG is a necessary and suf? cient condition for deadlock in these types of sequential resource allocation systems. We now provide an algorithm to detect a . This algorithm has capacitated knot in the same polynomial complexity as that given in [23]. Algorithm A2: . Input: Output: DEADLOCK, NO DEADLOCK. immediately preceding the ? rst encountered failure-dependent resource is set to one, all others are set to zero.

Note that these are the resources such a part will need to advance into the failure-dependent resource of its current neighborhood. Step 2 then executes the usual Banker’s logic. A1 is not correct by itself, since it does not handle allocation of failure-dependent resources (for detailed examples, the reader is referred to [30]). However, A1 and NHC together form a robust controller, that is, if we allow the system to visit only those states acceptable to both A1 and NHC, then the system operation will satisfy the requirements of De? nition 3. 1.

The detailed proofs for this are given in Appendix C. The supervisor is de? ned as follows: De? nition 4. 1: Supervisor . Supervisor permits a system state that satis? es both A1 and NHC in runtime. Consider Fig. 5, which illustrates a state, holds and ; and holds . It is say , in which clear, at , that there exists an admissible sequence by A1; that is, can advance into failure-dependent resource can can advance into failure-dependent resource ; and, ? nally, be advanced out of the system. In addition, satis? es NHC since there is one there is one Therefore, is an admissible state by .

Supervisor will prohibit, at , advancing into (where it becomes because and will block, causing the resulting state to into at is violate A1 although not NHC. Loading a since the resulting state violates , also precluded by although not A1. However, advancing one step into or into will result in an admissible state. loading a is of polynomial complexity since both A1 Supervisor and NHC require polynomial time for runtime implementation. CHEW AND LAWLEY: ROBUST SUPERVISORY CONTROL FOR PRODUCTION SYSTEMS 317 Step 1: Compute the set of strongly connected components . of RAG: such that Step 2:

Construct digraph and and for . Step 3: For every strongly connected component such that If is a capacitated knot Return DEADLOCK. End If End For Step 4: Return NO DEADLOCK. We note that, for our present work, this version of deadlock detection algorithm operates only on nonfailure-dependent resources and parts held by these resources. In A2, Step 1 computes the set of strongly connected components in RAG. As mentioned earlier, this is a standard digraph operation. Step 2 constructs a digraph that de? nes the reachability relationship between these components. Step 3 looks for a component with no outgoing arc.

If such a component is ? lled to capacity with parts requesting other resources in the component, then it is a capacitated knot, and deadlock exists. If no such capacitated knot exists, then the RAG is deadlock-free. Note that A2 is not correct by itself since it considers only the nonfailure-dependent resources. Failure-dependent resources can easily deadlock themselves. However, when A2 is taken in conjunction with NHC, it guarantees Property 3. 1 and, thus, ensures that the system will continue to operate even when multiple unreliable resources are down. De? nition 4. 2: Supervisor . ccepts a system state that contains no deadSupervisor lock and satis? es NHC. For example, in Fig. 1, suppose that every nonfailure-dependent resource has nonunit capacity; that . Then, A2 permits is, any state in which no subset of parts residing on is deadlocked on . If the state also satis? es NHC, then the properties of 3. 1 are guaranteed. is suited for real-time impleNote that mentation since both A2 and NHC are both of polynomial comyields a roplexity. Appendix D formally establishes that bust supervisor for systems where every part type requires in its route, at most, one unreliable resource. V.

CONCLUSION In this paper, we developed supervisory controllers to allocate resources that may fail so that robust deadlock-free operation is possible at all times in ? exibly automated manufacturing systems. In other words, every part type not required in its processing of any of the failed resources may continue to produce at any time. We established a set of desired properties for a supervisory controller robust to resource failures. This controller constrains a system to states that will result in feasible initial states from which deadlock-free operation is possible for a perturbed system if a failure or a repair event occurs.

We developed a modi? ed version of Banker’s Algorithm, A1, and combined it with a set of neighborhood constraints, NHC, as a policy to restrict the number and distribution of parts in a system. The conjunction of these two policies was shown to satisfy the desired properties and, hence, is capable of ensuring robust deadlock-free operation given that every part requires in its processing, at the most, one unreliable resource. We also developed another robust control policy for this subclass of systems, the conjunction of a single-step look-ahead policy A2 and NHC.

Future work may look into ensuring robust deadlock-free operation for systems where every part type may require multiple unreliable resources. Another direction may consider systems where resources degrade over time. APPENDIX A. De? nition of Formal Properties This section formally develops a set of desired properties for a supervisory controller that is robust to multiple resource fail. ures. Recall that our system is S be the uncontrolled language generated by S. Let S Further, for a string S and an event , let be the score (number of occurrences) of in .

The state transition S function is extended in the usual way; that is, for leading from state to , and , we have . S S represents the beThe controlled language havior exhibited by the system S under the control of supervisor . Here, is a function mapping S to the power set of . Speci? cally, S such that for S is the control action for S at state with . . S is only allowed to execute an event of is “admissible,” while Hence, under is “inadmissible. ” Note that must be since it is assumed that ; that in is, is not allowed to disable any enabled uncontrollable events . t state Now, let be the set of unreliable resources such that is failed in the current state (in each . Then, the event set of S will have been recon? gured; in fact, we will say that we now have a modi? ed events generator S , where S bounds the occurrences of events in the set . The set represents the part moves and completions for all stages of part types requiring a failed resource of . be the uncontrolled language of S ; Let S that is, the language of S given that resource failures or repairs do not occur. Then, S S is the conunder supervisor . Let be the trolled language of S , thus set of system states generated by S . s the set of initial states for S . To understand Note that is, for a failed resource set , de? ne and as what that satisfy the following properties: subsets of and and In words, is a subset of unreliable resources that contains and has exactly one additional failed resource, while is a subset of and has exactly one less failed resource. Thus, 318 IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, VOL. 3, NO. 3, JULY 2006 if a system S has a failed resource set with , and the repair event occurs in state , then the resulting state will be the initial state of the system S .

Similarly, with , and if a system S has a failed resource set the failure event occurs in state , then the resulting state will be the initial state of the system S . Thus, we write the following: Note that . The following desired properties of supervisory controller can now be de? ned. The ? rst property is that a supervisory controller must ensure safety for the system S . That is, given that further resource failures or repairs do not occur, must guarantee that S continues to produce all part types not requiring failed resources.

More formally, must guarantee that S and (natural numbers), u S such that is a pre? x of and then , we have . In short, in the absence of any further resource failures or repairs, must ensure that can continue to produce every part type not requiring S any of the resources of . must be able to execute Thus, in the example of Fig. 1, S inde? nitely all events in .S must be able to execute inde? nitely all events in . S execute inde? nitely all events in . must be able to execute inde? nitely must be able to , when the repair event for occurs in S . The following de? ition of a robust supervisory controller is now possible. De? nition A. 1: Supervisory controller is operationally roin system bust to failures/repairs of unreliable resources of S if: ) S and S such that is a pre? x , we have ; of and ) such that must guarantee A. 1. 1 for S starting from ; ) such that must guarantee A. 1. 1 for S starting from state . The above de? nition can be summarized as follows: A. 1. 1 requires that the supervisory controller ensure continuing production of part types not requiring failed resources, given that additional failures/repairs do not occur; A. 1. requires that the supervisory controller allow only those states that serve as feasible initial states if an additional resource failure occurs; and A. 1. 3 requires that the supervisory controller allow only those states that serve as feasible initial states if a failed resource is repaired and becomes operational. The objective of this research is to develop supervisory controllers that satisfy these properties. Such a controller will be capable of ensuring robust operation in the face of multiple resource failures or repairs without manual intervention. We note that we require these policies to be polynomial in resources and part types.

B. Properties of NHC We now develop properties of NHC that are needed to establish that NHC together with some appropriately con? gured DAP will be robust to failure of any subset of the unreliable resources. In the remainder of the work, we will assume that part types require no more than one unreliable resource. This is necessary to guarantee that neighborhoods do not intersect and, thus, the advancement of a part between neighborhoods cannot simultaneously violate the constraints of more than one. We now present and brie? y discuss NHC properties. Property B. simply states that if is in some neighborhood, then , when it exists, will also be in some neighborhood. Property B. 1: If with , then . for , Proof: Suppose then . Suppose , then . . is not in any neighborhood, Property B. 2 states that if , when it exists, will not be in any neighborhood. then Property B. 2: If with , then . Proof: Suppose that . Then, by Property B. 1. But this contradicts the hypothesis. And, ? nally, S all events in . The second property is that a supervisory controller must at all times constrain S to states that serve as feasible initial states for a reduced system if a failure event occurs.

Suppose fails while processing a part. Let S such that (with , and . Then supervisory controller must ensure safety for the starting from state . Note system is unable that Fig. 2 illustrates a safe state from which S , while Fig. 3 to inde? nitely execute events in is unable to inde? nitely illustrates a safe state from which S . execute events in must The third property is that a supervisory controller constrain the system to states that will result in feasible initial states for an upgraded system if a repair event occurs. Suppose is repaired. Let that a failed unreliable resource S such that (with , and ).

Then, supervisory controller must ensure safety starting from state . for the system S S cannot inde? nitely Fig. 4 illustrates a state for which S execute events in , namely CHEW AND LAWLEY: ROBUST SUPERVISORY CONTROL FOR PRODUCTION SYSTEMS 319 Properties B. 3, B. 4, and B. 5 establish that if every part type requires at most one unreliable resource, then all neighborhoods are nonintersecting. Property B. 3: Let be the set of unreliable resources required by . If , then . , and let . Since Proof: Let , it follows that , which . That is, the route of contains both implies unreliable resources and , a contradiction.

Property B. 4: Suppose and . If . . Since Proof: Let , for some , there exists a sequence with and . Since , for some , there exists a sequence with and . Without loss of . Thus, generality, let , which implies . Property B. 5: Suppose , and . If and . . Since , Proof: Let , there exists a sequence for some with and . Since , for some , there exists a sequence with and . . Thus, Without loss of generality, let , which implies . Properties B. 6-B. 9 establish conditions under which advancing parts do not cause violations of NHC. Property B. 6 establishes that if the system is in a state that satis? s NHC, and we advance a part not requiring any unreliable resource, then the new state also satis? es NHC. , and let be Property B. 6: Suppose for . Then, the state a state that satis? es NHC such that also satis? es NHC. Proof: If , then , the variables and appear in no constraint of NHC. Thus, decrementing by one and incrementing by one (the difference bedoes not affect the constraints of tween states and NHC. . Property B. 7 establishes that for systems where every part type requires, at most, one unreliable resource, if the system is in a state that satis? s NHC, and we advance a part within a neighborhood, then the resulting state does not violate NHC. and Property B. 7: Suppose . Let be a state that satis? es NHC . Then, the state also satis? es such that NHC. . Then, the correProof: Let apsponding state variables of and, aspear in the constraint for some m, in the constraints of , and nowhere else in NHC. Since , executing the event does not alter the left-hand sum of or of . Thus, any constraint of . if satis? es NHC, so does Property B. 8 establishes conditions for advancing parts between neighborhoods that have mutual ? w dependencies (that is, they occur in the same strongly connected component of . The result is that if all neighborhoods in the strongly ( connected component are under capacitated, then advancing a part from one to another will not violate NHC. If one is capacitated (as is allowed by NHC), then advancing a part out of it and into another (which, by NHC, must be under capacitated) will not violate NHC. and Property B. 8: Suppose such that and with . Let be a state . If either (1) that satis? es NHC such that or (2) , then satis? es NHC. ecrements Proof: Note that the execution of by one and increments ( by one. Also, if satis? es NHC, then either (1) , or (2) such that and . Finally, if violates NHC, then at , we have either (i) such that > or (ii) such that and . , since Suppose (1) is true in . (i) cannot be true in by one cannot yield . Further, incrementing since incrementing by one (ii) cannot be true in cannot yield two ? lled neighborhoods where none existed in . Suppose (2) is true in . (i) cannot be true in since in . (ii) cannot be true since in . Since all cases have been examined, the property holds.

Property B. 9 establishes that advancing a part out of a neighborhood will not violate NHC if the next stage of the part is not in any neighborhood. Property B. 9: Suppose and . Let be a . Then, the state state that satis? es NHC such that satis? es NHC. Proof: Executing the event decrements and increments . does not appear in any constraint of cannot violate NHC. Clearly, decreNHC, so incrementing does not cause a violation of NHC. menting We have now developed the constraints of NHC and established several important properties. To summarize, NHC constraints come in two sets. The ? st set ensures that no neighborhood is ever overcapacitated and the second ensures that within any set of neighborhoods with mutual ? ow dependencies, only one will be capacitated at a time. After developing these constraints, we established that when each part type requires, at suming 320 IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, VOL. 3, NO. 3, JULY 2006 most, one unreliable resource, neighborhoods do not intersect. We also established that if a state satis? es NHC, then certain types of event executions (part moves) lead to other states that again satisfy NHC. C.

Properties of Supervisor This section establishes that satis? es De? nition A. 1. To such that begin, let . Note that is a dimensional vector that encodes the number of units of each resource type allocated in as terminal if state . We refer to . Let TQ be the set of be the set of terminal states terminal states and TQ , then is a state for which admitted by NHC. If all parts in the system reside in failure-dependent resources. If , then also satis? es NHC. , then satis? es A1. Lemma C. 1: If in A1. Thus, Proof: For such that A1 is trivially satis? ed. Lemma C. 2: Assume that . Let S , such that .

Then, such that and . Further, . Proof: Let represent the set of parts inis the set of parts in neighborhoods, stantiated in , where is the set of parts not in any neighborhoods. Since and satis? es both A1 and NHC, there exists a sequence of events, say , that advances all parts in out of the system into failure-dependent resources. Thus, and all parts in . Note that since every state encountered in the execution of satis? es A1 trivially, and every state encountered in the execution of satis? es NHC by Properties B. 6 (for parts not in neighborhoods) and B. 7 (for need not have parts in neighborhoods).

Finally, note that failures or repairs and, thus, is possible. Lemma C. 3: Assume that . Let S such that with . Then, such that and S. Proof: Recall that is the set of failed resources in . We illustrate as follows: First, note that if , (the repair event for is enabled). Then, for then , let . Thus, in the sequence we are illustrating, all repair events occur ? rst. Now, recall is a directed graph with that if and and is the set of strongly connected components of this digraph. Consider the where digraph . Note that, by de? nition of strongly connected components, is acyclic and can therefore be topologically ordered.

That is, we can impose that satis? es the following: a complete ordering is reachable from in , then . if be the th ordered component of (that is, all Let have an order exceeding , nodes in reachable from and all nodes in from which is reachable have order less than ). such that has parts occupying Select the smallest has no part its neighborhoods in , but for any occupying its neighborhoods in . There are two possibilities : for has a capacitated neighborhood, advance any First, if part in that neighborhood one step in its route. To see that this is possible, suppose that it is not. Then, no part of the apacitated neighborhood can advance, each being blocked by a capac. In a teritated failure-dependent resource since minal state, parts only occupy failure-dependent resources and, thus, these capacitated resources represent capacitated neigh, since this borhoods. These neighborhoods are not in would contradict the assumption that, at most, one neighboris capacitated. These neighborhoods are not in hood of , since this contradicts the topological ordering . of . Thus, these neighborhoods must be in ; that is, we chose the But this contradicts our choice of holds parts and for smallest such that holds no part in . as no capacitated neighborhood, advance Second, if one step in its route. any part in any neighborhood of This is possible by reasoning similar to that above. be the part-type stage of the part advanced. Now let . There are three cases for Case 1) Case 2) If , then Property B. 8 endoes not violate NHC. sures that does If not violate NHC since no neighborhood of holds parts in and, thus, neighborhoods of hold only one part in . is in no neighborhood, by Property B. 9, If the instance of can be advanced one step without violating NHC. Case 3) (recall ) In Cases 1 and 2, if until it reaches . dvance the instance of This is possible since are all available in and since is not capacitated in . By Property B. 7, all states visited during this se, satisfy NHC, and the ? nal quence of events, call it satis? es . For Case 3, state every resource of is available in since by Property B. 2 and, thus, can be advanced out of S. By Property the instance of B. 6, all states visited during this sequence of events, again call satis? es it , satisfy NHC, and the ? nal state . satis? es Then, for all cases, by Lemma C. 1, is any pre? x of satis? es A1 since for A1, and if . After advancing one part through these steps, we have the .

We can repeat these steps a ? nite number event sequence of times and complete all parts without any violation of either be the resulting string of A1 or NHC. Let and, by construction, events. Then, S. We now show that satis? es the properties of De? nition A. 1 for systems where every part type has, at most, one unreliable resource in its route (the frequency of the unreliable re- CHEW AND LAWLEY: ROBUST SUPERVISORY CONTROL FOR PRODUCTION SYSTEMS 321 source in a part type’s route is not restricted). First, Lemma C. 4 enforces property A. 1. 1. establishes that . Then, Lemma C. 4: Assume that S and S such is a pre? of and , we have that . . Proof: Suppose that S is in state , and satis? es Recall that is the set of failed resources of S in . Also, recall S is the set of all possible event sequences that and under the supergenerated by S , starting from states in that do not contain repair or failure events. We need vision of S to establish that any sequence of events in can be extended in S to include any desired , where number of occurrences of events in (i. e. , part advancement and service completion events associated with resources that are operational). We will show this in two steps.

In Step 1), we show a sequence leading to a terminal state and in Step 2), we extend this sequence to satisfy De? nition A. 1. 1. Step 1) Let and S such . By Lemma C. 2, that such that and S . (recall that is the Note that that are not in neighborhoods). set of parts in , where is Further, the set of parts requiring operational unreliable reis the set of parts requiring failed sources and unreliable resources. By the logic of Lemma C. 3, out of we can legally advance all parts in S . Let the corresponding sequence of events be . Clearly, S ; and, in , every resource in is empty and idle. Step 2) Then, letting , we apply Step 2) of Lemma C. 4. Finally, Lemma C. 6 establishes that enforces property A. 1. 3; that is, constrains a system to states that will result in feasible initial states for an upgraded system. . If Lemma C. 6: Assume that such that ( enables the repair event, , for ), then is a feasible initial state for S ; that is, guarantees Property A. 1. 1 for S , starting from . also satis? es since Proof: Since satis? es neither A1 nor NHC is affected by the status of unreliable resources. Thus, Step 1 of Lemma C. 4 remains valid, yielding . Then, letting , we apply Step 2 of Lemma C. 4.

The following theorem is now obvious. . Supervisor Theorem C. 1: Suppose that is robust to failure of . Proof: Follows directly from De? nition A. 1 and Lemmas C. 4–C. 6. Theorem C. 1 establishes that is robust to resource failures in the system given that every part type requires, at most, one unreliable resource in its route. This ensures that if any subset of unreliable resources fails, the system can continue to produce those part types not requiring failed resources. Also, when a failed resource is repaired, the system will be in a feasible state to begin producing those additional part types that it supports.

No system recon? guration is required and no special sequences computes sequences just to have to be executed. In fact, show their existence, not to force the system to follow. These results generalize those presented in [30], which considers only a single unreliable resource. D. Properties of Supervisor The following lemmas establish that is robust to failure of multiple unreliable resources. These lemmas follow the struc. ture of those for , then satis? es SSL. Lemma D. 1: If Proof: For such that , every nonfailuredependent resource is empty and idle. Thus, SSL is trivially satis? ed.

Lemma D. 2: Assume that . Let such that . Then, such that and . Further, . represents the set Proof: Recall that is the set of parts in neighof parts instantiated in , where is the set of parts not in any neighborborhoods, and hoods. Since satis? es both SSL and NHC, there is no deadlock involving parts on nonfailure-dependent resources. Thus, , held by a nonfailure-dependent rethere exists a part, say source that can be advanced one step in its route, such that satis? es both SSL (by the optimality of SSL) and NHC by Property B. 7. It is obvious that a ? nite number of iterations out f the above step eventually moves every part of into failure-dependent reof S and moves every part of sources. Suppose that the corresponding sequence of events is Select any natural number, say n. Select an event of , say . Execute the following sequence of events times . Note that represents loading a single instance of into the system and advancing it to compleand requires only events tion. This is clearly allowed under . Then, our original string is extended under in to S (where is repeated times). The score of in this string is no less than . Repeat this for all other events of and call S is a the resulting string .

Clearly, , then we have . . pre? x of , and enforces property Next, Lemma C. 5 establishes that A. 1. 2; that is, constrains a system to states that will result in feasible initial states for a further reduced system. . If Lemma C. 5: Assume that such that ( enables the failure event, , for ), then is a feasible initial state for S ; that is, guarantees Property A. 1. 1 for S , starting from state . Proof: Since satis? es also satis? es since neither A1 nor NHC is affected by the status of unreliable resources. Thus, Step 1) of Lemma C. 4 remains valid, yielding 322 IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, VOL. , NO. 3, JULY 2006 . Clearly, we have such that and . Lemma D. 3: Assume that . Let such that with . Then, such that and . Proof: The proof of Lemma C. 3 holds with only minor modi? cations; in other words, in the next-to-last paragraph of the proof of Lemma C. 3, replace the reference to Lemma C. 1 with Lemma D. 1. We now show that satis? es the properties of De? nition A. 1 for systems where SSL is a correct deadlock avoidance policy and each part type has, at most, one unreliable resource in its route (again, the frequency of the unreliable resource in a part type’s route is not restricted).

First, Lemma D. 4 establishes enforces property A. 1. 1. that Lemma D. 4: Assume that . Then, S and S such is a pre? x of and , we have that . Proof: The proof of Lemma C. 4 holds with only minor modi? cations; that is, in Step 1, replace the reference to Lemma C. 2 with Lemma D. 2. Next, Lemma D. 5 establishes that enforces property A. 1. 2; that is, constrains a system to states that will result in feasible initial states for a further reduced system. . If Lemma D. 5: Assume that such that ( enables the failure event, , for , then is a feasible initial state for ; guarantees Property A. . 2 for , starting that is, from state . Proof: The proof of Lemma C. 5 holds with minor, obvious modi? cations. Finally, Lemma D. 6 establishes that enforces property constrains a system to states that will result in A. 1. 3; that is, feasible initial states for an upgraded system. . If Lemma D. 6: Assume that such that ( enables the repair event , for , then is a feasible initial state for S ; that is, guarantees Property A. 1. 3 for , starting from . Proof: The proof of Lemma C. 6 holds with minor, obThe vious modi? cations. following theorem is now readily available. Supervisor Theorem D. 1: Suppose that is robust to failure of R . Proof: Follows directly from De? nition A. 1 and Lemmas D. 4–D. 6. We have now established that is a robust supervisor for systems with multiple unreliable resources, under the assumption that every part type requires, at most, one unreliable resource in its route. REFERENCES [1] A. Habermann, “Prevention of system deadlocks,” Commun. ACM, vol. 12, pp. 373–377, 1969. [2] R. Holt, “Some deadlock properties of computer systems,” ACM Comput. Surv. , vol. 4, pp. 179–196, 1972. [3] P. Ramadge and W.

Wonham, “Supervisory control of a class of discrete event processes,” SIAM J. Control Optim. , vol. 25, no. 1, pp. 206–230, 1987. [4] Z. Banaszak and E. Roszkowska, “Deadlock avoidance in pipeline concurrent processes,” Podstawy Sterowania (Foundations of Control), vol. 18, pp. 3–17, 1988. [5] Z. Banaszak and B. Krogh, “Deadlock avoidance in ? exible manufacturing systems with concurrently competing process ? ows,” IEEE Trans. Robot. Autom. , vol. 6, no. 6, pp. 724–734, Dec. 1990. [6] N. Viswanadham, Y. Narahari, and T. Johnson, “Deadlock prevention and deadlock avoidance in ? xible manufacturing systems using petri net models,” IEEE Trans. Robot. Autom. , vol. 6, no. 6, pp. 713–723, Dec. 1990. [7] R. Wysk, N. Yang, and S. Joshi, “Detection of deadlocks in ? exible manufacturing cells,” IEEE Trans. Robot. Autom. , vol. 7, no. 6, pp. 853–859, Dec. 1991. [8] Y. Leung and G. Sheen, “Resolving deadlocks in ? exible manufacturing cells,” J. Manuf. Syst. , vol. 12, no. 4, pp. 291–304, 1993. [9] E. Roszkowska and J. Jentink, “Minimal restrictive deadlock avoidance in FMS’s,” in Proc. Eur. Control Conf. , vol. 2, 1993, pp. 530–534. [10] T. Kumaran, W. Chang, H. Cho, and R.

Wysk, “A structured approach to deadlock detection, avoidance and resolution in ? exible manufacturing systems,” Int. J. Prod. Res. , vol. 32, no. 10, pp. 2361–2379, 1994. [11] F. Hsieh and S. Chang, “Dispatching-driven deadlock avoidance controller synthesis for ? exible manufacturing systems,” IEEE Trans. Robot. Autom. , vol. 10, no. 2, pp. 196–209, Apr. 1994. [12] T. Cormen, C. Leiserson, and R. Rivest, Introduction to Algorithms. New York: McGraw-Hill, 1994. [13] J. Ezpeleta, J. Colom, and J. Martinez, “A Petri-net based deadlock prevention policy for ? exible manufacturing systems,” IEEE Trans.

Robot. Autom. , vol. 11, no. 2, pp. 173–185, Apr. 1995. [14] M. Zhou, “Deadlock avoidance methods for a distributed robotic system: Petri net modeling and analysis,” J. Robot. Syst. , vol. 12, no. 3, pp. 177–187, 1995. [15] K. Xing, B. Hu, and H. Chen, “Deadlock avoidance policy for Petri net modeling of ? exible manufacturing systems with shared resources,” IEEE Trans. Autom. Control, vol. 41, no. 2, pp. 289–296, Feb. 1996. [16] M. Lawley, S. Reveliotis, and P. Ferreira, “Design guidelines for deadlock handling strategies in ? exible manufacturing systems,” Int. J. Flexible Manuf.

Syst. , vol. 9, pp. 5–30, 1997. [17] M. Fanti, B. Maione, S. Mascolo, and B. Turchiano, “Event-based feedback control for deadlock avoidance in ? exible production systems,” IEEE Trans. Robot. Automat. , vol. 13, no. 3, pp. 347–734, Jun. 1997. [18] M. Lawley, S. Reveliotis, and P. Ferreira, “FMS structural control and the neighborhood policy: Parts 1 and 2,” IIE Trans. , vol. 29, no. 10, pp. 877–899, 1997. [19] S. Reveliotis, M. Lawley, and P. Ferreira, “Polynomial complexity deadlock avoidance policies for sequential resource allocation systems,” IEEE Trans. Autom. Control, vol. 2, no. 10, pp. 1344–1357, Oct. 1997. [20] M. Fanti, B. Maione, and B. Turchiano, “Event control for deadlock avoidance in production systems with multiple capacity resources,” Studies Inf. Control, vol. 7, no. 4, pp. 343–364, Dec. 1998. [21] M. Lawley, S. Reveliotis, and P. Ferreira, “Application and evaluation of Banker’s algorithm for deadlock-free buffer space allocation in ? exible manufacturing systems,” Int. J. Flexible Manuf. Syst. , vol. 10, pp. 73–100, 1998. [22] , “A correct and scalable deadlock avoidance policy for ? exible manufacturing systems,” IEEE Trans.

Robot. Autom. , vol. 14, no. 5, pp. 796–809, Oct. 1998. [23] M. Lawley, “Deadlock avoidance for production systems with ? exible routing,” IEEE Trans. Robot. Autom. , vol. 15, no. 3, pp. 497–510, Jun. 1999. [24] S. Reveliotis, “Accommodating FMS operational contingencies through routing ? exibility,” IEEE Trans. Robot. Autom. , vol. 15, no. 1, pp. 3–19, Feb. 1999. [25] S. Park and J. Lim, “Fault-tolerant robust supervisor for discrete event systems with model uncertainity and its application to a workcell,” IEEE Trans. Robot. Autom. , vol. 15, no. 2, pp. 386–391, Apr. 1999. 26] S. A. Reveliotis, “Con? ict resolution in AGV systems,” IIE Trans. , vol. 32, no. 7, pp. 647–659, 2000. [27] F. Hsieh, “Recon? gurable fault tolerant deadlock avoidance controller synthesis for assembly production processes,” in Proc. IEEE Conf. Man, Syst. Cybern. , Nashville, TN, 2000, pp. 3045–3050. [28] M. Lawley and S. Reveliotis, “Deadlock avoidance for sequential resource allocation systems: Hard and easy cases,” Int. J. Flexible Manuf. Syst. , vol. 13, no. 4, pp. 385–404, Oct. 2001. CHEW AND LAWLEY: ROBUST SUPERVISORY CONTROL FOR PRODUCTION SYSTEMS 323 [29] N.

Wu and M. Zhou, “Avoiding deadlock and reducing starvation and blocking in automated manufacturing systems,” IEEE Trans. Robot. Autom. , vol. 17, no. 5, pp. 658–669, Oct. 2001. [30] M. Lawley and W. Sulistyono, “Robust supervisory control policies for manufacturing systems with unreliable resources,” IEEE Trans. Robot. Autom. , vol. 18, no. 3, pp. 346–359, Jun. 2002. [31] M. Lawley, “Control of deadlock and blocking for production systems with unreliable resources,” Int. J. Prod. Res. , vol. 40, no. 17, pp. 4563–4582, 2002. [32] J. Ezpeleta, F. Tricas, F. Garcia-Valles, and