# dissertation available

Subject: dissertation available
From: Wei Zhang (zhangw@redwood.rt.cs.boeing.com)
Date: Tue Apr 21 1998 - 19:45:56 MET DST

Dear colleagues,

My dissertation on scheduling now is available at neuroprose:

Here are the title and abstract. Thanks.

Reinforcement Learning for Job-Shop Scheduling

Wei Zhang
Oregon State University
Department of Computer Science

May 1996

173 pages, double side.

Abstract.

This dissertation studies applying reinforcement learning algorithms
to discover good domain-specific heuristics automatically for job-shop
scheduling. It focuses on the NASA space shuttle payload processing
problem. The problem involves scheduling a set of tasks to satisfy a
set of temporal and resource constraints while also seeking to
minimize the total length (makespan) of the schedule.

The approach described in the dissertation employs a repair-based
scheduling problem space that starts with a critical-path schedule and
incrementally repairs constraint violations with the goal of finding a
short conflict-free schedule. The temporal difference (TD) learning
algorithm $TD(\lambda)$ is applied to train a neural network to learn
a heuristic evaluation function for choosing repair actions over
schedules. This learned evaluation function is used by a one-step
lookahead search procedure to find solutions to new scheduling
problems.

Several important issues that affect the success and the efficiency of
learning have been identified and deeply studied. These issues
include schedule representation, network architectures, and learning
strategies. A number of modifications to the $TD(\lambda)$ algorithm
are developed to improve learning performance. Learning is
investigated based on both hand-engineered features and raw features.
For learning from raw features, a time-delay neural network
architecture is developed to extract features from irregular-length
schedules. The learning approach is evaluated on synthetic problems
and on problems from a NASA space shuttle payload processing task. The
evaluation function is learned on small problems and then applied to
solve larger problems. Both learning-based schedulers (using
hand-engineered features and raw features respectively) perform better
than the best existing algorithm for this task---Zweben's iterative
repair method.

It is important to understand why TD learning works in this
application. Several performance measures are employed to investigate
learning behavior. We verified that TD learning works properly in
capturing the evaluation function. It is concluded that TD learning
along with a set of good features and a proper neural network is the
key to this success. The success shows that reinforcement learning
methods have the potential for quickly finding high-quality solutions
to other combinatorial optimization problems.

#=====================================================================#
| Dr. Wei Zhang | ___ ___ ___ ___ ___ |
| Computer Scientist | /__// //__ / /\ // _ |
| Adaptive Systems | /__//__//__ _/_ / //__/ |
| Applied Research & Technology | |
| | P.O. Box 3707, M/S 7L-66 |
| Voice: (425) 865-2602 | Seattle, WA 98124-2207 |
| FAX: (425) 865-2964 | -- or for ground express mail -- |
| | 2710 160th Ave. S.E., Bldg. 33-07 |
| zhangw@redwood.rt.cs.boeing.com | Bellevue, WA 98008 |
#=====================================================================#

This archive was generated by hypermail 2b25 : Fri Mar 03 2000 - 16:17:14 MET