From: "Saved by Windows Internet Explorer 8" Subject: ARIANE 5 Failure - Full Report Date: Mon, 8 Nov 2010 14:12:36 +0100 MIME-Version: 1.0 Content-Type: text/html; charset="Windows-1252" Content-Transfer-Encoding: quoted-printable Content-Location: http://www.di.unito.it/~damiani/ariane5rep.html X-MimeOLE: Produced By Microsoft MimeOLE V6.1.7600.16543 ARIANE 5 Failure - Full Report

Paris, 19 July 1996

ARIANE 5

Flight 501 Failure

Report by the Inquiry Board

The Chairman of the Board :

Prof. J. L. LIONS

FOREWORD

On 4 June 1996, the maiden flight of the Ariane 5 launcher ended in a = failure. Only about 40 seconds after initiation of the flight sequence, = at an=20 altitude of about 3700 m, the launcher veered off its flight path, broke = up and=20 exploded. Engineers from the Ariane 5 project teams of CNES and Industry = immediately started to investigate the failure. Over the following days, = the=20 Director General of ESA and the Chairman of CNES set up an independent = Inquiry=20 Board and nominated the following members :

- Prof. Jacques-Louis Lions (Chairman) Acad=02emie des Sciences = (France)
-=20 Dr. Lennart L=01beck (Vice-Chairman) Swedish Space Corporation (Sweden) =
- Mr.=20 Jean-Luc Fauquembergue De=02l=02egation Ge=02ne=02rale pour l'Armement = (France)
-=20 Mr. Gilles Kahn Institut National de Recherche en Informatique et en = Automatique=20 (INRIA), (France)
- Prof. Dr. Ing. Wolfgang Kubbat Technical = University of=20 Darmstadt (Germany)
- Dr. Ing. Stefan Levedag Daimler Benz Aerospace = (Germany)
- Dr. Ing. Leonardo Mazzini Alenia Spazio (Italy)
- = Mr. Didier=20 Merle Thomson CSF (France)
- Dr. Colin O'Halloran Defence Evaluation = and=20 Research Agency (DERA), (U.K.)

The terms of reference assigned to the Board requested it

The Board started its work on 13 June 1996. It was assisted by a = Technical=20 Advisory Committee composed of :

- Dr Mauro Balduccini (BPD)
- Mr Yvan Choquer (Matra Marconi = Space)
-=20 Mr Remy Hergott (CNES)
- Mr Bernard Humbert (Aerospatiale)
- Mr = Eric=20 Lefort (ESA)

In accordance with its terms of reference, the Board concentrated its = investigations on the causes of the failure, the systems supposed to be=20 responsible, any failures of similar nature in similar systems, and = events that=20 could be linked to the accident. Consequently, the recommendations made = by the=20 Board are limited to the areas examined. The report contains the = analysis of the=20 failure, the Board's conclusions and its recommendations for corrective=20 measures, most of which should be undertaken before the next flight of = Ariane 5.=20 There is in addition a report for restricted circulation in which the = Board's=20 findings are documented in greater technical detail. Although it = consulted the=20 telemetry data recorded during the flight, the Board has not undertaken = an=20 evaluation of those data. Nor has it made a complete review of the whole = launcher and all its systems.

This report is the result of a collective effort by the Commission, = assisted=20 by the members of the Technical Advisory Committee.

We have all worked hard to present a very precise explanation of the = reasons=20 for the failure and to make a contribution towards the improvement of = Ariane 5=20 software. This improvement is necessary to ensure the success of the=20 programme.

The Board's findings are based on thorough and open presentations = from the=20 Ariane 5 project teams, and on documentation which has demonstrated the = high=20 quality of the Ariane 5 programme as regards engineering work in general = and=20 completeness and traceability of documents.

Chairman of the Board

1. THE FAILURE

1.1 GENERAL DESCRIPTION

On the basis of the documentation made available and the information=20 presented to the Board, the following has been observed:

The weather at the launch site at Kourou on the morning of 4 June = 1996 was=20 acceptable for a launch that day, and presented no obstacle to the = transfer of=20 the launcher to the launch pad. In particular, there was no risk of = lightning=20 since the strength of the electric field measured at the launch site was = negligible. The only uncertainty concerned fulfilment of the visibility=20 criteria.

The countdown, which also comprises the filling of the core stage, = went=20 smoothly until H0-7 minutes when the launch was put on hold since the = visibility=20 criteria were not met at the opening of the launch window (08h35 local = time).=20 Visibility conditions improved as forecast and the launch was initiated = at H0 =3D=20 09h 33mn 59s local time (=3D12h 33mn 59s UT). Ignition of the Vulcain = engine and=20 the two solid boosters was nominal, as was lift-off. The vehicle = performed a=20 nominal flight until approximately H0 + 37 seconds. Shortly after that = time, it=20 suddenly veered off its flight path, broke up, and exploded. A = preliminary=20 investigation of flight data showed:

The origin of the failure was thus rapidly narrowed down to the = flight=20 control system and more particularly to the Inertial Reference Systems, = which=20 obviously ceased to function almost simultaneously at around H0 + 36.7=20 seconds.

1.2 INFORMATION AVAILABLE

The information available on the launch includes:

The whole of the telemetry data received in Kourou was transferred to = CNES/Toulouse where the data were converted into parameter over time = plots. CNES=20 provided a copy of the data to Aerospatiale, which carried out analyses=20 concentrating mainly on the data concerning the electrical system.

1.3 RECOVERY OF MATERIAL

The self-destruction of the launcher occurred near to the launch pad, = at an=20 altitude of approximately 4000 m. Therefore, all the launcher debris = fell back=20 onto the ground, scattered over an area of approximately 12 km2 east of = the=20 launch pad. Recovery of material proved difficult, however, since this = area is=20 nearly all mangrove swamp or savanna.

Nevertheless, it was possible to retrieve from the debris the two = Inertial=20 Reference Systems. Of particular interest was the one which had worked = in active=20 mode and stopped functioning last, and for which, therefore, certain = information=20 was not available in the telemetry data (provision for transmission to = ground of=20 this information was confined to whichever of the two units might fail = first).=20 The results of the examination of this unit were very helpful to the = analysis of=20 the failure sequence.

1.4 UNRELATED ANOMALIES OBSERVED

Post-flight analysis of telemetry has shown a number of anomalies = which have=20 been reported to the Board. They are mostly of minor significance and = such as to=20 be expected on a demonstration flight.

One anomaly which was brought to the particular attention of the = Board was=20 the gradual development, starting at Ho + 22 seconds, of variations in = the=20 hydraulic pressure of the actuators of the main engine nozzle. These = variations=20 had a frequency of approximately 10 Hz.

There are some preliminary explanations as to the cause of these = variations,=20 which are now under investigation.

After consideration, the Board has formed the opinion that this = anomaly,=20 while significant, has no bearing on the failure of Ariane 501.

2. ANALYSIS OF THE FAILURE

2.1 CHAIN OF TECHNICAL EVENTS

In general terms, the Flight Control System of the Ariane 5 is of a = standard=20 design. The attitude of the launcher and its movements in space are = measured by=20 an Inertial Reference System (SRI). It has its own internal computer, in = which=20 angles and velocities are calculated on the basis of information from a=20 "strap-down" inertial platform, with laser gyros and accelerometers. The = data=20 from the SRI are transmitted through the databus to the On-Board = Computer (OBC),=20 which executes the flight program and controls the nozzles of the solid = boosters=20 and the Vulcain cryogenic engine, via servovalves and hydraulic = actuators.

In order to improve reliability there is considerable redundancy at = equipment=20 level. There are two SRIs operating in parallel, with identical hardware = and=20 software. One SRI is active and one is in "hot" stand-by, and if the OBC = detects=20 that the active SRI has failed it immediately switches to the other one, = provided that this unit is functioning properly. Likewise there are two = OBCs,=20 and a number of other units in the Flight Control System are also=20 duplicated.

The design of the Ariane 5 SRI is practically the same as that of an = SRI=20 which is presently used on Ariane 4, particularly as regards the = software.

Based on the extensive documentation and data on the Ariane 501 = failure made=20 available to the Board, the following chain of events, their = inter-relations and=20 causes have been established, starting with the destruction of the = launcher and=20 tracing back in time towards the primary cause.

The SRI internal events that led to the failure have been reproduced = by=20 simulation calculations. Furthermore, both SRIs were recovered during = the=20 Board's investigation and the failure context was precisely determined = from=20 memory readouts. In addition, the Board has examined the software code = which was=20 shown to be consistent with the failure scenario. The results of these=20 examinations are documented in the Technical Report.

Therefore, it is established beyond reasonable doubt that the chain = of events=20 set out above reflects the technical causes of the failure of Ariane = 501.

2.2 COMMENTS ON THE FAILURE SCENARIO

In the failure scenario, the primary technical causes are the Operand = Error=20 when converting the horizontal bias variable BH, and the lack of = protection of=20 this conversion which caused the SRI computer to stop.

It has been stated to the Board that not all the conversions were = protected=20 because a maximum workload target of 80% had been set for the SRI = computer. To=20 determine the vulnerability of unprotected code, an analysis was = performed on=20 every operation which could give rise to an exception, including an = Operand=20 Error. In particular, the conversion of floating point values to = integers was=20 analysed and operations involving seven variables were at risk of = leading to an=20 Operand Error. This led to protection being added to four of the = variables,=20 evidence of which appears in the Ada code. However, three of the = variables were=20 left unprotected. No reference to justification of this decision was = found=20 directly in the source code. Given the large amount of documentation = associated=20 with any industrial application, the assumption, although agreed, was=20 essentially obscured, though not deliberately, from any external = review.

The reason for the three remaining variables, including the one = denoting=20 horizontal bias, being unprotected was that further reasoning indicated = that=20 they were either physically limited or that there was a large margin of = safety,=20 a reasoning which in the case of the variable BH turned out to be = faulty. It is=20 important to note that the decision to protect certain variables but not = others=20 was taken jointly by project partners at several contractual levels.

There is no evidence that any trajectory data were used to analyse = the=20 behaviour of the unprotected variables, and it is even more important to = note=20 that it was jointly agreed not to include the Ariane 5 trajectory data = in the=20 SRI requirements and specification.

Although the source of the Operand Error has been identified, this in = itself=20 did not cause the mission to fail. The specification of the = exception-handling=20 mechanism also contributed to the failure. In the event of any kind of=20 exception, the system specification stated that: the failure should be = indicated=20 on the databus, the failure context should be stored in an EEPROM memory = (which=20 was recovered and read out for Ariane 501), and finally, the SRI = processor=20 should be shut down.

It was the decision to cease the processor operation which finally = proved=20 fatal. Restart is not feasible since attitude is too difficult to = re-calculate=20 after a processor shutdown; therefore the Inertial Reference System = becomes=20 useless. The reason behind this drastic action lies in the culture = within the=20 Ariane programme of only addressing random hardware failures. From this = point of=20 view exception - or error - handling mechanisms are designed for a = random=20 hardware failure which can quite rationally be handled by a backup = system.

Although the failure was due to a systematic software design error,=20 mechanisms can be introduced to mitigate this type of problem. For = example the=20 computers within the SRIs could have continued to provide their best = estimates=20 of the required attitude information. There is reason for concern that a = software exception should be allowed, or even required, to cause a = processor to=20 halt while handling mission-critical equipment. Indeed, the loss of a = proper=20 software function is hazardous because the same software runs in both = SRI units.=20 In the case of Ariane 501, this resulted in the switch-off of two still = healthy=20 critical units of equipment.

The original requirement acccounting for the continued operation of = the=20 alignment software after lift-off was brought forward more than 10 years = ago for=20 the earlier models of Ariane, in order to cope with the rather unlikely = event of=20 a hold in the count-down e.g. between - 9 seconds, when flight mode = starts in=20 the SRI of Ariane 4, and - 5 seconds when certain events are initiated = in the=20 launcher which take several hours to reset. The period selected for this = continued alignment operation, 50 seconds after the start of flight = mode, was=20 based on the time needed for the ground equipment to resume full control = of the=20 launcher in the event of a hold.

This special feature made it possible with the earlier versions of = Ariane, to=20 restart the count- down without waiting for normal alignment, which = takes 45=20 minutes or more, so that a short launch window could still be used. In = fact,=20 this feature was used once, in 1989 on Flight 33.

The same requirement does not apply to Ariane 5, which has a = different=20 preparation sequence and it was maintained for commonality reasons, = presumably=20 based on the view that, unless proven necessary, it was not wise to make = changes=20 in software which worked well on Ariane 4.

Even in those cases where the requirement is found to be still valid, = it is=20 questionable for the alignment function to be operating after the = launcher has=20 lifted off. Alignment of mechanical and laser strap-down platforms = involves=20 complex mathematical filter functions to properly align the x-axis to = the=20 gravity axis and to find north direction from Earth rotation sensing. = The=20 assumption of preflight alignment is that the launcher is positioned at = a known=20 and fixed position. Therefore, the alignment function is totally = disrupted when=20 performed during flight, because the measured movements of the launcher = are=20 interpreted as sensor offsets and other coefficients characterising = sensor=20 behaviour.

Returning to the software error, the Board wishes to point out that = software=20 is an expression of a highly detailed design and does not fail in the = same sense=20 as a mechanical system. Furthermore software is flexible and expressive = and thus=20 encourages highly demanding requirements, which in turn lead to complex=20 implementations which are difficult to assess.

An underlying theme in the development of Ariane 5 is the bias = towards the=20 mitigation of random failure. The supplier of the SRI was only following = the=20 specification given to it, which stipulated that in the event of any = detected=20 exception the processor was to be stopped. The exception which occurred = was not=20 due to random failure but a design error. The exception was detected, = but=20 inappropriately handled because the view had been taken that software = should be=20 considered correct until it is shown to be at fault. The Board has = reason to=20 believe that this view is also accepted in other areas of Ariane 5 = software=20 design. The Board is in favour of the opposite view, that software = should be=20 assumed to be faulty until applying the currently accepted best practice = methods=20 can demonstrate that it is correct.

This means that critical software - in the sense that failure of the = software=20 puts the mission at risk - must be identified at a very detailed level, = that=20 exceptional behaviour must be confined, and that a reasonable back-up = policy=20 must take software failures into account.

2.3 THE TESTING AND QUALIFICATION PROCEDURES

The Flight Control System qualification for Ariane 5 follows a = standard=20 procedure and is performed at the following levels :

The logic applied is to check at each level what could not be = achieved at the=20 previous level, thus eventually providing complete test coverage of each = sub-system and of the integrated system.

Testing at equipment level was in the case of the SRI conducted = rigorously=20 with regard to all environmental factors and in fact beyond what was = expected=20 for Ariane 5. However, no test was performed to verify that the SRI = would behave=20 correctly when being subjected to the count-down and flight time = sequence and=20 the trajectory of Ariane 5.

It should be noted that for reasons of physical law, it is not = feasible to=20 test the SRI as a "black box" in the flight environment, unless one = makes a=20 completely realistic flight test, but it is possible to do ground = testing by=20 injecting simulated accelerometric signals in accordance with predicted = flight=20 parameters, while also using a turntable to simulate launcher angular = movements.=20 Had such a test been performed by the supplier or as part of the = acceptance=20 test, the failure mechanism would have been exposed.

The main explanation for the absence of this test has already been = mentioned=20 above, i.e. the SRI specification (which is supposed to be a = requirements=20 document for the SRI) does not contain the Ariane 5 trajectory data as a = functional requirement.

The Board has also noted that the systems specification of the SRI = does not=20 indicate operational restrictions that emerge from the chosen = implementation.=20 Such a declaration of limitation, which should be mandatory for every=20 mission-critical device, would have served to identify any = non-compliance with=20 the trajectory of Ariane 5.

The other principal opportunity to detect the failure mechanism = beforehand=20 was during the numerous tests and simulations carried out at the = Functional=20 Simulation Facility ISF, which is at the site of the Industrial = Architect. The=20 scope of the ISF testing is to qualify :

A large number of closed-loop simulations of the complete flight = simulating=20 ground segment operation, telemetry flow and launcher dynamics were run = in order=20 to verify :

In these tests many equipment items were physically present and = exercised but=20 not the two SRIs, which were simulated by specifically developed = software=20 modules. Some open-loop tests, to verify compliance of the On-Board = Computer and=20 the SRI, were performed with the actual SRI. It is understood that these = were=20 just electrical integration tests and "low-level " (bus communication)=20 compliance tests.

It is not mandatory, even if preferable, that all the parts of the = subsystem=20 are present in all the tests at a given level. Sometimes this is not = physically=20 possible or it is not possible to exercise them completely or in a=20 representative way. In these cases it is logical to replace them with = simulators=20 but only after a careful check that the previous test levels have = covered the=20 scope completely.

This procedure is especially important for the final system test = before the=20 system is operationally used (the tests performed on the 501 launcher = itself are=20 not addressed here since they are not specific to the Flight Control = Electrical=20 System qualification).

In order to understand the explanations given for the decision not to = have=20 the SRIs in the closed-loop simulation, it is necessary to describe the = test=20 configurations that might have been used.

Because it is not possible to simulate the large linear accelerations = of the=20 launcher in all three axes on a test bench (as discussed above), there = are two=20 ways to put the SRI in the loop:

The first approach is likely to provide an accurate simulation = (within the=20 limits of the three-axis dynamic table bandwidth) and is quite = expensive; the=20 second is cheaper and its performance depends essentially on the = accuracy of the=20 simulation. In both cases a large part of the electronics and the = complete=20 software are tested in the real operating environment.

When the project test philosophy was defined, the importance of = having the=20 SRIs in the loop was recognized and a decision was taken to select = method B=20 above. At a later stage of the programme (in 1992), this decision was = changed.=20 It was decided not to have the actual SRIs in the loop for the following = reasons=20 :

The opinion of the Board is that these arguments were technically = valid, but=20 since the purpose of a system simulation test is not only to verify the=20 interfaces but also to verify the system as a whole for the particular=20 application, there was a definite risk in assuming that critical = equipment such=20 as the SRI had been validated by qualification on its own, or by = previous use on=20 Ariane 4.

While high accuracy of a simulation is desirable, in the ISF system = tests it=20 is clearly better to compromise on accuracy but achieve all other = objectives,=20 amongst them to prove the proper system integration of equipment such as = the=20 SRI. The precision of the guidance system can be effectively = demonstrated by=20 analysis and computer simulation.

Under this heading it should be noted finally that the overriding = means of=20 preventing failures are the reviews which are an integral part of the = design and=20 qualification process, and which are carried out at all levels and = involve all=20 major partners in the project (as well as external experts). In a = programme of=20 this size, literally thousands of problems and potential failures are=20 successfully handled in the review process and it is obviously not easy = to=20 detect software design errors of the type which were the primary = technical cause=20 of the 501 failure. Nevertheless, it is evident that the limitations of = the SRI=20 software were not fully analysed in the reviews, and it was not realised = that=20 the test coverage was inadequate to expose such limitations. Nor were = the=20 possible implications of allowing the alignment software to operate = during=20 flight realised. In these respects, the review process was a = contributory factor=20 in the failure.

2.4 POSSIBLE OTHER WEAKNESSES OF SYSTEMS INVOLVED

In accordance with its termes of reference, the Board has examined = possible=20 other weaknesses, primarily in the Flight Control System. No weaknesses = were=20 found which were related to the failure, but in spite of the short time=20 available, the Board has conducted an extensive review of the Flight = Control=20 System based on experience gained during the failure analysis.

The review has covered the following areas :

In addition, the Board has made an analysis of methods applied in the = development programme, in particular as regards software development=20 methodology.

The results of these efforts have been documented in the Technical = Report and=20 it is the hope of the Board that they will contribute to further = improvement of=20 the Ariane 5 Flight Control System and its software.

3. CONCLUSIONS

3.1 FINDINGS

The Board reached the following findings:

3.2 CAUSE OF THE FAILURE

The failure of the Ariane 501 was caused by the complete loss of = guidance and=20 attitude information 37 seconds after start of the main engine ignition = sequence=20 (30 seconds after lift- off). This loss of information was due to = specification=20 and design errors in the software of the inertial reference system.

The extensive reviews and tests carried out during the Ariane 5 = Development=20 Programme did not include adequate analysis and testing of the inertial=20 reference system or of the complete flight control system, which could = have=20 detected the potential failure.

4. RECOMMENDATIONS

On the basis of its analyses and conclusions, the Board makes the = following=20 recommendations.

R1 Switch off the alignment function of the inertial reference = system=20 immediately after lift-off. More generally, no software function should = run=20 during flight unless it is needed.

R2 Prepare a test facility including as much real equipment as = technically feasible, inject realistic input data, and perform complete, = closed-loop, system testing. Complete simulations must take place before = any=20 mission. A high test coverage has to be obtained.

R3 Do not allow any sensor, such as the inertial reference = system, to=20 stop sending best effort data.

R4 Organize, for each item of equipment incorporating = software, a=20 specific software qualification review. The Industrial Architect shall = take part=20 in these reviews and report on complete system testing performed with = the=20 equipment. All restrictions on use of the equipment shall be made = explicit for=20 the Review Board. Make all critical software a Configuration Controlled = Item=20 (CCI).

R5 Review all flight software (including embedded software), = and in=20 particular :

R6 Wherever technically feasible, consider confining = exceptions to=20 tasks and devise backup capabilities.

R7 Provide more data to the telemetry upon failure of any = component,=20 so that recovering equipment will be less essential.

R8 Reconsider the definition of critical components, taking = failures=20 of software origin into account (particularly single point = failures).

R9 Include external (to the project) participants when = reviewing=20 specifications, code and justification documents. Make sure that these = reviews=20 consider the substance of arguments, rather than check that = verifications have=20 been made.

R10 Include trajectory data in specifications and test=20 requirements.

R11 Review the test coverage of existing equipment and extend = it where=20 it is deemed necessary.

R12 Give the justification documents the same attention as = code.=20 Improve the technique for keeping code and its justifications = consistent.

R13 Set up a team that will prepare the procedure for = qualifying=20 software, propose stringent rules for confirming such qualification, and = ascertain that specification, verification and testing of software are = of a=20 consistently high quality in the Ariane 5 programme. Including external = RAMS=20 experts is to be considered.

R14 A more transparent organisation of the cooperation among = the=20 partners in the Ariane 5 programme must be considered. Close engineering = cooperation, with clear cut authority and responsibility, is needed to = achieve=20 system coherence, with simple and clear interfaces between partners.

- END -