Tripping the Alarm

Download (1.7 MB)

Howard Williamson, Yokogawa Marex Ltd, UK, looks at plantwide trip analysis.

Tripping the AlarmAlarm management is now one of the most common topics in the process industries. The potential for increased numbers of alarms of ever more complexity has never been greater. This is compounded by the fact that fewer operators are being employed to manage them. Alarm management systems and best practice guidelines such as EEMUA 191 are available to help customers reduce, filter and sort their alarms to a level that is manageable. The systems focus on managing the alarms so that only those that are important are brought to the attention of operators. Yokogawa's new Consolidated Alarm Management System (CAMS) is an example of a modern online alarm system that incorporates EEMUA 191 recommendations.

An alarm audit. for example. will identify and help remove spurious alarms and reduce their quantity to a level that operators can realistically handle. EEMUA 191 recommends no more than one alarm every five minutes per operator. although a more desirable rate would be fewer than one every 10 minutes.

It is clear from the considerable body of literature and products available that there is a significant focus on preventing plant incidents. in contrast there are very few tools that assist operators and management to identify and recover from plant trips. Following a plant incident. plant management is under intense pressure to identify the cause of the trip and return the plant to a stable condition. ensuring that the problem is not repeated.

The times when incidents are likely to occur are during times of change:

  • Unit or plant startup/shutdown.
  • Plant maintenance.
  • Operational change.
  • New operators.
  • Shift changeover.

The global economy provides process industries with the constant challenge of reducing costs. with one of the most common methods being 'asset sweating' (obtaining greater efficiencies from existing plant). Typically this results in operators driving production units closer to their limits. The very act of driving a plant harder increases the stresses under which it is forced to operate and increases the likelihood of a plant upset or shutdown.

Plant incidents can be categorised from minor to major, as shown in Table 1. Even a relatively 'minor' incident involving loss of production can result in significant financial loss. It is estimated that losses due to downtime cost the petrochemical industries around US$ 20 billion/yr. Typically a refinery will be affected by an incident every three years,at an average cost of US$ 80 million.

Alarm monitoring

Figure 1. SER trip report
Figure 1. SER trip report.
The red line indicates the detected trip.

There are many systems that generate alarms and events, such as EPS, F&G and ESD systems. Collecting A&E data from individual disparate systems (using an SOE recorder, for example) while performing an important role does not provide a complete or holistic view of what is happening.

The source of a trip may not be within a unit, but rather from an adjoining unit. For example, a boiler shutdown will shut down major units that require steam. This will cause trips from several plants that require the same utility.

When a plant trip occurs, the plant operations team is under significant pressure to determine the cause of the trip, rectify the underlying problem that caused the incident and getting the plant up and running again as quickly as possible. The investigation team need ready access to the alarm and event messages and process data tags in a form they can filter, sort and manipulate so that the few key items that reveal the cause can be extracted and analysed.

A key requirement for analysis of the sequence of events leading up to and immediately following the trip is the pulling together of A&E messages and relevant process data in a consecutive time series, based on correctly synchronised time stamps.

Systems that only provide a view of alarms and events, without also considering process data, cannot show the full picture, since they cannot also show related continuous process history. Process data such as pressure, temperature and flow can reveal issues otherwise hidden by an analysis of A&E data alone. For example, if the plant trip was caused by a very high temperature in the reactor, did that situation develop suddenly or was it the result of a prolonged steady increase over a 10 hour period, during which the plant operators were ignoring the alarms? If the temperature was rising gradually, then a key issue for investigation would be why it was not picked up. If it shot up because of an abnormal exothermic reaction that could not have been foreseen, the underlying problem would be completely different.

Time synchronisation

Figure 2. Exaquantum SER architecture.
Figure 2. Exaquantum SER architecture.

Time synchronisation is a prerequisite for the correct time sequencing alarm and event messages. If the different systems in a plant are not synchronised then trying to bring together events from more than one system is completely useless.

Most process data is likely to be time stamped at the one second level, but millisecond timing is often the norm for safety related systems and other fast acting systems such as electrical systems involving switchgear.

If time stamps accurate to one second are required, inaccuracies of a few milliseconds make little difference. If however, time synchronisation in a plant is at the millisecond level, then the clocks have to be synchronised and be accurate to one millisecond between systems, otherwise the sequence of events will not be reliable.

The lEEE1588 protocol provides a full solution for synchronising slave clocks to a master clock within an ethernet network. This ensures that events and time stamps in all devices use the same time base. The protocol also measures and corrects time skew caused by clock offsets and network delays.

A further pitfall can come in the biannual switch from winter to summer time and back. Here. clocks must all follow one rule, either to follow the changes or to remain with one time standard.

It is also important to remember that relativity is more important than ultimate accuracy. If a reference clock is used, as long as all time stamps are synchronised to the same clock, then a trip report can reconstruct an accurate sequence of events.

SER

Table 1. Plant incidents
Major Loss of life
Environmental
Asset
Production
Minor Quality

Exaquantum/Sequence of Events Flecorder (SER) from Yokogawa is designed specifically to identify and extract A&E and process data relating to plant trip events. Working in conjunction with Exaquantum/PIMS, SER's trip detection system monitors all messages against predefined trip conditions stored in the system.

Many hundreds or thousands of alarms and process data tags will be generated by the plant following a trip, but perhaps just 10 or 20 will be useful in identifying its cause. SER's filtering, based on identifying a trip condition. enables these to be identified and extracted. Normally, data and messages relating to the item that has tripped and its surrounding equipment are extracted for investigation.

For example, if a compressor trips due to a high pressure alarm that is identified as a contributing factor, SER wilt typically collate information for related process equipment events/utilities, critical flow, temperature and pressure data for the compressor.

Pre and post data collection

A trip report can contain a small number of alarms and events and process data or it may contain many thousands. The number depends on the filters and the time period selected before and after the event. With SER both time periods can be configured 'independently. As noted, pretrip analysis reveals the plant conditions, alarms and operator actions that led up to the trip. Post trip analysis reveals whether the plant process and safety systems responded to the unexpected shutdown. For example, were they programmed properly and did the safety systems kick in and shut down the plant units in the right order? Post trip data also allows for the detection of a trip condition that was not configured within SER and that occurred after the element which triggered SER.

Data collection

SER uses Exaquantum/PIMS to provide base data collection, through OPC A&E and process data tags. The Exaquantum's real time and historical database system is designed to cope for high data rates during critical periods from OPC servers, enabling it to collect both current (OPC DA and AE) and historical (OPC HDA and HAE) events. Since there is no licence charge for collecting alarm and event data, these can be collected at the same time as process tags, without a cost penalty.

The system monitors all messages against predefined trip conditions stored within the system. When a matching trip condition is detected, the required point data and A&E messages are collated into a trip report and stored within the system. These trips are then available for selection and viewing at the user's choice of location through a web browser. This could be the control room or anywhere with access to the internet.

Defining a plant trip

Figure 3. CAMS real time A&E monitor.
Figure 3. CAMS real time A&E monitor.

A key part of SER's configuration involves the inputting of parameters that define when a plant trip has occurred. While there are thousands of individual components in a plant, trips will inevitability result in a few key process conditions or events occurring. If a plant goes down, key process equipment such as reactors and compressors will invariably trip. They may not be the cause of the trip but the fact that they have tripped provides a very good indicator that the plant has gone down. Consequently. it is not necessary to identify all the possible trip conditions, but to identify a reliable method such as a sequence of key events that when they occur they reveal that the plant or process has tripped. For example, in an ethylene unit, if the main process compressor trips, it is inevitable that the plant has gone down. This may have been caused by a range of factors, including another related unit going down.

Sequence of events recording is a legal requirement for automated plants and traditionally has been addressed through serial printer outputs. This approach has met the legal requirement but does not provide retrospective analysis of root causes and more complex reporting. SER addresses the legal requirements and provides a powerful toolkit to operators, engineers and production staff alike to improve safety and maximise plant uptime.

Alarm management

Yokogawa's recently released alarm management package, CAMS, has been designed to meet EEMUA No. 191's guidelines. CAMS has been designed as a practical, overarching real time alarm management system that delivers the right information to the right people at the right time. It provides operations staff with four key tools, which enable them to recognise and then deal with those alarms that need action.

Sorting/filtering

Operators can filter and sort alarm messages using alarm attribute identifiers included in the alarm messages as the keys. Engineering free filters such as the process alarm filter and plant hierarchy filter can be used immediately. Users can define filters unique to individual users and create temporary filters flexibly.

Eclipsing

Alarm messages that are repeatedly activated by the same tag can be collapsed and displayed as a single line. Eclipsing reduces the number of alarm messages being displayed and allows operators to identify critical alarms easily.

Shelving

Unnecessary alarms can be temporarily moved to the 'shelves'. Operators can shelve unnecessary alarms manually. Chattering alarms can be automatically moved to the shelf for a certain period of time or at a certain clock time. Similarly, alarms that match user defined conditions can be automatically moved to the shelves.

Load shedding

If too many alarms are displayed in a short period of time due to an unexpected event, predefined filters can be automatically activated to reduce the operators' monitoring load.


Top