Embedded Control Example

From AadlWiki

Jump to: navigation, search

Contents

Introduction

This example is inspired by the embedded control example from PRISM. You can get more information about this initial case-study here: http://www.prismmodelchecker.org/casestudies/embedded.php This example is an exercise to map a concrete example into an AADL model and experience the following functionalities of our Error-Model toolset:

  • Fault Hazard Analysis (FHA)
  • Reliability Block Diagram
  • Export to PRISM

The model is available on github at this address: https://github.com/osate/examples/tree/master/embedded-control Please provide any feedback on the model itself.

The system can be described as follow: The system comprises an input processor (PI) which reads and processes data from three sensors (S1, S2, S3). It is then polled by a main processor (PM) which, in turn, passes instructions to an output processor (PO). This process occurs cyclically, with the length of the cycle controlled by a timer in the main processor. The output processor (PO) takes the instructions it receives and uses them to control two actuators (A1, A2). Finally, a bus connects the three processors together. A concrete example of such a system might be a gas boiler, where the sensors are thermostats and the actuators are valves.

Advanced Version

An advanced version with error propagation is also available. You can learn more about it on this page.


AADL model

The AADL is composed of the following components:

  • Three input devices (sensors). They send output to a processor and all are bound to the same processor.
  • Three processors: one for processing the input (pi), another for processing and monitoring the data (pm) and another to produce and send the outputs (po)
  • Two actuators, a1 and a2
  • One bus connecting the processors for exchanging data.

The following picture illustrates the instance model.

Image:Embeddedcontrol-instance.png

Reliability Modeling

Overview of the reliability policy

The fault and failure policy is the following:

  • A sensor fails every month
  • A processor fails every month
  • An actuator fails every two months
  • A processor as a transient failure every day. When it is in the failed mode, a recover event happens in about 30 seconds.

Any of the three sensors can fail, but they are used in triple modular redundancy: the input processor can determine sufficient information to proceed provided two of the three are functional. If more than one becomes unavailable, the processor reports this fact to the main processor and the system is shut down. In similar fashion, it is sufficient for only one of the two actuators to be working, but if this is not the case, the output processor tells the main processor to shut the system down. The I/O processors themselves can also fail. This can be either a permanent fault or a transient fault. In the latter case, the situation can be rectified automatically by the processor rebooting itself. In either case, if the PI or PO processor is unavailable and this leads to PM being unable either to read data from PI or send instructions to PO, then PM is forced to skip the current cycle.


Modelling the main system, use of composite behavior

The main system instance has two states: operational and failed. The failed mode is triggered under the following conditions:

  1. Two sensors are in the failed mode
  2. A processor is in the failed mode
  3. The two actuators are in the failed mode

We capture this requirement using the following composite error behavior (see AADL code below).

	use behavior ErrorModelLibrary::Simple;
	composite error behavior
	states
		[((s1.Operational and s2.Operational ) or (s1.Operational and s3.Operational ) or (s3.Operational and s2.Operational ) ) or (a1.Operational or a2.Operational)]-> Operational;
		[((s1.Failed and s2.Failed ) or (s1.Failed and s3.Failed ) or (s3.Failed and s2.Failed ) ) or (a1.Failed or a2.Failed)]-> Failed;
	end composite;

Also, in order to being able to make fault evaluation with the Reliability Block Diagram (RBD) approach, we add properties to characterize the probability for a component to be in a particular state. The following AADL code address that aspect.

		properties
		
		--  The following values are used rather for the RDB
		--  They are not used by PRISM itself
		EMV2::OccurrenceDistribution => [ ProbabilityValue => 0.00003 ; Distribution => Fixed;] applies to s1.Failed;
		EMV2::OccurrenceDistribution => [ ProbabilityValue => 0.00003 ; Distribution => Fixed;] applies to s2.Failed;
		EMV2::OccurrenceDistribution => [ ProbabilityValue => 0.00003 ; Distribution => Fixed;] applies to s3.Failed;
		EMV2::OccurrenceDistribution => [ ProbabilityValue => 0.0001 ; Distribution => Fixed;] applies to a1.Failed;
		EMV2::OccurrenceDistribution => [ ProbabilityValue => 0.0001 ; Distribution => Fixed;] applies to a2.Failed;

Modelling the sensors

Each sensor is a device component with an associated error model. Its related error state machine is very basic and has only two states: operational and failure. The component goes from the operational to the failed state with a probability of 3.85e-7. To reflect that, we add the probability value to the component using the OccurrenceDistribution property. We also add properties and information in order to be able to generate the FHA report. The AADL description of the component is replicated here:

device sensor
features
	tempval : out event data port temperature;
annex EMV2 {**
	use types ErrorLibrary;
	use behavior ErrorModelLibrary::Simple;
	properties
	    --  The following properties are used to fill the PRISM model
	    --  and the FHA report.
	    --  The value here corresponds to lambda_s in the PRISM model
	    --  lamba_s = 1/(30*24*60*60)
	    --  from the specifications, it corresponds to 1 fault per month
		EMV2::OccurrenceDistribution => [ ProbabilityValue => 3.85e-7 ; Distribution => Poisson;] applies to Failure;
		EMV2::severity => 1 applies to Failure;
		EMV2::likelihood => C applies to Failure;
		EMV2::hazard => 
		[	crossreference => "X.X.X.X";
			failure => "Loss of sensor readings";
			phase => "all";
			description => "No position readings due to sensor failure";
			comment => "Not critical as long as two sensors are operating";
		]
		applies to Failure;
**};
end sensor;


Modelling the processors

Each processor is specified using an AADL processor component. The processor has three states: operational, failed and transientfailure:

  • switch from operational to failuretransient happens once a day (Occurrence value: 1.15e-5)
  • switch from failuretransient to operational happens every 30 seconds (Occurrence value: 0.03)
  • switch from operational to failed happens every year (Occurrence value: 3.17e-8)

We add the appropriate OccurrenceDistribution property values on the error model to reflect this requirement. We also add other properties (severity, likelihood and hazard) in order to generate the FHA report.

processor generic_processor
features
   ba : requires bus access generic_bus.i;
annex EMV2 {**
	use types embedded_errlib;
	use behavior embedded_errlib::SimpleAndTransient;
	
	properties
		--  The following properties are used to fill the PRISM model
	    --  and the FHA report.
	    --  The value corresponds to lambda_s in the PRISM model
	    --  lamba_p = 1/(365*24*60*60)
	    --  from the specifications, it corresponds to 1 fault per year
		EMV2::OccurrenceDistribution => [ ProbabilityValue => 3.17e-8 ; Distribution => Poisson;] applies to Failure;
		EMV2::severity => 1 applies to Failure;
		EMV2::likelihood => C applies to Failure;
		EMV2::hazard => 
		[	crossreference => "Z.Z.Z";
			failure => "Cannot process data";
			phase => "all";
			description => "Permanent failure from the processor";
			comment => "Major issue, the processor is not operating";
		]
		applies to Failure;
	 	--  The following properties are used to fill the PRISM model
	    --  and the FHA report.
		--  The value corresponds to delta_r in the PRISM model
	    --  delta_r = 1/(1/30)
	    --  from the specifications, it corresponds to 1 recovery occurence each 30 seconds
		EMV2::OccurrenceDistribution => [ ProbabilityValue => 0.03 ; Distribution => Poisson;] applies to ResetEvent;
		
		
		--  The following properties are used to fill the PRISM model
	    --  and the FHA report.
		--  The value corresponds to delta_f in the PRISM model
	    --  delta_f = 1/(1/24*60*60)
	    --  from the specifications, it corresponds to 1 transient failure occurence each day
		EMV2::OccurrenceDistribution => [ ProbabilityValue => 1.15e-5 ; Distribution => Poisson;] applies to FailureTransient;
		EMV2::severity => 1 applies to FailureTransient;
		EMV2::likelihood => C applies to FailureTransient;
		EMV2::hazard => 
		[	crossreference => "A.A.A";
			failure => "Temporary unavailable";
			phase => "all";
			description => "Temporary failure of the processor";
			comment => "Happen once a day and likely recover quickly";
		]
		applies to FailureTransient;
**};
end generic_processor;


Modelling the actuators

An actuator is specified using a device component. As for the sensor, the error state is basic with two main states: operational and failed. The switch from operational to failed occurs every two months (occorrence value=1.92e-7). We add the appropriate properties as well as the one required for the generation of the FHA report. The corresponding AADL description is shown below:

device actuator
features
    action : in event data port action;
annex EMV2 {**
	use types ErrorLibrary;
	use behavior ErrorModelLibrary::Simple;
	properties
	    --  The following properties are used to fill the PRISM model
	    --  and the FHA report.
	    --  The value here corresponds to lambda_a in the PRISM model
	    --  lamba_a = 1/(2*30*24*60*60)
	    --  from the specifications, it corresponds to 1 fault per 2 months
		EMV2::OccurrenceDistribution => [ ProbabilityValue => 1.92e-7 ; Distribution => Poisson;] applies to Failure;
		EMV2::severity => 1 applies to Failure;
		EMV2::likelihood => C applies to Failure;
		EMV2::hazard => 
		[	crossreference => "Y.Y.Y";
			failure => "Cannot operate";
			phase => "all";
			description => "Cannot operate the device";
			comment => "Major hazard if both are not operating";
		]
		applies to Failure;
**};
end actuator;


Analysis and Reports

Fault Hazard report

From this model, we can first generate the Fault and Hazard Analysis report. It produces a CSV file that contains the description of system fault event and their propagation. It retrieves the property values of the model and aggregates them into an Excel document, as shown below. In this model, the following properties are used to produce this report: severity, likelihood and hazard.

Image:Embeddedcontrol-fha.PNG


Reliability Block Diagram

Then, we can generate some reliability metrics using the Reliability Block Diagram feature from OSATE. This gives a gross estimate of system reliability using en evaluation of state occurrence. This metrics is produced using the composite error behavior from the main system (root component) and the OccurenceDistribution value associated to its state in the same component. The figure below show the result of this analysis.

Image:Embeddedcontrol-rbd.PNG

Export to PRISM

Then, the model can be exported into PRISM. For this model, as we used the Poisson distributions value for specifying occurrence distribution concerns, we will use the CTMC PRISM model. For that, we change the type of PRISM model to be generated and produce the PRISM model. Then, once the model is generated, we can use it to make some verifications. For the rules used to generate a PRISM model from an AADL description, interested readers can read the associated manual available here: http://www.aadl.info/aadl/osate/osate-doc/osate-emv2/prism.html

One basic verification when using this model would consist in checking the probability for a processor to have a transient failure. To do so, we establish the following PRISM property:

P=? [ true U<=T*3600*24 ( pimain_i_instance_is_transientfailure | pomain_i_instance_is_transientfailure | pmmain_i_instance_is_transientfailure) ]

This property check the occurence for having a transient fault on one processor during a day, T being the number of days. In order to be validated, we add a constant T in the model, this would represent the time (in seconds). We can also see the result using a graphical representation by making experiments. This is available in PRISM and trace graphs using values for the model variable. We show below an experiment for the previous property with a value of T being 1 and 10. In other words, this experiment show the property for having a transient error on a processor during 10 days.

Image:Embeddedcontrol-prism.png

It appears that on the first day, the probability grows quickly and after 2 days, we are sure to have at least one transient failure on one processor. This is consistent with the failure description: on an average, a transient fault occurs daily on a processor (and is recovered within 30 seconds).


References

Personal tools