INCREASING RELIABILITY OF COTS-BASED EPS SUBSYSTEMS FOR NANOSATELLITES

David Selčan(1), Gregor Kirbiš(1), Iztok Kramberger(1)

(1) University of Maribor, Faculty of Electrical Engineering and Computer Science, Smetanova ulica 17, 2000 Maribor, Slovenia, david.selcan@um.si

ABSTRACT

This paper presents an approach to increasing the reliability of a nanosatellite EPS (Electrical Power System) while expanding upon its functionality. The approach is split into three different categories, which include: replacing complex integrated circuit components with circuits of analog components, changing the architecture of the subsystem to rely on a protected FPGA and finally using a combination of the two to implement protection functionality of the EPS system. These methods were applied during the design of the TRISAT satellite EPS. Specifically, the introduction of an Analog Maximum Power Point Tracking circuit and a Transformer Coupled Charge Sharing battery balancer using LiFePO4 battery cells to the EPS power generation and storage is presented. Additionally, the use of Latching Current Limiters for both, protection of EPS functionality as well as for power distribution is shown. Finally, methods to protect the FPGA circuit itself, using a combination of careful logic design based on a TMR method, coupled with external circuitry are demonstrated.

1 INTRODUCTION

An Electrical Power System (EPS) can be considered to be one of the most important components of a nanosatellite – its failure has a high chance of resulting in a complete failure of the whole mission. This can be considered an especially problematic aspect for nanosatellite developers as, due to the history of their development and their size and mass constraints, little effort has been invested into increasing their reliability, including their built-in redundancy. Coupled with the heavy use of COTS components, nanosatellites of today are ill prepared for operation in a high-radiation environment.

Even though the use of nanosatellites is primarily limited to the Low Earth Orbit (LEO), which means that most missions proceeded without major problems even with the potential lack in reliability, there have already been recorded failures of operational nanosatellites that occurred because of a failure of the EPS [1]. Additionally, with the increasing popularity of the nanosatellite platform (more than 110 different nanosatellites were launched in 2015 alone, with the number expected to increase even further in the following years [2]), their use cases have begun expanding, where talks are already underway to use the nanosatellite platform in harsher environments (e.g. interplanetary orbits). As such, in order for nanosatellites to become prevalent in orbits above LEO, where the radiation environment and operational constraints are much harsher, one method would be to modify the designs of the EPS to be more in line with how larger satellites are designed and built [3].

The problem with this approach is that the EPS is typically the single component with the biggest impact on mass and volume, making it impractical to fly multiple redundant systems. As such, this
paper presents several methods which, combined together, increase the reliability of a nanosatellite EPS while expanding upon its functionality. The methods themselves are showcased together with how they were actually implemented on the EPS of the TRISAT satellite, which will be used on the TRISAT mission – a 3U nanosatellite technology demonstration project being led by the University of Maribor.

The proposed methods are compatible with the most important properties of nanosatellites: their low costs and short development cycles. They are additionally based on the use of COTS components and impose as little impact as possible on the mass and size of the satellite EPS. The primary focus of the methods is the prevention of faults caused by radiation (specifically SEE – Single Event Effects) but it is also useful for preventing other types of faults, including those caused by design failures. By following the presented approach when designing an EPS for a nanosatellite, its reliable operation in harsher-than-LEO conditions can be assured.

2 IMPROVING RELIABILITY OF NANOSATELLITE ELECTRONIC SYSTEMS

The methods with which to improve reliability of a nanosatellite have been grouped into three distinct categories of methods, which can be used to improving reliability of nanosatellite systems, while facilitating the use of COTS components. The three separate approach categories, which can be applied separately or used together, are:

- Analog component replacement – this category consists of redesigning circuits which rely on digital ICs to instead use general (usually analog) COTS components, which are more resistant to radiation effects.
- Use of FPGA logic – this category includes replacing all digital components with a reliable anti-fuse or flash-based Field Programmable Gate Arrays (FPGAs), which allows all the more complex functionality programmed in microcontrollers or other dedicated ICs to still be retained, while allowing the logic design to utilize methods that are more resistant to SEE.
- Mixed approach – this category represents a mixture of the previous two categories, which includes simplifying the remaining parts of the system to use a mixed design consisting of general COTS components and the logic inside the FPGA.

2.1 Analog component replacement method

The main idea behind the analog component replacement method is that COTS ICs are inherently vulnerable to SEE effects, especially SEL – Single Event Latch-ups. To mitigate their vulnerability to SEE, each component must be protected by a LCL – Latching Current Limiter, which imposes further constraint on the mass and size footprint of the system. Alternatively, the functionality of most COTS ICs can be replaced by a specifically designed analog circuit, using carefully selected components. In addition to being more resistant to radiation effect, this replacement circuit is usually less complex (as all the unneeded functionality is not implemented) and as such it becomes easier to assess its actual reliability.
Figure 1: Example of replacing an integrated component with an analog circuit

The critical part of this method is the selection of components to use in the replacement analog circuit. It is crucial that each component is immune SEL effects and is operational in the target radiation environment. For this purpose, the following guidelines should be followed:

- Passive components (resistors, inductors, capacitors, diodes) and discrete bipolar transistors can be presumed immune to a TID of up to 30 krad [4].
- P-type MOSFET transistors and NPN-type bipolar transistors should be preferred where possible, as these types are more latch-up tolerant than their complements [5]. N-type MOSFETs should be avoided, as due to the fact that they are produced on a P-type substrate, a possible thyristor configuration can become active in the presence of radiation.
- More complex COTS components can be used, provided that an analysis is done into their radiation-hardness. Specifically, the most important characteristics are either the existence of radiation testing results and the fabrication process used in their production.
- The Silicon-on-Insulator – SOI fabrication process should be preferred, as it provides an inherent protection against SEL effects. Examples of this include the Texas Instruments’ BiCOM or Analog Devices’ XFCB process. Some components produced with a classical bipolar process can also be suitable, provided enhanced low-dose radiation sensitivity – ELDRS hardness can be assured.
- Finally, if a specific functionality cannot be effectively implemented otherwise, COTS parts of at least an automotive grade, which can also be obtained as a radiation hardened version, should be used. It can be presumed that the underlying silicon structure of the COTS part is the same to the radiation-hardened part meaning that they are more resistant to radiation than other parts, for which such assurances cannot be made.

2.2 FPGA replacement method

Alternatively, some functionality can be too complex to be implemented entirely in a completely analog format. In this case, there exists an alternative, where a single anti-fuse or flash-based FPGA, coupled together with some external protection circuitry (consisting of at least a LCL) is used.
The primary advantage afforded by the FPGA is that the designer has complete control over the design of the logic functionality. This enables the use of fault mitigation techniques such as error coding of registers or the use of Triple Modular Redundancy – TMR on the logic circuit, which increases the reliability of the system and mitigates against soft SEE errors: Single Event Upsets – SEU and Single Event Transients – SET.

For example, to protect the synthesized logic inside the FPGA from said radiation-induced effects a modified TMR scheme, termed Temporally-Redundant TMR, can be employed [6]. This scheme is based around the use of three separately implemented copies of the logic system, which work in parallel. When implemented correctly (synchronous with a single clock, with correctly designed asynchronous paths), the behavior of each such instance is deterministic – meaning that if an error occurs in a single instance, the other two still function correctly. This way, voter circuits on the output pins assure that errors do not get propagated outside the FPGA circuit. This way, all single SEU, SET and SEL events inside the logic circuit can be properly mitigated.

This method requires that all inputs and outputs (including clocks) are connected to 3 pins on different I/O banks on the FPGA (see [7] for details). The primary benefit when used this way is that it is trivial to synthesis logic with COTS FPGA synthesis tools. Additionally, another benefit of this modified TMR scheme is that the logic overhead is less than that of more robust TMR, which uses voting circuits on all registers in a design.
2.3 Mixed approach

The FPGA approach has one small shortfall – protecting the FPGA itself is of utmost importance. By combining the previously described methods, it is possible to combine both methods – use analog circuitry for simple functionality but implement the more complex parts as logic inside the FPGA. In this way, the FPGA itself, as well as other parts of the system, can also be protected against SEE by external protection circuitry. In addition, this approach can be applied to most other parts of any subsystem, including the EPS. To give an example, two important circuits are used for this purpose.

The first is a custom design of a LCL. The primary advantage of it is that most of its complex functionality (delay after trigger, restart delay, etc.) can be implemented as counters inside the FPGA, while the switching element and supporting circuitry can be implemented as rugged dedicated components. In case the FPGA itself must be protected, the complex delay functionality can additionally be simplified and implemented using discrete components.

The other circuit is a carefully designed analog watchdog timer, which is used to reset the whole FPGA whenever a Single Event Functional Interrupt – SEFI occurs.

3 NANOSATELLITE ELECTRICAL POWER SYSTEM CASE STUDY

The methods described in the previous section were used in the design of the Electrical Power System (EPS) of the TRISAT nanosatellite.
For the purpose of explaining how the methods can be effectively applied, we have categorized the functionality of the EPS into four important sections: power generation, power storage, power distribution and system control. Power generation and conditioning consists of circuits which regulate the power obtained from the solar panels and condition it so that the battery pack is charged safely – this includes using Maximum Power Point Tracking – MPPT algorithms and undervoltage/over-current protections. In the next step, battery management is responsible for safe discharging of the battery pack and also performs the function of battery balancing. The power distribution part is responsible for switching power to individual subsystems on-board the satellite, including over-current protection and voltage monitoring. Finally, the system control part is responsible for monitoring important parameters of the EPS, performing local FDIR, controlling power distribution and reporting this data as telemetry.

3.1 Power generation and conditioning

By careful component selection and the use of advanced analog circuits, it was determined that it is possible to apply the first step of the approach to the power generation section of an EPS. For this purpose, an Analog Maximum Power Point Tracking (AMPPT) circuit, which includes an overvoltage protection circuit, was developed.
The primary operating principle on the AMPPT circuit relies on a modified perturb and observe method, where a current monitor is used to measure the output current of the AMPPT circuit. This signal is then derived using a differentiator circuit, discretized using a comparator and then fed into a direction logic circuit. The output of the logic circuit is then integrated and compared with a triangle wave, resulting in a PWM modulated signal, which is used to drive the step-down converter. Essentially, the direction logic circuit determines in which direction the parameters of the switching regulator are changed. In case the derivative of the output current is changed into negative, the direction also changes. The direction is also reversed in case the derivative remains negative for too long. To increase the tracking accuracy of the circuit, this delay period decreases with increasing output current. For a more complete explanation of the circuit refer to [8]. The proposed circuit achieves an 89% power conversion efficiency and a 96% tracking accuracy.

The presented AMPPT circuit fulfills most of the primary functions of the power generation part of an EPS: management of power transfer from the solar panels to the battery pack and protection of the battery. The only functionality it does not possess is over-current charging protection. This is
due to the fact that LiFePO4 type battery cells are used on-board the TRISAT satellite, which have a high maximum charging and discharging current, which is their primary advantage that makes them very suitable and robust for use on satellite missions. To determine the suitability of LiFePO4 batteries for use on space mission, a characterization in a vacuum chamber was performed, where it was found that the battery capacity remains within the specified range during extensive power cycling.

![Graph showing LiFePO4 extended cycling, 20°C](image)

**Figure 10:** Capacity in regards to charge/discharge cycle for LiFePO4 batteries

As the LiFePO4 batteries will be charged and discharged periodically for the whole duration of the mission, a battery balancing scheme also needs to be present. A specific problem with battery balancing is that it is difficult to perform active battery balancing without complex control algorithms, which are unsuitable for implementation as analog circuits. For this reason, a balancing scheme with a simple control algorithm was identified (for an overview of alternative battery balancing methods, refer to [9]). This battery balancing approach, named the Transformer Coupled Charge Sharing (TCCS) utilizes a pair of FET transistors coupled with an inductive transformer for each battery cell being balanced. But its primary advantage is that it can be controlled with a constant duty cycle PWM signal.

![Diagram of Transformer Coupled Charge Sharing battery balancing](image)

**Figure 11:** Transformer Coupled Charge Sharing battery balancing

Using this battery balancing approach, an efficiency of 84% was achieved.

![Graph showing balancing of 3 LiFePO4 battery cells using the TCCS balancer](image)

**Figure 12:** Balancing of 3 LiFePO4 battery cells using the TCCS balancer
3.3 Power distribution

Due to the fact that the power distribution system must be controlled by digital interfaces, it cannot be designed as a purely analog circuit. As such, by using the mixed design approach, it was possible to make it more robust. This was achieved by implementing the power distribution switches as dedicated analog circuits, similar to the ones used for LCLs, while the telemetry monitoring functionality (which also performs the over-current shutdown action) is implemented as logic inside the FPGA.

![Power Distribution LCL with Delta-Sigma ADCs](image)

Figure 13: Power Distribution LCL with Delta-Sigma ADCs

The FPGA functionality includes a delta-sigma digital backend, coupled with a comparison circuit for shutting down the power output when an over-current condition occurs. Each ADC only requires a single external comparator for proper functionality. As the over-current protection is implemented inside the FPGA, its parameters remain highly stable with temperature variations – turn off time (the time it takes the circuit to turn off the power output after the current is limited by the analog circuit) is on the order of 10 ms, while the (settable) trip current varies by less than 5% across the whole temperature range.

![Power Distribution LCL trip current and resistance with regards to temperature](image)

Figure 14: Power Distribution LCL trip current and resistance with regards to temperature

3.4 System control

The system control part (which includes interfaces to the communication bus and local Fault Detection, Isolation and Recovery (FDIR)) is implemented inside the FPGA, where the additional design techniques presented in the previous section were used to increase reliability. The flexibility that is introduced into the EPS system by using an FPGA circuit not only increases the reliability of the EPS itself, but can be used to protect other subsystems from various faults as well (by
implementing, for example, a watchdog timer that power cycles subsystems if they stop responding). In addition to FPGA design techniques which are used to improve reliability, another addition which can be implemented is to use two independent oscillators to provide the clock for all the logic inside the FPGA. This has the advantage of preventing complete loss of functionality in case a clock oscillator becomes damaged. Instead, the watchdog timer performs a reset, and the system uses the other, still working oscillator and continues with its intended operation.

![Figure 15: Redundant clock distribution implementation](image)

To facilitate this, two oscillators are each powered by a dedicated initially-on LCL. The FPGA then uses a pair of shift registers with clock domain crossing functionality to perform the clock selection and logic reset functionality. The oscillator which is first to stabilize is then used until the EPS FPGA is power cycled, when the arbitration process is begun anew. The redundant clock is powered down after the arbitration process is finished to conserve energy.

![Figure 16: Redundant clock distribution implementation](image)

4 CONCLUSION

It can be expected that with the ever increasing use of the nanosatellite platform some missions will be undertaken where the reliability of the subsystems used will need to be higher than what is
currently available off-the-shelf. For this purpose, this paper presented methods which can be used to design more robust nanosatellite electronic systems. Specifically, the combination of implementing simpler aspects of the system as purely analog circuits, which are then managed by a FPGA, which is designed to be resistant to SEE effects should provide to be especially resistant to various faults typically encountered in nanosatellite systems. The FPGA itself can be protected by using a simple TMR-based coding scheme, which can be implemented in most FPGA development environments that do not support TMR coding by default.

To showcase these techniques, the more relevant parts of the EPS, designed for the TRISAT mission, were showcased. These include a purely analog MPPT implementation, an unmanaged battery balancing approach and the use of a novel battery technology, which synergizes especially well with the aforementioned systems. Additionally, with the addition of a robust FPGA, many functions of an EPS can be implemented in a more efficient manner: LCLs can be simplified by tightly integrating them with the FPGA logic, the power distribution can be managed more thoroughly by integrating delta-sigma ADCs with the LCLs and to improve the reliability of the FPGA circuits itself two separate clock sources and an external analog watchdog circuit can be used.

Finally, while the methods can be selectively applied to parts of an existing nanosatellite EPS design, combined together, they can be used to eliminate all single points of failure of a nanosatellite power system.

5 REFERENCES