In contrast to my earlier posts about the project, this post is less technical and perhaps a bit more “dry” in nature: today talk about the research perspective on the project.
Research questions
When I proposed my project to Idaflieg and DLR, I wrote down three primary questions that I wanted to answer:
- Is having continuous feedback of angle of attack perceived to have a positive/negative effect on the safe execution of the flight or no effect at all?
- Is the CL-approximation of angle of attack perceived to be helpful? In which maneuvers is it/is it not?
- Is the manner in which the device provides feedback perceived to be appropriate?
- Optionally: how does this improve over time if we can test multiple days? This didn’t happen.
Notice that these are subjective questions. They deal with the perception of the pilot, not with the facts. This was a deliberate choice. Although one can probably draw more significant conclusions from facts, I figured that measuring these facts would require some sort of A/B testing. Unless we deliberately introduce potentially unsafe situations, I thought it would be likely that the results would be very ambiguous. So, I decided to deal with the pilot’s perception for now…
Method of evaluation
It was pretty quickly obvious that some kind of questionnaire would be what I would use. Thanks to my wife I came across Expectation Confirmation Theory, and found a related paper. This paper dealt with the (future) evaluation of a user interface, which seemed related enough to use it.
A questionnaire was created that reflected the categories in the mentioned paper:
- Perceived Usefulness (PU), the user’s perceptions of the expected benefits of using Hobbes.
- Perceived Ease of Use (PEoU), the ease and convenience of using Hobbes.
- Perceived Performance (PP), how well and clearly Hobbes warns of stalls.
- Expectation (E), how Hobbes relates to the user’s expectations.
- Confirmation (C), how Hobbes relates to the user’s needs.
- Satisfaction (S), whether users are content with Hobbes.
- Continuance Intention (CI), whether users would like to fly more with it and recommend it.
- Interface Quality (IQ), detailed questions about the device’s audio feedback (interval, tone frequency, volume). Mainly intended for the fourth research question.
All questions are positively phrased and use the Likert scale. In each category at least two questions are asked, where both questions try to measure the same thing but use different words. For example:
- “The device works better than expected”, and
- “The device fits my expectations”.
This is done in an attempt to find inconsistencies. If the answers to the above two questions are radically different (more than 2 points apart), this might be a sign of inconsistency or misinterpretation of one of the questions. I ignore those answers.
For each category for each flight, I calculate the average Likert score. So I end up with one score for each category, for each flight.
Next to averages I look at the correlation between the various categories, in an effort to find out more about the perception of the system and explain anomalies.
I leave room on the questionnaire to provide additional feedback, and I attempt to talk to pilots after their flight. I received video footage of three flights.
Results
In total 8 evaluated flights took place with Hobbes. Flights were performed by student pilots, experienced cross country pilots and a test pilot. The system was evaluated both in normal flight and inverted flight. Here are the results of each category. The rectangles reflect the range from minimum average value to maximum average value. The black bar reflects the average value.
The correlations look like this:
Three correlations stand out for me. I’ve tried to come up with an explanation for these.
- The 0.00 correlation between Perceived Performance and Confirmation.
- The 0.28 correlation between Expectations and Confirmation.
- The 0.32 correlation between Confirmation and Satisfaction.
- This correlation illustrates the relationship between how well and clearly Hobbes warns of stalls versus how Hobbes relates to the user’s needs. This signals that the pilots that evaluated the system feel no need to fly with an AudioAoA system, even though my system works well enough.
- This correlation illustrates the relationship between how Hobbes relates to the user’s expectations versus how Hobbes relates to the user’s needs. This signals that although Hobbes works better than expected, it does not completely do what the pilots feel they need.
- How Hobbes relates to the user’s needs versus whether users are content with Hobbes. This signals the same thing as number 2: it works, but it doesn’t really do what pilots feel they need.
Remarks and video observations
A few remarks show clear room for improvement, but also the added value of good test pilots that not only do as asked but investigate:
- “We tried stalling dynamically, which almost doesn’t get picked up”.
- “Sometimes the sound cuts out once you are in full Stall. I tried different calibrations, and a calibration closer to stall-speed worked better than the one close to first mushiness of the controls”.
Other remarks show issues with integration:
- “For normal flight it gets a bit annoying as it also beeps in non-critical situations.”
- “Sometimes it was hard to concentrate on the device and also the variometer, such that just one got my attention.”
- “Sound is the first thing a pilot ignores in high-stress situations. Perhaps an added LED-indication would help.”
Conclusions & personal takeaways
With the above info, I believe we can provide credible answers to the original three research questions:
- Is having continuous feedback of angle of attack perceived to have a positive/negative effect on the safe execution of the flight or no effect at all?
Yes, Perceived Usefulness is on average 3.6 and never below 3. - Is the CL-approximation of angle of attack perceived to be helpful? In which maneuvers is it/is it not?
It is helpful, except for flapped gliders dynamic stalls. It is unclear if dynamic stall behavior is a result of the input data or my arithmetic. - Is the manner in which the device provides feedback perceived to be appropriate?
Yes….and no. In general feedback is positive, with a Interface Quality average of almost 4. However, correlations indicate that some things are not satisfactory yet. Pilots indicate that the device is not loud enough. It should also probably not always beep, if we’re to use it as a stall-warning device. Pilot feedback shows that integration with other cockpit audio sources is a complex problem, which was not solved satisfactory.
Recommendations
- “Do or die” audio seems more appropriate than always giving feedback, although the gradient up to stall is perceived positively by pilots. Consider limiting feedback to a CL-range close to stall.
- Use a pilot-facing speaker and LED to solve volume and readability issues.
- Spread the evaluation out over a longer period, with multiple flights per pilot.
- Evaluate these kinds of systems over a longer time period, to pilots to get used to the system and acquire more feedback. This might also be an opportunity for tweaking parameters and re-evaluation, as I initially intended.
Outlook
While the experiment as such is finished, there are some things to look out for in the future:
- One of the Idaflieg pilots will continue to fly with Hobbes in his H101 Salto aerobatic glider. This allows for more iterations to be tested.
- Early tests with a flap-independent algorithm were very promising. This method requires knowledge of the glide polar of the glider, which gives rise to the need to know inverted polars for applicable gliders. These have not yet been determined, but make for exciting test flights.
- The experiments clearly show that integration in glider cockpits might have great benefits for flight safety. This opens up a new and exciting area of research.