Building up a new standard of eye-tracking API

30 replies [Last post]
oleg
oleg's picture
Offline
Joined: Sep 7 2009

This discussion is about the conversion of eye-tracking devices into somewhat similar to mouse concerning its easiness and transparency in end-user software development. It is expected that the proposal of “everyone-agree-and-support” standard ways to control devices that access data will help in the achievement of this goal (conversion).

oleg
oleg's picture
Offline
Joined: Sep 7 2009
Vision

“Damn! Where is that… hey you! no… god…” He left the game, nervously hitting the mouse buttons. “Restart it!”. No effect… He tried all the actions he knew that may help to recover a device from an error: pulled out and in the plug and USB wire, then reinstalled the drivers… The eye tracker was dead. “Gad!” – today his score was better than usually and hi almost reached the top 10; just another 20 minutes and he would get it. Disappointed, he stood up made a few circles around the room… then sat down. It was clear that the game was over for today. “Well, not a big deal…” –he muttered, bracing himself and opening CNet. There were about 25 devices in the tab “Eye trackers”, and he has selected one of the most popular that automatically detects its position relatively to a screen and in space. “Twice cheaper that mine” – his eyes were smiling already as his mood was changing rapidly. “.. and twice better” – he easily paid 70 euros, anticipating his following wins as the device were noticeable more accurate than the dead one, almost without any restriction in movements as long as the face was turned to the screen.
In a few days he was attaching four tiny plastic boxes to the monitor corners. The system notified him about successful recognition of a new device, so that there was no need to install anything from the disk he found in the package. “Well-well”, he was impatiently moving the mouse cursor around the screen, willing to start the game after a long pause. But the system popped up a full-screen window with an exciting ad about the new device. In a half of minute it has reported about successful calibration and he could dive finally into the virtual world.
Sure, the tracker was much better that the old one. The pointing was simply perfect. Actions were executing just when he was making a decision to execute them. It was like the machine has finally learnt how to listen his thoughts. Soon he noticed he is missing those movements he used to do previously to correct the tracker’s pointer… Today he got his ever best score, leading the rating list…

oleg
oleg's picture
Offline
Joined: Sep 7 2009
Who may be interesting in this discussion and its results?

- Researchers – right now.
- Commercial developers – some of them already today (mostly those who provide usability services), others later.
- Manufacturers – standardization is the winning strategy, as the history shows usually (oh, right: VHS, CD, Blue-ray are the opposite examples, but these are exceptions). However, market leaders, IMHO, often resist to this kind of activity. On the other hand, since leaders usually develop the best products, the proposal of standards should take a lot from their solutions. After all, this discussion is not against manufacturers, and not for drawing limits for them: rather, it is for asking them of an alternative (and standard) way of access and communication with hardware in addition to what they provide already.

oleg
oleg's picture
Offline
Joined: Sep 7 2009
Why do we need standardization?

From COGAIN Deliverable 2.1 “Survey of De-Facto Standards in Eye Tracking”, chapter 2.1 “A need for standards”:

“At present, choosing and using a gaze communication system usually implies commitment to a certain eye-tracking device from a certain manufacturer, with gaze tracker and operating software forming a tight partnership. Typically, both eye-tracking hardware and software from a specific manufacturer or supplier will operate based on in-house standards that are not compatible with other manufacturers operating standards.
This 'closed system' state typically restricts any future extensions, upgrades or modifications of a gaze tracking system by the addition of third party hardware of software by the end user unless these additions are from the original manufacturer. This greatly restricts choice for gaze communication system users in a discriminatory way not experience by other users. For example, for other input devices such as a mouse, replacing the device with a newer version does not restrict the applications that can be operated with the chosen mouse.
This existing situation is understandable for several reasons. To date, the main application area of gaze trackers has not been human-computer interaction: their development history origins from medical and psychological diagnostics and experiments. Consequently, gaze trackers are not traditionally designed for compatibility with other systems and applications. This is an unfortunate situation, as this lack of inter-system compatibility results in end users having only a limited range of choice of systems and applications. Hence, to address this situation and give users choice over mixing and matching differing systems to best suit their needs, there is clearly a need for some form of accepted de-facto set of standards that manufacturers and developers can comply with, either by adopting such a standard, or by including such a standard as part of their proprietary standards.”

oleg
oleg's picture
Offline
Joined: Sep 7 2009
Don’t we have it already?

Yes, we have. ETU-Driver has been developed within COGAIN Network of Excellence WP2 and it is the first attempt to archive the compatibility in communication with various eye trackers. It supports six devices and provide three simulators for testing purposes. Its architecture is based on plug-in modules. It has been downloaded 250-300 times since it was developed (June 2006), but I am not aware of any software written by someone using it outside of University of Tampere, with a couple of exceptions.

oleg
oleg's picture
Offline
Joined: Sep 7 2009
What is wrong with ETU-Driver?

As it is often happens, the best solution of some problem comes only after the less optimal solution has been developed already :). Some of the shortages of ETU-Driver or things to improve are listed below:

- The time-stamping algorithm: it uses the timestamp that is reported by an eye-tracker, and typically it reports the timestamp of the moment the video frame was captured. In many cases, the time used to obtain the gaze position and transfer this data to the client application may be unacceptably long to treat it as “close to 0” (for example, ETU-Driver API-converters use Windows messages to report to ETU-Driver core about new data). The absence of the method to measure the true delay of the gaze sample is usually the greater problem that the delay itself. It is important to note that the not all eye-tracking system are able to provide the value of the delay, thus this problem is not only the ETU-Driver’s problem.

- ETU-Driver filters (only those that are marked as active) may cause noticeable delay in data transferring, as they block the execution of ETU-Driver core functionality until they are done with the gaze event processing.

- The data protocol in ETU-Driver misses the sample validity flag; the file that contains gaze sample (_s) also contains document X/Y offset values that are more suitable for another file (_x rather than _s).

- XML file storage format could be better structured.

- COM object do not support multiple clients.

Other ETU-Driver implementation problems may exist and be indicated by someone as well. But there is another problem of a distinct type that may spoil all the perfect possible ETU-Driver implementations: the data transferring protocol and API-Converter specification were not discussed with other researchers, developers and eye-tracking devices manufacturers. Due to the complete ignorance of ETU-Driver by manufacturers, the ETU-Driver author is forced to develop device supporting modules himself, while it should be the other way round – manufacturers should provide such modules for their hardware. Moreover, ETU-Driver API-Converters will become useless once the original equipment manufacturer ships libraries (also SDKs) that expose the same standard interfaces recognizable by ETU-Driver.

oleg
oleg's picture
Offline
Joined: Sep 7 2009
Searching for better solution

As I suggested earlier, the best way to reach the desired compatibility in hardware and single API for development is to open up a discussion. I suggest using the COGAIN public forum as the platform for this activity. Also, the discussion may be split into several topics. Some of them may be started immediately, some only after the members of the discussion group agree on solutions of the currently discussing problems. Here are some topics that came into my mind:

- Data structure of a gaze sample: what we need to know about it?

- Device properties: what constant and adjustable device characteristics we do need to have?

- API type and structure: defining standard interfaces and services.
a. API type;
b. Basic interfaces/services;
c. Auxiliary interfaces/services: researcher approach;
d. Auxiliary interfaces/services: developer approach;

- ETUDE standard support by manufacturers: what we need to do to make it real?

oleg
oleg's picture
Offline
Joined: Sep 7 2009
Final word

I have a great hope that we - researchers, developers, and users - will finally get what we deserve (don’t we? :)) – simple and easy way to use eye-tracking devices, as easy as any other devices that already left labs and became every-day-gadget for hundreds of people (yes, they are mostly disabled, but soon the proportion will change, I believe, especially if this project will succeed).

The discussion group will be named as “ETAPIS (Eye-Tracking Application Programming Interface Standardization) Group”.

Thiago Chaves de Oliveira Horta
Re: Building up a new standard of eye-tracking API

Hi,

I'm highly interested in helping set standards for eye-tracking. And here are my first thoughts in the discussion:

- It is important that, when implementing a standard eye-tracking API, we rely on multi-platform and open-source technologies rather than platform-specific libraries. This would save efforts in reimplementation (should there be changes in the standard) while reaching for the maximal amount of potential users and programmers.

- Such a standard should work in different layers:
* The lowest layer is merely worried about getting raw data from the device and returning something like "The user is looking at the coordinates X,Y of the screen" or "This device's accuracy is 2.0 degrees".
* Higher layers should provide messages that convey the user's intention "Activate that drop-down menu", "select the third entry in that drop-down menu". Mouse emulation, though popular, is hardly sufficient with widgets that can be left-clicked, right-clicked, dragged, react to keypresses, etc. On higher levels it's better to have messages with more meaning.
* Higher layers should also proactively activate extra tools to facilitate user's interaction, such as "A text input area was selected, let's automatically activate the system's default gaze text-input software".
* Higher layers should also be able to disambiguate user's intention, when necessary. For example: "The user is trying to push either button 1 or button 2, but I'm not sure which. Let's call a disambiguation software to help us decide what the user wants. Maybe we'll zoom automatically in this area or push the buttons away from each other so it is easier to tell."

- There's great potential for creating an intensely active market for eye-trackers if it becomes easy to develop for them:
* Webcam providers could experience a increase in sales in webcams due to lower-accuracy software eye-trackers, such as the OpenGazer.
* This could create a more active market for head-mounted cameras to be used for eye-tracking. Current producers of webcams would have little effort in entering this market.
* Producers of more specialized eye-trackers could experience an increase of consumer interest in the area. Plus they'd have the advantage of providing higher-accuracy eye-trackers, and even eye-trackers that have their own processor for eye-tracking (rather than leeching processor time from the user's computer).

oleg
oleg's picture
Offline
Joined: Sep 7 2009
Re: Building up a new standard of eye-tracking API

thiago wrote:

- It is important that, when implementing a standard eye-tracking API, we rely on multi-platform and open-source technologies rather than platform-specific libraries. This would save efforts in reimplementation (should there be changes in the standard) while reaching for the maximal amount of potential users and programmers.

Right, this strategy would be very handy. However, we should take into account that our modules have to be implemented to be the quickest in data transfer from hardware to client application. If cross-platform solutions will appear too slow, we will have to drop this idea.

thiago wrote:

- Such a standard should work in different layers:
* The lowest layer....
* Higher layers ...

Sure. The lowest layer will be the only one required from manufacturers. all possible highest levels will be build on top of it, say, by you and me :) (aka ETU-Driver core, that implements several device-independent features like data saving to files).

thiago wrote:

- There's great potential for creating an intensely active market for eye-trackers if it becomes easy to develop for them:

+1. I hope that marketing leaders understand it as well, and will support this discussion not in words only (although word, i.e. solutions and suggestions are highly important also).

Tersia
Re: Building up a new standard of eye-tracking API

Hi Oleg,

As a researcher and developer of applications that use gaze as a pc input, I strongly agree that we need a standard ET API. So how do we go about achieving this?

Adrian Voßkühler
Re: Building up a new standard of eye-tracking API

Hello,
Thanks Oleg for taking the initiative! I think this discussion is overdue and it will provide us with an easy to use, fast and reliable API sketch.

During developement of OGAMA we needed to create a very basic API interface similar to the ETU-Driver definition wich I would just show for discussion at the lowest level of implementation that each hardware manufacturer should be able to provide. Its written in C#.

code:

  /// <summary>
  /// This interface introduces a possibility to add new tracking hardware
  /// to the <see cref="RecordModule"/>.
  /// </summary>
  /// <remarks>For an example how to implement this interface have a look
  /// at the two existing implementations MouseOnlyInterface
  /// for the tracking of mouse data and Tobii.TobiiInterface for
  /// tracking with a Tobii (www.tobii.com) system and AleaInterface for
  /// tracking with a Alea Technologies (www.alea-technologies.com) system.
  /// Please also refer to the RecordModule source to add the
  /// new tracker to the user interface. Each tracker should have its own
  /// TabPage in the RecordModule.</remarks>
  public interface ITracker
  {
    /// <summary>
    /// An implementation of this event should
    /// send the new sampling data at each sampling time intervall.
    /// </summary>
    event GazeDataChangedEventHandler GazeDataChanged;

    /// <summary>
    /// An implementation of this method should do all
    /// connection routines for the specific hardware, so that the
    /// system is ready for calibration.
    /// </summary>
    /// <returns><strong>True</strong> if succesful connected to tracker,
    /// otherwise <strong>false</strong>.</returns>
    bool Connect();

    /// <summary>
    /// An implementation of this method should do the calibration
    /// for the specific hardware, so that the
    /// system is ready for recording.
    /// </summary>
    /// <returns><strong>True</strong> if succesful calibrated,
    /// otherwise <strong>false</strong>.</returns>
    bool Calibrate();

    /// <summary>
    /// An implementation of this method should start the recording
    /// for the specific hardware, so that the
    /// system sends <see cref="GazeDataChanged"/> events.
    /// </summary>
    void Record();

    /// <summary>
    /// An implementation of this method should do a clean up
    /// for the specific hardware, so that the
    /// system is ready for shut down.
    /// </summary>
    void CleanUp();

    /// <summary>
    /// An implementation of this method should supply
    /// the specific hardware systems current time, so that the
    /// recorder could retrieve system times.
    /// </summary>
    /// <returns>A <see cref="Int64"/> with the current time in
    /// milliseconds.</returns>
    long GetCurrentTime();

    /// <summary>
    /// An implementation of this method should show a hardware
    /// system specific dialog to change its settings like
    /// sampling rate or connection properties. It should also
    /// provide a xml serialization possibility of the settings,
    /// so that the user can store and backup system settings in
    /// a separate file. These settings should be implemented in
    /// a separate class and are stored in a special place of
    /// Ogamas directory structure.
    /// </summary>
    /// <remarks>Please have a look at the existing implemention
    /// of the tobii system in the namespace Tobii.</remarks>
    void ChangeSettings();
  }

The messages sent contain the GazeData structure which serves the basic input values for default screen based devices.

code:

  /// <summary>
  /// Gaze data structure with fields that match the database columns
  /// that correspond to gaze data. Its a subset of
  /// <see cref="Modules.ImportExport.RawData"/>
  /// </summary>
  public struct GazeData
  {
    /// <summary>
    /// Time in milliseconds from the start of the recording.
    /// </summary>
    public long Time;

    /// <summary>
    /// x-diameter of pupil
    /// </summary>
    public float? PupilDiaX;

    /// <summary>
    /// y-diameter of pupil
    /// </summary>
    public float? PupilDiaY;

    /// <summary>
    /// x-coordinate of gaze position in values ranging between 0..1
    /// </summary>
    /// <remarks>0 means left margin of presentation screen,
    /// 1 means right margin of presentation screen.</remarks>
    public float? GazePosX;

    /// <summary>
    /// y-coordinate of gaze position in values ranging between 0..1
    /// </summary>
    /// <remarks>0 means top margin of presentation screen,
    /// 1 means bottom margin of presentation screen.</remarks>
    public float? GazePosY;
  }

This is the current implementation that serves its purpose well, but for generalization in a standard API there are drawbacks:

The problem is its restriction to 2D, should we replace the X,Y values with degrees and add a DegreeToXY method in the ITracker interface ?
Its missing a confidence value and is not suitable for tracking devices that track both eyes.

I try to post a first GazeData structure that may serve as the basis for further discussion in the next post.

Adrian Voßkühler
Freie Universität Berlin
http://didaktik.physik.fu-berlin.de/projekte/ogama

Adrian Voßkühler
Re: Building up a new standard of eye-tracking API

Hi all,

This post is intended to be a first sketch for a gaze data structure that should be flexible enough for use in a wide range of applications.

The '?' indicates that this value is allowed to be null.

code:

  public struct GazeData
  {
    public long Time;
    public Eye Eye;
    public Gaze? LeftGaze;
    public Gaze? RightGaze;
  }

It contains several substructures as defined below:

code:

public enum Eye
  {
    None,
    Left,
    Right,
    Both,
  }

  public enum Validity
  {
    None,
    Poor,
    Uncertain,
    Good,
  }

  public enum Unit
  {
    None,
    Millimeter,
    Pixel,
    Degree,
  }

  public enum PupilDataType
  {
    None,
    XYDiameters,
    EllipseDiameters,
  }

  public struct Pupil
  {
    public Unit DiameterUnit;
    public PupilDataType DiameterType;
    public double DiameterOne;
    public double DiameterTwo;
  }

  public struct Gaze
  {
    public Pupil GazePupil;
    public Validity GazeValidity;
    public Unit GazePosUnit;
    public double? GazePosOne;
    public double? GazePosTwo;
  }

So what do you think ?

Best wishes,
Adrian Voßkühler
Freie Universität Berlin
http://didaktik.physik.fu-berlin.de/projekte/ogama/
adrian.vosskuehler@fu-berlin.de

oleg
oleg's picture
Offline
Joined: Sep 7 2009
Re: Building up a new standard of eye-tracking API

tersia wrote:

Hi Oleg,

As a researcher and developer of applications that use gaze as a pc input, I strongly agree that we need a standard ET API. So how do we go about achieving this?

We will start with the discussion on the API type, and then continue with the suggestions of it implementation. Just follow the discussion :)

oleg
oleg's picture
Offline
Joined: Sep 7 2009
Re: Building up a new standard of eye-tracking API

adrian wrote:

Hello,

Thanks Oleg for taking the initiative! I think this discussion is overdue and it will provide us with an easy to use, fast and reliable API sketch.

Adrian, thanks for the code. I am going to separate this discussion into several threads, so soon I'll move your posts to the threads best suited for their topic.

Actually, before we suggest here anything, I will present some "state-of-the-art", so that we all could see what is done already and what we can refer to.

Lars Hildebrandt
Re: Building up a new standard of eye-tracking API

Hi there,

I strongly encourage you not just to think about technologies like COM / low level API / platform dependency etc.
It's seem very important to me to introduce layers of abstractions on the data access and eye tracker control as well. A strict requirement engineering is necessary to not get lost in too much functionality but still be able to support power users which want maximum control.
Even here in this short thread we have a user which comes from the Gaze Interaction side. He wants data like desktop object selection from the eye tracker. Gaze Analysis guys want just gaze data (monocular or binocular) some researchers rely on the fixation/saccade algorithms that hardware manufacturers provide, some like to implement their own filters and event detection algorithms. The next one wants to works in a pupillometry application with pupil diameter only.
An API that provides one layer of data access complexity will either scare users that want to write just a tiny simple eye chess application or it will be insufficient for researchers who want everything, from head acceleration to pupil diameter and ms precise binocular gaze data with confidence measurement.
Anyway, first you should collect application fields you want to address. Can we assume that let's say researchers that usually write every analysis on their own in Matlab will not use the API to access raw pupil position. I have seen researchers that don't even rely on the the calibration of the eye tracker. They just accessed the raw pupil position in video pixels and implemented they own gaze mapping.
It sometimes helps not to think want you want but to also think about what you don't want.
The focus is on gaze interaction, right? The more functionality is put in the eye tracker the better? This would exclude a couple low tech systems that do only gaze x/y. Eye trackers that are designed for the gaze interaction market Tobii/Alea bring already a couple of convenient functions like snap in to desktop elements, cumulative dwell on desktop elements and mouse click and zoom functions and tracking status windows. Should all this be part of the API as well or should developers of gaze interaction applications do this.

Lars Hildebrandt

VP Development
www.alea-technologies.de

oleg
oleg's picture
Offline
Joined: Sep 7 2009
Re: Building up a new standard of eye-tracking API

Lars, you are absolutely right talking about incompatible needs from various ET users and layers of abstractions. Here (actually, in the "API type" topic) I'm trying to start with the simplest level: come-and-get-gaze-data. No video pixels, however, - too low already. I suppose, all other possible layers will be based on this basic level, and I already mentioned that (possible) specification of such levels is the next task.

Next I try to answer your questions and clarify some things about this venture.

I strongly encourage you not just to think about technologies like COM / low level API / platform dependency etc.

Sure, but this is important too.

Even here in this short thread we have a user which comes from the Gaze Interaction side. He wants data like desktop object selection from the eye tracker. Gaze Analysis guys want just gaze data (monocular or binocular) some researchers rely on the fixation/saccade algorithms that hardware manufacturers provide, some like to implement their own filters and event detection algorithms. The next one wants to works in a pupillometry application with pupil diameter only.

An API that provides one layer of data access complexity will either scare users that want to write just a tiny simple eye chess application or it will be insufficient for researchers who want everything, from head acceleration to pupil diameter and ms precise binocular gaze data with confidence measurement.

The idea was to specify flexible gaze data protocol: the richness of data depends on the subscription type. The one who subscribes for X/Y will get only these value, another one may subscribe for a bunch of values, if they are available. Sorry for not telling this fact before, as I was planning to present layer-based approach at the time we start discussing gaze data protocol.

Anyway, first you should collect application fields you want to address. Can we assume that let's say researchers that usually write every analysis on their own in Matlab will not use the API to access raw pupil position. I have seen researchers that don't even rely on the the calibration of the eye tracker. They just accessed the raw pupil position in video pixels and implemented they own gaze mapping.

My guess is a bit different: application fields are important thinks of some higher layers. Do manufacturers of mouse/keyboard/network card/monitors/printers think about it? I'm not sure... but I'm not sure about validity of thinking so, also. Anyway, this talk is not about standardization of detecting pupil from video frame, but about the way and form of receiving result of this procedure.

It sometimes helps not to think want you want but to also think about what you don't want.

Absolute true, and I'm thankful for you, Lars, bringing this up in the discussion. Thus, some fresh and sober-minded point of view, far away from our own views, is a good start for this discussion.

The focus is on gaze interaction, right? The more functionality is put in the eye tracker the better? This would exclude a couple low tech systems that do only gaze x/y.

It should not... If someone needs x/y only, then why not letting him use this device? He will subscribe for x/y only, and stay happy :)

Eye trackers that are designed for the gaze interaction market Tobii/Alea bring already a couple of convenient functions like snap in to desktop elements, cumulative dwell on desktop elements and mouse click and zoom functions and tracking status windows. Should all this be part of the API as well or should developers of gaze interaction applications do this.

Nо. My idea was to start with specification of API type, basic ET interface, and gaze data protocol/format, and once this specification is there, we may go further, if needed. All this discussion (at least, at the beginning) is about how to get data from VARIOUS devices - the focus here is to the access itself, rather to level of access.

Oleg

P.S. Nice comment! Others, please assume an attitude and keep the level of heat :)

Thiago Chaves de Oliveira Horta
Re: Building up a new standard of eye-tracking API

Maybe I expressed poorly when talking about layering and platform-independence.

Having a standard driver, such as the ETU is important so that different devices can be treated similarly, without much need to redo driver-handling code. If there's a difference of accuracy, number of eyes tracked, rate at which one's eye tracker information is updated or whatever else, it can be informed by the device itself.

I might have jumped too far ahead when I was talking about desktop behavior, but that still holds. Mapping eye-tracking information into mouse events is relatively straightforward, but it is a sub-optimal solution. I'll hold that thought for the moment and bring this up again once there is more advance in lower-level API standardization.

Oleg, could you provide more information about the ETU driver you're talking about? Is it freely available? Is it open-source? What kind of information does it gather from a device? Can you provide some links?

-Thiago

oleg
oleg's picture
Offline
Joined: Sep 7 2009
Re: Building up a new standard of eye-tracking API

thiago wrote:

If there's a difference of accuracy, number of eyes tracked, rate at which one's eye tracker information is updated or whatever else, it can be informed by the device itself.

Tiago, you have read my mind :). Device self-announcement is already in my specification I hold here in my laptop, but did not posted to this forum yet, as I'm trying to separate tasks and discuss them in different threads. The discussion on information about a device available via self-announcement will soon appear in the "Basic API intreface".

thiago wrote:

I might have jumped too far ahead when I was talking about desktop behavior, but that still holds. Mapping eye-tracking information into mouse events is relatively straightforward, but it is a sub-optimal solution. I'll hold that thought for the moment and bring this up again once there is more advance in lower-level API standardization.

Sure, we will pay attention to higher layers, as Lars suggested already. I just thought that we discuss API primitives first, and only then more to the things like gaze-to-object mapping and selection.

thiago wrote:

Oleg, could you provide more information about the ETU driver you're talking about? Is it freely available? Is it open-source? What kind of information does it gather from a device? Can you provide some links?

I have not placed the code anywhere, but is available on request. If you install ETU-Driver, you will have its manual, but is quite short and not mush informative. The best way is to contact me (for you - to come to my room) and get sufficient information on the question you have.

jochen
Re: Building up a new standard of eye-tracking API

just found out about this thread. Great initiative.

How about something along the lines of Adrian's suggestion? The gaze data structure looks OK, although I would use double for the timestamp (there are already trackers with sub-millisecond resolution / higher than 1 kHz sampling rate). The API specification is basic enough. I'm only a user, but have found that the listed functionality covers about 95% of my needs. I think C# is not really an option for cross-platform development though--why not stick to plain old C? Maybe look at BSD interfaces as examples of good design? KISS. You can always provide a higher level abstraction layer later.

In another thread there a 50 ms delay was mentioned as desirable; this is definitely too long for research purposes

my 2 cts, jochen

oleg
oleg's picture
Offline
Joined: Sep 7 2009
Re: Building up a new standard of eye-tracking API

jochen wrote:

just found out about this thread. Great initiative.

Thank you

jochen wrote:

How about something along the lines of Adrian's suggestion?

Coming soon in the "Basic API interface" topic.

jochen wrote:

In another thread there a 50 ms delay was mentioned as desirable; this is definitely too long for research purposes

Sure, but enough for online interaction. The fact is that nowadays eye-tracking APIs has the delay somewhat close to this value. A solution to eliminate this delay will be proposed in the "Gaze data" topic.

Lars Hildebrandt
Re: Building up a new standard of eye-tracking API

Hi Oleg,

oleg wrote:

Lars, you are absolutely right talking about incompatible needs from various ET users and layers of abstractions. Here (actually, in the "API type" topic) I'm trying to start with the simplest level: come-and-get-gaze-data. No video pixels, however, - too low already. I suppose, all other possible layers will be based on this basic level, and I already mentioned that (possible) specification of such levels is the next task.

I just want to throw up some balls, which might help you to fix some decisions in advance. It’s a decision to provide at the lowest level binocular gaze x/y in screen pixels or degree. You will exclude some researchers already from using the API. As mentioned before I met some who just wanted pupil position in video pixels and they did they screen mapping on their own. I am fine with the decision but I want to stress out that I think it matters from which application field you are coming. Researchers have a different view on the needs of an API than application developers for usability testing or gaze interaction applications. Depending who would design such an API it will have a flavor. For instance developers of AAC applications like TheGrid, Mind Express, Dasher, Viking will not even consider the API if you tell them here is gaze x/y please detect your own fixations, saccades, blinks and eye gestures. I have the feeling that it’s worth to think about higher levels as well before implementing the lowest level of data access. Will you rely on the manufacturers to provide abstract data? Or do you want to implement the functionality of such a layer on your own? Most of the researchers are conditioned to work with gaze x/y, but is this necessary? The reason probably is because users of eye trackers trusted the hardware manufacturers in the past to detect the pupil and calculate the gaze properly but they never trusted them to calculate fixations properly. On the other side many established hardware manufacturers where too lazy to increase the comfort of eye tracker usage. This is a very implicit decision to define gaze x/y as the lowest level of data access.

Another though which is to consider. The Swedish market leader doesn’t give you access to gaze x/y with half of their products unless you pay a fortune for the API. Other manufactures might follow this strategy. Gaze X/Y is common and essential for researchers but the trend goes to higher levels of data abstraction if you leave universities as application fields. Handicapped and gaze interaction market left the research/analysis market behind by numbers. If you want to bring the API out of that niche you should somehow reflect this by design in the API not just as an add-on put on top of the lowest level. Higher data abstractions layer don't just work on the standard gaze x/y data. Sophisticated event detection and eye gesture detetection algorithm take much more internal data from the image processing.

I don’t want to push anyone to do a direction. I just want to raise question that might help to fix decisions and broaden the field of view.

Lars Hildebrandt

VP Development
www.alea-technologies.de

oleg
oleg's picture
Offline
Joined: Sep 7 2009
Re: Building up a new standard of eye-tracking API

Hi Lars.

The expected result of accepting and following some ET API standardization, as I see it, will lead to spread and increase of applications and services based on eye-tracking technologies. This will (should :) ) lead to popularization of these technologies among various typed of users and should promote them buying ET systems. As a consequence, ET system sales should increase, thus making it real mass-market product. It happens, the price of ET system should change to somewhat affordable for majority of population.

Probably, I'm too bad in making marketing forecasts, but this my vision and believe.

lars wrote:

It’s a decision to provide at the lowest level binocular gaze x/y in screen pixels or degree. You will exclude some researchers already from using the API. As mentioned before I met some who just wanted pupil position in video pixels and they did they screen mapping on their own.

I'd like to stress that standard API is not going to replace 100% of the existing API:

oleg wrote:

this discussion is not against manufacturers, and not for drawing limits for them: rather, it is for asking them of an alternative (and standard) way of access and communication with hardware in addition to what they provide already.

When I initiated this talk, I already knew that someone will need just X/Y, and someone all possible data coming from ET, therefore I wrote about the planning topics to discuss:

oleg wrote:

- API type and structure: defining standard interfaces and services.
a. API type;
b. Basic interfaces/services;
c. Auxiliary interfaces/services: researcher approach;
d. Auxiliary interfaces/services: developer approach;

lars wrote:

For instance developers of AAC applications like TheGrid, Mind Express, Dasher, Viking will not even consider the API if you tell them here is gaze x/y please detect your own fixations, saccades, blinks and eye gestures. I have the feeling that it’s worth to think about higher levels as well before implementing the lowest level of data access.

Absolutely agree.

lars wrote:

Will you rely on the manufacturers to provide abstract data? Or do you want to implement the functionality of such a layer on your own? Most of the researchers are conditioned to work with gaze x/y, but is this necessary? The reason probably is because users of eye trackers trusted the hardware manufacturers in the past to detect the pupil and calculate the gaze properly but they never trusted them to calculate fixations properly.

I leave the implementation of higher layers for third-party developers. My guess was that once we get gaze data in some standard way, we may apply the algorithms implementing high-level functionality, discriminating them by the parameters of ET system. Or I am wrong?

lars wrote:

On the other side many established hardware manufacturers where too lazy to increase the comfort of eye tracker usage. This is a very implicit decision to define gaze x/y as the lowest level of data access.

Wow.. So what is the lowest? Intermediate variables of x/y calculation? Again, I expect and in future the majority of ET systems will be used for interaction with computers, thus once the x/y data is reliable, it should be OK as the lowest level.

lars wrote:

Another though which is to consider. The Swedish market leader doesn’t give you access to gaze x/y with half of their products unless you pay a fortune for the API. Other manufactures might follow this strategy.

Noticed it already... :hm:
Bad trend (for us, good for business), leading to a monopoly... Those who wish to develop they own interaction tools (for ET systems) are now left "out-of-business".

lars wrote:

Gaze X/Y is common and essential for researchers but the trend goes to higher levels of data abstraction if you leave universities as application fields. Handicapped and gaze interaction market left the research/analysis market behind by numbers.

And this is perfect!

lars wrote:

If you want to bring the API out of that niche you should somehow reflect this by design in the API not just as an add-on put on top of the lowest level. Higher data abstractions layer don't just work on the standard gaze x/y data. Sophisticated event detection and eye gesture detection algorithm take much more internal data from the image processing.

OK, this is something new for me... The reason I talked about x/y was that these value are always available from ET systems. Were available... If manufacturers will ask for additional huge payments just for the access to data, then this idea will die. But once more: this initiative is devoted for proposing an alternative (and commonly-accepted) way to communicate with ET systems. 300 downloads of ETU-Driver per 3 years shows that there is a need in such standard. And there many developers who are awaiting something like this. They are creating they gaze-contingent software that rely on samples (basic data layer) or fixations (next data layer), build they own layers (like gaze-gesture) and share them with other developers.

That was a good story. I'm glad to see the point of view of someone who is from a commercial side and knows things behind the curtain, so that naive romantics like me would not think they are capable to shift giants so easily and ask them for some good turn.

Now, after some "cold shower" and gazing into the business games, I feel like we may forget about having any kind of standard API in high-end systems (since "Sophisticated event detection and eye gesture detection algorithm take much more internal data from the image processing" and "...leader doesn’t give you access to gaze x/y with half of their products unless you pay a fortune for the API and other manufactures might follow this strategy", then there is no place for standards here, as these"standards"are commercial know-hows, right?), and propose it only for amateurs and academic people who are not thinking about $$$, but rather about making ET technology affordable and useful for plain people. If any commercial company grant us with the standard we will specify here, I will applaud!

oleg
oleg's picture
Offline
Joined: Sep 7 2009
Re: Building up a new standard of eye-tracking API

Lars, what kind of API Alea Technologies would support? Would it be based on many layers? If so, what are those layers, far high their would be? It would be very useful for us to know what commercial companies expect, and not only suggesting our own solutions. If you reply in details, please use "API type", "Basic API interfaces" and "Gaza data" topics.

Oleg

Lars Hildebrandt
Re: Building up a new standard of eye-tracking API

Hi Oleg,

Don’t get me wrong. I don’t want to be the showstopper. I am as enthusiastic as you are about API standardization but every thought and argument needs a cold shower if you don’t want to build on sand. Arguments that survive the first skeptic attack are more likely to be a solid basis for decisions.

oleg wrote:

Probably, I'm too bad in making marketing forecasts, but this my vision and believe.

I am sharing your vision but in addition I am saying if you want to get more people in the boat, not just the usual suspect from universities, I would by design enlarge the scope of the API. There are tons of applications out there in the AAC market that are waiting for eye tracker data input but the developers, which is often just one person per software, don’t have the time and the knowledge to do anything useful with just gaze x/y. They need more comfort if they should use a standardized API.

oleg wrote:

I leave the implementation of higher layers for third-party developers. My guess was that once we get gaze data in some standard way, we may apply the algorithms implementing high-level functionality, discriminating them by the parameters of ET system. Or I am wrong?

Without going into details but much high level functionality can’t be derived by a standardized set of parameters. State of the art eye trackers have tons of internal parameters like head speed, head acceleration, image contrast, facial features, pupil-iris contrast, iris structures, history of pupil data and confidences for all measures. High level algorithm for gesture recognition or fixation detection don’t just take gaze x/y they take a bunch more parameters. The published algorithms for event detection as Salvucci /Goldberg describe them work well low tech systems but Tobii/Alea/SMI remote tracking systems perform better with its own proprietary high-level algorithms.

The decision which is to make is: Do you want to enforce the hardware manufacturers to support very high level data access and make application developers happy? Or do you want to enforce high level functionality programming by third party developers on top of Gaze x/y or by the application developers. In the former case you burden a lot of work on the shoulders of enthusiastic 100$ eye tracker builders but I think application developers will benefit from this. In the second case (which is the status quo) you make the life of application developers hard and you waste some potential because as mentioned before, state of the art eye trackers will always work better with proprietary high-level algorithm than with the community high-level algorithms.

My feeling is that the API should support gaze x/y at its lowest level . This keeps the 100$ eye tracker builders in the race. But you/we should also design higher functionality in the API and leave that open the manufacturer to implement low-level and/or high-level functionality. This allows commercial companies to strategically decide what they want to support. But application developers can rely on the same set of functions. They might not work with every device but the API is the same.

oleg wrote:

then there is no place for standards here, as these "standards" are commercial know-hows, right?), and propose it only for amateurs and academic people who are not thinking about $$$, but rather about making ET technology affordable and useful for plain people.

I just want to point out that there is a gap between hundreds of applications in the gaze interaction market and the current gaze x/y interface. Closing the gap is the challenge and I think it’s doable. I have seen a couple of larger AAC application programmers struggling with 5-6 different APIs from eye tracker manufacturers. It took them almost a year in time to enable gaze control for their application. A standardized interface with a high-level comfortable data access (not just gaze x/y) would allow more application developers to do the same in much less time. Which would be a great thing for the whole technology.

Lars Hildebrandt

VP Development
www.alea-technologies.de

Lars Hildebrandt
Re: Building up a new standard of eye-tracking API

oleg wrote:

Lars, what kind of API Alea Technologies would support? Would it be based on many layers? If so, what are those layers, far high their would be? It would be very useful for us to know what commercial companies expect, and not only suggesting our own solutions.

I don't have to say that I am not neutral, do I?

Alea eye trackers at this time strategically support many applications. (Analysis and AAC) We rely on application developers accessing an API. Other companies which produce the hardware and the application don’t really have an interest in an open API because this would make both of their components exchangeable. We would gladly like to see more and more applications popping up with the support of our device even if it’s not supported exclusively.
The layers which would make researchers and gaze interaction programmers happy could be.

1. Raw data access (Gaze x/y, Pupil diameter etc.)
2. Event data access ( fixations, saccades, blinks, eye gestures, head events )
3. GUI interaction events ( fixations inside a GUI element, blink on a GUI element )
4. GUI action events ( button/region clicked, button/region looked at )

Case 3 and 4 looks similar but they aren’t. In case 4 the application doesn’t get anything but the information that the eye tracker engine now clicked somehow on a button. With blink or dwell. Case 3 leaves the activation up to the application.

-> Up to discussion

This approach could satisfy the need of researchers which want to dig into binocular gaze data for micro saccades and it could catch those hundreds of application developers which never have time but who want to easily add gaze support to their software

Lars Hildebrandt

VP Development
www.alea-technologies.de

oleg
oleg's picture
Offline
Joined: Sep 7 2009
Re: Building up a new standard of eye-tracking API

Hi Lars!

lars wrote:

I am as enthusiastic as you are about API standardization but every thought and argument needs a cold shower if you don’t want to build on sand. Arguments that survive the first skeptic attack are more likely to be a solid basis for decisions.

I see - you are posting to this forum close to midnight :). Without any irony: I can state that this discussion we are having with you is the most valuable on this forum so far, as directs to what commercial companies worry about and what vision they have. And you completely right - "next" API that is fine for researchers only, will have the same support from manufacturers as ETU-Driver has.

lars wrote:

I am saying if you want to get more people in the boat, not just the usual suspect from universities, I would by design enlarge the scope of the API. ...developers, which is often just one person per software, don’t have the time and the knowledge to do anything useful with just gaze x/y. They need more comfort if they should use a standardized API.

+10. The only main issue I see is how to make standard high-level gaze data analysis. However, there is a student in the next room here in UTA, who is going to do it. Lets see what will come out if this...

lars wrote:

oleg wrote:

I leave the implementation of higher layers for third-party developers. My guess was that once we get gaze data in some standard way, we may apply the algorithms implementing high-level functionality, discriminating them by the parameters of ET system. Or I am wrong?

Without going into details but much high level functionality can’t be derived by a standardized set of parameters. State of the art eye trackers have tons of internal parameters like head speed, head acceleration, image contrast, facial features, pupil-iris contrast, iris structures, history of pupil data and confidences for all measures. High level algorithm for gesture recognition or fixation detection don’t just take gaze x/y they take a bunch more parameters. The published algorithms for event detection as Salvucci /Goldberg describe them work well low tech systems but Tobii/Alea/SMI remote tracking systems perform better with its own proprietary high-level algorithms.

That was a perfect explanation! Lars, your impact to this discussion becomes enormous :)

lars wrote:

The decision which is to make is: Do you want to enforce the hardware manufacturers to support very high level data access and make application developers happy? Or do you want to enforce high level functionality programming by third party developers on top of Gaze x/y or by the application developers. In the former case you burden a lot of work on the shoulders of enthusiastic 100$ eye tracker builders but I think application developers will benefit from this. In the second case (which is the status quo) you make the life of application developers hard and you waste some potential because as mentioned before, state of the art eye trackers will always work better with proprietary high-level algorithm than with the community high-level algorithms.

Good questions... Starting this discussion I was not aware of involvement of image processing into high-level interaction, so it is a time to think about it.

lars wrote:

My feeling is that the API should support gaze x/y at its lowest level. This keeps the 100$ eye tracker builders in the race. But you/we should also design higher functionality in the API and leave that open the manufacturer to implement low-level and/or high-level functionality. This allows commercial companies to strategically decide what they want to support. But application developers can rely on the same set of functions. They might not work with every device but the API is the same.

That was in my mind too. Maybe, I did not imagine clearly how it should be (and this is why I initiated this discussion), but I was thinking about various layers too (for example, I was thinking not only about X/Y, but also about somewhat similar to MyTobii MPA SDK... but much richer...). Not I feel that my mistake was to think that these layers will be one on top of another (or, at lease, on top of the basic layer), and therefore they will appear as a separate API, not the one provided my manufacturers (like basic API).

lars wrote:

I just want to point out that there is a gap between hundreds of applications in the gaze interaction market and the current gaze x/y interface. Closing the gap is the challenge and I think it’s doable. I have seen a couple of larger AAC application programmers struggling with 5-6 different APIs from eye tracker manufacturers. It took them almost a year in time to enable gaze control for their application. A standardized interface with a high-level comfortable data access (not just gaze x/y) would allow more application developers to do the same in much less time. Which would be a great thing for the whole technology.

I love this discussion :) Your post shoots to the heart (or apple?). I hope you'll stay here helping us to rich this vision.

oleg
oleg's picture
Offline
Joined: Sep 7 2009
Re: Building up a new standard of eye-tracking API

lars wrote:

1. Raw data access (Gaze x/y, Pupil diameter etc.)
2. Event data access ( fixations, saccades, blinks, eye gestures, head events )
3. GUI interaction events ( fixations inside a GUI element, blink on a GUI element )
4. GUI action events ( button/region clicked, button/region looked at )

That was the intention. Sorry for not telling this initially. I thought, we discuss it gradually...

lars wrote:

lars wrote:

I have the feeling that it’s worse to think about higher levels as well before implementing the lowest level of data access.

I hope everyone got the message. By firefox integrated spell checker decided to prefer worse instead of worth.

I corrected the typo and removed the post about it.

Lars Hildebrandt
Re: Building up a new standard of eye-tracking API

Hi Oleg,

regarding the different layers of data access. Let's do a top down approach.
Imaging you are a poor programmer of a communication software or a tiny game like eye chess, whatever. You know how to program a windows application. You can make your software to react on "MouseEnter", "MouseLeave", "MouseDouble-Clicks", "MouseSingle-Clicks". You want to make your application gaze aware. You have no clue about dwell, blink, saccades etc. You don't want to deal with this because this is the job of people you know about the human cognition and the physiology of the eye. This is the job of eye tracking device manufacturers.

The highest level of comfort you are expecting is something similar to what you are getting from the mouse if the eye tracker wants to be an alternative mouse input.

* GazeEnter ( a GUI element )
* GazeLeave ( a GUI element )
* GazeActivation ( user did a selection on a GUI element )
* Pause/UnPause
* Method-To-Tell-The-API-where-GUI-Elements-are.

This is all I can think of at first. But more you don't want to know about eye tracking as application developer.
An api must be as simple as above if you want to enlarge the number of users working with the API.
Even the animation of dwell feedback is the job of the eye tracker manufacturer because they know which feedback is distractive and which is not.

This is the highest level. The lowest is of course something that give you binocular gaze x/y etc. In between this two levels there could be more layers that allow you access to intermediate processing steps like fixations, saccades and blinks.

Lars Hildebrandt

VP Development
www.alea-technologies.de

oleg
oleg's picture
Offline
Joined: Sep 7 2009
Re: Building up a new standard of eye-tracking API

lars wrote:

* GazeEnter ( a GUI element )
* GazeLeave ( a GUI element )
* GazeActivation ( user did a selection on a GUI element )
* Pause/UnPause
* Method-To-Tell-The-API-where-GUI-Elements-are.

Reminds MPA SDK, doesn't it :)? Yes, I had the same vision. I think, I have mentioned that initially I though to design this kind of layer to be implemented as auxiliary interface (thus, to be implemented by someone else, not by manufacturers), but once you said X/Y is not enough to make the best implementation of this layer, then this layer will go into "basic" API.

However, I still think of letting set some layer as "optional". It means, that the manufacturer's API will contain a method to get info about the layers implemented in their API. Same optional layers can be implemented (based on X/Y) by third-party developers, so that if the client application will not find the desired layer in OEM API, then it may try to search in other modules (declared as part of ETUDE) for the same layer.

lars wrote:

Even the animation of dwell feedback is the job of the eye tracker manufacturer because they know which feedback is distractive and which is not.

Sure, it can. Like in MPA. But there should be a way to control it (on/off).

I'll try to summarize all this discussion and post some decision about ET API later this week.

Lars Hildebrandt
Re: Building up a new standard of eye-tracking API

oleg wrote:

lars wrote:

* GazeEnter ( a GUI element )
* GazeLeave ( a GUI element )
* GazeActivation ( user did a selection on a GUI element )
* Pause/UnPause
* Method-To-Tell-The-API-where-GUI-Elements-are.

Reminds MPA SDK, doesn't it :)? Yes, I had the same vision. I think, I have mentioned that initially I though to design this kind of layer to be implemented as auxiliary interface (thus, to be implemented by someone else, not by manufacturers), but once you said X/Y is not enough to make the best implementation of this layer, then this layer will go into "basic" API.

Sounds like a good strategy. I can name you dozens of gaze interaction applications developers which are desperatly waiting for an interface like that. Even if not every hardware manufactorer will support this API level I am sure they will gladly use it and bind their application to the major manufactorers.

Lars Hildebrandt

VP Development
www.alea-technologies.de