API Type

19 replies [Last post]
oleg
oleg's picture
Offline
Joined: Sep 7 2009

One of the first tasks in eye-tracking API specification is the specification of API type. Various technologies could be applied for API implementation, thus we need to find the best suitable. Lets first observe what API types were created by some manufacturers.

oleg
oleg's picture
Offline
Joined: Sep 7 2009
Re: API Type

The general architecture of API is shown below.

The blocks “High-level API (COM)” and “Dispatcher” are optional. A client application either calls functions from low-level DLL, or use methods of the COM objects created from high-level API (COM library). Developers that connect to a device via low-level libraries only know that API DLL location and version is somewhat they should care about and it causes various problems when deploying their solutions to their customers. High-level API is usually free of this sort of problems.

The low-level API is the necessary component of all eye-tracking SDKs. The most popular type of connection between low-level API and hardware driver (or dispatcher) is the TCP/IP connection (via WSOCK32/WS2_32). I am aware of only one low-level API (out of 5) that uses direct file reading/writing (via “COM” port), rather than this method.

At least one API uses dispatcher that allows multiple connections from multiple clients. It is implemented as Windows service. I am not sure that other APIs, without explicit dispatcher, allow multiple connections. Having multiple connections to the save device is the necessity, as there could be more than one application (process) that uses eye-tracking data at the time.

P.S. If someone knows about other types of eye-tracking APIs, or I not correct in my descriptions, make us aware :)

oleg
oleg's picture
Offline
Joined: Sep 7 2009
Re: API Type

And here is our first task: prepare requirements and recommendations concerning API type. I start it with the list of my expectations, others please correct/improve it if necessary.

1. API must be implemented using the technology fast enough. However, it is hard to say what is "enough". My own expectation here is that samples are delivered to client applications with the delay max 50 ms. I guess, human motor reaction to any visible stimuli is longer that this threshold. My experience shows that the existing solutions (TCP/IP connection between driver/dispatcher and low-level API; wrapping low-level API into COM objects) are good enough, but others may exist too, no strict restrictions here...
2. API must be implemented in a way that is it visible globally (on a PC) and no files from SDK must be copied into the folder where client application is installed. Again, COM technology seems the best here, IMHO, although I have heard some critics about it (mainly, due to the too long delays) and suggestion (not quite well argued, however) for other API types. The way I see it:

* During eye-tracking API installation, it registers itself into the HKEY_LOCAL_MACHINE\SOFTWARE\ETUDE\Devices\ by creating a key with the name of the CLSID of the class that implements basic eye-tracking interfaces. For example:

'['HKEY_LOCAL_MACHINE\SOFTWARE\ETUDE\Devices\{00000000-000-0000-0000-000000000000}']'
@ = "EyeTrackerName"

* Client applications or helpers (auxiliary ETUDE classes) will read this registry key to obtain a list of devices installed and create an object of the CLSID specified that implements a basic eye-tracking interface (to be discussed in a separate topic).

Any better suggestions on this topic? Would it be possible to use COM technology in Linux/Mac? If not, what could be other (cross-platform) solutions that satisfy the requirements listed above?

gdaunys
Re: API Type

As I know COM is only Microsoft technology that could be used in Microsoft Windows. Similar platform independent client-server technology is CORBA. Also to the same group often is assigned Java Remote Method Invocation (Java RMI) technology. But seems that nowadays they are obsolete. In textbooks about Microsoft .NET often is written that .NET remoting has advantages against COM. And exist MONO project, which is proposing .NET methods implementations for other platforms.

oleg
oleg's picture
Offline
Joined: Sep 7 2009
Re: API Type

gdaunys wrote:

In textbooks about Microsoft .NET often is written that .NET remoting has advantages against COM. And exist MONO project, which is proposing .NET methods implementations for other platforms.

Indeed, something to keep in mind... I'll try to dig it a bit.. whether it suits for API development

oleg
oleg's picture
Offline
Joined: Sep 7 2009
Re: API Type

Some words about the communication architecture. In my mind, it looks like on the image below:

- “Driver + COM library” should be developed by hardware manufacturer and should be ALWAYS shipped with the device. It implements all basic ETUDE interfaces.
- Third-party developers may create “ETUDE Manager”. It implements auxiliary ETUDE interfaces and provide layered (as Lars noted) access to various services (gaze-to-object mapping, etc).
- End-user applications are clients of “Driver + COM library” and/or (optionally) “ETUDE Manager” software.

The proposed architecture allows two alternatives for communication between end-user application and API: direct or indirect. The direct communication way is rather close to what it is today. The indirect way lies via ETUDE Manager. The ETUDE Manager serves as an intermediate layer that implements the most common actions that end-user applications would apply while communicating with a device directly: implementation of dialog boxes to adjust settings, stream gaze data to disk, implement fixation detection algorithms, snap mouse to gaze, check data quality, etc.

gdaunys
Re: API Type

I want accent diversity of eye trackers hardware.
If we consider other input devices (for example mouse), here really is hardware component of the same class and a driver, which allows for applications communicate with input device. If PnP device is attached to computer bus, it translates device identifier. By the identifier OS defines to which class device belongs and driver for device is launched (or is asked to insert disk with driver if it is absent).
In case of videooculographical eye tracking systems the attached to computer bus device is camera. By its identifier a camera driver is launched. An eye tracker is a system of hardware and software components. So we must have in mind that real driver for hardware is camera driver, which is used by image processing software. Image processing software using calibration data calculates point of gaze coordinates and transfer them to applications or to some intermediate software component as COM server.
Perhaps already exist a few systems where gaze data is produced by hardware. For example it can be EOG systems. I believe that in nearest future will be created hardware VOG systems ( in principle image analysis software will run on smaller scale computer without user interfaces).
Other case, when gaze data are produced in other computer and transfered by LAN to the computer, where applications run.
An architecture of API must take into account all this diversity of eye trackers, because data way to ETUDE Manager is different.

Adrian Voßkühler
Re: API Type

Think of the Tobii T60/T120 systems, their hardware has already a computer without user interface integrated into the LCD-Display, so what is plugged into the end user system is only the ethernet connection. The data is accessed via low and/or high level apis reading the tcp stream.
So with such a system there is no option to be automatically registered via PnP, because the only connection is via TCP.

But I think it would be a "nice to have" to be eyetracking hardware treated like any other default group of devices (monitors, mice). So that the manufacturer itself could implement a PnP installation as basic eyetracker (as Oleg said e.g in \ETUDE\Devices\ and additionally as a mouse).

Adrian

oleg
oleg's picture
Offline
Joined: Sep 7 2009
Re: API type

adrian wrote:

But I think it would be a "nice to have" to be eyetracking hardware treated like any other default group of devices (monitors, mice). So that the manufacturer itself could implement a PnP installation as basic eyetracker (as Oleg said e.g in \ETUDE\Devices and additionally as a mouse).

I seems for me, that PnP approach will not work for eye-trackers, since

gdaunys wrote:

In case of videooculographical eye tracking systems the attached to computer bus device is camera. By its identifier a camera driver is launched. An eye tracker is a system of hardware and software components. So we must have in mind that real driver for hardware is camera driver, which is used by image processing software. Image processing software using calibration data calculates point of gaze coordinates and transfer them to applications or to some intermediate software component as COM server.

Moreover, MS Windows does not support eye-tracking devices on a level it supports mouse and keyboard, thus the access to ET devices will be via OEM libraries, not via nt.dll, kernel.dll or user32.dll. As long as eye-tracking system is a combination of hardware and software, it is not gonna be PnP, IMHO.

It seems that we have the same in mind, and I guess that everyone stay happy if there will be some commonly-accepted way for an ET system to a) announce itself and its features, and b) expose its services (functions).

gdaunys wrote:

I want accent diversity of eye trackers hardware.

Sure, but the role of the API standardization is to left behind such API peculiarities of each system, so that access to it and data capturing is fairly easy and straightforward.

What seems also important for me, is that in future we may have many ET systems on a single machine, and many clients connected to the same or distinct systems. Thus many-to-many. Therefore the next expectation from an API is:

3. API allows many-to-many connections. For example, there are T60 and X120 in the setup, and App1 and App2 clients running simultaneously. Each client gathers data from both devices simultaneously (each client contains several components, each component is communicating with a single device only). And such a setup should work. And there would be possible very complex setups:

One may notice that Tobii SDK is organized this way already, thus I would encourage others to follow this way :). This expectation does not tell how to implement such functionality in API, is explains its features only.

oleg
oleg's picture
Offline
Joined: Sep 7 2009
Re: API Type

gregory wrote:

I vote for COM because COM objects can be used by applications other than traditional programs, such as Matlab. I will be glad to present for the discussion the COM interface that ASL presently provides for the access to real time eye tracking data.

Hi Gregory.

So far, I vote for COM as well. To minimize efforts transferring the API into Linux/MAC, I would suggest first implementing low-level library using plain C/C++, so that it would be completely cross-platform solution. And then to write COM-wrapper for this library. The way an ET system declares itself in Windows was mention already (via a special key in Registry)

However, we should also provide recommendation for the way ET system declares itself in Linux/MAC, thus becoming visible globally. I have no any experience of coding for Linux/MAC, so I ask other to help me solving this question.

Lars Hildebrandt
Re: API Type

oleg wrote:

So far, I vote for COM as well. To minimize efforts transferring the API into Linux/MAC, I would suggest first implementing low-level library using plain C/C++, so that it would be completely cross-platform solution.

oleg wrote:

- “Driver + COM library” should be developed by hardware manufacturer and should be ALWAYS shipped with the device. It implements all basic ETUDE interfaces.

I am not getting all the points. If the lowest interface to the manufacturers modules is the Driver + COM library. How will that be platform independent? If you standardize the COM interface no hardware manufacturer will provide another non-windows interface. Because there are virtually no customers for non-windows eye tracking applications. If you force manufacturers to provide a platform independent common interface and implementation they will probably join the initiative very intractable.

You should think about if you want platform independency and whats the price (in time and complexity) for this would be. Again define your main users groups and applications, do some research what they expect from an API and prioritize features.

Lars Hildebrandt

VP development
www.alea-technologies.de

oleg
oleg's picture
Offline
Joined: Sep 7 2009
Re: API Type

Right, COM is for Windows only. I think, this specification will be for Windows OS only... at least, its first version.

My suggestion was just to keep in mind (for manufacturers) that one day they would like to make their products available on other OSes. But no any restrictions for the way of implementing standard API for Windows...

oleg
oleg's picture
Offline
Joined: Sep 7 2009
Re: API Type

Here is the first draft of API assumptions and requirements:

API assumptions

  • This specification targets primary to HCI researchers and developers, but others may benefit from it as well.
  • This specification deals with gaze data passive (collecting) and active (interaction) usage.


API requirements
  • Fast:
    API must be implemented using the technology fast enough, i.e. samples are delivered to client applications no later than 50 ms after a video frame was captured. Most of the today existing solutions (TCP/IP connection between driver/dispatcher and low-level API; wrapping low-level API into COM objects) are good enough, but others may exist too, no restrictions here.
  • Accurate timestamping:
    Tmestamping is at least of millisecond precision and is presented as “double”-type value in seconds (or, alternatively, two integers - for second and microsecond parts). One of the system information fields should report about the timestamp retrieval moment: for example, -2 value for system that timestamps data by the frame start time, -1 for system that timestamps data by the frame middle time, 0 for system that timestamps data by the frame end time, and any positive value to denote typical latency (in milliseconds) between frame end time and the moment of timestamp retrieval.
  • Allows many-to-many connections:
    Each client application can be connected to different eye-tracking systems in parallel and receive data, and each API can serve for multiple client application. Moreover, the standard should support systems that may track more than one person at a time.
  • Visible globally:
    There is an ETUDE Eye-Tracking System Manager (ETSYM) implemented in each API, that knows how to detect certain types of eye-tracking systems. Each ETSYM installed on a PC declares its own name, names of manufacturers and their systems (with their versions) that it recognizes. ETSYM manages the lists of systems available currently, and provides this list to clients on request. It also informs clients when some systems becomes available/unavailable. The list of ETSYMs installed on a PC is available via some global resources.
  • Auto-scan:
    ETSYM scans for the systems available and create a list of them. It exposes a function that establishes a connection between client application and ET system. This function takes a system ID as parameter and returns descriptor of ETUDE Eye-Tracking System (ETSYS) the client now is connected to. System ID is either a) the ID value from the list of available systems, or b) the value received via an event that informs client that a new system becoming available.
  • Self-informative:
    ETSYS exposes properties that return manufacturer name, system name, system version, and a number of possible system configurations. There should be at least one default configuration for each system. The default configuration has “0” value. ETSYS also exposes a property to get/set current configuration, and a function that that fills a predefined structure of the basic ET system characteristics (such as sampling frequency, mix/max working distance, etc.) given the configuration index (the structure also contains the name of a given configuration). ETSYS provides functions to list, get and set system-specific parameters (not classified, addressed by name).
  • Notification of state change:
    Each API has 4 variables to indicate system state, and it send notifications to clients when any variable changes
    AvailableConnectedCalibratedTrackingDescription
    no---System is not available
    yesno--System is available but not connected to the client application
    yesyesno-System is connected to the client application, but is not calibrated yet
    yesyesyesnoSystem is connected, calibrated and ready to deliver data
    yesyesyesyesSystem sends data to the client application
  • Easy-to-calibrate:
    Each API provides two means to calibrate a system. First mean is a simple function that runs all the calibration routine and returns after the routine is finished (either successfully or not). The second mean is a custom calibration (client application shows stimuli and informs API about new calibration point ready to add). Each ETSYS provides (optionally) an access to ETUDE Eye-Tracking System Calibration, ETSYC (if custom calibration is not available, the corresponding function returns NULL). API also has a mean to indicate how good the current or ready-to-set calibration is (for example, via a read-only property that returns a value between 0 (not calibrated) and 1 (ideal calibration)).
    API stores a number of named calibrations that can be set as a current calibration. There are functions that a) return the number of available calibrations (can be 0, if this feature is not implemented), b) get name of each calibration in the list, c) add the current calibration to the list of calibration (take a name as a parameter), and d) set a calibration as current given its index.
  • Leveled tracking quality report:
    API provides means to inspect tracking quality. Ideally, it has three levels of report:
    1. numerical: a number between 0 (no valid samples) and 1 (all samples are valid) is accessible to client application via callback function (estimation based on data collected within a certain time interval)
    2. simple GUI component: small GUI component to be placed into a client application to stay aware about tracking quality while tracking is on.
    3. full-featured GUI component: somewhat similar to the existing Tobii’s TrackStatus component.
  • Layered data:
    The richness and meaning of data that client application receives depends on what data layer it announces as data source. A client can subscribe to use many layers simultaneously. Possible layers (events): a) simple/complete/binocular sample, b) gaze event (fixation/saccade/smooth pursuit/blink) start/update/end, c) head event (?) start/update/end, d) gaze-awareness event (GUI object entering/leaving), e) gaze-control event (GUI object selection/click/double-click/???), f) interaction event (gesture/dwell/?) start/update/end. Each API manifests what kind of layers it implements. Before changing the system state into “Tracking=yes”, each client should subscribe to receive data of a certain layer. Each data packet contains a) data validity value between 0 (invalid) and 1 (absolutely valid), and b) time-stamp.
  • Synchronization:
    Each API has a function that returns current ET system internal clock value (timestamp).
  • GUI for options adjustment:
    Each API has a function that shows a dialog with ET systems options.
oleg
oleg's picture
Offline
Joined: Sep 7 2009
Re: API Type

API requirements: "Leveled tracking quality report" and "Accurate timestamping" added

Lars Hildebrandt
Re: API Type

Hi Oleg,
good summary. It's a start of a discussion.

Could you please explain a "many-to-many connections" scenario? I am lacking fantasy to find use case where N eye tracking devices are operated at the same time on one computer. I can imaging use cases with one eye tracker and N clients, but the many to many connection seems cumbersome.

Lars Hildebrandt

VP Development
www.alea-technologies.de

oleg
oleg's picture
Offline
Joined: Sep 7 2009
Re: API Type

Hi Lars.

Imagine a huge screen, taking 90-120 degrees of your view-field. One device will hardly cover all the range of gaze directions, but using two devices (not necessary from the same vendor) may solve this problem. Imagine also that there are two applications running on a PC and a user works with them in parallel. Both application have to receive gaze data from both devices... Sounds a bit odd, but I'm trying to predict a future :)

However, this requirement is not strict: ET system may deny a connection from a client, if another application have established it already.. so this is more "suggestion" rather than strict "requirement".

oleg
oleg's picture
Offline
Joined: Sep 7 2009
Re: API Type

...and here is the example from life:

Just yesterday here we were discussing about implementing a tool for our research. We are going to study the case of text entry (usual, not by gaze) and reading, having one person per task: one is typing a text, another one is reading the product. The software runs on a single machine, but uses two screens: one for writer, one for reader. The text that is shown for reader is not "raw", i.e. the software tries not to show words while they are in typing or editing process, but rather complete result.
And in our study we are going to track eye movements of both writer and reader. Obviously, we need two eye-trackers, connected to the same machine, and a single tool will collect data.

I'm not sure we can implement such a tool with our current ET systems...

Lars Hildebrandt
Re: API Type

Hi Oleg,

this example is odd. Of course it's probably a useful study. There will be plenty of more example of experiments where you have additional stimulus and subject data. It would be always useful for that specific case if an API would do the job of an analysis software that is in charge of data synchronization. You have a very specific problem. The synchronization of two eye trackers. I would not try to solve that problem with a general purpose API. The next researcher will ask, I want to use a additional brain interface in my experiment, can I get synchronized data from that API as well? You can pack much more stuff in the API. If you ask 10 more researches you will get 10 more different wishes what the API should do.

But by doing this you increase complexity and no Visual Basic developer of an OnScreenKeyboard will ever consider to look at the API and implement eye tracker support for his application.

My suggestion is, make the API as easy as possible. Fortunately modern eye trackers became that easy in the usage that application developers don't have to be engineers anymore to operate an eye tracker. Make the API as easy to program, so that enthusiastic hobby programmers don't struggle with the next barrier. Remember, those commercial AAC application developers are often just a one men show. They only have two days time to add a feature like direct eye tracker support to the application.

Lars Hildebrandt

VP Development
www.alea-technologies.de

oleg
oleg's picture
Offline
Joined: Sep 7 2009
Re: API Type

OK, I agree that this is a very specific case, and API should be as simple as possible. But I mentioned that this requirement is "weak" and comes more like a recommendation: in future, it might be useful to have several synchronized devices. Anyway, violation of this recommendation will not harm compatibility with standard API...

oleg
oleg's picture
Offline
Joined: Sep 7 2009
Re: API Type

I was trying to see how other tools may be organized to be globally visible. I have decided to discover how video-compressing codecs are listed in Windows registry. And here what I found:

Video-compressing applications read the "HKEY_CURRENT_USER\Software\Microsoft\ActiveMovie\devenum\{33D9A760-90C8-11D0-BD43-00A0C911CE86}\" key to obtain a list of encoding formats installed into the system. "{33D9A760-90C8-11D0-BD43-00A0C911CE86}" is CLSID of ICM Class Manager, that is one of ActiveMovie Filter Categories. Each video encoder has a reference to AVI Compressor "{D76E2820-1563-11CF-AC98-00AA004C0FA9}", the class that is used to compress video.

For example, I have "Cinepak Codec by Radius" video-compressing codec on my PC. It appears as HKEY_CURRENT_USER\Software\Microsoft\ActiveMovie\devenum\{33D9A760-90C8-11D0-BD43-00A0C911CE86}\cvid. This key (and all others in this folder) contains the following values:
"ClassManagerFlags=[0|1]"
"CLSID={D76E2820-1563-11CF-AC98-00AA004C0FA9}"
"FccHandler="
"FilterData="
"FriendlyName="

We could use similar approach: each ET system registers its browsing class into "HKEY_LOCAL_MACHINE\SOFTWARE\ETUDE\Browsers\" (creates a key with CLSID).
I wonder, whether it is worth to specify additional values for each browser, such as "Name" and "Version" values... and another key, "Systems", that is populated by keys with CLSID (that implement IETUDE_System interface) of supported systems.

Or is it better to do the other way round: each ET system registers its system class into "HKEY_LOCAL_MACHINE\SOFTWARE\ETUDE\Systems\" (creates a key with CLSID), and adds "Name", "Version" and "BrowserCLSID" values? Or both?

Imagine you are the developer of gaze-aware application. You may want to code a ET-system searching function that creates a list of systems available and let a user to select the active(s) systems. I'd better read the list of browsers and get the list of systems each has found. But what if two browsers will find the same system? Probably, I should organize a search in my list of ET system and check whether the system in on the list already. Thus, the first solution seems to be better. However, none of the values (Name, Version and Systems) is useful for me as a client developer.

If someone has comments on this, please share.

Oleg

oleg
oleg's picture
Offline
Joined: Sep 7 2009
Temporal lock

This branch is temporally locked due to hard spamming. Please, contact administrator if you wish to post any message here.