WP 1: Learning and Estimation of Compositional Representations
of Objects from Visual Data
To develop a new representation of visual object categories that surpasses the current state of the
art representations in terms of generality, scalability and robustness.
The representation will follow the principles of compositional hierarchies and will include both 2D and
3D information making explicit the shape properties at various level of details relevant for
accomplishing the required robotic tasks.
Statistical learning will proceed layer by layer capturing the regularities of the visual data and forming
visual vocabularies that increase in complexity and abstraction.
Compositional hierarchical structure of the representation will enable scalable learning exploiting
transfer of knowledge and scaffolding.
The learned representation (visual vocabulary) will enable efficient real time detection (due to efficient
indexing and matching), and robust performance in cases when the input information is unreliable or
Building on the prior work by Leonardis and co-workers and Piater and co-workers, we will focus on
learning 2D compositional hierarchical models from multiple viewpoints, and then on learning a 3D
compositional shape vocabulary which will be subsequently bound with 2D shape vocabulary.
The representations developed/learned in this workpackage will be further augmented with multi-modal
information actively acquired under uncertainty and then explored in grasping and dishwasher
scenario related tasks.
WP 2: Multi-modal Compositional Representation of Objects
Integrate visual and non-visual modalities into a single representation, producing unified, rich,
compositional object and scene representations.
Non-visual features include haptic properties such as local surface shape and friction, as well as
grasp parameters that associate objects and parts with how to grasp them.
In all cases, a crucial innovative aspect is the association of these features with the appropriate
level of the compositional hierarchy, and the composability of higher-level feature from lower-level
To unleash the full potential of this compositionality, the hierarchical representations should be
structured in a way that the components correspond to meaningful parts. The construction of
compositional hierarchies driven by graspable parts will be addressed. Finally, this will put learned
compositional object models to use by creating scene models in terms of instantiated object models.
These scene models will form the basis of the reasoning process of workpackage 3.
WP3: Active Haptic and Visual Information Gathering Under Uncertainty
Develop proper methods for active information gathering.
This requires the development of specific reactive techniques for control of haptic information
gathering from objects; methods for planning with such techniques which attempt to optimise the rate
at which the surface properties of the object are estimated; and methods for actively controlling gaze
so as to support grasping activities.
The result should be a set of techniques able to actively gather information that is fed into the
compositional hierarchical representation of the object.
WP4: Grasping Under Uncertainty
Develop methods of control grasping actions given a detection of an object using the compositional
There will be three stages of grasping: moving from the hand pre-shape to the first object contact;
performing an incipient grasp and assessing the first order properties of the surfaces of the object;
actual grasp acquisition; planning grasps under pose and shape uncertainty.
We will develop:
i) a system that moves systematically between these stages,
ii) a method for grasping based on solving a belief state, where the low level actions of the robot will
be either generated by a path planner, or given by the behaviours.
The proposed approach takes advantage of the efficiency, scalability, and robustness of the
compositional representation to describe objects in a cognitive way, and allows this knowledge to be
transferred to the haptic actions that are to be performed on the object by interacting with it at
various levels. The information gained through interaction, i.e. first and higher-order properties of
the object surface patches, friction and mass distribution, in their turn, are accumulated in the
compositional framework, allowing information organized in the various nodes to be complemented
and reasoned upon in a natural way.
Moreover, figures about force-closure properties of the incipient grasp to be performed on an object
will be profitably employed within a probabilistic setting.
WP5: Scenario based evaluation
Evaluate the work of WPs 1-4 on sub-tasks from a dishwasher loading scenario.
The work to be done will involve integration of subsets of the components into three robot systems,
each of which is a bi-manual system based on Kuka Lightweight arms, and a common head with two
pairs of cameras (narrow and wide field of view).
The systems will all employ the compositional representation developed in WPs 1 and 2.
The objective of this workpackage is to ensure smooth running of the project scientifically and in terms
of project management issues such as procurement, personnel recruitment, timely production of
periodic reports, including financial reporting, and implementation of the consortium agreement.
Disseminate the work of the project as widely as possible to three different communities for three
The first is the academic community in robotics and computer vision, in order to allow other scientists
to build on our work as effectively as possible.
The second is the general public, to give them an appreciation of the achievements of the project in
layperson’s terms, and to understand the challenges involved in flexible object manipulation.
The third is to potential European industrialists who may have an interest in taking up either the
publically accessible results, or entering into a future exploitation agreement with any of the partners
for exploitation of their foreground Intellectual Property. The main objective of this project is
scientific, and commercial exploitation activity is not fundable under this instrument, but we consider it
a worthwhile activity to make specific efforts to disseminate the results to both the general public and