*** UPDATE November 17th, 2022: Semantic Heads in HuRIC ***
The linguistic interpretation of commands are extended to make the Semantic Head of each argument explicit.
In a command, e.g. "take the red mug next to the keyboard", where "the red mug" corresponds to the argument Theme
, the Semantic Head is useful in recognizing only the semantic category of a phrase, i.e. "mug", which is the main carier of the meaning, instead of the entire span. This allows to define an additional evaluation type, in which only the Semantic Heads are considered, enabling a wider usage of robotic action primitives. For a robotic function only the Semantic Head ("mug") may be required to execute an action. The example below was updated accordingly.
Moreover, additional entities were added to the Semantic Map, enabling a full grounded interpretation.
HuRIC (Human Robot Interaction Corpus) is a resource that has been gathered as a collaboration between the Semantic Analytics Group (SAG) from the University of Roma, Tor Vergata, and the Laboratory of Cognitive Cooperating Robots (Lab.Ro.Co.Co.) at Sapienza, University of Rome. The basic idea of this project is to build a corpus for Human Robot Interaction in Natural Language containing information that are yet oriented to a specific application domain, e.g. the house service robotics, but at the same time inspired by sound linguistic theories, that are by definition decoupled from such a domain.
HuRIC is designed to enable the Grounded Language Interpretation of robotic commands, i.e., make the interpretation process of a robotic command dependent from the specific environment where the utterance is expressed. Without any contestual information, a command such as "take the mug next to the keyboard" is ambiguous: it may in fact express the need of picking up the mug that is near the keyboard or to bring the mug whose position is not expressed toward a new position near the keyboard. Whithout knowing the actual placement of the mug
and the keyboard
in the environment, it is not possible to decide the suitable interpretation, i.e. correctly assign the intended meaning to the command.
HuRIC is based on the theory of Frame Semantics and captures cognitive information about the real-world situations and events expressed in sentences. The most interesting feature is that HuRIC is not system or robot dependent and these regards the type of accepted sentences and the adopted formalism for representing and extracting their interpretation.
In order to enable the learning of Grounded Language Interpretation processes, each command in HuRIC is paired with a Semantic Map, reflecting the naming and disposition of entities in the environment that are referred by the interpretation.
HuRIC is released as an open source resource, under the Apache 2.0 license.
HuRIC exploits different situations representing possible commands given to a robot in a house environment. The corpus is composed of different subsets, characterized by different order of complexity and designed to differently stress the language recognition architecture. Each sentence is annotated linguistically as well as conceptually. In linguistic terms lemmas, POS tags, dependency trees, and Frame Semantics are annotated over the sentence. Semantic frames and frame elements are associated to sentence fragments (e.g. verbs and their syntactic arguments) and correspoind to the adopted meaning representation formalisms for the underlying command: they also conceptually reflect the actions requested to a robot, that are usually the actions it can carry out in a home environment.
HuRIC provides commands in two different languages: English and Italian. While the English subset contains 656 sentences, 241 commands are available in Italian. Almost all Italian sentences are translations of the original commands in English and the corpus keeps an alignment between them.
The number of annotated sentences, number of frames, and further statistics are reported in Table 1.
English | Italian | |
---|---|---|
Number of examples | 656 | 241 |
Number of frames | 18 | 14 |
Number of predicates | 762 | 272 |
Number of roles | 34 | 28 |
Predicates per sentence | 1.16 | 1.13 |
Sentences per frame | 36.44 | 17.21 |
Roles per sentence | 2.02 | 1.90 |
Entities per sentence | 6.59 | 6.97 |
Table 1: HuRIC: some statistics |
Detailed statistics about the number of sentences for each frame and frame elements are reported in Table 2 and Table 3 for the English and Italian subsets, respectively.
Frame | Ex | Frame | Ex | Frame | Ex |
---|---|---|---|---|---|
Motion | 143 | Bringing | 153 | Cotheme | 39 |
Goal | 129 | Theme | 153 | Cotheme | 39 |
Theme | 23 | Goal | 95 | Manner | 9 |
Direction | 9 | Beneficiary | 56 | Goal | 8 |
Path | 9 | Agent | 39 | Theme | 4 |
Manner | 4 | Source | 18 | Speed | 1 |
Area | 2 | Manner | 1 | Path | 1 |
Distance | 1 | Area | 1 | Area | 1 |
Source | 1 | ||||
Locating | 90 | Inspecting | 29 | Taking | 80 |
Phenomenon | 89 | Ground | 28 | Theme | 80 |
Ground | 34 | Desired_state | 9 | Source | 16 |
Cognizer | 10 | Inspector | 5 | Agent | 8 |
Purpose | 5 | Unwanted_entity | 2 | Purpose | 2 |
Manner | 2 | ||||
Change_direction | 11 | Arriving | 12 | Giving | 10 |
Direction | 11 | Goal | 11 | Recipient | 10 |
Angle | 3 | Path | 5 | Theme | 10 |
Theme | 1 | Manner | 1 | Donor | 4 |
Speed | 1 | Theme | 1 | Reason | 1 |
Placing | 52 | Closure | 19 | Change_operational_state | 49 |
Theme | 52 | Containing_object | 11 | Device | 49 |
Goal | 51 | Container_portal | 8 | Operational_state | 43 |
Agent | 7 | Agent | 7 | Agent | 17 |
Area | 1 | Degree | 2 | ||
Being_located | 38 | Attaching | 11 | Releasing | 9 |
Theme | 38 | Goal | 11 | Theme | 9 |
Location | 34 | Item | 6 | Goal | 5 |
Place | 1 | Items | 1 | ||
Perception_active | 6 | Being_in_category | 11 | Manipulation | 5 |
Phenomenon | 6 | Item | 11 | Entity | 5 |
Manner | 1 | Category | 11 |
Table 2: Distribution of frames and frame elements in the English dataset
Frame | Ex | Frame | Ex | Frame | Ex |
---|---|---|---|---|---|
Motion | 51 | Locating | 27 | Inspecting | 4 |
Goal | 28 | Phenomenon | 27 | Ground | 2 |
Direction | 20 | Ground | 6 | Unwanted_entity | 2 |
Distance | 13 | Manner | 2 | Desired_state | 2 |
Speed | 8 | Purpose | 1 | Instrument | 1 |
Theme | 3 | ||||
Path | 2 | ||||
Manner | 1 | ||||
Source | 1 | ||||
Bringing | 59 | Cotheme | 13 | Placing | 18 |
Theme | 60 | Cotheme | 13 | Theme | 18 |
Beneficiary | 31 | Manner | 6 | Goal | 17 |
Goal | 26 | Goal | 5 | Area | 1 |
Source | 8 | ||||
Closure | 10 | Giving | 7 | Change_direction | 21 |
Container_portal | 6 | Theme | 7 | Direction | 21 |
Containing_object | 5 | Recipient | 6 | Angle | 9 |
Degree | 1 | Donor | 1 | Speed | 9 |
Taking | 22 | Being_located | 14 | Being_in_category | 4 |
Theme | 22 | Location | 14 | Item | 4 |
Source | 8 | Theme | 12 | Category | 4 |
Releasing | 8 | Change_operational_state | 14 | ||
Theme | 8 | Device | 14 | ||
Place | 3 |
Table 3: Distribution of frames and frame elements in the Italian dataset
This repository contains the whole HuRIC corpus, a collection of robotics commands.
It is composed of 2 versions, one for each language:
en
: the English version of HuRICit
: the Italian verison of HuRIC
The English version is further decomposed in 7 subsets, characterized by different order of complexity and designed to differently stress a labeling architecture.
The current release of HuRIC is made available through an XML-based format, whose extension is .hrc
. An example is provided below.
The targeted command is "take the mug next to the keyboard"
<?xml version="1.0" encoding="UTF-8"?>
<huricExample id="2650">
<commands>
<command>
<sentence>take the mug next to the keyboard</sentence>
<tokens>
<token id="1" lemma="take" pos="VB" surface="take" />
<token id="2" lemma="the" pos="DT" surface="the" />
<token id="3" lemma="mug" pos="NN" surface="mug" />
<token id="4" lemma="next" pos="JJ" surface="next" />
<token id="5" lemma="to" pos="TO" surface="to" />
<token id="6" lemma="the" pos="DT" surface="the" />
<token id="7" lemma="keyboard" pos="NN" surface="keyboard" />
</tokens>
<dependencies>
<dep from="0" to="1" type="root" />
<dep from="1" to="3" type="dobj" />
<dep from="3" to="2" type="det" />
<dep from="1" to="4" type="advmod" />
<dep from="4" to="7" type="nmod" />
<dep from="7" to="5" type="case" />
<dep from="7" to="6" type="det" />
</dependencies>
<semantics>
<frames>
<frame name="Bringing">
<lexicalUnit>
<token id="1" />
</lexicalUnit>
<frameElements>
<frameElement>
<type name="Theme" semanticHead="3" />
<span startId="2" endId="3" />
</frameElement>
<frameElement>
<type name="Goal" semanticHead="7" />
<span startId="4" endId="7" />
</frameElement>
</frameElements>
</frame>
</frames>
</semantics>
</command>
</commands>
<semanticMap>
<entities>
<entity atom="p1" type="Cup">
<attributes>
<attribute name="contain_ability">
<value>true</value>
</attribute>
<attribute name="preferred_lexical_reference">
<value>cup</value>
</attribute>
<attribute name="lexical_references">
<value>cup</value>
<value>mug</value>
<value>coffee cup</value>
<value>bowl</value>
</attribute>
</attributes>
<coordinate angle="0.0" x="2.0" y="5.0" z="0.0" />
</entity>
...
<entity atom="k1" type="Keyboard">
<attributes>
<attribute name="contain_ability">
<value>false</value>
</attribute>
<attribute name="lexical_references">
<value>keyboard</value>
<value>console</value>
</attribute>
</attributes>
<coordinate angle="0.0" x="4.0" y="1.0" z="0.0" />
</entity>
</entities>
</semanticMap>
<lexicalGroundings>
<lexicalGrounding atom="p1" tokenId="3" />
<lexicalGrounding atom="k1" tokenId="7" />
</lexicalGroundings>
</huricExample>
Hence, for each command, the following information are provided:
- the whole sentence (i.e.,
<sentence/>
tag), like the command abovetake the mug next to the keyboard
. - the list of tokens composing the command, along with the corresponding lemma and POS tags (i.e., the
<tokens/>
XML tag)- notice that each
token
is referred with anid
which is used in the rest of the file to refer to it.
- notice that each
- the syntactic information, in terms of dependency relations among tokens (i.e., the
<dependencies/>
tag)- in the example above a row like
<dep from="1" to="3" type="dobj" />
means that the third word referred by<token id="3" lemma="mug" pos="NN" surface="mug" />
expresses the direct object (i.e., thedobj
) of the main verb<token id="1" lemma="take" pos="VB" surface="take" />
; - dependency relations exist only for the English dataset and their tag is consistent with the Stanford Dependency Tagset.
- in the example above a row like
- the semantics, based on the Frame Semantics Theory and expressed by Frames (i.e., the
<frames/>
tag) and Frame elements (i.e., the<frameElements/>
tag):- even though a sentence may express an arbitrary number of frames, in the example above only the frame
Bringing
is expressed with two frame elements, i.e., theTheme
role spanning between the second and the third token (the mug
) and theGoal
role, instead spanning between the forth and the seventh token (next to the keyboard
); - for each frame element, the semantic head was marked through an attribute
semanticHead
: the main carier of the semantic meaning forTheme
ismug
, thus IDsemanticHead="3"
is appointed, whilekeyboard
is main carrier for theGoal
role, i.e.semanticHead="7"
;
- even though a sentence may express an arbitrary number of frames, in the example above only the frame
- the configuration of the environment, in terms of entities populating the Semantic Map (SM), along with their semantic attributes (i.e.,
semanticMap
tag):- each entity is identified by a unique id (
atom
) and characterized by atype
; in the example above, two objects are in the Semantic Map, such as the objectp1
which is an instance of the classCup
; - entities are extended through semantic or lexical
<attributes/>
; in the example above an instance of the classCup
may contain other entities, so that thecontainability
property istrue
; these attributes also encode the multiple lexical references that can be used to refer to the entitie, such ascup
,mug
orbowl
. - entities are localized within the environment through
<coordinate/>
which refer to an ideal gridmap.
- each entity is identified by a unique id (
- the gold groundings, providing gold mapping between linguistic symbols (namely, words of the sentence) and entities of the semantic map (i.e.,
lexicalGroundings
tag). In the example, the token with id3
(mug) refers to the entityp1
(Cup
), while token7
(keyboard) to entityk1
(Keyboard
).
This repository contains the HuRIC 2.1. The previous version of Huric is available at the following link: http://sag.art.uniroma2.it/demo-software/huric/
Main changelogs with respect to HuRIC 2.0:
- Added Semantic Head attribute to each frame element with the corresponding ID.
- Updated the Semantic Map with new Entities.
Main changelogs with respect to HuRIC 1.0:
- Added additioanl annotated examples for English.
- Added brand new examples for Italian.
- Each sentence is now paired with a corresponding Semantic Map.
Together with the corpus, we developed a Spoken Language Understanding system called LU4R, based on a cascade of sequential labelers, whose models have been trained over HuRIC. It has been designed also for a context-aware interpretation of spoken commands, consistently with the corpus.
More details on LU4R can be found at the following link:: http://sag.art.uniroma2.it/lu4r.html
If you use HuRIC for your research, please cite the following paper:
Andrea Vanzo, Danilo Croce, Emanuele Bastianelli, Roberto Basili, Daniele Nardi (2020): Grounded language interpretation of robotic commands through structured learning. In: Artificial Intelligence Volume 278, January 2020, 103181, 278, 2020.
@article{DBLP:journals/ai/VanzoCBBN20,
author = {Andrea Vanzo and
Danilo Croce and
Emanuele Bastianelli and
Roberto Basili and
Daniele Nardi},
title = {Grounded language interpretation of robotic commands through structured
learning},
journal = {Artificial Intelligence},
volume = {278},
year = {2020},
url = {https://doi.org/10.1016/j.artint.2019.103181},
doi = {10.1016/j.artint.2019.103181},
biburl = {https://dblp.org/rec/bib/journals/ai/VanzoCBBN20},
bibsource = {dblp computer science bibliography, https://dblp.org}
}
Andrea Vanzo, Danilo Croce, Emanuele Bastianelli, Roberto Basili, Daniele Nardi (2020): Grounded language interpretation of robotic commands through structured learning. In: Artificial Intelligence, Volume 278, January 2020, 103181, 278, 2020.
Emanuele Bastianelli and Giuseppe Castellucci and Danilo Croce and Roberto Basili and Daniele Nardi (2017): Structured learning for spoken language understanding in human-robot interaction. In: International Journal of Robotics Research, 36 (5-7), pp. 660–683, 2017.
Emanuele Bastianelli, Danilo Croce, Andrea Vanzo, Roberto Basili, Daniele Nardi (2016) A Discriminative Approach to Grounded Spoken Language Understanding in Interactive Robotics. In Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, {IJCAI} 2016, New York, NY, USA, 9-15 July
Emanuele Bastianelli, Giuseppe Castellucci, Danilo Croce, Luca Iocchi, Roberto Basili, Daniele Nardi (2014): HuRIC: a Human Robot Interaction Corpus. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14), European Language Resources Association (ELRA), Reykjavik, Iceland, 2014, ISBN: 978-2-9517408-8-4.
Emanuele Bastianelli, Giuseppe Castellucci, Danilo Croce, Roberto Basili, Daniele Nardi (2014): Effective and Robust Natural Language Understanding for Human Robot Interaction. In: Proceedings of the 21st European Conference on Artificial Intelligence (ECAI 2014), pp. 57 - 62, Prague, Czech Republic, 2014.