You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There are scenarios in which elements in the types column returned by get_dbpedia_uris() are not named lists. This is a) inconsistent and b) results in errors when checking for the types_src which relies on named elements in this column.
Example
See the following example:
library(dbpedia)
library(quanteda)
inaugural_paragraphs <- data_corpus_inaugural |>
corpus_subset(Year == 2021) |>
corpus_reshape(to = "paragraphs")
get_dbpedia_uris(
x = inaugural_paragraphs["2021-Biden.145"],
language = getOption("dbpedia.lang"),
max_len = 5600L,
confidence = 0.5,
support = 20,
types = character(),
api = getOption("dbpedia.endpoint"), # English endpoint
verbose = FALSE,
progress = FALSE
)
This will result in an error:
Error in FUN(X[[i]], ...) : subscript out of bounds
Likely underlying issue
Currently, the way to populate the types column in get_dbpedia_uris() usually results in either an empty list (if there are no types for the entity) or a list of lists containing entity types (if there are types for an entity). The names of the nested lists refer to the source/ontology the type is derived from.
This fails, however, if the document passed to get_dbpedia_uris() has only one entity and only types from one source. In this case, types are added as unnamed list elements to the column. This seems to be happening only if resource_min (the data.table containing entities) has only one row.
Error with types_src
This, in itself, is inconsistent and should be addressed. However, the lack of a name in the column results in an error in the subsequent mechanism to extract and filter the types by their source via the types_src argument. This relies on the elements in types being named.
Potential Solution
I think that when preparing the types for the column, it would be necessary to check if
there are only types for a single element
these types are all from the same source
In case there is only one type of a single source, e.g. "Person" from "DBpedia", wrapping this value into an additional list() should work.
The text was updated successfully, but these errors were encountered:
Issue
There are scenarios in which elements in the
types
column returned byget_dbpedia_uris()
are not named lists. This is a) inconsistent and b) results in errors when checking for thetypes_src
which relies on named elements in this column.Example
See the following example:
This will result in an error:
Likely underlying issue
Currently, the way to populate the
types
column inget_dbpedia_uris()
usually results in either an empty list (if there are no types for the entity) or a list of lists containing entity types (if there are types for an entity). The names of the nested lists refer to the source/ontology the type is derived from.This fails, however, if the document passed to
get_dbpedia_uris()
has only one entity and only types from one source. In this case, types are added as unnamed list elements to the column. This seems to be happening only ifresource_min
(the data.table containing entities) has only one row.Error with
types_src
This, in itself, is inconsistent and should be addressed. However, the lack of a name in the column results in an error in the subsequent mechanism to extract and filter the types by their source via the
types_src
argument. This relies on the elements intypes
being named.Potential Solution
I think that when preparing the types for the column, it would be necessary to check if
In case there is only one type of a single source, e.g. "Person" from "DBpedia", wrapping this value into an additional
list()
should work.The text was updated successfully, but these errors were encountered: