Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for biblio-glutton 0.3 #1086

Merged
merged 6 commits into from
Sep 14, 2024
Merged

Support for biblio-glutton 0.3 #1086

merged 6 commits into from
Sep 14, 2024

Conversation

kermitt2
Copy link
Owner

@kermitt2 kermitt2 commented Feb 25, 2024

This PR enables the support of the latest version of biblio-glutton (0.3), which extends the bibliographical reference matching to HAL archive (around 3.5M records), beyond CrossRef records with DOI.

In practice, consolidation can now resolve raw bibliographical references against HAL records, in case it is not present in Crossref. HAL ID are also added when we have a DOI matching for a record also present on HAL.

For example, with the PDF https://hal.science/hal-04303155v2, we have several consolidated entries with HAL ID and no DOI in the bibliographical reference section:

                    <biblStruct xml:id="b14">
                        <analytic>
                            <title level="a" type="main">Grafting of nitrophenyl groups on carbon and metallic surfaces without electrochemical induction</title>
                            <author>
                                <persName>
                                    <forename type="first">A</forename>
                                    <surname>Adenier</surname>
                                </persName>
                            </author>
                            <author>
                                <persName>
                                    <forename type="first">E</forename>
                                    <surname>Cabet-Deliry</surname>
                                </persName>
                            </author>
                            <author>
                                <persName>
                                    <forename type="first">A</forename>
                                    <surname>Chaussé</surname>
                                </persName>
                            </author>
                            <author>
                                <persName>
                                    <forename type="first">S</forename>
                                    <surname>Griveau</surname>
                                </persName>
                            </author>
                            <author>
                                <persName>
                                    <forename type="first">Florian</forename>
                                    <surname>Mercier</surname>
                                </persName>
                            </author>
                            <author>
                                <persName>
                                    <forename type="first">J</forename>
                                    <surname>Pinson</surname>
                                </persName>
                            </author>
                            <author>
                                <persName>
                                    <forename type="first">Christine</forename>
                                    <surname>Vautrin-Ul</surname>
                                </persName>
                            </author>
                            <idno type="HALid">hal-00157436</idno>
                        </analytic>
                        <monogr>
                            <title level="j">Chem. Mater</title>
                            <idno type="ISSN">0897-4756</idno>
                            <imprint>
                                <biblScope unit="volume">17</biblScope>
                                <biblScope unit="page" from="491" to="501"/>
                                <date type="published" when="2005">2005</date>
                                <publisher>American Chemical Society</publisher>
                            </imprint>
                        </monogr>
                    </biblStruct>

as well as consolidated entries with both DOI and HAL ID:

                    <biblStruct xml:id="b16">
                        <analytic>
                            <title level="a" type="main">Evidence of the Grafting Mechanisms of Diazonium Salts on Gold Nanostructures</title>
                            <author>
                                <persName>
                                    <forename type="first">Stéphanie</forename>
                                    <surname>Betelu</surname>
                                </persName>
                            </author>
                            <author>
                                <persName>
                                    <forename type="first">Inga</forename>
                                    <surname>Tijunelyte</surname>
                                </persName>
                            </author>
                            <author>
                                <persName>
                                    <forename type="first">Leïla</forename>
                                    <surname>Boubekeur-Lecaque</surname>
                                </persName>
                            </author>
                            <author>
                                <persName>
                                    <forename type="first">Ioannis</forename>
                                    <surname>Ignatiadis</surname>
                                </persName>
                            </author>
                            <author>
                                <persName>
                                    <forename type="first">Joyce</forename>
                                    <surname>Ibrahim</surname>
                                </persName>
                            </author>
                            <author>
                                <persName>
                                    <forename type="first">Stéphane</forename>
                                    <surname>Gaboreau</surname>
                                </persName>
                            </author>
                            <author>
                                <persName>
                                    <forename type="first">Catherine</forename>
                                    <surname>Berho</surname>
                                </persName>
                            </author>
                            <author>
                                <persName>
                                    <forename type="first">Timothée</forename>
                                    <surname>Toury</surname>
                                </persName>
                            </author>
                            <author>
                                <persName>
                                    <forename type="first">Erwann</forename>
                                    <surname>Guenin</surname>
                                </persName>
                            </author>
                            <author>
                                <persName>
                                    <forename type="first">Nathalie</forename>
                                    <surname>Lidgi-Guigui</surname>
                                </persName>
                            </author>
                            <author>
                                <persName>
                                    <forename type="first">Nordin</forename>
                                    <surname>Felidj</surname>
                                </persName>
                            </author>
                            <author>
                                <persName>
                                    <forename type="first">Emmanuel</forename>
                                    <surname>Rinnert</surname>
                                </persName>
                            </author>
                            <author>
                                <persName>
                                    <forename type="first">Marc</forename>
                                    <surname>Lamy De La Chapelle</surname>
                                </persName>
                            </author>
                            <idno type="DOI">10.1021/acs.jpcc.6b06486</idno>
                            <idno type="HALid">hal-01685660</idno>
                        </analytic>
                        <monogr>
                            <title level="j">J. Phys. Chem. C</title>
                            <idno type="ISSN">1932-7447</idno>
                            <imprint>
                                <biblScope unit="volume">120</biblScope>
                                <biblScope unit="issue">32</biblScope>
                                <biblScope unit="page" from="18158" to=" 18166"/>
                                <date type="published" when="2016">2016</date>
                                <publisher>American Chemical Society</publisher>
                            </imprint>
                        </monogr>
                    </biblStruct>

@lfoppiano lfoppiano added this to the 0.8.1 milestone May 21, 2024
@coveralls
Copy link

Coverage Status

coverage: 40.77% (-0.02%) from 40.787%
when pulling a77114d on glutton-0.3
into 694f0ed on master.

Copy link
Collaborator

@lfoppiano lfoppiano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kermitt2 I tested and revised the code, it seems working well and I left a couple of comments.

@lfoppiano
Copy link
Collaborator

I tested grobid 0.8.0 with both glutton 0.2 and glutton 0.3, and grobid 0.8.1 with both glutton 0.2 and 0.3. They all work fine.

I also ran into some problems with ES on my side and wrote a troubleshooting section in the documentation, in case this happens again.

@lfoppiano lfoppiano merged commit 4b2bda6 into master Sep 14, 2024
5 of 7 checks passed
@lfoppiano lfoppiano deleted the glutton-0.3 branch September 14, 2024 09:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants