Skip to content

Commit

Permalink
Documentation and example monobert
Browse files Browse the repository at this point in the history
  • Loading branch information
bpiwowar committed Jul 24, 2023
1 parent 7750213 commit 90126b1
Show file tree
Hide file tree
Showing 8 changed files with 41 additions and 15 deletions.
36 changes: 25 additions & 11 deletions docs/source/letor/samplers.rst
Original file line number Diff line number Diff line change
@@ -1,35 +1,38 @@
Samplers
--------

.. currentmodule:: xpmir.letor.samplers

Samplers provide samples in the form of *records*. They all inherit from:

.. autoxpmconfig:: xpmir.letor.samplers.Sampler
.. autoclass:: xpmir.letor.samplers.SerializableIterator
.. autoxpmconfig:: Sampler
.. autoclass:: SerializableIterator


Pointwise
=========

.. autoxpmconfig:: xpmir.letor.samplers.PointwiseSampler
.. autoxpmconfig:: PointwiseSampler
:members: pointwise_iter

.. autoxpmconfig:: xpmir.letor.samplers.PointwiseModelBasedSampler
.. autoxpmconfig:: PointwiseModelBasedSampler

Pairwise
=========

.. autoxpmconfig:: xpmir.letor.samplers.PairwiseSampler
.. autoxpmconfig:: xpmir.letor.samplers.PairwiseModelBasedSampler
.. autoxpmconfig:: PairwiseSampler
.. autoxpmconfig:: BatchwiseSampler
.. autoxpmconfig:: PairwiseModelBasedSampler
.. autoxpmconfig:: xpmir.documents.samplers.BatchwiseRandomSpanSampler

.. autoxpmconfig:: xpmir.letor.samplers.TripletBasedSampler
.. autoxpmconfig:: xpmir.letor.samplers.PairwiseDatasetTripletBasedSampler
.. autoxpmconfig:: TripletBasedSampler
.. autoxpmconfig:: PairwiseDatasetTripletBasedSampler

Hard Negatives Sampling (Tasks)
============
===============================

.. autoxpmconfig:: xpmir.letor.samplers.ModelBasedHardNegativeSampler
.. autoxpmconfig:: xpmir.letor.samplers.TeacherModelBasedHardNegativesTripletSampler
.. autoxpmconfig:: ModelBasedHardNegativeSampler
.. autoxpmconfig:: TeacherModelBasedHardNegativesTripletSampler

Distillation
============
Expand All @@ -45,3 +48,14 @@ Records for training

.. automodule:: xpmir.letor.records
:members: PointwiseRecord, PairwiseRecord


Document samplers
=================

Useful for pre-training or when learning index parameters (e.g. for FAISS).

.. currentmodule:: xpmir.documents.samplers
.. autoxpmconfig:: DocumentSampler
.. autoxpmconfig:: HeadDocumentSampler
.. autoxpmconfig:: RandomDocumentSampler
7 changes: 7 additions & 0 deletions docs/source/text/huggingface.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,13 @@ Encoders
.. autoxpmconfig:: SentenceTransformerTextEncoder
.. autoxpmconfig:: OneHotHuggingFaceEncoder


Tokenizers
==========

.. autoxpmconfig:: OneHotHuggingFaceEncoder
.. autoxpmconfig:: HuggingfaceTokenizer

Hooks
=====

Expand Down
2 changes: 1 addition & 1 deletion requirements.txt
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Experimaestro

experimaestro>=0.29.1
experimaestro>=1.0.0
datamaestro>=0.8.13
datamaestro_text>=2023.3.23
ir_datasets
Expand Down
3 changes: 2 additions & 1 deletion src/xpmir/letor/samplers.py
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,8 @@ def pairwise_iter(self) -> SerializableIterator[PairwiseRecord]:


class BatchwiseSampler(Sampler):
"""Batchwise samplers provide for each question a set of documents"""
"""Base class for batchwise samplers, that provide for each question a list
of documents"""

def batchwise_iter(self, batch_size: int) -> SerializableIterator[BatchwiseRecords]:
"""Iterate over batches of size (# of queries) batch_size
Expand Down
3 changes: 3 additions & 0 deletions src/xpmir/papers/monobert/configuration.py
Original file line number Diff line number Diff line change
Expand Up @@ -46,3 +46,6 @@ class Monobert(RerankerMSMarcoV1Configuration):

dev_test_size: int = 0
"""Development test size (0 to leave it like this)"""

base: str = "bert-base-uncased"
"""Identifier for the base model"""
2 changes: 1 addition & 1 deletion src/xpmir/papers/monobert/experiment.py
Original file line number Diff line number Diff line change
Expand Up @@ -115,7 +115,7 @@ def run(

monobert_scorer: CrossScorer = CrossScorer(
encoder=DualTransformerEncoder(
model_id="bert-base-uncased", trainable=True, maxlen=512, dropout=0.1
model_id=cfg.base, trainable=True, maxlen=512, dropout=0.1
)
).tag("scorer", "monobert")

Expand Down
2 changes: 2 additions & 0 deletions src/xpmir/papers/monobert/small.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,8 @@ description: |
This model has been trained on MsMarco v1 but only a few iterations (debug)
gpu: true
base: "microsoft/MiniLM-L12-H384-uncased"
dev_test_size: 50

validation:
Expand Down
1 change: 0 additions & 1 deletion src/xpmir/text/huggingface.py
Original file line number Diff line number Diff line change
Expand Up @@ -260,7 +260,6 @@ def static(self):
@deprecate
class HuggingfaceTokenizer(OneHotHuggingFaceEncoder):
"""The old encoder for one hot"""

pass


Expand Down

0 comments on commit 90126b1

Please sign in to comment.