Sentence Embeddings in Android

An Android library that provides a port to sentence-transformers, which are used to generate sentence embeddings (fixed-size vectors for text/sentences)

Setup

1. Add the Jitpack repository to `settings.gradle.kts`

The library is hosted with Jitpack. Add the jitpack.io repository in settings.gradle.kts for Gradle to search Jitpack packages,

dependencyResolutionManagement {
    repositoriesMode.set(RepositoriesMode.FAIL_ON_PROJECT_REPOS)
    repositories {
        google()
        mavenCentral()
        maven{ url = uri("https://jitpack.io") }
    }
}

or with Groovy build scripts,

dependencyResolutionManagement {
    repositoriesMode.set(RepositoriesMode.FAIL_ON_PROJECT_REPOS)
    repositories {
        google()
        mavenCentral()
        maven { url "https://jitpack.io" }
    }
}

2. Add the dependency to `build.gradle.kts`

Add the Sentence-Embeddings-Android dependency to build.gradle.kts,

dependencies {
    // ...
    implementation("com.github.shubham0204:Sentence-Embeddings-Android:0.0.3")
    // ...
}

Sync the Gradle scripts and rebuild the project.

3. (Optional) Download the ONNX model and `tokenizer.json` for `all-MiniLM-L6-V2`

Note

You may download the model and the tokenizer at runtime, as the library only expects raw-bytes of these files. If you wish to include them in the app's package, then proceed with this step

The ONNX model and the tokenizer can be downloaded from the sentence-transformers/all-MiniLM-L6-v2 repository,

Download model.onnx
Download tokenizer.json

Place model.onnx and tokenizer.json in the assets folder of the application.

Usage

API

The library provides a SentenceEmbedding class with init and encode suspend functions that initialize the model and generate the sentence embedding respectively.

The init function takes two mandatory arguments, modelBytes and tokenizerBytes.

import com.ml.shubham0204.sentence_embeddings.SentenceEmbedding

val sentenceEmbedding = SentenceEmbedding()
val modelBytes: ByteArray = context.assets.open("all-MiniLM-L6-V2.onnx").use{ it.readBytes() }
val tokenizerBytes: ByteArray = context.assets.open("tokenizer.json").use{ it.readBytes() }
CoroutineScope(Dispatchers.IO).launch {
    sentenceEmbedding.init(
        modelBytes,
        tokenizerBytes
    )
}

Once the init functions completes its execution, we can call the encode function to transform the given sentence to an embedding,

CoroutineScope(Dispatchers.IO).launch {
    val embedding: FloatArray = sentenceEmbedding.encode( "Delhi has a population 32 million" )
    println( "Embedding: $embedding" )
    println( "Embedding size: ${embedding.size}")
}

Compute Cosine Similarity

The embeddings are vectors whose relative similarity can be computed by measuring the cosine of the angle between the vectors, also termed as cosine similarity,

Tip

Here's an excellent blog to under cosine similarity

private fun cosineDistance(
    x1: FloatArray,
    x2: FloatArray
): Float {
    var mag1 = 0.0f
    var mag2 = 0.0f
    var product = 0.0f
    for (i in x1.indices) {
        mag1 += x1[i].pow(2)
        mag2 += x2[i].pow(2)
        product += x1[i] * x2[i]
    }
    mag1 = sqrt(mag1)
    mag2 = sqrt(mag2)
    return product / (mag1 * mag2)
}

CoroutineScope(Dispatchers.IO).launch {
    val e1: FloatArray = sentenceEmbedding.encode( "Delhi has a population 32 million" )
    val e2: FloatArray = sentenceEmbedding.encode( "What is the population of Delhi?" )
    val e3: FloatArray = sentenceEmbedding.encode( "Cities with a population greater than 4 million are termed as metro cities" )
    
    val d12 = cosineDistance( e1 , e2 )
    val d13 = cosineDistance( e1 , e3 )
    println( "Similarity between e1 and e2: $d12" )
    println( "Similarity between e1 and e3: $d13" )
}

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
app		app
gradle		gradle
resources		resources
sentence_embeddings		sentence_embeddings
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
build.gradle.kts		build.gradle.kts
gradle.properties		gradle.properties
gradlew		gradlew
gradlew.bat		gradlew.bat
jitpack.yml		jitpack.yml
settings.gradle.kts		settings.gradle.kts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sentence Embeddings in Android

Setup

1. Add the Jitpack repository to `settings.gradle.kts`

2. Add the dependency to `build.gradle.kts`

3. (Optional) Download the ONNX model and `tokenizer.json` for `all-MiniLM-L6-V2`

Usage

API

Compute Cosine Similarity

About

Releases 2

Languages

License

shubham0204/Sentence-Embeddings-Android

Folders and files

Latest commit

History

Repository files navigation

Sentence Embeddings in Android

Setup

1. Add the Jitpack repository to settings.gradle.kts

2. Add the dependency to build.gradle.kts

3. (Optional) Download the ONNX model and tokenizer.json for all-MiniLM-L6-V2

Usage

API

Compute Cosine Similarity

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 2

Languages

1. Add the Jitpack repository to `settings.gradle.kts`

2. Add the dependency to `build.gradle.kts`

3. (Optional) Download the ONNX model and `tokenizer.json` for `all-MiniLM-L6-V2`