OCI Function triggered by OCI Events when a new object is created on Object Storage. The funciton extracts metadata and text from various document types using Apache Tika and writes the output as JSON to OCI Streaming, Object Storage bucket and OpenSearch.
In the OCI console Go to the menu
Functions / Application
Check that you are in the right compartment
Name: e.g. docs-application
VCN: <your existing VCN>
Subnet: <your existing private subnet>
fn list context
fn list contextfn use context eu-frankfurt-1
fn update context oracle.compartment-id <your compartment ocid>
fn update context registry fra.ocir.io/xxxxxx/docsapp #xxxxxx is your tenancy namespace
docker login -u 'xxxxx/oracleidentitycloudservice/name@domain.com' fra.ocir.io #You will need to generate a token under your OCI user to log to the container registry.
git clone https://github.com/mkratky/docparser.git
cd docparser
fn -v deploy --app docs-application
After building the code, it will create a function docparser in the doc-application In the OCI console Go to the menu
Functions / Application
Add Key: OUTPUT_BUCKET , Value: <name of your existing Object Storage bucket e.g. docs-extract>
Add Key: STREAM_NAME , Value: <name of your existing Stream e.g. docsextract>
Add Key: SEARCH_ENDPOINT , Value: <your existing OpenSearch API endpoint>
Object storage will be used to contain the documents to index. In OCI console Go to the menu
Object Storage / Bucket
Choose the right compartment
Bucket name: e.g. docs-upload
Check: Emit Object Events
In OCI console Go to the menu
Event Rules
Check that you are in the right compartment
Display Name: docs-upload-rule
Add the Rules Condition:
Condition: Event Type
Service: Object Storage
Event Type: Object - Create, Object - Update
Add Another Condition
Condition: Attribute
Attribute Name: bucketName
Attribute value: docs-upload (Then press enter)
In the actions:
Action Type: Functions
Function Compartment: <your compartment name>
Function Application: docs-application
Fuction: docparser
The Dynamic Group will allow to give rights to the function to read the Object Storage. In OCI console Go to the menu
Dynamic Groups
Name: docs-fn-dyngroup
Description: docs-fn-dyngroup
Rule: ALL {resource.type = 'fnfunc', resource.compartment.id = '##COMPARTMENT_OCID##'}
where you need to replace the value ##COMPARTMENT_OCID## with your compartment ocid
In OCI console Go to the menu
Policies
Check that you are in the right compartment
Name: docs-fn-policy
Description: docs-fn-policy
Choose: Show manual editor
Copy paste the policies:
Allow service cloudEvents to use functions-family in compartment ##COMPARTMENT_NAME##
Allow dynamic-group docs-fn-dyngroup to manage objects in compartment ##COMPARTMENT_NAME##
Allow dynamic-group docs-fn-dyngroup to inspect streams in compartment ##COMPARTMENT_NAME##
Allow dynamic-group docs-fn-dyngroup to use stream-push in compartment ##COMPARTMENT_NAME##
where you need to replace the value ##COMPARTMENT_NAME## with your compartment name
To run the application upload your document(s) you wish to index to the Object storage bucket In OCI console Go to the menu
Object Storage / Bucket
Choose the right compartment
Choose Files from your Computer (Drop files here or select files)