Skip to content

Commit

Permalink
Backport #144 #140 to stable24 (#153)
Browse files Browse the repository at this point in the history
* Feature/impl#144 (#145)

* Register EventService class

* Fire TextRecognizedEvent

* Add TextRecognizedEvent class

* Create sidecar and add recognized text to result

* Added PdfOcrProcessor constructor argument

* Added recognizedText variable to class

* Added EventService

* Refactored TextRecognizeEvent

* Added EventService

* Fixed tests

* composer run cs:fix

* Basic code cleanup

Signed-off-by: Robin Windey <ro.windey@gmail.com>

* Adjustments for #144

* Add additional tests
* Refactor code to use more "high-level" SidecarFileAccessor

Signed-off-by: Robin Windey <ro.windey@gmail.com>

* Add docs for #144

* Add section for events to README.md
* Remove TOC workflow

Signed-off-by: Robin Windey <ro.windey@gmail.com>

* Fix php7.4 syntax

Signed-off-by: Robin Windey <ro.windey@gmail.com>

* Add check if event is emitted

Signed-off-by: Robin Windey <ro.windey@gmail.com>

* Change TextRecognizedEvent interface to be more generic

Linked to #144

* Adjust docs to match new interface

Signed-off-by: Robin Windey <ro.windey@gmail.com>

* Fix codecov

Signed-off-by: Robin Windey <ro.windey@gmail.com>

Signed-off-by: Robin Windey <ro.windey@gmail.com>
Co-authored-by: Guido Schmitz <g.schmitz@iurfriend.com>
Co-authored-by: Robin Windey <ro.windey@gmail.com>

* Implement #140 (#148)

* Implement #140

Get installed tesseract languages from backend

Signed-off-by: Robin Windey <ro.windey@gmail.com>

* Fix OcrBackendInfoServiceTest for #140

Signed-off-by: Robin Windey <ro.windey@gmail.com>

* Introduce specific CommandException

Signed-off-by: Robin Windey <ro.windey@gmail.com>

Signed-off-by: Robin Windey <ro.windey@gmail.com>

Signed-off-by: Robin Windey <ro.windey@gmail.com>
Co-authored-by: g-schmitz <g.schmitz@addedlifevalue.com>
Co-authored-by: Guido Schmitz <g.schmitz@iurfriend.com>
  • Loading branch information
3 people authored Sep 24, 2022
1 parent d89d43a commit cf9a9ec
Show file tree
Hide file tree
Showing 41 changed files with 1,601 additions and 166 deletions.
5 changes: 3 additions & 2 deletions .github/workflows/build.yml
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
name: Build artifact

on:
pull_request:
workflow_dispatch:

env:
Expand Down Expand Up @@ -36,5 +37,5 @@ jobs:
- name: Upload artifacts
uses: actions/upload-artifact@v1
with:
name: ${{ env.APP_NAME }}.tar.gz
path: ${{ env.APP_NAME }}/build/artifacts/appstore/${{ env.APP_NAME }}.tar.gz
name: ${{ env.APP_NAME }}.tar.gz
path: ${{ env.APP_NAME }}/build/artifacts/appstore/${{ env.APP_NAME }}.tar.gz
1 change: 1 addition & 0 deletions .github/workflows/codecov.yml
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,7 @@ jobs:
uses: actions/checkout@v2
with:
path: apps/${{ env.APP_NAME }}
fetch-depth: 0

- name: Set up php ${{ matrix.php-versions }}
uses: shivammathur/setup-php@v2
Expand Down
13 changes: 0 additions & 13 deletions .github/workflows/toc_generator.yml

This file was deleted.

1 change: 1 addition & 0 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -180,6 +180,7 @@ appstore:
--exclude="../$(app_name)/*.json" \
--exclude="../$(app_name)/*.lock" \
--exclude="../$(app_name)/*.cov" \
--exclude="../$(app_name)/psalm.xml" \
../$(app_name) \

.PHONY: test
Expand Down
95 changes: 67 additions & 28 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,36 +6,36 @@
[![Generic badge](https://img.shields.io/github/v/release/R0Wi/workflow_ocr)](https://github.com/R0Wi/workflow_ocr/releases)
[![Generic badge](https://img.shields.io/badge/Nextcloud-24-orange)](https://github.com/nextcloud/server)

<!-- START doctoc generated TOC please keep comment here to allow auto update -->
<!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE -->
## Table of contents

- [Setup](#setup)
- [App installation](#app-installation)
- [Nextcloud background jobs](#nextcloud-background-jobs)
- [Backend](#backend)
- [Usage](#usage)
- [Useful triggers](#useful-triggers)
- [Trigger OCR if file was created or updated](#trigger-ocr-if-file-was-created-or-updated)
- [Trigger OCR on tag assigning](#trigger-ocr-on-tag-assigning)
- [Settings](#settings)
- [Per workflow settings](#per-workflow-settings)
- [Global settings](#global-settings)
- [Testing your configuration](#testing-your-configuration)
- [How it works](#how-it-works)
- [General](#general)
- [PDF](#pdf)
- [Images](#images)
- [Development](#development)
- [Dev setup](#dev-setup)
- [Debugging](#debugging)
- [`docker`-based setup](#docker-based-setup)
- [Executing tests](#executing-tests)
- [Adding a new `OcrProcessor`](#adding-a-new-ocrprocessor)
- [Limitations](#limitations)
- [Used libraries & components](#used-libraries--components)

<!-- END doctoc generated TOC please keep comment here to allow auto update -->
- [Nextcloud Workflow OCR app](#nextcloud-workflow-ocr-app)
- [Table of contents](#table-of-contents)
- [Setup](#setup)
- [App installation](#app-installation)
- [Nextcloud background jobs](#nextcloud-background-jobs)
- [Backend](#backend)
- [Usage](#usage)
- [Useful triggers](#useful-triggers)
- [Trigger OCR if file was created or updated](#trigger-ocr-if-file-was-created-or-updated)
- [Trigger OCR on tag assigning](#trigger-ocr-on-tag-assigning)
- [Settings](#settings)
- [Per workflow settings](#per-workflow-settings)
- [Global settings](#global-settings)
- [Testing your configuration](#testing-your-configuration)
- [How it works](#how-it-works)
- [General](#general)
- [PDF](#pdf)
- [Images](#images)
- [Development](#development)
- [Dev setup](#dev-setup)
- [Debugging](#debugging)
- [`docker`-based setup](#docker-based-setup)
- [Executing tests](#executing-tests)
- [Adding a new `OcrProcessor`](#adding-a-new-ocrprocessor)
- [Events emitted by the app](#events-emitted-by-the-app)
- [`TextRecognizedEvent`](#textrecognizedevent)
- [Limitations](#limitations)
- [Used libraries & components](#used-libraries--components)

## Setup
### App installation
Expand Down Expand Up @@ -334,6 +334,45 @@ public static function registerOcrProcessors(IRegistrationContext $context) : vo

That's all. If you now create a new workflow based on your added mimetype, your implementation should be triggered by the app. The return value of `ocrFile(string $fileContent, WorkflowSettings $settings, GlobalSettings $globalSettings)` will be interpreted as the file content of the scanned file. This one is used to create a new file version in Nextcloud.

### Events emitted by the app

The app currently emits the following events from `lib/Events`. You can use these hooks to extend the app's functionality inside your own app.
Use the following sample code to implement a listener for the events:

```php
use OCA\WorkflowOcr\Events\TextRecognizedEvent;
use OCP\EventDispatcher\Event;
use OCP\EventDispatcher\IEventListener;
class TextRecognizedListener implements IEventListener {
public function handle(Event $event): void {
if (!$event instanceof TextRecognizedEvent) {
return;
}
// Do something with the event ...
}
}
```

Your implementation should then be registered in your app's `Application.php`:

```php
public function register(IRegistrationContext $context): void {
$context->registerEventListener(TextRecognizedEvent::class, TextRecognizedListener::class);
}
```

#### `TextRecognizedEvent`

This event will be emitted when a OCR process has finished successfully. It contains the following information:

| Method | Type | Description |
|--------|-------|------------|
| `getRecognizedText()` | `string` | Contains the text which was recognized by the OCR process. |
| `getFile()` | `OCP\Files\File` | The NC file node where the OCR processed file was stored to. |

> **Note:** this event will be emitted even if the OCR content was empty.

## Limitations
* **Currently only pdf documents (`application/pdf`) can be used as input.** Other mimetypes are currently ignored but might be added in the future.
* Pdf metadata (like author, comments, ...) is not available in the converted output pdf document.
Expand Down
5 changes: 3 additions & 2 deletions appinfo/routes.php
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,8 @@

return [
'routes' => [
['name' => 'GlobalSettings#getGlobalSettings', 'url' => '/globalsettings', 'verb' => 'GET'],
['name' => 'GlobalSettings#setGlobalSettings', 'url' => '/globalsettings', 'verb' => 'PUT']
['name' => 'GlobalSettings#getGlobalSettings', 'url' => '/globalSettings', 'verb' => 'GET'],
['name' => 'GlobalSettings#setGlobalSettings', 'url' => '/globalSettings', 'verb' => 'PUT'],
['name' => 'OcrBackendInfo#getInstalledLanguages', 'url' => '/ocrBackendInfo/installedLangs', 'verb' => 'GET']
]
];
14 changes: 14 additions & 0 deletions lib/AppInfo/Application.php
Original file line number Diff line number Diff line change
Expand Up @@ -28,13 +28,19 @@
namespace OCA\WorkflowOcr\AppInfo;

use OCA\WorkflowOcr\Helper\IProcessingFileAccessor;
use OCA\WorkflowOcr\Helper\ISidecarFileAccessor;
use OCA\WorkflowOcr\Helper\ProcessingFileAccessor;
use OCA\WorkflowOcr\Helper\SidecarFileAccessor;
use OCA\WorkflowOcr\Listener\RegisterFlowOperationsListener;
use OCA\WorkflowOcr\OcrProcessors\IOcrProcessorFactory;
use OCA\WorkflowOcr\OcrProcessors\OcrProcessorFactory;
use OCA\WorkflowOcr\Service\IEventService;
use OCA\WorkflowOcr\Service\EventService;
use OCA\WorkflowOcr\Service\GlobalSettingsService;
use OCA\WorkflowOcr\Service\IGlobalSettingsService;
use OCA\WorkflowOcr\Service\IOcrBackendInfoService;
use OCA\WorkflowOcr\Service\IOcrService;
use OCA\WorkflowOcr\Service\OcrBackendInfoService;
use OCA\WorkflowOcr\Service\OcrService;
use OCA\WorkflowOcr\Wrapper\CommandWrapper;
use OCA\WorkflowOcr\Wrapper\Filesystem;
Expand All @@ -46,7 +52,10 @@
use OCP\AppFramework\Bootstrap\IBootContext;
use OCP\AppFramework\Bootstrap\IBootstrap;
use OCP\AppFramework\Bootstrap\IRegistrationContext;
use OCP\ITempManager;
use OCP\WorkflowEngine\Events\RegisterOperationsEvent;
use Psr\Container\ContainerInterface;
use Psr\Log\LoggerInterface;

class Application extends App implements IBootstrap {
public const COMPOSER_DIR = __DIR__ . '/../../vendor/';
Expand All @@ -68,11 +77,16 @@ public function register(IRegistrationContext $context): void {
$context->registerServiceAlias(IViewFactory::class, ViewFactory::class);
$context->registerServiceAlias(IFilesystem::class, Filesystem::class);
$context->registerServiceAlias(IGlobalSettingsService::class, GlobalSettingsService::class);
$context->registerServiceAlias(IEventService::class, EventService::class);
$context->registerServiceAlias(IOcrBackendInfoService::class, OcrBackendInfoService::class);

// BUG #43
$context->registerService(ICommand::class, function () {
return new CommandWrapper();
}, false);
$context->registerService(ISidecarFileAccessor::class, function (ContainerInterface $c) {
return new SidecarFileAccessor($c->get(ITempManager::class), $c->get(LoggerInterface::class));
}, false);

$context->registerService(IProcessingFileAccessor::class, function () {
return ProcessingFileAccessor::getInstance();
Expand Down
8 changes: 8 additions & 0 deletions lib/BackgroundJobs/ProcessFileJob.php
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@
use OCA\WorkflowOcr\Helper\IProcessingFileAccessor;
use OCA\WorkflowOcr\Model\WorkflowSettings;
use OCA\WorkflowOcr\Service\IOcrService;
use OCA\WorkflowOcr\Service\IEventService;
use OCA\WorkflowOcr\Wrapper\IFilesystem;
use OCA\WorkflowOcr\Wrapper\IViewFactory;
use OCP\AppFramework\Utility\ITimeFactory;
Expand All @@ -57,6 +58,8 @@ class ProcessFileJob extends \OCP\BackgroundJob\QueuedJob {
private $rootFolder;
/** @var IOcrService */
private $ocrService;
/** @var IEventService */
private $eventService;
/** @var IViewFactory */
private $viewFactory;
/** @var IFilesystem */
Expand All @@ -72,6 +75,7 @@ public function __construct(
LoggerInterface $logger,
IRootFolder $rootFolder,
IOcrService $ocrService,
IEventService $eventService,
IViewFactory $viewFactory,
IFilesystem $filesystem,
IUserManager $userManager,
Expand All @@ -82,6 +86,7 @@ public function __construct(
$this->logger = $logger;
$this->rootFolder = $rootFolder;
$this->ocrService = $ocrService;
$this->eventService = $eventService;
$this->viewFactory = $viewFactory;
$this->filesystem = $filesystem;
$this->userManager = $userManager;
Expand Down Expand Up @@ -179,15 +184,18 @@ private function processFile(string $filePath, WorkflowSettings $settings) : voi
return;
}


$fileContent = $ocrFile->getFileContent();
$nodeId = $node->getId();
$originalFileExtension = $node->getExtension();
$newFileExtension = $ocrFile->getFileExtension();

if ($originalFileExtension === $newFileExtension) {
$this->createNewFileVersion($filePath, $fileContent, $nodeId);
$this->eventService->textRecognized($ocrFile, $node);
} else {
$this->createNewFileVersion($filePath.".pdf", $fileContent, $nodeId);
$this->eventService->textRecognized($ocrFile, $node);
}
}

Expand Down
41 changes: 41 additions & 0 deletions lib/Controller/ControllerBase.php
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
<?php

declare(strict_types=1);

/**
* @copyright Copyright (c) 2022 Robin Windey <ro.windey@gmail.com>
*
* @author Robin Windey <ro.windey@gmail.com>
*
* @license GNU AGPL version 3 or any later version
*
* This program is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as
* published by the Free Software Foundation, either version 3 of the
* License, or (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU Affero General Public License for more details.
*
* You should have received a copy of the GNU Affero General Public License
* along with this program. If not, see <http://www.gnu.org/licenses/>.
*
*/

namespace OCA\WorkflowOcr\Controller;

use OCP\AppFramework\Controller;
use OCP\AppFramework\Http\JSONResponse;

abstract class ControllerBase extends Controller {
protected function tryExecute(callable $function) : JSONResponse {
try {
$result = $function();
return new JSONResponse($result);
} catch (\Throwable $e) {
return new JSONResponse(['error' => $e->getMessage()], 500);
}
}
}
12 changes: 1 addition & 11 deletions lib/Controller/GlobalSettingsController.php
Original file line number Diff line number Diff line change
Expand Up @@ -28,14 +28,13 @@

use OCA\WorkflowOcr\Model\GlobalSettings;
use OCA\WorkflowOcr\Service\IGlobalSettingsService;
use OCP\AppFramework\Controller;
use OCP\AppFramework\Http\JSONResponse;
use OCP\IRequest;

/**
* This is the backend API controller for the Admin.vue component.
*/
class GlobalSettingsController extends Controller {
class GlobalSettingsController extends ControllerBase {
/** @var IGlobalSettingsService */
private $globalSettingsService;

Expand Down Expand Up @@ -66,13 +65,4 @@ public function setGlobalSettings(array $globalSettings) : JSONResponse {
return $this->globalSettingsService->getGlobalSettings();
});
}

private function tryExecute(callable $function) : JSONResponse {
try {
$result = $function();
return new JSONResponse($result);
} catch (\Throwable $e) {
return new JSONResponse(['error' => $e->getMessage()], 500);
}
}
}
53 changes: 53 additions & 0 deletions lib/Controller/OcrBackendInfoController.php
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
<?php

declare(strict_types=1);

/**
* @copyright Copyright (c) 2022 Robin Windey <ro.windey@gmail.com>
*
* @author Robin Windey <ro.windey@gmail.com>
*
* @license GNU AGPL version 3 or any later version
*
* This program is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as
* published by the Free Software Foundation, either version 3 of the
* License, or (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU Affero General Public License for more details.
*
* You should have received a copy of the GNU Affero General Public License
* along with this program. If not, see <http://www.gnu.org/licenses/>.
*
*/

namespace OCA\WorkflowOcr\Controller;

use OCA\WorkflowOcr\Service\IOcrBackendInfoService;
use OCP\AppFramework\Http\JSONResponse;
use OCP\IRequest;

/**
* This is the backend API controller which provides informations about the OCR backend system.
*/
class OcrBackendInfoController extends ControllerBase {
/** @var IOcrBackendInfoService */
private $ocrBackendInfoService;

public function __construct($AppName, IRequest $request, IOcrBackendInfoService $ocrBackendInfoService) {
parent::__construct($AppName, $request);
$this->ocrBackendInfoService = $ocrBackendInfoService;
}

/**
* @return JSONResponse
*/
public function getInstalledLanguages() : JSONResponse {
return $this->tryExecute(function () {
return $this->ocrBackendInfoService->getInstalledLanguages();
});
}
}
Loading

0 comments on commit cf9a9ec

Please sign in to comment.