-
Notifications
You must be signed in to change notification settings - Fork 0
/
en.search-data.min.5e4d5ad3cdaf1b87af6be3587c30253e5308a108d1986c475b60d77116ba13b2.json
1 lines (1 loc) · 42.3 KB
/
en.search-data.min.5e4d5ad3cdaf1b87af6be3587c30253e5308a108d1986c475b60d77116ba13b2.json
1
[{"id":0,"href":"/docs/guide-development/","title":"Guide Development","section":"Docs","content":"MMODA workflow development Guide # The MMODA platform provides access to the Astronomical Open Research Data Analysis Services (ORDAS). Good fraction of these services follow a simple scheme, they:\n access publicly available external astronomical data archives to fetch data relevant to specific source or source catalog, transform the fetched data using a workflow based on a Python notebook to derive a data product. display a preview of the data product on the MMODA frontend and/or return the data product to the user via Python API The users of MMODA are encouraged to become its developers and add new ORDAS. This help page provides a step-by step instructions on how to add new services to MMODA .\nWorkflows are all things that can be computed, broadly speaking. For reproducibility, we want our workflows to be repeatable: producing the same output every time they are computed. This looks easy enough in first approximation, but might be harder to achieve than it seems when the workflow relies on external data and compute resources and has to yield an \u0026ldquo;up-and-running\u0026rdquo; ORDAS in an ever-evolving compute environment. This help page is also aimed at helping the developers in ensuring reproducibility and reusability of their workflows converted to ORDAS.\nBuild a repeatable parameterized workflow # Suppose you have a jupyter notebook that derives some useful information (a lightcurve in the GeV gamma-ray band) on an astronomical source from data found in an astronomical data archive (in our example, it will be Fermi LAT data archive).\nThe first essential step in promoting the notebook to an ORDAS is to make the workflow of the notebook reusable by parameterizing it. For example, it is initeresting to enable generation of similar data products for different sources, by simply giving the source name or coordinates as input parameters to the workflow. It is also useful to explicitly tag the resulting data product (the lightcurve in our example) as the output, to make clear which of the numerous entities generated by the notebook is the final result. It is also possible to convert non-parametrized but strickly repeatable notebooks to services (for example, to assure reproducibility of a result published in a research publication), but this is less interesting since they always produce the same output data products.\nHow to designate input parameters and output cells of the notebook # In MMODA we use the approach of papermill) to tag the notebook cells that contain the input parameters and the outputs. In your notebook, you may create two dedicated cells with the \u0026ldquo;parameters\u0026rdquo; and \u0026ldquo;outputs\u0026rdquo; tags. In the Jupyter Lab environment this can be done by clicking on the cogwheel sign on the right top, red arrow in the image below, and adding new tag as pointed by the second red arrow: How to define input parameters in the dedicated parameters cell # The variables defined in the dedicated \u0026ldquo;parameters\u0026rdquo; cell, will be the input parameters of the workflow. They will be visualized in the frontend of the service and it will be possible to provide these parameters via the service API. For example, in the image of the parameters cell in our example (see above)\n the names of the declared variables will be used as parameter names in the MMODA service (except the default parameters, see below). For example, the Source_region_radius variable will be visible in the frontend as a query parameter with the same name. It will appear with a default value assigned to it (2. in the example notebook). if not annotated, the types of the inputs parameters are determined based on the parameter default value (would be float for the Source_region_radius parameter). otherwise, it is possible to customize the parameter by adding annotation the input parameter with an MMODA ontology item as a comment (after the hash sign, #http://odahub.io/ontology#AngleDegrees in the reference example of the Source_region_radius parameter. This may be useful for checking the validity of the user inputs. For example, the sky angle in degrees (defined by the #http://odahub.io/ontology#AngleDegrees) should be a float number and can take values between 0 and 360. it also possible to directly express additional restrictions on the parameter value by using annotation properties oda:lower_limit, oda:upper_limit, oda:allowed_value. For example, to define an angle with maximum value of 10deg, the annotation will be # oda:AngleDegrees; oda:upper_limit 10. Another example is a string parameter with possible values of \u0026ldquo;b\u0026rdquo;, \u0026ldquo;g\u0026rdquo;, \u0026ldquo;r\u0026rdquo;: oda:String; oda:allowed_value 'b','g','r'. to explicitly express units of the parameter, one can use predefined oda:ExpressedInUnit subclasses, like # oda:Float; oda:GeV or annotation property, like # oda:Float; oda:unit unit:GeV. Default parameters # Several default common parameters are always set by the MMODA frontend. These include:\n Type annotation Parameter default name http://odahub.io/ontology#PointOfInterestRA RA http://odahub.io/ontology#PointOfInterestDEC DEC http://odahub.io/ontology#StartTime T1 http://odahub.io/ontology#EndTime T2 http://odahub.io/ontology#AstrophysicalObject src_name The default parameters are common to all workflows in the MMODA ecosystem. They appear at the top of the MMODA frontend as shown below:\nIf the notebook contains parameters anotated with these types, their names will be automatically considered as the parameters appearing in the common parameter field of all services. If some of them are ommited, they will be added to the list of workflow parameters automatically.\nNote that both source name and source coordinates are passed to the workflow, and in principle there is no guarantee the coordinates are that of the source. We leave it up to the workflow developer to reconcile these parameters. Please explain the logic in the associated help page of the service.\nAdding annotations the entire notebook # Annotations can apply to parameters or entire notebook. In both cases they are kept in the notebook cell tagged parameters. For example:\n# oda:version \u0026quot;v0.1.1\u0026quot; # oda:reference \u0026quot;https://doi.org/10.1051/0004-6361/202037850\u0026quot; source_name = \u0026quot;Crab\u0026quot; # oda:AstrophysicalObject reference_energy = 20 # oda:keV Adding external resource annotations # In case your notebook explicitly calls some external resources, such as S3 storage or compute cluster this should be reflected in the annotations in the notebook cell tagged parameters. Below is the list of the resource annotations supported currently:\noda:S3 - S3 storageoda:Dask - Dask compute clusterAll kinds of resources may have resourceBindingEnvVarName property. If the resource is available the corresponding enviroment variable stores json with the credentials needed to access the resource.\nFor example, in the code below we declare the S3 storage:\n# oda:usesRequiredResource oda:MyS3 . # oda: MyS3 a oda:S3 . # oda: MyS3 oda:resourceBindingEnvVarName \u0026quot;MY_S3_CREDENTIALS\u0026quot; . In the code below we initialize the S3 storage session using the credentials provided by means of the environment variable:\nimport json import os from minio import Minio credentials_env = os.environ.get('MY_S3_CREDENTIALS') if credentials_env: credentials=json.loads(credentials_env) client = Minio( endpoint=credentials[\u0026quot;endpoint\u0026quot;], access_key=credentials[\u0026quot;access_key\u0026quot;], secret_key=credentials[\u0026quot;secret_key\u0026quot;], ) In the example below we declare dask cluster resource requirements in the parameter cell\n# oda:usesRequiredResource oda:MyDaskCluster . # oda: MyDaskCluster a oda:Dask . # oda: MyDaskCluster oda:memory_per_process \u0026quot;2G\u0026quot; . # oda: MyDaskCluster oda:n_processes \u0026quot;16\u0026quot; . # oda: MyDaskCluster oda:resourceBindingEnvVarName \u0026quot;MY_DASK_CREDENTIALS\u0026quot; . Here memory_per_process and n_processes define minimal requirements to the resource.\nIn the code below we open the dask cluster session\nimport json from dask.distributed import Client credentials_env = os.environ.get('MY_DASK_CREDENTIALS') if credentials_env: credentials=json.loads(credentials_env) client = Client(address=credentials[\u0026quot;address\u0026quot;]) Adding token annotations # In case your notebook uses token to access some resources this should be reflected in the annotations in the notebook cell tagged parameters in the following way:\n# oda:oda_token_access oda:InOdaContext . The above expression enables the standard mechanism to supply token using oda context variable. Then the token can be accessed from the code in the following way:\nfrom oda_api.api import get_context token = get_context()['token'] However, we recommend instead using higer level oda_api.token API, which also provides token validation and token discovery method as options. The code above is eqivalent to the following higher level code\nfrom oda_api.token import discover_token, TokenLocation token = discover_token(allow_invalid=True, token_discovery_methods=[TokenLocation.CONTEXT_FILE]) Below is the entire list of token locations supported:\nclass TokenLocation(Enum): ODA_ENV_VAR = \u0026quot;environment variable ODA_TOKEN\u0026quot; FILE_CUR_DIR = \u0026quot;.oda-token file in current directory\u0026quot; FILE_HOME = \u0026quot;.oda-token file in home\u0026quot; CONTEXT_FILE = \u0026quot;context file current directory\u0026quot; By default, token validation is enabled and the attempts are made to load the token from all the supported locations in the order they appear in the TokenLocation class.\nHow to annotate the notebook outputs # A cell tagged \u0026ldquo;outputs\u0026rdquo; defines the data product(s) that will be provided by the service:\nThe outputs may be strings, floats, lists, numpy arrays, astropy tables etc. They may be also strings which contain filenames for valid files. If they do, the whole file will be considered as the output. Similar to the \u0026ldquo;parameters\u0026rdquo; cell, the \u0026ldquo;outputs\u0026rdquo; cell should contain the definitions of the output variables followed by equality that assigns values to them and a comment that defines their type (for example, the variable lightcurve_astropy_table in the example shown above takes the value lightcurve which is an astropy table. the comment field # http://odahub.io/ontology#ODAAstropyTable specifies this in terms of the MMODA ontology. If you want to give more detailed description of the notebook input and output, use terms from the pre-defined ontology described here.\nThere is also one special type of the output annotation # http://odahub.io/ontology#WorkflowResultComment. An output variable of string type, annotated with it will be returned as a comment, shown in yellow field upon completion of the job.\nQuering external astronomical data archives from a notebook # It is very likely that your analysis workflow needs to access astronomical data retrievable from an external archive. A good practice is to avoid placing large volumes of data direcly into the container where the analysis notebook runs (this would overload the renkulab.io platform which we use for new service deployment). A better approach is to query the relevant data using online services, for example, Table Access Protocol (TAP) services, or services available through the astroquery Python module. In our example case of Fermi/LAT analysis, we use the astroquery.fermi module to query the archive of publicly available Fermi/LAT data from Fermi Science Support Center in the Cell 4 of the notebook Lightcurve.ipynb.\nOf course, relying on external data archives and data access services may pose a problem for reproducibility of results and potentially for the service operational stability. The external services may have downtime period, they may be upgraded and change their API etc. We leave it to the developer to make sure the requests to external services are operational and reproducible. We encourage the developers to supply tests (see guide) that will be automatically executed from time to time during service operations and are supposed to always yield exactly the same results. If this is not so, this may signal a problem with non-availability of some external services, in this cases you, as the service developer, will be alerted and invited to investigate the origin of the problem. It may also happen that the notebook would not produce the exactly the same result every time but still be reproducible (see motivation on the difference between reproducibility and repeatability).\nHandling exceptions # It can happen that your analysis workflow is expected to produce no data products in some cases, for example, if there is no data for a specified source and time interval, if the parameters specified by the user have wrong format, or in other \u0026ldquo;exceptions\u0026rdquo; In this case, it would be good to inform the user what happened. This can be done using the raise RuntimeError() method directly in the notebook, as shown below:\nDon\u0026rsquo;t worry if you do not succeed to foresee all possible exceptions at the initial development stage. If unforeseen exceptions would occur when the service is already deployed and available to users, each time an unforeseen exception occurs, you will be notified and invited to patch your notebook to handle this exception (perhaps raising a new RuntimeError() case in the appropriate cell).\nHow to add a test to the notebook # It is a good practice to test the developed notebook. This allows to make sure that the code remains valid in the future. A test is implemented as another notebook, except that name of the notebook starts with \u0026ldquo;test_\u0026rdquo;. The notebook should call other notebooks and check that the output matches expectations. See an example of such a test here.\nReporting progress for long running tasks # In case your computation task runs considerable amount of time and can be split into stages consider reporting task progress using ODA API class ProgressReporter:\nfrom oda_api.api import ProgressReporter pr = ProgressReporter() pr.report_progress(stage='simulation', progress=0, substage='step 1') # implement step 1 pr.report_progress(stage='simulation', progress=50, substage='step 2') # implement step 2 Make the notebook available for deployment on MMODA via renkulab.io # The parameterized workflow formulated as a Python notebook can be converted into a service provided by MMODA by a bot that scans a specific location astronomy/mmoda in the project directory on the renkulab.io collaborative platform. Creating a new project in this directory will make it visible for the bot. In our example of Fermi/LAT lightcurve workflow, it is in the fermi subdirectory of astronomy/mmoda.\nTo proceed along this way, you first you need to make sure your notebook runs correctly in the renkulab.io environment. You can start a new project in the astronomy/mmoda by clicking on the \u0026ldquo;Creat new project\u0026rdquo; button.\nYou will need to choose the name of the new project. This name will be the name of your service appearing at the MMODA frontend and discoverable via MMODA API. Place your project in the astronomy/mmoda namespace by specifying this in the \u0026ldquo;Namespace\u0026rdquo; field as shown below:\nChoose the \u0026ldquo;Python 3\u0026rdquo; template for the project and then click \u0026ldquo;Create project\u0026rdquo; button:\nTo start working on the newly created project, you can launch an interactive Jupyter lab session by clicking on the \u0026ldquo;Start\u0026rdquo; button:\nOnce in the Jupyter lab environment, you can update the project by uploading the notebook that you intend to promote to a service:\nYour notebook most probably imports some python packages that are not installed by default in a generic Python environment on renkulab.io. To add necessary packages, you need to update the requirements.txt and possibly environment.yml files with the packages you need:\nIn the example of Fermi/LAT analysis we are considering, packages astropy, matplotlib and astroquery packages will be needed. They can be added in the requirements.txt file as shown above.\nOnce you are done with uploading your notebook and adding missing Python packages into the requirements.txt file, you can commit changes to your project by going to the GitLab tab in the Jupyter lab interface. You will see files that have been added or modified appearing as such in the dedicated sub-panel as shown below:\nPromote these files to the \u0026ldquo;Staged\u0026rdquo; state by clicking at the \u0026ldquo;+\u0026rdquo; signs next to them and commit changes to your project by clicking at the \u0026ldquo;Commit\u0026rdquo; buttong just below the file list. Next, push the comitted changes to the Gitlab by pressing the \u0026ldquo;push\u0026rdquo; button (see the screenshot above).\nNow the CI/CD of the Gitlab will generate a new container in which all necessary Python packages and the notebook to be promoted to a service will be available. This may take a few minutes. You can test this new container in operation if you start a new interactive session on renkulab.io, using the container produced from your latest commit.\nIf the notebook runs as expected and produces correct outputs, you may proceed to the next stage and deploy a new service.\nPublish your workflow as an MMODA service # If you project is in the /astronomy/mmoda/ namespace, it is straightforward to convert it into a MMODA service. All you need to do is to add a live-workflow as a topic at the project Gitlab that you can access by clicking on the \u0026ldquo;GitLab\u0026rdquo; button at the main project page on renkulab.io:\nAt the GitLab webpage, go to the \u0026ldquo;Settings\u0026rdquo; section, the \u0026ldquo;Topic\u0026rdquo; field is in the \u0026ldquo;General\u0026rdquo; settings:\nNote that you may add multiple topics in this field. In the example of Fermi/LAT shown above, there is an additional topic \u0026ldquo;MM Gamma-rays\u0026rdquo; that helps MMODA users to classify workflows by the messenger and waveband types. Any topic which starts with \u0026ldquo;MM \u0026quot; (note the space) will be shown as a messenger label in MMODA, excluding the \u0026ldquo;MM \u0026quot; prefix. The additional topic will appear in the name of the tab of your workflow on MMODA frontend. The topics associated to your project are visible right below the project name on the Gitlab pages:\nOnce the project is associated to the live-workflow topic, it becomes visible to a bot that periodically scans the Gitlabs of the projects in the astronomy/mmoda domain, looking for new or modified \u0026ldquo;live\u0026rdquo; workflows that propose themselves as online services for MMODA. The bot will try to convert your notebook into a service and if this works, it will automatically add the new service MMODA (by default, on to its staging instance). You can monitor the progress of the bot work if you visit the \u0026ldquo;COntinuous Integration / Continuous Development\u0026rdquo; (CI/CD) section of the GitLab page of your project. It will show that a pipeline is in progress, both on the build of the updated renkulab project container image and on the \u0026ldquo;External\u0026rdquo; MMODA side:\nOnce the deployment is finished, you will recieve an email similar to this:\nYou may now connect to the MMODA frontend to test the functionalities of your service, check the correctness of appearance and format of the input parameters that you have specified in the parameters cell of the notebook, check the formatting and correctness of the data products that are produced upon pressing the \u0026ldquo;Submit\u0026rdquo; button on the frontend etc:\nNote that some of the input parameters in the example of the Fermi/LAT Lightcurve.ipynb notebook appear as multiple choice parameters with pre-defined values, while others are query fields. For some of the parameters, units are specified just below the query window. The names of the parameters are the names of the variables defined in the parameters cell of the notebook (see the screenshot of the parameters cell in the section above of this help page. Have a look in the example how this is regulated with parameter annotations.\nIf the outputs cell of your notebook contains multiple data products, they will be shown as a list at the MMODA frontend, as shown above. The names of the list items correspond to the names of the variables defined in the outputs cell. Each item of the list can be previewed or downloaded by clicking on the \u0026ldquo;View\u0026rdquo; button. The preview will depend on the type of the data product that has been specified after the comment hash # tag in the outputs cell.\nYou can explore different examples of the notebooks converted to services in the astronomy/mmoda domain on renkulab.io, to see how to format the inputs and outputs. If unsure, first take a look on this simple repository. You can also experiment with further possibilities exploring the ontology of the MMODA parameters and data products.\nBy default, all notebooks residing in the root of the repository (except the ones named as test_*.ipynb) will be converted to separate data-products. If notebooks are in the subdirectory, one needs to add the configuration file mmoda.yaml with notebook_path: \u0026quot;subfolder/path\u0026quot;. It\u0026rsquo;s also possible to include only some notebooks by putting into mmoda.yaml e.g. filename_pattern: \u0026quot;prefix_.*\u0026quot; to define the notebook name regex pattern.\nTo add a help for the workflow one have to create a file mmoda_help_page.md in the root of the repository. This file will be converted by the bot to the help page, associated with the corresponding instrument tab in the MMODA platform interface.\nThe file aknowledgemets.md is used to edit the acknowledgements text, which is shown at the bottom of the products window. The default text refers to the renku repository, in which the workflow was created.\nSupport the workflow development via renku plugin # To support the development of workflows in Renku, a set of dedicated funcitonailities, provided as Renku plugins, are made available. Specifically, these plugins aim to achieve the following:\n Offer a visualization of the project\u0026rsquo;s Knowledge Graph (renku-graph-vis plugin) Intercept calls to astroquery functions and store them in the project\u0026rsquo;s Knowledge Graph (renku-aqs-annotation plugin) See below how to install these plugins, which are not available by default.\nVisualizing project Knowledge Graph with renku-graph-vis plugin # This plugin provides a graphical representation of the renku repository\u0026rsquo;s knowledge graph, possibly from within the renku session.\nThe plugin provides two CLI commands:\n display to generate a representation of the graph over an output image show-graph to start an interactive visualization of the graph over the browser Furthermore, the plugin provides an interactive graph visualization feature for real-time monitoring during a renku session. To initiate or access the live graph visualization during your session, simply click on the Graph icon located on the main page, as shown in the image below.\nThe primary benefit introduced is the ability to have a live overview of the ongoing development within an interactive Renku session. This can be seen within the animation below, where the graph is automatically updated with information about the execution of a notebook upon its completion.\nThis visualization also includes the ODA ontology, providing valuable insights into the types of entities within it that are known to the ontology, and therefore helping during the workflow development. The image below displays a graph where the ODA ontology has been imported, and it can be seen that the SimbadClass node is an instance of the AstroqueryModule class, while Mrk 421 is an instance of the AstrophysicalObject class.\nMore technical details are presented in the README of the repo page: https://github.com/oda-hub/renku-graph-vis/\nInstallation # The plugin can be installed via pip on your own environment:\npip install renku_graph_vis Alternatively, it can be made available within a Renku session by adding it in the list of requirements of the Renku project, within your requirements.txt file.\nTracking access to astronomical archives and services in the project Knowledge Graph by using renku-aqs-annotation plugin # This plugin intercepts several key astroquery methods and stores annotations containing information about the calls to these methods (like the arguments used in the call) to the project\u0026rsquo;s Knowledge Graph: https://github.com/oda-hub/renku-aqs-annotation\nIn the image below, the information added to the project Knowledge Graph is highlighted. Specifically, it can be seen that during a papermill run of the test-notebook.ipynb notebook (that produced out.ipynb as an output notebook) a call to the astroquery method query_object, via the Simbadclass, has been detected. This notebook is requesting the object Mrk 421 object. The hightlighed labels on the edges provide information about the relationship between the two nodes: during the papermill execution, a call to the query_object method is executed (call label) and in turn, this requests the Astrophysical Object Mrk 421.\nInstallation # The plugin can be installed either via pip on your own evironment:\npip install renku_aqs_annotation Just like the renku_graph_vis plugin, the renku_aqs_annotation plugin can be made available within a Renku session by adding it to the list of requirements in your requirements.txt file for the Renku project.\nInform MMODA team and suggest automatic test cases to ensure service stability # Please contact MMODA team (see contact form) to inform that you have created a new Open Research Data Analysis Service (ORDAS). We will check the functionalities and stability of your service, and we can inform the user community on availability of this new service. We will also ask you to suggest automatic tests of service operations that will be performed from time to time, to make sure your service does not break with future updates or because of unavailability of external services providing the input data for your analysis workflow. We can also add acknowledgements to the data providers and to you as the workflow developer.\n"},{"id":1,"href":"/docs/guide-discovery/","title":"Guide Discovery","section":"Docs","content":"Workflow Publishing and Discovery with KG: Astronomer Guide # Latest-Version https://github.com/oda-hub/workflow-discovery/, also deployed as https://odahub.io/ Purpose of this note # We want to demostrate on concrete and scientifically-useful working examples how an astronomer, who might indeed have relatively little interest to look in the code, can leverage ODA Knowledge Base and Knowledge Graphs together with other valuable resources (especially Renku):\n collaborate on workflows discover and use ODA-built services discovery and use our record of globally available web-based data analysis services easily contribute your own analysis as web-servces annotate your work in the ways ready for consumption by the synthetic astronomer robots, making some of the reasonable reasoning for scienstics, and most of all, support scientists with discovery space empowering their irreplacible scientific capacities. It\u0026rsquo;s clear that much of this functionality is available in other frameworks, usually custom for purpose. We make use of many the re-usable open-source technologies, to support an ecosystem of tools and other developments, which can be re-used between some of our projects.\nDeveloping the workflow # The simplest way to build a workflow is to write a jupyter notebook. We will not go in every details here, see dedicated guide for step-by-step instructions.\nInstead, this document here focuses on workflow annotation, publishing, and discovery. These features are powered by an RDF Knowledge Graph. What exactly is stored in the Knowledge Graph, is described by the ontologies (which are themselves stored in some KG).\nOntology # We will describe here the simplest elements of the ontology, which are necessary for workflow annotation. We will not go into details about how to define various constrains and relations on/between things here.\nOntology describes relations between some things, terms (represented as RDF URIs). URI can look like a URL, e.g. https://odahub.io/ontology/sources/Mrk421 (the URL may or may not be leading to a real location, although it generally should). The URI can be also shortened, assuming a namespace prefix:\nPREFIX odaSources: \u0026lt;https://odahub.io/ontology/sources#\u0026gt; (see some default list of prefixes here)\nThis way, https://odahub.io/ontology/sources/Mrk421 becomes odaSources:Mrk421.\nIt is necessary to annotate the workflow with these terms. Specifically, to make relations between the workflow and these terms. Relations have a form of simply propositions, expressed as subject-predicate-object triples.\nFor example\noda:sdssWorkflow oda:isImportantIn oda:radioAstronomy . or\noda:sdssWorkflow astroquery:uses astroquery:sdssArchive . oda:sdssWorkflow oda:isAbout odaSources:Mrk421 . Consider that it is benefitial to use terms already used by other people, described in existing ontologies. This way we speak in the same language as other people, and will be able to more easily combine our resources. However, it can be quite an effort to understand what other people meant, which is necessary to use their terms correctly. This effort should be made conciously when possible. It is advisable to also discuss unclear points withing our group, and come to a common solution.\nParticular attention should be paid to International Virtual Observatory (IVOA) vocabulaires. See their rdf vocabularies here: https://www.ivoa.net/rdf/index.html and references therein. Developments used in variety of tables managed by CDS-Strasbourg, where much of the needed terms for astrophysical entities can be found (one can start here).\nIt is however, often important to adopt project-specific narrowed-down scope. For example, our understanding of what an AGN is, may differ from that of CDS-Strasbourg. Which is why, in unclear cases, we should not hesitate to use custom terms, such as odaSources:AGN. Then, we can also model and encode equivalence between our own understanding of the AGN with that of CDS. For example, as so:\nodaSources:Mrk421 oda:isSubclassOf odaSources:AGN . odaSources:AGN oda:equivalentTo cds:AGN . Later, these equivalences can be reduced under specific assumptions: for example some agent may assume that oda:equivalentTo implies literal substitution in all contexts.\nWorkflow inputs # For our purposes, the most important workflow properties are set by their inputs and outputs.\nWe will use nb2workflow (commands below will need pip install nb2workflow) to add addition details and instrospection on the workflow notebooks.\nname_input = \u0026#34;Mrk 421\u0026#34; # name of the object; if empty coordinates are used http://odahub.io/ontology/sourceName radius_input = 3.0 # arcmin They can be see for example with\n$ nbinspect final.ipynb ... \u0026#34;name_input\u0026#34;: { \u0026#34;comment\u0026#34;: \u0026#34; name of the object; if empty coordinates are used http://odahub.io/ontology/sourceName\u0026#34;, \u0026#34;default_value\u0026#34;: \u0026#34;Mrk 421\u0026#34;, \u0026#34;name\u0026#34;: \u0026#34;name_input\u0026#34;, \u0026#34;owl_type\u0026#34;: \u0026#34;http://odahub.io/ontology/sourceName\u0026#34;, \u0026#34;python_type\u0026#34;: \u0026#34;\u0026lt;class \u0026#39;str\u0026#39;\u0026gt;\u0026#34;, \u0026#34;value\u0026#34;: \u0026#34;Mrk 421\u0026#34; }, ... \u0026#34;radius_input\u0026#34;: { \u0026#34;comment\u0026#34;: \u0026#34; arcmin\u0026#34;, \u0026#34;default_value\u0026#34;: 3.0, \u0026#34;name\u0026#34;: \u0026#34;radius_input\u0026#34;, \u0026#34;owl_type\u0026#34;: \u0026#34;http://www.w3.org/2001/XMLSchema#float\u0026#34;, \u0026#34;python_type\u0026#34;: \u0026#34;\u0026lt;class \u0026#39;float\u0026#39;\u0026gt;\u0026#34;, \u0026#34;value\u0026#34;: 3.0 } ... Notice how in the first case, owl_type (OWL being the ontology definition language) is derived from the comment. And in the second case, it is derived from the variable type.\nTo generate a graph from the notebook:\n$ nb2rdf final-an.ipynb final.rdf This graph can be viewed, for example, with WebVOWL. It can also be published in a common location (see nb2service --help).\nTODO: show command to publish\nWorkflow properties from capturing workflow behavior # Renku (with a renku plugin) is currently able to deduce that workflow uses some algorithms, providing basis for useful automatic annotation. Current plugin is dedicated to ML algorithms.\nWe expect in the future to make an ODA-specific (or, more generally, astrquery-specific) renku plugin.\nOther Domain-specific knowledge # It is possible to assign any other characteristics to the workflow. It should be seem in case what makes sense. We used oda:importantIn predicate to assign relevance to some domains, e.g. domain:transients. In many cases these predicates can be assigned based on reasoning rules.\nFrom Workflow to Web-Based Data Analysis # We made a simple tool to present an HTTP service executing given notebook on demand. See https://github.com/oda-hub/nb2workflow\nReasoning workflows # Reasoning is transformation of knowledge.\nSince our knowledge base is the knowledge graph, our reasoning is transformation of the knowledge graph.\nI know, it may sound ambitious and unreasonable to claim capacity of out platform to reason. However, this terminology has been accepted in the community. This restricted form of general reasoning.\nIngesting data into the graph also transforms the graph. Moreoever, workflows ingesting data may be guided by the present, graph content. When possible, we separate reasoning from external source ingestion by ingesting first, and reasoning later: this allows to preserve. But it is not always feasible.\nReasoning is performed by executing these reasoning workflows: in response to external triggers, or just regularly.\nWorkflow exection # Curiusly, it is very convenient to see worklow execution as reasoning. See more details.\nOther reasoning rules # Various standard reasoning rules can be applied.\nLiterature # Literature parsing # Simple workflows to read astronomical and arxiv publications and produce some RDF.\nhttps://github.com/oda-hub/literature-to-facts\nLiterature building # Integrating data into paper. Adding another compile step deriving data from various sources (a lot of the time - workflow executions) and producing macroses for the latex.\nMade use-case first, for the easist possible latex work.\nhttps://github.com/oda-hub/linked-data-latex\nHuman interventions into the KG # Human agents are first-class citizens in the ODA KB/KB, on paar with the automated workflows. Humans are not very reproducible, but provide unique intuitively-guided inputs, owing to their own built-in very large but a bit vague Knowledge \u0026ldquo;Graphs\u0026rdquo;. Key aspect of our development here is to allow data and workflow interoperability. It is only natural that we are concerned with human-ODA unteroperability. Technically, we implement human interactions are implemented in the same way workflow executions.\nMost Humans experience the KB through various frontends. These multiple light-weight frontends allow making pre-defined actions, leading to workflow excutions.\nSome pre-built frontends for develoment needs are presented here:\nhttps://in.odahub.io/odatests/\nViewing computed workflows # As described in the details on reasoning engine computed workflows are fully curryied workflows are equivalent to simple data-fetching workflows.\nAdding a workflow # it should be as simple as pushing a button. They could be synchronized from Renku. If Renku will provide simple a limited public graph, we could directly use it, without reproducing it part of it in ODA KG.\nExample of adding new workflow which reacts on astro transients # TODO\n"},{"id":2,"href":"/docs/guide-ontology/","title":"Guide Ontology","section":"Docs","content":"Ontology # Purpose # Ontology defines terms in which we describe what we do: data, workflows, publications, etc.\nWhile creating and discovering workflows it is useful to learn to speak in these terms: find and assign suitable annotations. In an increasingly large number of cases we identify and assign annotations automatically.\nThe tools we develop commit to implement the common understanding of the terms.\nDiscovering terms to use # The terms look like URLs, e.g. https://odahub.io/ontology/#AstrophysicalObject . These URLs can be directly pasted in the browser, leading to some description:\nWe advise to look into public ontologies like https://www.ivoa.net/rdf/object-type/2020-10-06/object-type.rdf and https://odahub.io/ontology/ for the available terms. If it looks like there is nothing suitable there - it may be necessary to introduce new terms.\nAdvanced: It is also possible to look into an interactive graph explorer http://graphdb.obsuks1.unige.ch/ and https://share.streamlit.io/oda-hub/streamlite-graph/javascript-lib-interaction/main/main.py .\nAdding new terms to the ontology # Sometimes, it is necessary to add a new term. In principle, Workflow Developer may add a new term at will - it is their own understanding of what is being labeled. But the term will not be fully used until it is related to other terms in the Ontology, which is done either automatically or by the Ontology Developers.\nCore Ontology Developers can improve the common ontology with http://webprotege.obsuks1.unige.ch .\nThere is also an experimental edit interface here, the edited result should be stored and uploaded manually.\nExternal ontology changes should be suggested in here.\nOntology locations and versions # As ODA ontology is evolving, it is version-controlled. The version is based on git revision, and (should be) tracked in the ontology description. Also, at any given time, there are variants of the ODA ontology of different degrees of maturity.\nThe principal ontology is stored in git here https://github.com/oda-hub/ontology/blob/main/ontology.ttl and published as https://odahub.io/ontology/ontology.ttl.\n"},{"id":3,"href":"/docs/issues/","title":"Issues","section":"Docs","content":"What if a user experiences a problem? # Purpose # Explain to users how issues are handled, and what can be expected.\nProcess # user receives a kind message, \u0026ldquo;treatment redirected to humans, follow-up promised\u0026rdquo;. This may be delivered in the interactive session, and/or in the email. issue is addressed by the support, and request can be submitted again. It will be generally pre-computed by the time of the new request. user is informed by the platform that all is good, but it is clear to the user that the result is not satisfactory \u0026ldquo;feedback\u0026rdquo; button anything is unclear please feel free to contact contact@odahub.io Please also consider consulting http://status.odahub.io/ to check for any current problems.\n"},{"id":4,"href":"/docs/reasoning-engine/","title":"Reasoning Engine","section":"Docs","content":"Details about the reasoning engines # Workflows entities in the KG can undergo various transformations. One key transformation is currying, understood in the same way as function currying - since workflow, for our purposes, is very similar to a function. Currying transforms workflow with parameters with workflow with less parameters (arguments), possibly without any parameters. We assume that only workflow without parameters can be computed (executed).\nWorkflow execution is\nThis approach separates:\n workflow composition, which becomes one of the workflow transformation operations. workflow execution (computing) Reasoning engine is itself a process (workflow) which takes as an input some KG state, and produces new triples (which can be inserted back in the KG).\nCurrying worker # Execution # workflows have a property which describes what can execute them. Two forms used now are\nExecuting, computing the workflow is also a reasoning action, deriving equivalence between the given workflow and a trivial worklow which implements to request to data store.\n"},{"id":5,"href":"/docs/workflow-development-progression/","title":"Workflow Development Progression","section":"Docs","content":"Maintaining semantic coherence in workflow development progression: from jupyter notebooks to python modules, packages, API\u0026rsquo;s # At some point, it may be advisable to move part of code in functions of a python module (e.g. my_functions.py), stored in the same repository. The functions can be called from the workflow notebook as from my_functions import my_nice_function; my_nice_function(argument).\nIf some functions are often re-used, they can be stored in external packages, and even published on pypi (to allow pip install my_function_package).\nSometimes, the function may be in fact called remotely, though API. From the point of view of workflow (e.g. notebook) where the function is called there such a remotely executed function may look very similar to local function from a module, giving similar advantages and posing similar challenges.\nOn should be wary that extracting the functions somewhat obscure content of the workflow, by introducing structure which is not generally automatically traced by workflow execution provenance tracking.\nSo when reusable part of the workflow matures, it may be extracted and treated as another workflow, providing inputs to the current workflow under development.\nIt is not feasible to always design workflow to use other workflows by consuming some pre-computed inputs. As described above in this section, workflow development progression often separates some function from within the workflow, or uses. SmarkSky project and in a way in general renku plugins essentially acknowledges this feature of the workflows: they use external functions from within the code at random locations, possibly calling them multiple times.\nThis additional information about functions called by the workflow can be introduced to the workflow metadata with special annotations (see more about workflow annotation in ODA Workflow Publishing and Discovery Guide), such as oda:requestsAstroqueryService. These annotations should be also include information about parameters used to annotate the workflow. This additional structure associated with workflows will be ingested in the KG. While it can not be directly interpretted as workflow provenance graph, it is possible to produce additional similar-looking graph with inferred provenance, which is different but analogous to strict renku-derivde provenance.\n"}]