Data Collection and Annotation as a Service (DCaaS)
DCaaS is a microservice that helps in collection and annotation of input video data. The collected/annotated data can then be used for AI model training/fine-tuning and statistical analysis.
It supports 3 design patterns which is targeted to cover different deployment scenarios:
Design Pattern |
Remarks |
---|---|
Simple Data Collection |
Data ingest from source and save in local database |
Simple Data Collection with Human Annotation |
Data ingest from source and data annotation by a user along with review |
Simple Data Collection with Auto-Annotation |
Data ingest from source and data annotation by an (AI) algorithm |
Below figure outlines architecture of DCaaS. Please click here for more details
Git OS Python Ubuntu 20.04 3.8 Ubuntu 22.04 3.10 Note: See Setup for Development for
details 1. Install repo tool.
Clone DCaaS repository.
Install docker utilities and python packages required during first
installation
Create proxy configuration for docker. Create
``` Edit the Add Note: Set Run the builder script to generate deployment and configuration
files. To learn more, click
here Build the OEI usecase. To learn more, click
here
Start License Manager Before DCaaS or any other services can be started, we need to start
the license manager agent. DCaaS will do a license check during the
start of the app as well as during runtime periodically. The license
manager agent will allow DCaaS to check if the user has valid
entitlement for usage. Failure during this check will exit DCaaS
gracefully. For 1st time setup, please see
wiki If you have already done the first time setup, open a different
terminal and navigate to the directory where the Start DCaaS and other Services Stop the Services To run DCaaS in “Auto Annotation Mode” follow instructions
here To run DCaaS in “Simple Storage Mode” (stores the frames from
“data_filter” in DataStore) Remove Edit the default config.json as below Ensure that you have set Ensure that you have set License details can be found in License.pdf Please refer USAGE.md for more details If you run into any issue, please refer
TROUBLESHOOTING.mdContents
How to run DCaaS
Basic pre-requisites
Get Code base from GitHub
sh curl https://storage.googleapis.com/git-repo-downloads/repo > repo sudo mv repo /bin/repo sudo chmod a+x /bin/repo
2. Create a working directory and initialize the eii-manifest repo. For
more details, read
here
sh mkdir -p [WORKDIR] cd [WORKDIR] repo init -u "https://github.com/intel-innersource/applications.industrial.edge-insights.manifests.git" -b main repo sync
sh cd IEdgeInsights git clone https://github.com/intel-innersource/frameworks.ai.edgecsp.data-collection-as-a-service DataCollectionMicroservice cd DataCollectionMicroservice git checkout develop
Install pre-requisites and proxy setup
sh cd [WORKDIR]/IEdgeInsights/build sudo -E ./pre_requisites.sh ## run with proxy if needed # sudo -E ./pre_requisites.sh --proxy="<Proxy-DNS>:<port>"
[HOMEDIR]/.docker/config.json
and add the below with the
appropriate <Proxy-DNS>:<port>
```json { “proxies”: { “default”:
{ “httpProxy”: “http://:”, “httpsProxy”: “http://:”, “noProxy”:
“intel.com,*.intel.com,10.0.0.0/8,192.168.0.0/16,172.16.0.0/12” } } }Configure, Build and Run the Services
.env
file and add entries for the following:
sh # DEB packages source location PKG_SRC=http://eii-nightly-devops.iind.intel.com/latest # Host ip address to be updated here HOST_IP= ETCD_HOST= # Service credentials ETCDROOT_PASSWORD= INFLUXDB_USERNAME= INFLUXDB_PASSWORD= MINIO_ACCESS_KEY= MINIO_SECRET_KEY= # These are required to be updated in PROD mode only WEBVISUALIZER_USERNAME= WEBVISUALIZER_PASSWORD= # Path where remote backup will happen DCAAS_STORAGE_DIR=/path/to/persistent/remote/storage # set the below variable if DEV_MODE is set to `true
in the current .env file DCAAS_UDF_DIR=/udfs`
Edit the .env
file further as mentioned
here"DataCollectionMicroservice": ""
to the "subscriber_list"
in [WORKDIR]/IEdgeInsights/build/builder_config.jsonDCAAS_STORAGE_DIR
to the directory where you wish
to remotely save the annotations and images (ex:
/path/to/persistent/remote/storage) and run the below. Azure users
need additional setup before being able to upload data to azure.
Refer Remote Storage for more details.
sh sudo chmod 777 /path/to/persistent/remote/storage
(venv) python3 builder.py -f ../DataCollectionMicroservice/artifacts/sample-usecase.yml
sh (venv) docker compose -f docker-compose.yml build
LM_SCP_Start.sh
script is located (ideally the extracted directory during the 1st
time setup) sh ./LM_SCP_Start.sh
sh (venv) ./run.sh
Once the
above script runs, you should see the below prints. To check DCaaS
container running logs and for more information, please refer
Viewing logssh (venv) docker compose down -v
Using DCaaS in other modes
"DataCollectionMicroservice": ""
from the
"subscriber_list"
in
[WORKDIR]/IEdgeInsights/build/builder_config.json if it already
exists
enabled
"enabled": true
for
“simple_storage”"enabled": false
for both
“auto_annotation” and “human_annotation”License
More Details
Troubleshooting