COPKIT tools

The COPKIT project has developed data-driven policing technologies to support Law Enforcement Agencies in analysing, investigating, mitigating and preventing the use of new information and communication technologies by organized crime and terrorist groups.

The COPKIT toolkit has been designed to support an Early Warning/Early Action methodology which helps to explain how crime is evolving, identify “weak signals” or trends and send alerts about new risks (Early Warning), and form a basis for assisting decision-makers to develop Early Action (preparedness, mitigation, prevention, and other security policies).

According to its functionality, each component that was developed during the project is described here below under one of COPKIT’s six Early Warning/Early Action ecosystem phases: data collection, information extraction, information enrichment, knowledge discovery, assessment, and forecasting. An innovative Human Machine Interface (HMI) supports the analyst’s mental process using visual analytics. Finally, the COPKIT project developed means to facilitate the integration of tools in existing IT systems and the acquisition and take-up of new tools.

All of the developed tools, including their detection and knowledge-sharing capabilities, have been developed by applying ethical, legal (particularly regarding human rights and data protection), and societal aspects by-design to ensure that our tools are ethically acceptable and socially desirable.



Data Collection

Dark-web data collection tool – GENDSCRAP

Tool developed by Gendarmerie Nationale (GN)

Α crawler and a scraper which can operate on Clear Web and Dark Web (especially TOR). It allows to realize an exact snapshot of the scraped domain with a crawling policy defined by the user. The snapshot stores all data from the website as texts or images and other multimedia entities.


Information Extraction

Named Entity Recognition – CKNER

Tool developed by Austrian Institute of Technology (AIT)

A service integrating several state-of-the-art named entity recognizers, each of them with standard and domain specific models trained on generic corpora, which are focused on text data acquired through crawling darknet marketplaces offering weapons and drugs, focusing on short, poorly written texts.


Relationship Extraction – CKRELEXT

Tool developed by Austrian Institute of Technology (AIT)

REST service for recognizing relationships between entities (drugs, weapons, usernames, locations, etc.). It takes a text (e.g., one or several text paragraphs taken from a darknet market advert) as input and produces a named entity graph as a result. The component depends on the entities recognized by the CKNER.


Annotation Tool – RECOGITOJS

Tool developed by Austrian Institute of Technology (AIT)

An embeddable web component which can be integrated into web applications to provide annotation functionality for annotating named entities or relationships between those entities.


Dataset Repository – DSR

Tool developed by Austrian Institute of Technology (AIT)

A middleware component supporting the management of datasets and model versions, particularly of texts together with the corresponding learned model for information extraction and the corresponding annotations. The Dataset Repository integrates the RECOGITOJS embeddable annotation component as well as the components for Named Entity Recognition (CKNER) and Relationship Extraction (CKRELEXT) to support the annotation of dark net crawl data and demonstrate the recognition of entities and their relationships.


Moment Recognizer – MOREC

Tool developed by IBM Ireland (IBM)

REST service for extracting important moments form a multi-party conversation such as discussion forums in the context of Dark Net Markets. The input for the service is a series of text messages exchanged between users of a system, returning the important moments as events in JSON format.


Knowledge Enrichment

COPKIT Knowledge Base – COPWIK

Tool developed by Universidad de Granada (UGR)

Α repository storing expert and learnt knowledge relevant to the COPKIT ecosystem. It can be seen as a semantic database including definitions of concepts, instances of these concepts, and relations between them. It allows other components to access this knowledge through programming interfaces (APIs) in JSON format. It holds domain specific knowledge and links to general knowledge (e.g., knowledge not specifically related to crime or crime type, for instance, places, concepts etc.): for instance, URLs for markets, taxonomy and information about firearm weapons, legal classification, etc.


Knowledge Discovery

Frequent Item Set & Association Rules Discovery – FIS/ARD

Tool developed by Universidad de Granada (UGR)

Discovers associations (e.g., statistical correlation) between concepts or objects, expressed in a categorical form (although components provide concepts to transform numerical concepts into categorical, for instance prices into prices category (“cheap”, “normal”, “expensive”). Typical applications (in COPKIT): detects frequent associations in advertisement datasets between concepts such as: product sold, market platform, price category, time of publication, or indicated origin.


Graph Partitioning: Communities discovery & patterns of relationships – GP

Tool developed by Thales SIX GTS

Performs the detection of groups of similar nodes in a network of interactions such as networks of relationships between people, blogs, cryptocurrency accounts, etc. It can detect specific structures of criminal (covert) network differing from the ones commonly observed in Social Network Analysis.


Connection Finder – CF

Tool developed by Legind Technologies (LTA)

Searches for connections in graphs of entities (e.g., not limited to “persons” or identities but also heterogeneous) with uncertain links. Can be used in on-line (including dark net) investigations to find relationships between various elements of digital identities (for instance usernames, digital currency wallets…), in relationship with internal intelligence bases if available (and if graph merging is possible).



Contextualized Threat & Situation Assessment And Estimator – CTSAE

Tool developed by Thales Nederland (TNL)

Contextual Threat and Situation Assessment and Estimation. Probabilistic assessment classification approach incorporating Machine Learning and prior domain knowledge. Application to assessment of dark net market advertisements according to different concepts such as seriousness and priority for investigation using LEA’s definition and priorities.


Situation Assessment – SA

Tool developed by Legind Technologies (LTA)

It aims at automated monitoring and assessment of dark net ads as well as similar data, assessing the risks, threats, characteristics, and anomalies in ads, aggregating this data by selling goods, providers etc., and presenting the outputs on a comprehensive dashboard.



Spatial-Temporal Forecaster – STF

Tool developed by IBM Ireland (IBM)

Spatial Temporal Forecasting including predictors and factors. Targeting larger volumes of data. Application to dark net when geographical information is available and relevant for strategic trend analysis and policy decisions (for instance, analysis of advertisements for (certain) products and their indicated provenance).


Change Point Detection – CPD

Tool developed by Thales SIX GTS

It aims at detection changes in trends of temporal data and at providing means to detect changes in criminal activities, estimate the bias, clustering time periods and proposing expected values.


Context aware spatial temporal forecasting – CASTF

Tool developed by Thales Nederland (TNL)

Builds statistical models of spatial temporal phenomena, typical criminal activities taking into account explicit context/strategic level intelligence, crime drivers and spatial/geographical contexts. Using the models, the tool provides forecasts of criminal activities. The tool also detects non-trivial spatial temporal anomalies, e.g., anomalies that are not related to known crime facilitators.



Human Machine Interface – HMI

Tool developed by Thales Nederland (TNL)

Visualization for strategic and operational analysts enabling the Early Warning/Early Action methodology (e.g., the use of strategic insights in operational analysis and vice-versa). Supports the visualization of various types of data (annotated texts, graphs of relations and spatial temporal data). Includes sharing and access control functions, as well as elements supporting Ethical, Legal and Privacy aspects.



Secure Sharing & Adaptive Privacy – SSAP

Tool developed by Thales Nederland (TNL)

It is a middle-ware type of component, in charge of the information management between components, in particular data source (in a large sense: databases or results from other components) and a visualization component such as the component “HMI for usage of multilevel intelligence” enabling appropriate handling of security, privacy and uncertainty. It supports integration of legacy databases (also when they are distributed) without data migration and shield databases by exposing only the relevant data.


Graph Anonymization – GA

Tool developed by Thales SIX GTS

It performs the anonymization of a network of interactions. Such as a network of relationships between people, blogs, cryptocurrency accounts, etc., either removing metadata and attributes and leaving only (social) link structures or performing optimal deletion-insertion edge to achieve k-anonymization.


Take-up facilitation

Secure Test Lab – STL

Tool developed by Gendarmerie Nationale (GN)

A test environment (platform) enabling LEA to test data processing components at their location in a physically isolated environment (i.e., no internet) on their own data and thus assess the statistical performance and/or the usefulness on their data. This component targets the phase when LEAs are considering the acquisition or installation of tools provided by external partners.