THREATIQ, the reasoning engine of MOREAL employs advanced data science and machine learning models along with computations on graph and meta-graph models to provide predictive and analytical capabilities for threat detection and risk estimation. The traffic generated by malware when communicating with C&C servers tends to remain statistically consistent partially due to the considerable amount of effort required to change protocols and distribute code changes. Although some APT activities will continue to leverage never-before-seen malware, a significant number of ongoing APT campaigns can still be consistently detected through their network footprint. Worms exhibit predicted behavior as well, by trying to spread quickly, communicating with multiple new hosts and thus leaving footprint consisting of repetitive patterns. While C&C domain names and IP addresses are constantly updated, making it difficult to maintain a defense posture by blocking them alone, network patterns are less subject to change. Extrapolating methods and characteristics from known threat communication behaviors, thus deriving more generic and aggressive indicators, extends the range of detected threats in the network. The historical communication profile kept for each important entity (devices, hosts, networks, users) enables the application of Anomaly Detection and Time-series Prediction, and allows the tracking of changes in temporal and statistical behaviors which may indicate persistent attacks and ongoing campaigns. By these means, the historical data are utilized for the development of training data sets for the modeling components, and for the unsupervised detection of ongoing campaigns. Apart from worms and trojans, THREATIQ detects threats, which can be probabilistically detected including volumetric DDoS attacks, reflection attacks, brute force attacks, and suspicious port scanning and service discovery activities.
The THREATIQ design has strong learning capabilities and it continuously builds knowledge by adapting and evaluating intermediate results. The most important component regarding the self-learning capabilities is the historical profiling component which continuously updates the calculated statistical parameters for every important entity in the network. Based on these stored statistical metrics and sequential patterns, the unsupervised and anomaly detection subsystems are naturally adapting their behavior to recently acquired information. On the other hand, the supervised models are re-trained online taking into account user feedback and customer-related communication profiles. Finally, the algorithms are trained with robust non-exploitable features and thus the system is capable of adapting to the evolving adversarial environments.
THREATDB is the novel platform built by Crypteia Networks aiming at collecting and aggregating reliable data with respect to a variety of threats and vulnerabilities. THREATIQ is directly involved in the automated quality assessment and calculation of Threat Risk Indicators (TRI) for THREATDB entities. The platform introduces mechanisms for permitting a deeper assessment via a statistical analysis that is based on historical records and a multi-faceted metadata-based approach.
While most companies use Threat Intelligence indicator feeds as single blocklists, THREATIQ uses the feeds to learn about the threats. It builds upon the metadata-enriched information to train threat intelligence-based models and calculate risk indicators for several levels of entities, such as IPs, Domains and Networks. THREATIQ applies analytical methods, in order to filter-out unnecessary data noise (incl. false positives) and guarantee an appropriate confidence level for decision-making. The level of trustworthiness and reliability of reported threats is then propagated to the original sources guiding this way the information collection process.
THREATIQ extrapolates the knowledge of existing TI feeds taking steps as those taken by experienced analysts. The dynamic nature of threat related data is appropriately modeled taking into account collected historical information. The THREATIQ scoring models include the detection of Dynamically and Algorithmically Generated domains (DGA) while making keeping a low false-positive rate ensuring that domain names from content-delivery networks are excluded.. This operation is based on sophisticated natural language processing techniques utilizing n-grams, entropy-based models and other extracted features. The Ground-truth labeling needed for classifiers training is performed using several sources, including customer logs in a semi-supervised fashion. Non-malicious samples are mostly collected from external sources (incl. High-reputation sites and OpenDNS Random domains).
Entities are clustered utilizing attributes such as proximity and similarity by behavior. These clusters span different facets, including larger networks (e.g. botnets), ISPs and locations. To each cluster, a risk indicator is calculated and assigned, based on the count and features of the malicious members. Initially, each newly-encountered entity inherits the risk indicator of its cluster. A temporal decay scheme is applied to TRIs according to statistically determined parameters approaching a half-life attenuation function. This is based on the assumption that Bad Networks renovate, attackers change ISPs and proxies, while botnets may be shut down or relocate. Finally, THREATDB objects matching with entries in customer logs and confirmed by security devices receive higher scores.