Files
2026-05-02 18:33:38 +03:00

20 KiB

tags, base_model, widget, pipeline_tag, library_name, metrics, model-index
tags base_model widget pipeline_tag library_name metrics model-index
sentence-transformers
sentence-similarity
feature-extraction
dense
generated_from_trainer
dataset_size:2400
loss:TripletLoss
sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2
source_sentence sentences
id=certification<NUM>@yahoo.com <NUM> Volume [<IP>] '<NUM>' id=c<NUM>a<NUM>ac<NUM> Latency Error to rendering connecting user:chorus_<NUM> [<NUM>a<NUM>bc] '<NUM>ecd<NUM>f' 'estimated<NUM>@example.org' started together [<NUM><NUM><NUM>] user:trying<NUM>@yandex.com present <NUM> id=<NUM>c<NUM>b<NUM>ad
'<NUM>';<NUM><NUM><NUM>;goals;failed;Client;'<IP>';Directory;killing;licence<NUM>@gmail.com;id=<NUM><NUM><NUM>;<NUM><NUM><NUM>;pound;Route;failed;authenticating;<NUM>;picture;through;Header;martin<NUM>@yahoo.com;<IP>;/var/log/unit.jpg;Route;deleted
id=positioning<NUM>@example.com;confidential;'/var/log/offer.awk';'/var/log/contain.dat';id=<NUM>;id=cute<NUM>@protonmail.com;'<NUM>';Packet;'<NUM>';locked;either;with;Transaction;updated;'<NUM>.<NUM>'
id=collaboration<NUM>@example.com <NUM> Volume [<IP>] '<NUM>' id=<NUM>ec<NUM>cbb Latency Error to rendering connecting user:depot_<NUM> [<NUM>eca] '<NUM>e<NUM>a<NUM>' 'prior<NUM>@yahoo.com' started together [<NUM><NUM><NUM>] user:solaris<NUM>@outlook.com present <NUM> id=<NUM>b<NUM>d<NUM>
source_sentence sentences
remote user:robbie_<NUM> <NUM> fundamental id=<NUM> User aborted user:/var/log/with.jpeg through '/var/log/love.md' cycling '<NUM>.<NUM>' private '<NUM>.<NUM>' 'indigenous_<NUM>' Database authenticating <NUM> 'universe<NUM>@protonmail.com' Query <NUM> id=chris_<NUM> names
user:/var/log/silver.doc <NUM> User remote <NUM> names aborted 'smoke<NUM>@duck.com' <NUM> authenticating '<NUM>.<NUM>' private cycling user:alto_<NUM> '<NUM>.<NUM>' id=<NUM> Query fundamental Database '/var/log/wall.mov' through id=jonathan_<NUM> 'identification_<NUM>'
fetching;[<NUM>ff<NUM>e<NUM>];available;HTTP/<NUM>;[<NUM>.<NUM>];POST;user:<NUM>.<NUM>;<NUM><NUM><NUM>;user:<NUM>;<NUM>.<NUM>;Session;System;user:san<NUM>@outlook.com;had;'<NUM>';user:/var/log/rich.tar.gz;Stack
remote user:dvds_<NUM> <NUM> fundamental id=<NUM> User aborted user:/var/log/from.csv through '/var/log/foot.dat' cycling '<NUM>.<NUM>' private '<NUM>.<NUM>' 'proposed_<NUM>' Database authenticating <NUM> 'exceptional<NUM>@protonmail.com' Query <NUM> id=website_<NUM> names
source_sentence sentences
projection;local;insecure;Thread;'<IP>';<IP>;[<NUM>];with;Interface;Buffer;updated;'/var/log/write.bmp';user:clearly_<NUM>;active;afford;id=<NUM>ab<NUM>;Latency;[strain<NUM>@live.com];stupid<NUM>@gmail.com;Key;created
projection;local;insecure;Thread;'<IP>';<IP>;[<NUM>];with;Interface;Buffer;updated;'/var/log/shoe.jar';user:mirrors_<NUM>;active;afford;id=bac<NUM>cfa;Latency;[associations<NUM>@yandex.com];laos<NUM>@example.org;Key;created
'commercial_<NUM>'|'/var/log/piece.tar.gz'|Table|user:catering_<NUM>|user:<NUM>|authorizing|'<IP>'|oxygen|URI|started|Component|Packet|<NUM><NUM><NUM>|Interface|'/var/log/made.exe'|GET|user:resist<NUM>@yahoo.com|Payload|[<NUM>]
Port|user:pdf_<NUM>|<NUM>|user:<NUM>.<NUM>|[<NUM>f<NUM>c<NUM>dc]|'adb<NUM>e<NUM>'|implementing|user:<NUM>cfb<NUM>e<NUM>a|<NUM>.<NUM>|discussed|<NUM>|Memory|id=/var/log/dance.m<NUM>u|<NUM>.<NUM>|ceo|remote|'<NUM>.<NUM>'|user:<NUM>a<NUM>|JS
source_sentence sentences
updated|national|rendering|comply|user:<NUM>|binding|Gateway|<IP>|resolving|responsible|[<NUM>]|'opportunities<NUM>@duck.com'|opens_<NUM>|JSON|retrying|Server|Error|'<NUM>ec<NUM>ca'|berkeley|id=<NUM>.<NUM>|System|torture|Job|id=f<NUM>d
connecting disconnected comes<NUM>@gmail.com unavailable Directory [/var/log/early.m<NUM>v] with memorabilia active Payload to Index 'watershed_<NUM>' validated created <NUM>ad<NUM>
origin<NUM>@yandex.com;'peaceful_<NUM>';user:<NUM>;URL;its;Gateway;Component;[<NUM>];[<NUM><NUM><NUM>];insecure;tune;'zero_<NUM>';Heap;HTTP/<NUM>;id=queue_<NUM>
updated|national|rendering|comply|user:<NUM>|binding|Gateway|<IP>|resolving|responsible|[<NUM>]|'tools<NUM>@duck.com'|jury_<NUM>|JSON|retrying|Server|Error|'e<NUM>a<NUM>b<NUM>ce'|berkeley|id=<NUM>.<NUM>|System|torture|Job|id=bb<NUM>bc
source_sentence sentences
authenticating YAML PATCH authorizing id=/var/log/seem.tar.xz [<NUM>] rendering 'pursue_<NUM>' [<NUM><NUM><NUM>] fresh online authenticating GET Heap CRITICAL Module id=bother_<NUM>
authenticating YAML PATCH authorizing id=/var/log/born.log [<NUM>] rendering 'school_<NUM>' [<NUM><NUM><NUM>] fresh online authenticating GET Heap CRITICAL Module id=brochure_<NUM>
user:<IP>;completed;<NUM>;id=/var/log/whose.jpg;user:<NUM>.<NUM>;resolving;allowed;Commit;Index;Daemon;building;length;hall;[/var/log/segment.doc];with
Heap;id=dim_<NUM>;[except<NUM>@gmail.com];dropped;determination;via;File;created;id=<NUM>;unavailable;id=/var/log/page.tar.xz;rendering;<NUM>b<NUM>ad<NUM>;id=/var/log/want.tar.gz;Kernel;JS;secure;HTTP/<NUM>;user:a<NUM>dd<NUM>d;user:<NUM><NUM><NUM>;resolving;Header
sentence-similarity sentence-transformers
cosine_accuracy
name results
SentenceTransformer based on sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2
task dataset metrics
type name
triplet Triplet
name type
structural val structural-val
type value name
cosine_accuracy 0.996666669845581 Cosine Accuracy

SentenceTransformer based on sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2

This is a sentence-transformers model finetuned from sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 128, 'do_lower_case': False, 'architecture': 'BertModel'})
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
    "authenticating YAML PATCH authorizing id=/var/log/seem.tar.xz [<NUM>] rendering 'pursue_<NUM>' [<NUM><NUM><NUM>] fresh online authenticating GET Heap CRITICAL Module id=bother_<NUM>",
    "authenticating YAML PATCH authorizing id=/var/log/born.log [<NUM>] rendering 'school_<NUM>' [<NUM><NUM><NUM>] fresh online authenticating GET Heap CRITICAL Module id=brochure_<NUM>",
    'Heap;id=dim_<NUM>;[except<NUM>@gmail.com];dropped;determination;via;File;created;id=<NUM>;unavailable;id=/var/log/page.tar.xz;rendering;<NUM>b<NUM>ad<NUM>;id=/var/log/want.tar.gz;Kernel;JS;secure;HTTP/<NUM>;user:a<NUM>dd<NUM>d;user:<NUM><NUM><NUM>;resolving;Header',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[ 1.0000,  0.9960, -0.1292],
#         [ 0.9960,  1.0000, -0.1269],
#         [-0.1292, -0.1269,  1.0000]])

Evaluation

Metrics

Triplet

Metric Value
cosine_accuracy 0.9967

Training Details

Training Dataset

Unnamed Dataset

  • Size: 2,400 training samples
  • Columns: sentence_0, sentence_1, and sentence_2
  • Approximate statistics based on the first 1000 samples:
    sentence_0 sentence_1 sentence_2
    type string string string
    details
    • min: 31 tokens
    • mean: 81.66 tokens
    • max: 128 tokens
    • min: 33 tokens
    • mean: 81.55 tokens
    • max: 128 tokens
    • min: 28 tokens
    • mean: 79.74 tokens
    • max: 128 tokens
  • Samples:
    sentence_0 sentence_1 sentence_2
    ERROR;[river_];;bit;.;watches;Table;user:.;/var/log/art.zip;/var/log/neck.docx;id=;.;schedules;watson_;DELETE;user:.;Session ERROR;[taxation_];;bit;.;watches;Table;user:.;/var/log/hunt.pps;/var/log/radio.z;id=;.;schedules;tab_;DELETE;user:.;Session [experiments_] id= watches DELETE Table user:. . . need_ /var/log/list.mov user:. schedules Session /var/log/pull.pptx bit ERROR
    divided;defence;binding;user:helmet@outlook.com;hours;user:;parsing;rocky;API;Gateway;started;by;flexible;by;INFO;Interface;Memory;teens;JS;fetching;deleted divided;defence;binding;user:night@protonmail.com;hours;user:;parsing;rocky;API;Gateway;started;by;flexible;by;INFO;Interface;Memory;teens;JS;fetching;deleted by;binding;Interface;user:;divided;INFO;parsing;API;Memory;teens;user:cells@example.org;started;Gateway;by;deleted;JS;defence;hours;fetching;flexible;rocky
    user:ced|queued||private|Session|blocked|at|user:bba|.|Rollback|Config||Config|user:margin@example.com|spawning||inactive user:ae|queued||private|Session|blocked|at|user:dbce|.|Rollback|Config||Config|user:travelers@yandex.com|spawning||inactive ;spawning;inactive;;user:dce;queued;Config;;user:promote@protonmail.com;Config;private;user:fad;at;Session;.;blocked;Rollback
  • Loss: TripletLoss with these parameters:
    {
        "distance_metric": "TripletDistanceMetric.COSINE",
        "triplet_margin": 0.5
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • per_device_train_batch_size: 64
  • per_device_eval_batch_size: 64
  • multi_dataset_batch_sampler: round_robin

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: no
  • prediction_loss_only: True
  • per_device_train_batch_size: 64
  • per_device_eval_batch_size: 64
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1
  • num_train_epochs: 3
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.0
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • bf16: False
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • parallelism_config: None
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • project: huggingface
  • trackio_space_id: trackio
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • hub_revision: None
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: no
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • liger_kernel_config: None
  • eval_use_gather_object: False
  • average_tokens_across_devices: True
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: round_robin
  • router_mapping: {}
  • learning_rate_mapping: {}

Training Logs

Epoch Step structural-val_cosine_accuracy
1.0 38 0.9950
2.0 76 0.9967

Framework Versions

  • Python: 3.12.2
  • Sentence Transformers: 5.1.2
  • Transformers: 4.57.1
  • PyTorch: 2.6.0+cu124
  • Accelerate: 1.12.0
  • Datasets: 4.4.1
  • Tokenizers: 0.22.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

TripletLoss

@misc{hermans2017defense,
    title={In Defense of the Triplet Loss for Person Re-Identification},
    author={Alexander Hermans and Lucas Beyer and Bastian Leibe},
    year={2017},
    eprint={1703.07737},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}