scitex_core.repro
Reproducibility utilities for scientific computing.
This module provides tools for ensuring reproducible scientific experiments: - Unique ID generation (gen_id) - Timestamp generation (gen_timestamp) - Array hashing for verification (hash_array) - Random state management across libraries (RandomStateManager)
- scitex_core.repro.gen_id(time_format='%YY-%mM-%dD-%Hh%Mm%Ss', N=8)[source]
Generate a unique identifier with timestamp and random characters.
Creates a unique ID by combining a formatted timestamp with random alphanumeric characters. Useful for creating unique experiment IDs, run identifiers, or temporary file names.
- Parameters:
- Returns:
Unique identifier in format “{timestamp}_{random_chars}”
- Return type:
Examples
>>> from scitex_core.repro import gen_id >>> id1 = gen_id() >>> print(id1) '2025Y-05M-31D-12h30m45s_a3Bc9xY2'
>>> id2 = gen_id(time_format="%Y%m%d", N=4) >>> print(id2) '20250531_xY9a'
>>> # For experiment tracking >>> exp_id = gen_id() >>> save_path = f"results/experiment_{exp_id}.pkl"
Notes
Random component uses alphanumeric characters (a-z, A-Z, 0-9)
Same timestamp will produce different IDs due to random component
IDs are suitable for filesystem use (no special characters)
- scitex_core.repro.gen_ID(time_format='%YY-%mM-%dD-%Hh%Mm%Ss', N=8)
Generate a unique identifier with timestamp and random characters.
Creates a unique ID by combining a formatted timestamp with random alphanumeric characters. Useful for creating unique experiment IDs, run identifiers, or temporary file names.
- Parameters:
- Returns:
Unique identifier in format “{timestamp}_{random_chars}”
- Return type:
Examples
>>> from scitex_core.repro import gen_id >>> id1 = gen_id() >>> print(id1) '2025Y-05M-31D-12h30m45s_a3Bc9xY2'
>>> id2 = gen_id(time_format="%Y%m%d", N=4) >>> print(id2) '20250531_xY9a'
>>> # For experiment tracking >>> exp_id = gen_id() >>> save_path = f"results/experiment_{exp_id}.pkl"
Notes
Random component uses alphanumeric characters (a-z, A-Z, 0-9)
Same timestamp will produce different IDs due to random component
IDs are suitable for filesystem use (no special characters)
- scitex_core.repro.gen_timestamp()[source]
Generate a timestamp string for file naming.
Returns a timestamp in the format YYYY-MMDD-HHMM, suitable for creating unique filenames or version identifiers.
- Returns:
Timestamp string in format “YYYY-MMDD-HHMM”
- Return type:
Examples
>>> from scitex_core.repro import gen_timestamp >>> timestamp = gen_timestamp() >>> print(timestamp) '2025-0531-1230'
>>> filename = f"experiment_{gen_timestamp()}.csv" >>> print(filename) 'experiment_2025-0531-1230.csv'
Notes
Format: YYYY-MMDD-HHMM (e.g., “2025-0531-1230”)
Month and day are zero-padded to 2 digits
Hour and minute are zero-padded to 2 digits
Suitable for filesystem use (no special characters except hyphen)
- scitex_core.repro.timestamp()
Generate a timestamp string for file naming.
Returns a timestamp in the format YYYY-MMDD-HHMM, suitable for creating unique filenames or version identifiers.
- Returns:
Timestamp string in format “YYYY-MMDD-HHMM”
- Return type:
Examples
>>> from scitex_core.repro import gen_timestamp >>> timestamp = gen_timestamp() >>> print(timestamp) '2025-0531-1230'
>>> filename = f"experiment_{gen_timestamp()}.csv" >>> print(filename) 'experiment_2025-0531-1230.csv'
Notes
Format: YYYY-MMDD-HHMM (e.g., “2025-0531-1230”)
Month and day are zero-padded to 2 digits
Hour and minute are zero-padded to 2 digits
Suitable for filesystem use (no special characters except hyphen)
- scitex_core.repro.hash_array(array_data)[source]
Generate hash for array data.
Creates a deterministic hash for numpy arrays, useful for verifying data integrity and reproducibility.
- Parameters:
array_data (np.ndarray) – Array to hash
- Returns:
16-character hash string
- Return type:
Examples
>>> import numpy as np >>> from scitex_core.repro import hash_array >>> data = np.array([1, 2, 3, 4, 5]) >>> hash1 = hash_array(data) >>> hash2 = hash_array(data) >>> hash1 == hash2 True
>>> # Different data produces different hash >>> data2 = np.array([1, 2, 3, 4, 6]) >>> hash3 = hash_array(data2) >>> hash1 != hash3 True
Notes
Uses SHA-256 hashing algorithm
Returns first 16 characters of hex digest
Same array will always produce same hash
Useful for detecting changes in data
- class scitex_core.repro.RandomStateManager(seed=42, verbose=False)[source]
Bases:
objectSimple, robust random state manager for scientific computing.
Provides centralized management of random number generators with deterministic seeding across multiple ML/scientific libraries.
- Parameters:
Examples
>>> from scitex_core.repro import RandomStateManager >>> >>> # Direct usage >>> rng_manager = RandomStateManager(seed=42) >>> gen = rng_manager("data") >>> data = gen.random(100) >>> >>> # Verify reproducibility >>> rng_manager.verify(data, "my_data") >>> >>> # Named generators for different purposes >>> data_gen = rng_manager("data") >>> model_gen = rng_manager("model") >>> augment_gen = rng_manager("augment")
Notes
Automatically detects and seeds available libraries (numpy, torch, tf, jax)
Creates independent named generators for different experiment components
Verification cache stored in ~/.scitex/rng/
- get_np_generator(name)[source]
Get or create a named NumPy random generator.
- Parameters:
name (str) – Generator name (e.g., “data”, “model”, “augment”)
- Returns:
Independent NumPy random generator
- Return type:
Examples
>>> rng_manager = RandomStateManager(42) >>> gen = rng_manager.get_np_generator("data") >>> values = gen.random(100) >>> perm = gen.permutation(100)
- __call__(name, verbose=None)[source]
Get or create a named NumPy random generator.
This is a convenience wrapper for get_np_generator().
- Parameters:
- Returns:
NumPy random generator with deterministic seed
- Return type:
- verify(obj, name=None, verbose=True)[source]
Verify object matches cached hash (detects broken reproducibility).
First call: caches the object’s hash Later calls: verifies object matches cached hash
- Parameters:
obj (Any) – Object to verify (array, tensor, data, model weights, etc.) Supports: numpy arrays, torch tensors, tf tensors, jax arrays, lists, dicts, pandas dataframes, and basic types
name (str, optional) – Cache name. Auto-generated from caller location if not provided.
verbose (bool, optional) – Print verification results (default: True)
- Returns:
True if matches cache (or first call), False if different
- Return type:
- Raises:
ValueError – If verification fails (object doesn’t match cached hash)
Examples
>>> data = generate_data() >>> rng_manager.verify(data, "train_data") # First run: caches >>> # Next run: >>> rng_manager.verify(data, "train_data") # Verifies match
- _compute_hash(obj)[source]
Compute hash for various object types.
Supports: - NumPy arrays - PyTorch tensors - TensorFlow tensors - JAX arrays - Pandas DataFrames/Series - Lists, tuples, dicts - Basic types (int, float, str, bool)
- Return type:
- checkpoint(name='checkpoint')[source]
Save current state of all generators.
- Parameters:
name (str, optional) – Checkpoint name (default: “checkpoint”)
- Returns:
Path to checkpoint file
- Return type:
Path
- restore(checkpoint)[source]
Restore from checkpoint.
- Parameters:
checkpoint (str or Path) – Path to checkpoint file
- temporary_seed(seed)[source]
Context manager for temporary seed change.
- Parameters:
seed (int) – Temporary seed value
Examples
>>> rng_manager = RandomStateManager(42) >>> with rng_manager.temporary_seed(123): ... data = np.random.random(10)
- get_sklearn_random_state(name)[source]
Get a random state for scikit-learn.
Scikit-learn uses integers for random_state parameter.
Examples
>>> rng_manager = RandomStateManager(42) >>> from sklearn.model_selection import train_test_split >>> X_train, X_test = train_test_split( ... X, test_size=0.2, ... random_state=rng_manager.get_sklearn_random_state("split") ... )
- get_torch_generator(name)[source]
Get or create a named PyTorch generator.
- Parameters:
name (str) – Generator name
- Returns:
PyTorch generator with deterministic seed
- Return type:
torch.Generator
Examples
>>> rng_manager = RandomStateManager(42) >>> gen = rng_manager.get_torch_generator("model") >>> torch.randn(5, 5, generator=gen)
- clear_cache(patterns=None)[source]
Clear verification cache files.
- Parameters:
patterns (str or list of str, optional) – Specific cache patterns to clear. If None, clears all. Can be: - Single name: “my_data” - List of names: [“data1”, “data2”] - Glob pattern: “experiment_*” - None: clear all cache files
- Returns:
Number of cache files removed
- Return type:
Examples
>>> rng_manager = RandomStateManager(42) >>> rng_manager.clear_cache() # Clear all >>> rng_manager.clear_cache("old_data") # Clear specific >>> rng_manager.clear_cache(["test1", "test2"]) # Clear multiple >>> rng_manager.clear_cache("experiment_*") # Clear pattern
- scitex_core.repro.get(verbose=False)[source]
Get or create the global RandomStateManager instance.
- Parameters:
verbose (bool, optional) – Whether to print status messages (default: False)
- Returns:
Global instance
- Return type:
Examples
>>> from scitex_core.repro import get >>> rng_manager = get() >>> data = rng_manager("data").random(100)