[BUG] precision cast in _process_input_tensor breaks integer target tensors (cross_entropy / nll_loss / kl_div)

## Summary

`src/kernelbench/eval.py::_process_input_tensor` unconditionally casts every input tensor to the configured `precision` (default `torch.float32`). For tasks whose `get_inputs()` returns integer class indices — e.g. `torch.randint(0, num_classes, (batch_size,))` for CrossEntropyLoss / NLLLoss / KLDivLoss problems — this turns the int64 target tensor into float32, which **breaks PyTorch's dispatch** for ops that require Long targets.

The user-visible failure is a misleading error from PyTorch:

```
NotImplementedError: "nll_loss_forward_reduce_cuda_kernel_2d_index"
  not implemented for 'Float'
```

This looks like a CUDA arch / dtype coverage gap (and we initially diagnosed it as a Blackwell PyTorch wheel issue), but the root cause is purely the cast in this helper.

## Affected code

https://github.com/ScalingIntelligence/KernelBench/blob/main/src/kernelbench/eval.py#L370-L391

```python
def _process_input_tensor(input, device, backend="cuda", precision=torch.float32):
    ...
    # sometimes things like init inputs are floats (like in the case of labels / targets, classification losses, etc.)
    if not isinstance(input, torch.Tensor):
        return input

    # cast to the desired percision dtype for activations
    input_tensor = input.to(dtype=precision)   # ← casts int64 → float

    return input_tensor.to(device=device)
```

The comment on line 383 explicitly recognizes that "labels / targets / classification losses" are a special case, but the implementation does not actually exempt them — every tensor gets cast.

## When was this introduced

The integer-dtype protection existed in earlier versions and was removed during the precision-support refactor in #80 (\"Precision Support + TileLang Integration\", merged 2025-11-05). Issue #79 was the original feature request; the regression was an unintended side-effect.

## Reproduction

Environment:
- PyTorch `2.9.0+cu128`
- CUDA 12.8
- GPU: any (we hit it on RTX 5090, but this is a dispatch issue, not arch-specific)

```bash
python3 scripts/generate_and_eval_single_sample.py \
    dataset_src=local level=1 problem_id=95 \
    server_type=openai model_name=<any-model> \
    eval_mode=local backend=cuda precision=fp32 \
    gpu_arch="['Ada']"
```

Without any LLM-generated kernel involved, the **reference implementation** in `level1/95_CrossEntropyLoss.py` raises `NotImplementedError` because the target tensor was cast to Float by `_process_input_tensor`.

Affected level_1 problems we've seen fail with this:
- `95_CrossEntropyLoss`
- `98_KLDivLoss`
- (any other problem whose `get_inputs()` returns integer indices)

## Impact

- Multiple level_1 problems are unevaluable on the current `main` regardless of which model is being benchmarked.
- The error path produces `compiled=False` in metrics, which falsely attributes the failure to the LLM-generated kernel rather than the eval harness.
- For RL / agent loops that consume these metrics, this introduces noise and potentially misdirected reward signal.

## Proposed fix

Skip the precision cast for non-floating-point tensors:

```python
def _process_input_tensor(input, device, backend="cuda", precision=torch.float32):
    if not isinstance(input, torch.Tensor):
        return input
    if not input.is_floating_point():
        return input.to(device=device)        # int / bool: only move
    return input.to(dtype=precision).to(device=device)
```

A PR is open at #<TBD> implementing this.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] precision cast in _process_input_tensor breaks integer target tensors (cross_entropy / nll_loss / kl_div) #151

Summary

Affected code

When was this introduced

Reproduction

Impact

Proposed fix

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[BUG] precision cast in _process_input_tensor breaks integer target tensors (cross_entropy / nll_loss / kl_div) #151

Description

Summary

Affected code

When was this introduced

Reproduction

Impact

Proposed fix

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions