..
 # Copyright (c) 2025, Arm Limited.
 #
 # SPDX-License-Identifier: MIT

############################
Kernel Regression Bisection
############################

Fastpath provides automated kernel regression bisection to identify which specific 
kernel commit introduced a performance regression. This uses Git bisection integrated 
with Fastpath's benchmarking and analysis capabilities.

********
Overview
********

The bisection process:

1. **Prepare** - Identify good/bad kernels and create bisection context (fastpath bisect start)
2. **Execute** - Git bisect automatically builds, tests, and evaluates each commit (git bisect run)
3. **Analyze** - Git bisect identifies the first bad commit (git bisect log)

Prerequisites
=============

* A result store with benchmark results from both:

  * **Good swprofile** - Known working version with acceptable performance
  * **Bad swprofile** - Version showing the regression

* Both swprofiles must have been tested with:

  * Same SUT (System Under Test)
  * Same benchmark
  * Identical configuration (cmdline, sysctl, bootscript)
  * Only difference should be the kernel git SHA

* SSH access to the SUT for running tests
* Kernel source repository for building commits

*****************
Prepare Context
*****************

Create a bisection context file with ``fastpath bisect start``:

.. code-block:: shell

  # Basic command structure
  fastpath bisect start \
    --host <hostname> --user <user> --port <port> --keyfile <keyfile> \
    --sut <sut-id> \
    --good-swprofile <good-id> --bad-swprofile <bad-id> \
    --benchmark <suite/name> --resultclass <metric> \
    --resultstore <url> --context <output.yaml>

Example

.. code-block:: shell

  fastpath bisect start \
    --host test-server --user root --keyfile ~/.ssh/id_rsa \
    --sut "Ampere Altra Max" \
    --good-swprofile "6.8.0-baseline" --bad-swprofile "6.9.0-regression" \
    --benchmark "sysbench/thread" --resultclass "sysbenchthread-110" \
    --context ./bisection_context.yaml

This validates baseline results exist, profiles match (except ``kernel_git_sha``), and
SUT is single-node. Then creates a temporary result store with baseline copies and
generates ``bisection_context.yaml``.

.. note::

   **Result Store Isolation:** Original result store remains read-only. A temporary store
   holds baseline copies plus new bisection results.

**Generated Context File:**

The ``bisection_context.yaml`` contains:

* Test plan (SUT connection, benchmark config, shared profile fields)
* Baseline profile names and kernel git SHAs
* Resultclass for performance evaluation
* Result store paths (original and temporary)

.. code-block:: yaml

   plan:
     sut:
       name: "Ampere Altra Max"
       connection:
         method: SSH
         params: {...}
     swprofiles:
       - cmdline: [...]
         sysctl: []
         bootscript: []
     benchmarks:
       - suite: sysbench
         name: thread
         ...
     defaults:
       benchmark:
         warmups: 1
         repeats: 3
         sessions: 2
   good-swprofile: "6.8.0-baseline"
   good_sha: "a1b2c3d4e5f6"
   bad-swprofile: "6.9.0-regression"
   bad_sha: "f6e5d4c3b2a1"
   resultclass: "sysbenchthread-110"
   resultstore: "mysql://..."
   output-resultstore: "/tmp/bisect-resultstore-abc123/"

***************
Run Bisection
***************

Run automated bisection in your kernel source repository:

.. code-block:: shell

  cd /path/to/kernel/source
  
  # Extract SHAs and start git bisect
  export GOOD_SHA=$(python3 -c "import yaml; print(yaml.safe_load(open('bisection_context.yaml'))['good_sha'])")
  export BAD_SHA=$(python3 -c "import yaml; print(yaml.safe_load(open('bisection_context.yaml'))['bad_sha'])")
  
  git bisect start
  git bisect good $GOOD_SHA
  git bisect bad $BAD_SHA
  
  # Run automated bisection
  git bisect run /path/to/fastpath/scripts/execute_bisection.sh \
    /path/to/bisection_context.yaml

The bisection script will:

1. Build the kernel for each commit tested
2. Create a unique swprofile named ``bisect-<sha>`` (first 12 chars of SHA)
3. Execute the benchmark on the SUT
4. Compare results against good/bad baselines using confidence intervals
5. Report to git bisect: GOOD (0), BAD (1), SKIP (125), or ERROR (128)

Git will automatically test commits until it identifies the first bad commit.

For more control over each bisection step:

.. code-block:: shell

  # Manual bisection loop
  cd /path/to/kernel/source
  
  # Extract SHAs and start git bisect
  export GOOD_SHA=$(python3 -c "import yaml; print(yaml.safe_load(open('bisection_context.yaml'))['good_sha'])")
  export BAD_SHA=$(python3 -c "import yaml; print(yaml.safe_load(open('bisection_context.yaml'))['bad_sha'])")
  
  git bisect start
  git bisect good $GOOD_SHA
  git bisect bad $BAD_SHA
  
  # For each commit picked by git bisect, repeat until done:
  
  # 1. Build the kernel
  ./scripts/build_local_kernel.sh
  source ./scripts/.env
  
  # 2. Test and evaluate
  fastpath bisect run \
    --context bisection_context.yaml \
    --kernel $KERNEL_PATH \
    --modules $MODULES_PATH \
    --gitsha $GITSHA
  
  # 3. Mark commit based on exit code (0=good, 1=bad, 125=skip)
  git bisect good   # if exit code is 0
  git bisect bad    # if exit code is 1
  git bisect skip   # if exit code is 125
  
  # Git bisect picks next commit and repeats until first bad commit found

*******************
Bisection Output
*******************

**During each bisection step:**

.. code-block:: text

  Building kernel for current commit...
  Executing plan.yaml...
  Result: REGRESSION detected for resultclass 'sysbenchthread-110' 
          comparing 'bisect-a1b2c3d4e5f6' vs '6.8.0-baseline'.
  fastpath bisect run exited with status 1

**Final bisection result:**

.. code-block:: text

  a1b2c3d4e5f6 is the first bad commit
  commit a1b2c3d4e5f6
  Author: Developer Name <dev@example.com>
  Date:   Mon Nov 1 10:00:00 2025 +0000
  
      Subject line of the problematic commit

**************************
Understanding Results
**************************

Each commit is tested and classified by comparing results against good baseline:

* **GOOD (0)**: Performance matches or exceeds good baseline
* **BAD (1)**: Performance regression detected
* **SKIP (125)**: Overlapping confidence intervals
* **ERROR (128)**: Fatal error, abort bisection (environment/infrastructure failure)

**Adaptive Testing:** Tests with 1 sample initially. If inconclusive (gap <1.5× confidence
interval), collects 4 more samples (5 total) for robust classification.

**Limitations:** Single-node SUTs only.