All guides
TEMPLATE

Hypothesis & Experiment (HEEC) Template

Every Core activity in the AusIndustry portal asks the same questions. Use this template verbatim. The character limits below are enforced by the portal.

Template structure

Hypothesis (≤ 4,000 chars)

A single falsifiable statement, with measurable parameters, the variables you'll change (independent), the variables you'll measure (dependent), the variables you'll hold constant, and the intended outcome.

Pattern: "We hypothesised that [system / approach X] would [produce outcome Y] when [conditions Z], measured by [metric M] within [tolerance T]."

Technical uncertainties

What competent professionals in the field could not have known or determined in advance. Not "we hadn't tried it" - "no public knowledge predicts the answer".

New knowledge sought

The knowledge you set out to generate. State it as a question you can answer at the end.

Sources investigated (≤ 1,000 chars)

Papers, docs, benchmarks, vendor literature you reviewed before deciding the experiment was necessary. Cite specifically - "PyTorch 2.4 release notes", not "industry literature".

Experiment

What you actually built and ran. The control / baseline. The variations. The measurement apparatus. Enough detail that another engineer could rebuild it.

Evaluation

The observations against the hypothesis. Numbers, tables, before/after. Where it confirmed, where it diverged.

Conclusion

What the experiment proved (or didn't), and the next hypothesis it surfaced. Often "the simple approach doesn't work because X" is the conclusion - and that's fine.

Worked example: vector retrieval at sub-50ms p99

Hypothesis

We hypothesised that approximate-nearest-neighbour retrieval over a 12M-vector corpus could sustain <50ms p99 latency at 800 concurrent queries on a single A10G, using a hybrid HNSW + product-quantisation index, without recall dropping below 0.92@10 relative to an exact brute-force baseline. Independent variables: index type (HNSW vs IVF-PQ vs IVF-OPQ), M and ef-construction parameters, PQ subspace count. Dependent: p50/p99 latency, recall@10, memory footprint. Constants: corpus, query distribution, hardware, query batch size, OS, kernel.

Technical uncertainty

Public benchmarks cover either latency or recall in isolation, not the combined constraint on our query distribution (long-tail entity queries with 18% near-duplicates).

Experiment

Built a benchmark harness that replayed 14 days of production query logs against each index configuration. Held all environment variables constant. Ran 200k queries per configuration.

Evaluation

HNSW M=48 ef=256 hit 38ms p99 at 0.94 recall - passed. IVF-PQ with 64 subspaces hit 22ms p99 but recall collapsed to 0.81 - failed. IVF-OPQ recovered recall to 0.90 but cost 41ms p99 - borderline.

Conclusion

HNSW with the tuned parameters meets the latency-recall constraint on our distribution. Memory cost (28GB) exceeds budget, surfacing a new hypothesis on tiered storage.

Ready to file?

Stop reading. Start preparing.

RDTI Lodge applies everything in this guide automatically - from your codebase, tickets and payroll.

Start your claim free