Cloud GPU Provider Analysis — Structured Evaluation
Description
Analyze this vendor comparison and help us pick the right cloud GPU provider for our AI workloads
Story Contract
Must deliver:
- Analyze all three GPU cloud providers in the source document and produce a structured comparison across cost, training performance, inference suitability, availability, and ecosystem maturity
- Provide a single, clearly justified primary vendor recommendation with rationale tied explicitly to the workload requirements (fine-tuning 7B–70B parameter models, production inference serving) and the current ~$18K/month spend baseline
- Extract every footnote, asterisk (*), dagger (†), and double-asterisk (**) annotation from the source document and list each one in footnote_caveats with the complete verbatim caveat text and the provider and field it applies to
- Identify any provider sections that contain vendor-supplied marketing copy rather than the evaluation team's objective findings, and list each instance in marketing_flags with the exact quoted text and why it is marketing not fact
- For every data point that is missing, unverified, or untested in the source document, record it in data_gaps with the provider, the field, and the reason given in the source document
Should deliver:
- Normalize pricing to on-demand $/GPU-hour equivalents so reserved and on-demand rates are directly comparable
- Estimate projected monthly cost under the recommended configuration and calculate the delta vs. the $18K baseline
- Flag hidden costs not visible in the headline GPU-hour price (setup time, minimum commitments, storage, engineering overhead)
- Rank all three vendors to support fallback planning
- Surface contractual or SLA considerations that affect production inference suitability (uptime definition, degraded-performance exclusions, commitment non-refundability)
Acceptance criteria:
- All three providers (TensorCloud, NeuralScale, DeepCompute) appear in vendor_scorecards with no nulls in any score field
- footnote_caveats contains at least 4 entries — one for each annotated symbol (* TensorCloud pricing, † TensorCloud latency, ‡ NeuralScale vLLM claim, ** DeepCompute H200 promotional pricing) — each with verbatim caveat text
- marketing_flags contains at least one entry identifying NeuralScale's opening paragraph as vendor marketing copy with quoted text
- data_gaps contains at least 3 entries covering: NeuralScale H200 pricing ('contact sales'), DeepCompute inference latency (unverified third-party), and NeuralScale data transfer pricing (inconsistent invoices)
- recommendation.primary_vendor exactly matches a vendor_name in vendor_scorecards
- All severity values in vendor_risks are one of: low, medium, high
- Output is valid JSON conforming to the output_schema with no extra top-level keys
Output schema: { "vendor_scorecards": [{ "vendor_name", "scores": {cost_efficiency, training_performance, inference_suitability, availability, ecosystem_maturity}, "estimated_monthly_cost_usd", "strengths", "weaknesses", "overall_rank" }], "footnote_caveats": [{ "symbol", "provider", "field", "verbatim_text" }], "marketing_flags": [{ "provider", "quoted_text", "reason" }], "data_gaps": [{ "provider", "field", "reason" }], "recommendation": { "primary_vendor", "secondary_vendor", "rationale", "projected_monthly_cost_usd", "savings_vs_baseline_usd", "vendor_risks": [{ "vendor_name", "risk", "severity" }] } }