Summarized Notes Report
Prepared May 15, 2026
Executive Summary
The current notes cover two connected areas:
- AI video generation costs across Grok Imagine, Kling 3.0, Seedance 2.0, and Veo 3.1.[S1][S2][S3][S4]
- Hosting and distribution architecture options for a short-video product.[S6]
For generation, the lowest listed raw output costs are Grok Imagine at 480p and Veo 3.1 Lite at 720p, both around $0.05/sec for output. Kling 3.0 can also be relatively affordable, especially at standard quality and discounted package rates. Seedance is much more expensive per second in the captured pricing, especially at 720p and 1080p.[S1][S2][S3][S4]
For hosting, YouTube embed/distribution is not a good fit because YouTube prohibits creating a substantially similar competing service, and embedded monetization would flow to YouTube/the creator rather than the platform. Among the architecture options in the notes, only Cloudflare R2 DIY and Bunny Stream are included in the final architecture comparison, per request.[S6]
AI Video Generation Pricing Summary
Grok Imagine
Grok Imagine has simple output pricing by resolution:
| Resolution | Output cost | Video input cost | Image input cost |
|---|---|---|---|
| 480p | $0.05/sec | $0.01/sec | $0.002/image |
| 720p | $0.07/sec | $0.01/sec | $0.002/image |
Source: [S1] xAI Grok Imagine model pricing.
Main takeaway: Grok Imagine is one of the cheapest listed options for 480p and 720p output. The input pricing is also simple and low.[S1]
Veo 3.1
| Model | Resolution | Audio | Output cost |
|---|---|---|---|
| Veo 3.1 Standard | Not specified | Yes | $0.40/sec |
| Veo 3.1 Fast | 720p | Yes | $0.10/sec |
| Veo 3.1 Fast | 1080p | Yes | $0.12/sec |
| Veo 3.1 Lite | 720p | Yes | $0.05/sec |
| Veo 3.1 Lite | 1080p | Yes | $0.08/sec |
Sources: [S2] Google Vertex AI Veo pricing. Veo 3.1 Lite resolution-specific pricing is supported by the public online sources listed under S2, because it was not visible in the official Vertex AI pricing table at review time.
Main takeaway: Veo 3.1 Lite is very competitive if quality is acceptable. Standard is significantly more expensive.[S2]
Kling 3.0
Kling pricing depends on resource pack unit price and unit deductions per second.
Package unit prices:
| Package | Units | USD/unit |
|---|---|---|
| Trial Package | 100 or 1,000 | $0.098 |
| Standard Package 1 | 5,000 | $0.140 |
| Standard Package 2 | 15,000 | $0.140 |
| Standard Package 3 | 30,000 | $0.126 |
| Standard Package 4 | 45,000 | $0.126 |
Source: [S3] online Kling 3.0 pricing references.
Representative USD/sec:
| Mode | Units/sec | Trial | Standard | Discounted standard |
|---|---|---|---|---|
| Standard, no audio | 0.6 | $0.059 | $0.084 | $0.076 |
| Standard, audio/no voice | 0.9 | $0.088 | $0.126 | $0.113 |
| Pro, no audio | 0.8 | $0.078 | $0.112 | $0.101 |
| Pro, audio/no voice | 1.2 | $0.118 | $0.168 | $0.151 |
| 4K, no audio | 3.0 | $0.294 | $0.420 | $0.378 |
| 4K, audio/no voice | 3.0 | $0.294 | $0.420 | $0.378 |
Source: [S3] online Kling 3.0 pricing references. USD/sec values are calculated from the captured unit deductions and package USD/unit.
Main takeaway: Kling is cost-effective at standard and pro levels, but 4K becomes much more expensive. Audio increases cost for standard/pro modes, while 4K appears to cost the same with or without audio in the captured notes.[S3]
Seedance 2.0
Seedance pricing depends on purchased credit pack:
| Package | Credits | USD/credit |
|---|---|---|
| Starter Pack | 1,000 | $0.040 |
| Creator Pack | 3,000 | $0.030 |
| Professional Pack | 8,000 | $0.025 |
| Max Pack | 30,000 | $0.020 |
| Super Pack | 100,000 | $0.019 |
Source: [S4] Seedance 2.0 pricing page.
Base output costs without video input:
| Model | Resolution | Credits/sec | Lowest listed cost | Highest listed cost |
|---|---|---|---|---|
| Seedance 2.0 | 480p | 6 | $0.11/sec | $0.24/sec |
| Seedance 2.0 | 720p | 12 | $0.23/sec | $0.48/sec |
| Seedance 2.0 | 1080p | 30 | $0.57/sec | $1.20/sec |
| Seedance 2.0 Fast | 480p | 5 | $0.10/sec | $0.20/sec |
| Seedance 2.0 Fast | 720p | 10 | $0.19/sec | $0.40/sec |
Source: [S4] Seedance 2.0 pricing page. Per-second costs are calculated from credits/sec and displayed USD/credit package rates.
Main takeaway: Seedance is the most expensive option in the captured notes for normal 720p/1080p generation. The Fast model helps, but it is still expensive compared with Grok, Kling standard, and Veo Lite/Fast.[S1][S2][S3][S4]
Cost Comparison for 5-Second Clips
Approximate 5-second generation costs based on the notes:
| Provider/model | Configuration | Approx. 5s cost |
|---|---|---|
| Grok Imagine | 480p output | $0.25 |
| Grok Imagine | 720p output | $0.35 |
| Veo 3.1 Lite | 720p with audio | $0.25 |
| Veo 3.1 Lite | 1080p with audio | $0.40 |
| Veo 3.1 Fast | 720p with audio | $0.50 |
| Veo 3.1 Fast | 1080p with audio | $0.60 |
| Veo 3.1 Standard | With audio | $2.00 |
| Kling 3.0 | Standard, no audio, discounted pack | $0.38 |
| Kling 3.0 | Pro, no audio, discounted pack | $0.50 |
| Kling 3.0 | Standard, audio/no voice, discounted pack | $0.57 |
| Kling 3.0 | 4K, discounted pack | $1.89 |
| Seedance 2.0 Fast | 480p, Super pack | $0.48 |
| Seedance 2.0 Fast | 720p, Super pack | $0.95 |
| Seedance 2.0 | 720p, Super pack | $1.14 |
| Seedance 2.0 | 1080p, Super pack | $2.85 |
Sources: [S1] xAI Grok Imagine pricing, [S2] Google/Veo pricing sources, [S3] online Kling 3.0 pricing references, and [S4] Seedance 2.0 pricing page. Grok and Veo values are per-second prices multiplied by 5 seconds; Kling and Seedance values come from the cited pricing tables and the report calculations.
Service Availability Notes
What we are looking for is a service that already has a library of meaningful AI-generated short videos and shorts, such as drama, funny, story, romance, thriller, or other entertainment clips, that can be used through an API or commercial service for distribution inside our own TikTok-like application.
There does not currently appear to be a ready-made service that provides this exact thing: a usable API or service with a catalog of meaningful AI-generated short drama/funny/story videos suitable for direct distribution in our own app.[S5]
Observed service notes:
- Shaike appears to be more of a watching/community platform. It hosts its own and third-party videos, posts videos on YouTube, and may offer services to hire them for video creation.[S5]
- AIFlixHub is focused on creating and publishing on its own platform, with no noted API for distribution.[S5]
- GenFlix appears similar: creation/publishing on its own platform rather than a usable distribution API.[S5]
- Stock footage exists, but it is generally not meaningful short-form story content and does not match the intended use case.[S5]
Main takeaway: a product in this space likely needs either original generation, creator partnerships, or an internal content pipeline. There is no obvious plug-and-play catalog/API from the captured notes.[S5]
Hosting And Distribution Notes
YouTube
YouTube is a poor fit for this product direction.
Reasons:
- YouTube has policies against creating a substantially similar competing service.[S6]
- YouTube Shorts makes this especially risky for a short-video product.[S6]
- With embedded videos, ad revenue goes to YouTube and/or the creator, not to the new platform.[S6]
- Using YouTube would limit control over monetization, playback experience, recommendations, and platform identity.[S6]
1 Million View Delivery Example
Assumption from the notes:
- 1,000,000 full views.[S6]
- Each view watches a 3-minute video.[S6]
- Each full view transfers about 30 MB.[S6]
- Total transfer is about 30,000 GB, or 30 TB.[S6]
Relevant costs from the notes:
| Option | Estimated delivery/storage cost for example | Notes |
|---|---|---|
| Cloudflare R2 DIY[S6c] | About $0-$18 | Cheapest, but requires custom video pipeline. |
| Bunny Stream[S6d] | About $150 | More expensive than R2, but includes video-platform features. |
Sources: [S6c] Cloudflare R2 pricing and [S6d] Bunny Stream pricing. The example assumes 30 TB delivered from 1,000,000 full views of one 3-minute, 30 MB video.
Other noted options, excluded from the final architecture section by request:
Architecture Example
The application is a TikTok-like short-video feed, but the clips are AI-generated drama, comedy, story, or other entertainment videos instead of user-uploaded phone videos.
At a high level, the workflow would look like this:
Idea / prompt
-> script/story generation
-> image or reference asset generation
-> video generation
-> quality check / moderation
-> hosting and processing
-> feed delivery
-> user interaction
-> analytics, feedback, and error reporting
-> improve prompts, models, ranking, and retry flows
1. Content Planning
The system needs a way to decide what clips should be made.
This can come from:
- Admin-created prompts.
- AI-generated story ideas.
- Trending topics or formats.
- User requests.
- A content calendar.
For drama/funny clips, this stage matters a lot because the model is not only generating visuals. It needs a clear scene, pacing, joke, conflict, reveal, or short story arc.
Typical output:
- Clip concept.
- Short script.
- Characters.
- Style.
- Target duration.
- Target resolution.
- Safety/moderation constraints.
2. Image Or Reference Asset Generation
Some video models can start from text only, but many workflows benefit from generated reference images.
This step may create:
- Character images.
- Scene backgrounds.
- Keyframes.
- Style references.
- Start/end frames for image-to-video generation.
This helps keep clips more consistent, especially if the product wants recurring characters, serial drama, or recognizable comedic formats.
3. Video Generation
The selected video model generates the actual clip.
Possible providers from the notes:
The app may choose models based on:
- Cost.
- Quality.
- Speed.
- Resolution.
- Audio support.
- Style fit.
- Availability/API reliability.
The output of this stage is a raw generated video file, plus metadata about the prompt, provider, model, cost, duration, status, and any generation errors.
4. Quality Check And Moderation
Before publishing, the clip should pass automated and/or human checks.
Checks can include:
- Did generation succeed?
- Is the video corrupted or blank?
- Is the duration correct?
- Is the resolution acceptable?
- Does it contain unsafe or prohibited content?
- Is the story coherent enough to publish?
- Are faces, text, hands, voices, or motion artifacts too broken?
Failed clips can be:
- Retried with the same prompt.
- Regenerated with a safer or clearer prompt.
- Sent to manual review.
- Discarded.
5. Hosting And Processing
After approval, the video is uploaded to the hosting layer.
With Cloudflare R2 DIY:
- Store the raw/generated video in R2.[S6]
- Run custom encoding/transcoding.[S6]
- Generate HLS or MP4 playback assets.[S6]
- Generate thumbnails.[S6]
- Save playback URLs and metadata.
- Handle cache behavior and access control.
With Bunny Stream:
- Upload the approved video to Bunny Stream.[S6]
- Bunny handles video processing and stream-ready delivery.[S6]
- Store Bunny video IDs, playback URLs, thumbnails, and status in the app database.
6. Feed Delivery
The app backend decides which clips users see.
The feed system can use:
- Freshness.
- Completion rate.
- Likes.
- Shares.
- Rewatches.
- Skips.
- Reports.
- User interests.
- Language/region.
- Content category, such as drama, funny, romance, thriller, or absurd comedy.
The frontend receives a list of video metadata and playback URLs, then plays clips in a vertical swipe feed.
7. User Interaction
Users interact with clips through familiar TikTok-like actions:
- Watch.
- Swipe.
- Like.
- Comment.
- Share.
- Save.
- Follow a series/character/channel.
- Report.
- Request more like this.
- Request less like this.
These interactions become feedback signals for ranking and content generation.
8. Analytics, Feedback, And Error Reporting
The system needs to track both user behavior and technical playback health.
User/content analytics:
- Views.
- Watch time.
- Completion rate.
- Rewatch rate.
- Like rate.
- Share rate.
- Comment rate.
- Report rate.
- Skip timing.
Technical analytics:
- Playback failures.
- Buffering.
- Load time.
- CDN errors.
- Encoding failures.
- Generation failures.
- Moderation failures.
- Provider API errors.
These signals should feed back into the system:
- Better prompt templates.
- Better provider/model selection.
- Retry rules.
- Feed ranking changes.
- Content category decisions.
- Cost optimization.
- Moderation tuning.
9. Simplified System View
Admin / AI content planner
-> Prompt and script service
-> Image/reference generation
-> Video generation provider
-> Moderation and quality checks
-> Video hosting layer
-> Cloudflare R2 DIY, or
-> Bunny Stream
-> App database
-> Feed/ranking API
-> Mobile/web client
-> Analytics + error reporting
-> Prompt/model/feed improvements
The most important architectural point is that generation and delivery are separate systems. The AI provider creates the video, but the product still needs its own publishing, hosting, feed ranking, analytics, reporting, moderation, and cost-control layers.
Build Time Estimation
These estimates assume a small experienced team building an MVP: roughly 1 frontend/mobile developer, 1 backend developer, and some product/design/QA support. A solo developer could still build it, but the calendar time would likely be longer.
Basis: implementation estimate derived from the MVP scope and externally sourced hosting constraints, especially the Cloudflare R2 DIY vs Bunny Stream capability/cost difference in [S6].
The estimates also assume the first version is a focused MVP, not a fully mature TikTok-scale system.
MVP Scope Assumption
The MVP includes:
- Vertical short-video feed.
- AI-generated clips uploaded into the system.
- Basic content categories such as drama, funny, romance, thriller, or similar.
- Like/save/share/report actions.
- Basic admin/content dashboard.
- AI generation workflow.
- Video hosting and playback.
- Basic moderation/quality checks.
- Basic analytics and error tracking.
- Simple recommendation/feed ranking.
It does not include advanced creator tools, livestreaming, complex social graph features, mature recommendation AI, full creator monetization, or large-scale human moderation operations.
Estimated Timeline By Area
| Area | Bunny Stream estimate | Cloudflare R2 DIY estimate | Notes |
|---|---|---|---|
| Product/design planning | 1-2 weeks | 1-2 weeks | Define feed UX, content types, admin workflow, and moderation rules. |
| Frontend/mobile app | 4-8 weeks | 4-8 weeks | Vertical swipe feed, video player, interactions, auth, profile/basic settings, reporting. |
| Backend/API | 4-7 weeks | 5-8 weeks | Users, feed API, metadata, interactions, admin APIs, permissions, jobs. |
| AI generation pipeline | 3-6 weeks | 3-6 weeks | Prompt/script flow, provider integration, job queue, retries, cost tracking, status handling. |
| Video hosting integration | 1-3 weeks | 5-10 weeks | Bunny is much faster because it handles processing. R2 needs custom encoding and playback assets. |
| Moderation/quality checks | 2-4 weeks | 2-4 weeks | Automated checks, manual review queue, report handling, takedown flow. |
| Feed ranking/recommendation MVP | 2-4 weeks | 2-4 weeks | Start with rules: freshness, completion, likes, reports, categories, user preferences. |
| Analytics/error reporting | 2-4 weeks | 3-5 weeks | User analytics, playback errors, generation failures, provider errors, cost dashboards. |
| Admin/content dashboard | 2-4 weeks | 2-4 weeks | Review generated clips, approve/reject, inspect errors, manage categories. |
| QA, polish, launch prep | 2-4 weeks | 3-5 weeks | Device testing, playback testing, moderation tests, load checks, bug fixing. |
Total MVP Estimate
| Architecture | Estimated build time | Difficulty | Main reason |
|---|---|---|---|
| Bunny Stream | 10-16 weeks | Medium | Video processing and delivery are mostly handled by Bunny. |
| Cloudflare R2 DIY | 16-26 weeks | Hard | The team must build and operate the video processing pipeline. |
These ranges assume work happens in parallel. If one person is building everything sequentially, the timeline is harder to estimate reliably because it depends heavily on the developer's experience with mobile video feeds, backend systems, AI provider integrations, and video infrastructure.
| Architecture | Solo/mostly solo estimate |
|---|---|
| Bunny Stream | Undetermined; likely significantly longer than the team estimate |
| Cloudflare R2 DIY | Undetermined; likely significantly longer and higher risk than the team estimate |
Frontend Estimate
Frontend/mobile work is likely 4-8 weeks for an MVP.
Main pieces:
- Vertical swipe video feed.
- Smooth video preloading.
- Like/comment/share/save/report controls.
- Category or interest selection.
- Basic profile/account screens.
- Login/signup.
- Empty/loading/error states.
- Basic admin dashboard if web-based.
Main hurdles:
- Video feed performance.
- Smooth autoplay and preloading.
- Avoiding high data usage.
- Handling playback failures cleanly.
- Making the app feel polished enough for entertainment content.
Backend Estimate
Backend work is likely 4-8 weeks for a Bunny-based MVP and 5-10+ weeks for Cloudflare R2 DIY.
Main pieces:
- User accounts.
- Video metadata.
- Feed API.
- Like/comment/share/save/report APIs.
- Admin review APIs.
- AI generation jobs.
- Provider API integrations.
- Retry/error handling.
- Cost tracking.
- Moderation states.
- Analytics events.
Main hurdles:
- Coordinating async jobs: generation, moderation, upload, encoding, publishing.
- Keeping state clean when providers fail or return bad output.
- Tracking per-clip cost and provider reliability.
- Making sure unpublished/failed/rejected videos never leak into the public feed.
AI Pipeline Estimate
AI generation workflow is likely 3-6 weeks for the first usable version.
Main pieces:
- Prompt templates.
- Script/story generation.
- Optional image/reference generation.
- Video generation provider integration.
- Job queue.
- Retries.
- Error classification.
- Cost logging.
- Quality status.
Main hurdles:
- Generating clips that are actually entertaining, not just technically valid.
- Keeping characters/scenes consistent.
- Handling provider limits, failures, and slow generation.
- Preventing cost runaway from retries or bad prompts.
Hosting Estimate Difference
Bunny Stream:
- Estimated integration: 1-3 weeks.
- Upload approved videos.
- Wait for processing.
- Store playback URLs/statuses.
- Use Bunny playback in the app.
- Test playback and errors.
Cloudflare R2 DIY:
- Estimated integration: 5-10 weeks.
- Store files in R2.[S6]
- Build encoding/transcoding jobs.[S6]
- Generate HLS/MP4 playback assets.[S6]
- Generate thumbnails.[S6]
- Manage cache behavior.
- Build playback URL logic.
- Test browser/mobile compatibility.
- Build more custom analytics around playback.
This is the largest build-time difference between the two architectures.
Biggest Timeline Risks
- Video generation quality is not good enough for repeat viewing.
- Provider APIs are slow, unstable, expensive, or limited.
- The product needs stronger moderation than expected.
- Feed performance is poor on mobile.
- Cloudflare R2 DIY takes longer because video processing has many edge cases.
- Analytics are not detailed enough to understand why users skip or leave.
- Content supply becomes a bottleneck if generation is too slow or too expensive.
Practical Build Recommendation
For the first MVP, Bunny Stream is the faster path. It likely reduces the build by 1-3 months compared with Cloudflare R2 DIY because the team does not need to build encoding, adaptive streaming, and playback infrastructure from scratch.[S6]
Cloudflare R2 DIY is better treated as a later cost-optimization project once the product has proven users, retention, and content quality.
Architecture Summary: Cloudflare R2 DIY vs Bunny Stream
Only Cloudflare and Bunny Stream are included here, per request.
Option 1: Cloudflare R2 DIY Video Pipeline
Difficulty: Hard.
Cost profile: Very attractive. For the 1 million full-view example, expected cost is about $0-$18 depending on request patterns. Serving a single MP4 may remain near the free tier, while HLS chunking could create 45-60 requests per view and add about $13-$18.[S6]
What needs to be built:
- Upload flow for generated videos.
- Storage in Cloudflare R2.
- Encoding/transcoding pipeline.
- HLS generation if adaptive streaming is needed.
- Thumbnail generation.
- Metadata management.
- Playback URLs and access control.
- Player behavior and compatibility testing.
- Caching strategy.
- Moderation and takedown workflow.
- Analytics for views, completion, bandwidth, and failures.
Main hurdles:
- Encoding pipeline: R2 is storage, not a video platform. The app needs a separate process to convert raw videos into playback-ready files.
- Adaptive playback: If the product needs smooth playback across network conditions, HLS/DASH packaging and multiple renditions must be generated and maintained.
- Operational complexity: Failed encodes, retries, queues, storage lifecycle, and cleanup need to be handled.
- Player quality: The team must validate browser/device playback, seeking, buffering behavior, subtitles/audio handling, and mobile edge cases.
- Analytics: Video-specific metrics are not automatically provided at the same level as a managed video platform.
- Abuse and moderation: A public video product needs workflows for copyright, illegal content, takedowns, and content review.
Blockers:
- No built-in encoding/transcoding.[S6]
- No built-in video player platform.[S6]
- No built-in adaptive streaming workflow.[S6]
- No built-in video analytics comparable to managed platforms.[S6]
- Requires engineering time before launch.[S6]
Best fit:
Cloudflare R2 DIY is best if cost is the highest priority and the team can afford to build and operate a custom media pipeline. It becomes more attractive at scale, but only if the engineering complexity is acceptable.
Option 2: Bunny Stream
Difficulty: Easy to Medium.
Cost profile: More expensive than Cloudflare R2 DIY, but still relatively low. For the 1 million full-view example, the expected delivery cost is about $150 on the Volume network.[S6]
What Bunny Stream provides:
- Video upload handling.[S6]
- Encoding/transcoding.[S6]
- Stream-ready delivery.[S6]
- Player-oriented video infrastructure.[S6]
- CDN-backed delivery.[S6]
- Simpler operational model than building on object storage directly.[S6]
Main hurdles:
- Higher delivery cost than Cloudflare R2 DIY.
- Less control over the low-level media pipeline.
- Platform dependency/vendor lock-in.
- Need to confirm API behavior, limits, encoding presets, quality controls, regions, and pricing details before committing.
- Need to validate that playback quality and latency match the expected user experience.
Blockers:
- Cost is not as low as R2 DIY at large scale.
- Architecture depends on Bunny's video platform capabilities and pricing staying acceptable.
- Custom video processing needs may be constrained by Bunny's supported workflow.
- Migration away later could require reprocessing or moving a large media library.
Best fit:
Bunny Stream is best if speed to launch matters more than achieving the absolute lowest delivery cost. It removes much of the early video infrastructure work and lets the product focus on content, feed logic, UX, moderation, and monetization.
Recommendation
For an early product, Bunny Stream is the easier architecture because it removes the largest video infrastructure blockers. It is more expensive than Cloudflare R2 DIY, but the difference in the example is about $150 versus $0-$18 per 1 million full views, which may be worth it if it saves weeks of engineering work and reduces launch risk.[S6]
Cloudflare R2 DIY is the better long-term cost-optimization path if the product proves demand and video delivery cost becomes a major margin issue. It should be treated as a later optimization unless the team already has strong video pipeline experience.
Sources
These are the external web sources used for the cited pricing, availability, policy, and hosting claims in this HTML report. Pricing and product pages can change, so the citations should be rechecked before financial commitment.
- [S1] xAI Docs: grok-imagine-video - Grok Imagine video API pricing, including 480p/720p per-second output pricing and image/video input pricing.
- [S2a] Google Cloud Vertex AI pricing: Veo; [S2b] GIGAZINE report on Veo 3.1 Lite pricing - Official Veo 3.1 and Veo 3.1 Fast pricing, plus public reporting for Veo 3.1 Lite 720p/1080p pricing.
- [S3] Kling 3.0 pricing math reference - Online Kling 3.0 Standard/Pro per-second pricing reference. The official Kling site was not accessible to automated retrieval during this update.
- [S4] Seedance 2.0 pricing - Seedance 2.0 credits-per-second table by model/resolution and credit package pricing.
- [S5a] Shaike.ai; [S5b] AIFlixHub; [S5c] GenFlix; [S5d] GenFlix FAQ - Public pages reviewed for AI-film/community/platform positioning and whether a direct catalog distribution API is presented.
- [S6a] YouTube API Services Developer Policies; [S6b] YouTube Help: ads on embedded videos; [S6c] Cloudflare R2 pricing; [S6d] Bunny Stream pricing; [S6e] Mux Video pricing - YouTube policy/embedded monetization details and hosting cost inputs for R2, Bunny Stream, and Mux.