《构建多模式未来:开放的生态系统和数据发挥作用.pdf》由会员分享,可在线阅读,更多相关《构建多模式未来:开放的生态系统和数据发挥作用.pdf(36页珍藏版)》请在三个皮匠报告上搜索。
1、Realistic bird studies,intricate psychedelic landscapeBuilding a Multimodal Future Open Ecosystems and Data at PlayPROPRIETARY&CONFIDENTIAL2Stability AI is a leader in open source generative AIPROPRIETARY&CONFIDENTIAL85%of smartphones are Linux/Android poweredThe open source ecosystem is mission cri
2、tical infrastructure80%of 80%of Fortune 100 companiesuse Kafka200182020Popularity of open source database management systems2023200212022Source:1 2 363%Open Source LicenseCommercial License52%100%of 500 fastest supercomputers use Linux/Unix90%of cloud infrastructure operates on
3、 LinuxPROPRIETARY&CONFIDENTIALClosedAI SystemsClosed versus OpenValueDataNo control,no ownershipRent a 3rd party API to an opaque model that may eventually compete against youOpen sourcefoundational ModelsYou own everythingTransparent,open source methodologies with weights you can customize and take
4、 anywhereApplicationsStart-ups4PROPRIETARY&CONFIDENTIALFinding middle ground with escrow architectures5PROPRIETARY&CONFIDENTIALAnt,macro lens photoOpen Source Advantage6Stable Suite of ModelsOwn your models(weights)Interpretability,auditability,optimization,securityIntegrate within your environment/
5、VPC Promotes equal knowledge access&economic gain Encourages innovation through global collaboration Empowers startups,fostering further innovation Enhances security via community accountability Tackles biases,misinformation,©right concerns Promotes industry standards for better interoperability
6、 Prevents blackbox systems,increasing accountabilityPROPRIETARY&CONFIDENTIAL7 7Open source flywheel effectsInstructPix2PixFollow image editing instructionsPhoto of a wolf7Stable Models are optimized on chipDreamboothTextual inversion fine-tuningLoRALow-rank adaptation for fine-tuningRiffusionReal-ti
7、me music and audio generationInvokeAILeading creative engine&WebUIControlNetNN structure to control diffusion modelsPROPRIETARY&CONFIDENTIAL8The cost&time to produce content is rapidly approaching zeroCat made entirely of grapes30.0s|2To generate a high quality image with text0.5s|0.2Price and times
8、 quoted are industry benchmarks and not meant to be specific to Stability g5.xlarge assumes 50%utilization8To generate a high quality image with text202220232.6sSeconds to generate an image on a ml.g5.xlarge*$0.0014Cost per 1x 512x512 imageToday:Stable Diffusion XL Beta on AWS SageMakerSignificant i
9、mprovements in capability and cost efficiency PROPRIETARY&CONFIDENTIALPromptAn epic cg rendered fantasy shield logo where the game icon shield logo has stylized crossed swords and text:“NATE”9Time to frustrationSD 1.5PROPRIETARY&CONFIDENTIALPromptAn epic cg rendered fantasy shield logo where the gam
10、e icon shield logo has stylized crossed swords and text:“NATE”10Time to frustrationSD 2.1PROPRIETARY&CONFIDENTIALPromptAn epic cg rendered fantasy shield logo where the game icon shield logo has stylized crossed swords and text:“NATE”11Time to frustrationDALLE 2PROPRIETARY&CONFIDENTIALPromptAn epic
11、cg rendered fantasy shield logo where the game icon shield logo has stylized crossed swords and text:“NATE”12Time to frustrationSDXL v0.9PROPRIETARY&CONFIDENTIALPromptAn epic cg rendered fantasy shield logo where the game icon shield logo has stylized crossed swords and text:“NATE”13Time to frustrat
12、ionSDXL v0.9PROPRIETARY&CONFIDENTIALEmerging multimodal use cases14PROPRIETARY&CONFIDENTIALEmerging multimodal use cases15PROPRIETARY&CONFIDENTIALEmerging multimodal use cases16PROPRIETARY&CONFIDENTIAL17Stable AudioStable ImagesStable Diffusion 2.XStable Diffusion XLDeepFloyd“IF”Stable 3DHorse in La
13、ser scanner data point cloudStableLMStableChat(Q3)Text-to-3D(Q3)Stable MusicStable Text-to-Voice(TBA)Stable LanguageStable Suite of Models,for every modalityStable VideoStable Video(Q4)StableCode(Q3)Stable CodePROPRIETARY&CONFIDENTIALSDXL v0.9 is a leap forward in AI image generation Extended functi
14、onalities:Image-to-image prompting,inpainting,outpainting 3.5B base model and6.6B model ensemble pipeline Dual CLIP models(OpenCLIP ViT-G/14 and CLIP-ViT/L)High resolution(1024x1024)and depth Over 700,000beta images generated18PROPRIETARY&CONFIDENTIAL19Data,the essential foundation for AI successPRO
15、PRIETARY&CONFIDENTIAL Collected dataset with over half-a-million examples of text-to-image prompts and preferences Scoring function,PickScore,was created from this dataset,achieving a superhuman accuracy of 70.2%in predicting user preferences PickScore shows strong correlation with human preferences
16、,outperforming the traditional FID metric in text-to-image evaluation.PickScore is more correlated with ground truth rankings,as determined by real users20Data+human in the loopGUI ScreenshotPROPRIETARY&CONFIDENTIALContinual sampling of human preferences21 dpm_2 Ims dpm_adaptive dpm_2_ancestral uni_
17、pc_bh2 dpmpp_sde dpmpp_2m_sde dpm_fast dpmpp_2m uni_pc dpmpp_2s_ancestral heun euler euler ancestral ddim SDXL-vx-a SDXL-vx-b SDXL-vx-c.SDXL-vx-iModelsSamplersGlowing glass white silhouette playing violinPROPRIETARY&CONFIDENTIALTraining and tuning rely on high quality data22Fine-tuningImage Variety
18、Emotion Fashion Hair Style Head Rotation Light BodyComplex subject=more imagesSimple subject=less imagesQuality QuantityTraining Stable DiffusionImage Data Quality SD 1.X:2B 12M aesthetic SD 2.X:5B(filtered)Synthetic data+human feedbackCLIP Contrastive Language-Image Pre-trainingPROPRIETARY&CONFIDEN
19、TIAL23Data challenges for video modelsData Storage Inefficient MP4 format=slow decodingData Cleaning Threshold of motion to filter Accurate clipping for fluent processingData Tagging&Variation Capture scene variation Maintain accurate context(middle frame)Hardware Constraints Large batch sizes for s
20、tability CLIP image models 30k batch sizesLush jungle backgroundUrban town scenePROPRIETARY&CONFIDENTIALRocket blasting into spaceThe future of creativity is already herePROPRIETARY&CONFIDENTIAL26The next generation of film,tv,animation&music,will be redefined by Generative AIDynamic&Interactive Con
21、tent Personalized to Consumers Real-time adaptation of movies&shows with characters,scenes and whole storylines generated on-the-fly.Seamless dubbing and content translation to facilitate global accessibility and engagement.Hyper-personalized music&voice mixing to create the perfect composition.Mass
22、ive Leverage to Creators who have the Best Visions Large-scale ideation enabled across all building blocks of movie/show development.Democratized access to easy-to-use tooling to go from concept to high-quality content.Novel music and voices generated via a combination of text and existing tooling.2
23、6“Create a Jumanji themed world for my 5 years old son”PROPRIETARY&CONFIDENTIALThree“types of open”27Freely available base modelsOpen ModelsWeight files are similar to codecsOpen,auditable variantsServing an optimised infrastructure and application stack for our customers to use our models.Technolog
24、y pillars in a grassy fieldModel inference and deploymentOpen Source InterfacesFOSS developer communityBuilds the ecosystemData access is a challengeOpen DataEssential for interpretability Future:national and synthetic datasetsOur Vision:Multimodal AI Models in an Open EcosystemTrainOpen Source Stab
25、le Building Blocks ReleaseAI SolutionsOpen EcosystemCraftStandardizeAcross Modalities With openness at our core28PROPRIETARY&CONFIDENTIALPROPRIETARY&CONFIDENTIALPROPRIETARY&CONFIDENTIALPROPRIETARY&CONFIDENTIALPROPRIETARY&CONFIDENTIALPROPRIETARY&CONFIDENTIALPROPRIETARY&CONFIDENTIALRocket blasting into spaceThe future of creativity is already here