《stateof.ai:2023年人工智能全景报告(英文版)(163页).pdf》由会员分享,可在线阅读,更多相关《stateof.ai:2023年人工智能全景报告(英文版)(163页).pdf(163页珍藏版)》请在三个皮匠报告上搜索。
1、State of AI ReportOctober 12,2023Nathan BenaichAir Street Capital#stateofaistateof.aiAbout the authors Introduction|Research|Industry|Politics|Safety|Predictions#stateofai|2stateof.ai 2023Nathan is the General Partner of Air Street Capital,a venture capital firm investing in AI-first technology and
2、life science companies.He founded RAAIS and London.AI(AI community for industry and research),the RAAIS Foundation(funding open-source AI projects),and Spinout.fyi(improving university spinout creation).He studied biology at Williams College and earned a PhD from Cambridge in cancer research.Nathan
3、BenaichState of AI Report 2023 team#stateofai|3 Introduction|Research|Industry|Politics|Safety|Predictionsstateof.ai 2023Othmane SebbouhVenture FellowOthmane is a Venture Fellow at Air Street Capital and ML PhD student at ENS Paris,CREST-ENSAE and CNRS.He holds an MsC in management from ESSEC Busine
4、ss School and a Master in Applied Mathematics from ENSAE and Ecole Polytechnique.Alex ChalmersPlatform LeadAlex is Platform Lead at Air Street Capital.Alex was previously Associate Director at Milltown Partners where he advised leading technology companies including AI labs.Corina GurauVenture Fello
5、wCorina is a Venture Fellow at Air Street Capital.Corina was previously an Applied Scientist at autonomous driving company,Wayve.She holds a PhD in AI from the University of Oxford.Artificial intelligence(AI)is a multidisciplinary field of science and engineering whose goal is to create intelligent
6、machines.We believe that AI will be a force multiplier on technological progress in our increasingly digital,data-driven world.This is because everything around us today,ranging from culture to consumer products,is a product of intelligence.The State of AI Report is now in its sixth year.Consider th
7、is report as a compilation of the most interesting things weve seen with a goal of triggering an informed conversation about the state of AI and its implication for the future.We consider the following key dimensions in our report:-Research:Technology breakthroughs and their capabilities.-Industry:A
8、reas of commercial application for AI and its business impact.-Politics:Regulation of AI,its economic implications and the evolving geopolitics of AI.-Safety:Identifying and mitigating catastrophic risks that highly-capable future AI systems could pose to us.-Predictions:What we believe will happen
9、in the next 12 months and a 2022 performance review to keep us honest.Produced by Nathan Benaich and Air Street Capital teamstateof.ai 2023#stateofai|4 Introduction|Research|Industry|Politics|Safety|Predictions#stateofai|5 Introduction|Research|Industry|Politics|Safety|PredictionsArtificial intellig
10、ence(AI):a broad discipline with the goal of creating intelligent machines,as opposed to the natural intelligence that is demonstrated by humans and animals.Artificial general intelligence(AGI):a term used to describe future machines that could match and then exceed the full range of human cognitive
11、 ability across all economically valuable tasks.AI Agent:an AI-powered system that can take actions in an environment.For example,an LLM that has access to a suite of tools and has to decide which one to use in order to accomplish a task that it has been prompted to do.AI Safety:a field that studies
12、 and attempts to mitigate the risks(minor to catastrophic)which future AI could pose to humanity.Computer vision(CV):the ability of a program to analyse and understand images and video.Deep learning(DL):an approach to AI inspired by how neurons in the brain recognise complex patterns in data.The“dee
13、p”refers to the many layers of neurons in todays models that help to learn rich representations of data to achieve better performance gains.Diffusion:An algorithm that iteratively denoises an artificially corrupted signal in order to generate new,high-quality outputs.In recent years it has been at t
14、he forefront of image generation.Generative AI:A family of AI systems that are capable of generating new content(e.g.text,images,audio,or 3D assets)based on prompts.Graphics Processing Unit(GPU):a semiconductor processing unit that enables a large number calculations to be computed in parallel.Histo
15、rically this was required for rendering computer graphics.Since 2012 GPUs have adapted for training DL models,which also require a large number of parallel calculations.Definitionsstateof.ai 2023(Large)Language model(LM,LLM):a model trained on vast amounts of(often)textual data to predict the next w
16、ord in a self-supervised manner.The term“LLM”is used to designate multi-billion parameter LMs,but this is a moving definition.Machine learning(ML):a subset of AI that often uses statistical techniques to give machines the ability to learn from data without being explicitly given the instructions for
17、 how to do so.This process is known as“training”a“model”using a learning“algorithm”that progressively improves model performance on a specific task.Model:a ML algorithm trained on data and used to make predictions.Natural language processing(NLP):the ability of a program to understand human language
18、 as it is spoken and written.Prompt:a user input often written in natural language that is used to instruct an LLM to generate something or take action.Reinforcement learning(RL):an area of ML in which software agents learn goal-oriented behavior by trial and error in an environment that provides re
19、wards or penalties in response to their actions(called a“policy”)towards achieving that goal.Self-supervised learning(SSL):a form of unsupervised learning,where manually labeled data is not needed.Raw data is instead modified in an automated way to create artificial labels to learn from.An example o
20、f SSL is learning to complete text by masking random words in a sentence and trying to predict the missing ones.Transformer:a model architecture at the core of most state of the art(SOTA)ML research.It is composed of multiple“attention”layers which learn which parts of the input data are the most im
21、portant for a given task.Transformers started in NLP(specifically machine translation)and subsequently were expanded into computer vision,audio,and other modalities.Definitionsstateof.ai 2023#stateofai|6 Introduction|Research|Industry|Politics|Safety|PredictionsModel type legendIn the rest of the sl
22、ides,icons in the top right corner indicate input and output modalities for the model.Input/Output types:Text:Image:Code :Software tool use(text,code generation&execution):Video:Music:3D:Robot stateDefinitionsstateof.ai 2023#stateofai|7 Introduction|Research|Industry|Politics|Safety|PredictionsModel
23、 types:LLMs +:Multimodal LLMs+:Multimodal LLMs for Robotics :Text to Code :Text to Software tool use :Text to Image :Text to Video :Text to Music :Image to 3D :Text to 3DResearch-GPT-4 lands and demonstrates a capabilities chasm between proprietary and next-best open source alternatives,while also v
24、alidating the power of reinforcement learning from human feedback.-Efforts grow to clone or beat proprietary model performance with smaller models,better datasets,longer contextpowered by LLaMa-1/2.-Its unclear how long human-generated data can sustain AI scaling trends(some estimate that data will
25、be exhausted by LLMs by 2025)and what the effects of adding synthetic data are.Videos and data locked up in enterprises are likely up next.-LLMs and diffusion models continue to offer gifts to the life science community by producing new breakthroughs for molecular biology and drug discovery.-Multimo
26、dality becomes the new frontier and excitement around agents of all flavors grows substantially.Industry-NVIDIA rips into the$1T market cap club with voracious demand for its GPUs from nation states,startups,big tech and researchers alike.-Export controls rate limit advanced chip sales to China,but
27、major chip vendors create export control-proof alternatives.-Led by ChatGPT,GenAI apps have a breakout year across image,video,coding,voice or CoPilots for everyone,driving$18B of VC and corporate investments.Politics-The world has divided into clear regulatory camps,but progress on global governanc
28、e remains slower.The largest AI labs are stepping in to fill the vacuum.-The chip wars continue unabated,with the US mobilising its allies,and the Chinese response remaining patchy.-AI is forecast to affect a series of sensitive areas,including elections and employment,but were yet to see a signific
29、ant effect.Safety-The existential risk debate has reached the mainstream for the first time and intensified significantly.-Many high-performing models are easy to jailbreak.To remedy RLHF challenges,researchers are exploring alternatives,e.g.self-alignment and pretraining with human preferences.-As
30、capabilities advance,its becoming increasingly hard to evaluate SOTA models consistently.Vibes wont suffice.Executive Summarystateof.ai 2023#stateofai|8 Introduction|Research|Industry|Politics|Safety|PredictionsScorecard:Reviewing our predictions from 2022stateof.ai 2023#stateofai|9 Introduction|Res
31、earch|Industry|Politics|Safety|PredictionsOur 2022 PredictionEvidenceA 10B parameter multimodal RL model is trained by DeepMind 10 x larger than Gato.NOSo far there has been no publicly disclosed research along these lines.NVIDIA announces a strategic relationship with an AGI focused organisation.In
32、stead of one relationship,NVIDIA has ramped its investment activities across many AGI focused organisations including Cohere,Inflection AI,and Adept.A SOTA LM is trained on 10 x more data points than Chinchilla,proving data-set scaling vs.parameter scalingYESWe dont know for sure,but GPT-4 was repor
33、tedly trained on 13T tokens vs.Chinchillas 1.4T.Metas Llama-2 was trained on 2T tokens.Generative audio tools emerge that attract over 100,000 developers by September 2023.YESBoth ElevenLabs and Resemble.ai claim over 1 million users each since launch.GAFAM invests$1B into an AGI or open source AI c
34、ompany(e.g.OpenAI).YESMicrosoft invested a further$10B into OpenAI in Jan.2023.Reality bites for semiconductor startups in the face of NVIDIAs dominance and a high profile start-up is shut down or acquired for$100M is invested in dedicated AI Alignment organisations in the next year as we become awa
35、re of the risk we are facing by letting AI capabilities run ahead of safety.YESAnthropic,an AI research and safety company,raised up to$4B in Sept 2023.A major user generated content site(e.g.Reddit)negotiates a commercial settlement with a start-up producing AI models(e.g.OpenAI)for training on the
36、ir corpus of user generated content.YESOpenAI has secured a 6-year license for access to additional Shutterstock training data(image,video and music libraries and associated metadata).#stateofai|10 Introduction|Research|Industry|Politics|Safety|Predictionsstateof.ai 2023Section 1:Research#stateofai|
37、11 Introduction|Research|Industry|Politics|Safety|Predictionsstateof.ai 2023 GPT-4 is OpenAIs latest Large Language Model.In contrast with text-only GPT-3 and follow-ups,GPT-4 is multimodal:it was trained on both text and images;it can among other capabilities generate text based on images.At 8,192
38、tokens when it was released,it had already exceeded the previous-best GPT-3.5 in possible input size.It is,of course,trained using RLHF.Equipped with these advances,GPT-4 is,as of the release of this report,the uncontested most generally capable AI model.stateof.ai 2023#stateofai|12GPT-4 is out and
39、it crushes every other LLM,and many humans Introduction|Research|Industry|Politics|Safety|Predictions OpenAI did a comprehensive evaluation of GPT-4 not only on classical NLP benchmarks,but also on exams designed to evaluate humans(e.g.Bar exam,GRE,Leetcode).GPT-4 is the best model across the board.
40、It solves some tasks that GPT-3.5 was unable to,like the Uniform Bar Exam where GPT-4 scores 90%compared to 10%for GPT-3.5.On most tasks,the added vision component had only a minor impact,but it helped tremendously on others.OpenAI reports that although GPT-4 still suffers from hallucinations,it is
41、factually correct 40%more often than the previous-best ChatGPT model on an adversarial truthfulness dataset(generated to fool AI models).In last years Safety section(Slide 100),we highlighted how Reinforcement Learning from Human Feedback(RLHF)used in InstructGPT helped make OpenAIs models safer and
42、 more helpful for users.Despite a few hiccups,ChatGPTs success proved the techniques viability at a massive scale.stateof.ai 2023“RLHF involves humans ranking language model outputs sampled for a given input,using these rankings to learn a reward model of human preferences,and then using this as a r
43、eward signal to finetune the language model with using RL.”In its modern form,it dates back to 2017,when OpenAI and DeepMind researchers applied it to incorporate human feedback in training agents on Atari games and to other RL applications.RLHF is now central to the success of state of the art LLMs
44、,especially those designed for chat applications.These include Anthropics Claude,Googles Bard,Metas LLaMa-2-chat,and,of course,OpenAIs ChatGPT.RLHF requires hiring humans to evaluate and rank model outputs,and then models their preferences.This makes this technique hard,expensive,and biased.This mot
45、ivated researchers to look for alternatives.#stateofai|13Fueled by ChatGPTs success,RLHF becomes MVP Introduction|Research|Industry|Politics|Safety|PredictionsTypical steps of RLHF,which follow an initial step of supervised fine-tuning of a pre-trained language model,e.g.GPT-3.We will cover other is
46、sues of RLHF in the Safety section.By using model size as a proxy for quality,the authors argue that more attention should be paid to better pre-training rather than fine-tuning on more imitation data.In the near future,RLHF seems here to stay.After careful ablation studies,Meta researchers conclude
47、d in their LLaMa-2 paper:“We posit that the superior writing abilities of LLMs,as manifested in surpassing human annotators in certain tasks,are fundamentally driven by RLHF”.The researchers examine a range of pretrained LLMs of different sizes and pre-trained on a varying amount of data.They show t
48、hat at a fixed model size,using more imitation data actually hurts the quality of the output.In turn,larger models benefit from using imitation data.Berkeley researchers show that fine-tuning small LLMs on the outputs of larger,more capable LLMs results in models which are stylistically impressive b
49、ut which often produce inaccurate text.stateof.ai 2023#stateofai|14The false promise of imitating proprietary LLMs,or how RLHF is still king Introduction|Research|Industry|Politics|Safety|Predictions In the wake of ChatGPT,many labs set out to answer the question:Can we create models as capable and
50、safe as OpenAIs LLMs,but that drastically reduce human supervision?stateof.ai 2023 Anthropic proposed RL from AI feedback,which we cover in the safety section.Other approaches entirely do a way with reinforcement learning.In Less is More for Alignment(LIMA),Meta argues for using a few(1,000 in their
51、 paper)very carefully curated prompts and responses.According to human evaluations of model outputs,LIMA is competitive with GPT-4 in 43%of cases.In LLMs can self-improve,Google researchers showed that LLMs can improve by training on their own outputs.In a similar vein,Self-Instruct is a framework i
52、n which a model generates its own instructions,input and output samples,and curates them to finetune its parameters.Yet another work in this direction is Metas Self-Alignment with Instruction Backtranslation.Stanford researchers used this last approach to generate instructions and outputs using GPT-
53、3.5 and fine-tune Metas LLaMa-7B.#stateofai|15Even so,researchers rush to find scalable alternatives to RLHF Introduction|Research|Industry|Politics|Safety|Predictions OpenAI published a technical report on GPT-4 where it didnt disclose any useful information for AI researchers,signalling the defini
54、tive industrialization of AI research.Googles PaLM-2 technical report suffered the same fate,while(OpenAI spinoff)Anthropic didnt bother releasing a technical report for its Claude models.stateof.ai 2023“Given both the competitive landscape and the safety implications of large-scale models like GPT-
55、4,this report contains no further details about the architecture(including model size),hardware,training compute,dataset construction,training method,or similar”,OpenAI writes in the GPT-4 technical report published on arXiv.When Google released PaLM 2,its most capable LLM,the company wrote in the t
56、echnical report:“Further details of model size and architecture are withheld from external publication.”As the economic stakes and the safety concerns are getting higher(you can choose what to believe),traditionally open companies have embraced a culture of opacity about their most cutting edge rese
57、arch.#stateofai|16The GPT-4 technical report puts the nail in the coffin of SOTA LLM research Introduction|Research|Industry|Politics|Safety|Predictions In February 23,Meta released a series of models called LLaMa.At their release,they stood out as being the most capable models trained exclusively o
58、n publicly available datasets.Meta initially granted access to the LLaMa model weights on demand only to researchers,but the weights were quickly leaked and published online.stateof.ai 2023 The LLaMa-1 models use regular transformers,with slight changes to the architecture.The authors also made a fe
59、w changes to the optimizer and to the implementation of attention.As a result,“when training a 65B-parameter model,their code processes around 380 tokens/sec/GPU on 2048 A100 GPU with 80GB of RAM.This means that training over their dataset containing 1.4T tokens takes approximately 21 days.”The LLaM
60、a-1 models outperform GPT-3(the original one,not the InstructGPT variants)and are competitive with DeepMinds Chinchilla and Googles PaLM.LLaMa-1 didnt allow commercial use,prompting heavy criticism around the term“open-source”that Meta used to describe the model release.But a second LLaMa iteration
61、appeased most of the open source community.#stateofai|17unless LLaMas reverse the trend Introduction|Research|Industry|Politics|Safety|Predictions After Meta released LLaMa-1,other institutions joined the movement to release the weights of relatively large language models.A few of them stand out,lik
62、e MosaicMLs MPT-30B,TII UAEs Falcon-40B,Togethers RedPajama,or Eleuthers Pythia.Meanwhile another dynamic was taking place,where the open-source community fine-tuned the smallest versions of LLaMa on specialized datasets and applied them to dozens of downstream applications.Mistral AIs 7B model also
63、 recently emerged as the strongest small model.stateof.ai 2023 Notably,RedPajama aimed to exactly replicate LLaMa-1 to make it fully open-source.Falcon 40B came from a new entrant in the LLM sweepstakes,TII UAE,and was quickly made open-source.Falcon-180B was later released,but was notably trained o
64、n very little code,and not tested on coding.Helped with parameter-efficient fine-tuning methods like LoRa(Low-rank adaptation of LLMs initially by Microsoft),LM practitioners started fine-tuning these pre-trained LLMs for specific applications like(of course)chat.One example is LMSyss Vicuna which i
65、s LLaMa fine-tuned on user-shared conversations with ChatGPT.#stateofai|18LLaMa sets off a race of open(ish)competitive Large Language Models Introduction|Research|Industry|Politics|Safety|Predictions In July 23,the LLaMa-2 series of models was released,giving(almost)everyone the right for commercia
66、l use.The base LLaMa-2 model is almost identical to LLaMa-1 but further fine-tuned using instruction tuning and RLHF and optimized for dialogue applications.In September 2023,Llama-2 as had almost 32M downloads.stateof.ai 2023 The pre-training corpus for LLaMa-2 has 2 trillion tokens(40%increase).Fo
67、r supervised fine-tuning,the researchers tried publicly available data,but what was most helpful was using a few(24,540)high-quality vendor-based annotations.For RLHF,they use binary comparison and split the RLHF process into prompts and answers designed to be helpful to the user and others designed
68、 to be safe.LLaMa-2 70B is competitive with ChatGPT on most tasks except for coding,where it significantly lags behind it.But CodeLLaMa,a fine-tuned version for code beats all non-GPT4 models(more on this later).Per Meta terms,anyone(with enough hardware to run the models)can use the LLaMa-2 models,
69、as long as their commercial application didnt have more than 700M users at the time of LLaMa-2s release.#stateofai|19LLaMa-2:the most generally capable and publicly accessible LLM?Introduction|Research|Industry|Politics|Safety|PredictionsHuman evaluation of LLaMa-2 helpfulness vs.other open source m
70、odels#stateofai|20 Introduction|Research|Industry|Politics|Safety|Predictionsstateof.ai 2023GPT and LLaMAs win the popularity contestChatGPT has the highest number of mentions on X(5430 times),followed by GPT-4 and LLaMA.While proprietary,closed-source models get the most attention,theres an increas
71、e in interest in LLMs that are open-source and allow commercial use.ChatGPTGPT-4LLaMALLaMA 2#stateofai|21 Introduction|Research|Industry|Politics|Safety|Predictionsstateof.ai 2023Trending topicsRLHF/Instruction-tuning emerges as the most trending topic since the end of 2022.Scaling laws that researc
72、hers developed for all types of ML models generally predict a smooth decrease in a models loss as a function of its parameter count and number of training tokens.In contrast,it has often been observed that some of the models capabilities actually emerge unpredictably when a given(unpredictable)scale
73、 is surpassed.Some call this observation into question:Emergent capabilities might be merely artifacts of researchers choice of evaluation metrics.Others are not convinced and offer counterarguments to the points below.stateof.ai 2023 Stanford researchers found that emergent abilities appeared only
74、under metrics that nonlinearly or discontinuously scale the models per-token error rate.For example,92%of reported emergent abilities on BIG-Bench(a comprehensive LLM benchmark)appeared under one of two discontinuous metrics.They test their hypotheses on new models and confirm that replacing nonline
75、ar or discontinuous metrics with linear or continuous proxies results in continuous improvements,rather than emerging capabilities.#stateofai|22Are emergent capabilities of language models a mirage?Introduction|Research|Industry|Politics|Safety|Predictions The AI community has extensively verified t
76、hat when models are trained correctly,their parameter count is a proxy for their capabilities.But these capabilities are sometimes constrained by the size of input that language models can process.Context length has consequently been an increasingly important theme of research.stateof.ai 2023 One of
77、 the most alluring promises of LLMs is their few-shot capabilities,i.e.the ability of an LLM to answer a request on a given input without further training on the users specific use case.But thats hindered by a limited context length due to the resulting compute and memory bottleneck.Several innovati
78、ons have been used to increase the context length of LLMs.Some fundamentally make the memory footprint of attention smaller(FlashAttention).Others enable models to train on small contexts but run inference on larger ones(ALiBi)this is called length extrapolation at the price of minimal finetuning an
79、d removing positional encodings.Other techniques worth looking into include RoPE and Positional Interpolation.Among long-context LLMs:Anthropics Claude with 100K,OpenAIs GPT-4 with 32K,MosaicML MPT-7B with 65K+,LMSyss LongChat with 16K.But is context all you need?#stateofai|23Context length is the n
80、ew parameter count Introduction|Research|Industry|Politics|Safety|Predictions The race to the highest context length relies on the hypothesis that a larger context length will result in improved performance for downstream tasks.Research from Samaya.ai,UC Berkeley,Stanford,and LMSYS.org calls this hy
81、pothesis into question:When input length is long,even the best available language models can fail on some multi-document question answering and key-value retrieval tasks.stateof.ai 2023 The researchers found that the models performance was better when the relevant information for the task occurred i
82、n the beginning or in the end of the input,with a more or less dramatic dip in the middle depending on the model.They also found that model performance decreased as input length increased.#stateofai|24 Introduction|Research|Industry|Politics|Safety|PredictionsLost in the Middle:long contexts(mostly)
83、dont live up to the expectations The researchers examined the performance of open models MPT-30B-Instruct(8K-token length)and LongChat-13B(16K),and closed ones gpt-3.5(16K)Claude 1.3(8K)and Claude 1.3-100K.They found that proprietary models struggled less than open ones.stateof.ai 2023 FlashAttentio
84、n introduces a significant memory saving by making attention linear instead of quadratic in sequence length.FlashAttention-2 further improves computing the attention matrix by having fewer non-matmul FLOPS,better parallelism and better work partitioning.The result is a 2.8x training speedup of GPT-s
85、tyle models.Reducing the number of bits in the parameters reduces both the memory footprint and the latency of LLMs.The case for 4-bit precision:k-bit Inference Scaling Laws shows across a variety of LLMs that 4-bit quantisation is universally optimal for maximizing zero-shot accuracy and reducing t
86、he number of bits used.Speculative decoding enables decoding multiple tokens in parallel through multiple model heads rather than forward passes,speeding up inference by 2-3X for certain models.SWARM Parallelism is a training algorithm designed for poorly connected and unreliable devices.It enables
87、training billion-scale LLMs on low bandwidth networks and low-power GPUs while achieving high training throughput.#stateofai|25Keeping up with high memory demands Introduction|Research|Industry|Politics|Safety|Predictions Increased context length and large datasets require architectural innovations.
88、In a still largely exploratory work,Microsoft researchers showed that when small language models(SLMs)are trained with very specialized and curated datasets,they can rival models which are 50 x larger.They also find that these models neurons are more interpretable.stateof.ai 2023 One hypothesis for
89、why small models often arent as good as large ones,even on narrow tasks,is that they are“overwhelmed”when trained on very large,uncurated datasets.#stateofai|26Can small(with good data)rival big?Introduction|Research|Industry|Politics|Safety|Predictions Assisted by GPT-3.5 and GPT-4,researchers gene
90、rated TinyStories,a synthetic dataset of very simple short stories but that capture English grammar and general reasoning rules.They then trained SLMs on TinyStories and showed that GPT-4(which was used as an evaluation tool)preferred stories generated by a 28M SLM to those generated by GPT-XL 1.5B.
91、In another work from the same group,the researchers selected a dataset of 7B tokens comprised of high-quality code and synthetic GPT-3.5-generated textbooks and exercises.They then trained several SLMs on this dataset,including the 1.3B parameters phi-1,which they claim is the only sub-10B parameter
92、 model to achieve 50%on HumanEval.They have since published the improved phi-1.5 version.In 2022,we predicted:“A SOTA LM is trained on 10 x more data points than Chinchilla,proving data-set scaling vs.parameter scaling”.Although OpenAI didnt confirm and we probably wont know anytime soon a sort of c
93、onsensus seems to be reached among experts about leaked information on the model size,architecture,and the dollar cost of GPT-4.GPT-4 was reportedly trained on 13 trillion tokens,9.3x more tokens than Chinchilla.stateof.ai 2023 The tiny corp founder George Hotz presented the most plausible rumour:“S
94、am Altman wont tell you that GPT-4 has 220B parameters and is a 16-way mixture model with 8 sets of weights”,and Soumith Chintala,PyTorch co-founder,confirmed.Neither the total size of the model nor using a Mixture of Experts model is unheard of.If the rumours are to be believed,no fundamental innov
95、ation underpins GPT-4s success.#stateofai|272022 Prediction:language models trained on huge amounts of data Introduction|Research|Industry|Politics|Safety|Predictionsmemetruth?Assuming current data consumption and production rates will hold,research from Epoch AI predicts that“we will have exhausted
96、 the stock of low-quality language data by 2030 to 2050,high-quality language data before 2026,and vision data by 2030 to 2060.”Notable innovations that might challenge the hypotheses in the article are speech recognition systems like OpenAIs Whisper that could make all audio data available for LLMs
97、,as well as new OCR models like Metas Nougat.It is rumored that plenty of transcribed audio data has already been made available to GPT-4.stateof.ai 2023#stateofai|28Are we running out of human-generated data?Introduction|Research|Industry|Politics|Safety|Predictions Another perspective that improvi
98、ng generating models open is expanding the pool of available training data via AI-generated content.Were nowhere near a definitive answer:Synthetic data is becoming more helpful,but there is still evidence showing that in some cases generated data makes models forget.stateof.ai 2023 Despite the seem
99、ingly infinitely proprietary and publicly available data,the largest models are actually running out of data to train on,and testing the limits of scaling laws.One way to alleviate this problem(which has been extensively explored in the past)is to train on AI-generated data,whose volume is only boun
100、ded by compute.#stateofai|29Breaking the data ceiling:AI-generated content Introduction|Research|Industry|Politics|Safety|Predictions Researchers from Google fine-tune the Imagen text-to-image model for class-conditional ImageNet,then generated one to 12 synthetic versions of ImageNet on which they
101、trained their models(in addition to the original ImageNet).They showed that increasing the size of the synthetic dataset monotonically improved the models accuracy.Other researchers showed that the compounding errors from training on synthetic text online may result in model collapse,“where generate
102、d data end up polluting the training set of the next generation of models”.The way forward might be carefully-controlled data-augmentation(so as usual).As text and image generative models become ever more capable,the longstanding problem of identifying what is AI generated and whether it comes from
103、a copyrighted source becomes increasingly harder to solve.stateof.ai 2023 Research from the University of Maryland proposes a new technique for watermarking proprietary language model output,i.e.“inserting a hidden pattern in text that is imperceptible to humans,while making the text algorithmically
104、 identifiable as synthetic.”The idea is to choose a few tokens at random,and increase the probability of the LM generating them.They devise an open-source algorithm that involves a statistical test which allows them to confidently detect watermarks.Google DeepMind launched SynthID,a tool which embed
105、s a digital watermark directly into image pixels.While imperceptible to the human eye,it can identify Imagen-generated images.#stateofai|30 Introduction|Research|Industry|Politics|Safety|PredictionsDisentangling the real and the fake,and surfacing the real behind the fake Researchers from Google,Dee
106、pMind,ETH,Princeton,and UC Berkeley showed that Stable Diffusion(a model used by Stability AI among others)memorizes individual images from training and emits them at generation time.The authors are able to extract 1,000+images,including ones with trademarked company logos.They further show that dif
107、fusion models are much more prone to generating images from their training set than other generative models like GANs.If we cant have more original training data,why not train more on what we have?Conflicting research indicates that the answer is,as always,it depends:Training for one or two epochs w
108、ill generally be optimal;In some cases,pushing for a few more epochs can help;But too many epochs generally equals overfitting.stateof.ai 2023 Before the large-scale deep learning era(say post GPT-2),most models were trained multiple epochs over a given dataset.But as the size of models grew larger,
109、training for multiple epochs almost always resulted in overfitting,prompting most practitioners to train for a single epoch on the available data(which for once,is the theoretically optimal thing to do).#stateofai|31Breaking the data ceiling:overtraining Introduction|Research|Industry|Politics|Safet
110、y|Predictions stateof.ai 2023#stateofai|32Vibe check:evaluating general-purpose LLMs leaderboards and“vibes”Introduction|Research|Industry|Politics|Safety|Predictions The motto of the HELM benchmark is to evaluate as many things as you can,leaving the choice of specific tradeoffs to users.It evaluat
111、es models on 42 scenarios(benchmarks)on 59 metrics.Categories for metrics include accuracy,robustness,fairness,bias,etc.As both open and closed LLMs multiply,users are left with a plethora of non-differentiated LLMs trained on more or less the same data.Based on challenging benchmarks,Stanfords HELM
112、 leaderboard and Hugging Faces LLM Benchmark seem to be the current standard for comparing model capabilities.But beyond benchmarks or combinations thereof,with such flexible models,users seem to still prefer the more subjective vibes.Contrary to HELM which includes both open and closed LLMs,Hugging
113、 Faces benchmark only compares open LLMs,but it seems to be evaluated more often than HELM(evaluating the largest models is also much more costly).Despite relatively dynamic benchmarks,according to the omniscient machine learning source of truth,X/Twitter,users tend to disregard leaderboards,and onl
114、y trust their“vibes”when applying LLMs to their specific use-case.Both Unnatural CodeLLaMa and WizardCoder are trained not only on large pre-training coding dataset,but also using additional LM-generated instruction finetuning techniques adapted to code data.Meta used their Unnatural Instructions wh
115、ile WizardLM used their EvolInstruct.Notably,CodeLLaMa is trained in a way that enables the model to do infilling(rather than only completion from past text),and all the CodeLLaMa models were released except for Unnatural CodeLLaMa.Smaller LMs for code(including replit-code-v1-3b and StarCoder 3B)of
116、fer both low latency and good performance on code completion tasks.Their support for inference at the edge(e.g.,ggml on Apple Silicon)have fostered the development of privacy-aware alternatives to GitHub Copilot.The leader in terms of coding abilities is unsurprisingly GPT-4,with Code Interpreter or
117、 now Advanced Data Analysis leaving users in awe.Open alternatives like WizardLMs WizardCoder-34B and Unnatural CodeLLaMa hold up with ChatGPT in coding benchmarks,but their performance in production is still TBD.stateof.ai 2023#stateofai|33State of LMs for code Introduction|Research|Industry|Politi
118、cs|Safety|Predictions DeepMind released AlphaDev,a deep RL agent based on AlphaZero that optimizes low-level Assembly code used to turn high-level code(e.g.in C+or Python)into machine-readable binary code.Through simple deletes and edits to an existing algorithm,AlphaDev found a method that speeds u
119、p sorting small sequences by up to 70%.stateof.ai 2023 AlphaZero had been used to reach superhuman levels in chess,Go,and shogi,or even to improve chip design.AlphaDev reformulates code optimization as an RL problem:At time t,the state is a representation of the generated algorithm and of memory and
120、 registers;the agent then writes new instructions or deletes new ones;its reward depends on both correctness and latency.The discovered algorithms for sort3,sort4,and sort5,led to improvements of 1.7%for sequences larger than 250K.These were open-sourced in the ubiquitous LLVM library.#stateofai|34
121、Introduction|Research|Industry|Politics|Safety|Predictions Interestingly,through careful prompting,a researcher managed to make GPT-4 come up with a similar(very simple)optimization to AlphaDevs for sort3.AlphaZero is DeepMinds gift that keeps on giving,now for low-level code optimization The qualit
122、y of a prompt highly influences task performance.Chain of Thought prompting(CoT)asks the LLM to additionally output intermediate reasoning steps which gives a boost in performance.Tree of Thought(ToT)further improves on that by sampling multiple times and representing the“thoughts”as nodes in a tree
123、 structure.stateof.ai 2023 The tree structure of a ToT can be explored with a variety of search algorithms.In order to leverage this search,the LLM also needs to assign a value to node,for instance by classifying it as one of sure,likely or impossible.Graph of Thought(GoT)turns this reasoning tree i
124、nto a graph by combining similar nodes.#stateofai|35 Introduction|Research|Industry|Politics|Safety|PredictionsWhere are we prompting?Take a deep breathits getting sophisticated It turns out that LLMs are also great prompt engineers.Auto-CoT matches or exceeds the performance of CoT on 10 reasoning
125、tasks.Automatic Prompt Engineer(APE)shows the same on 19/24 tasks.APE-engineered prompts are also able to steer models towards truthfulness and/or informativeness.Optimization by Prompting(OPRO)shows that optimized prompts outperform human-designed prompts on GSM8K and Big-Bench Hard by a significan
126、t margin,sometimes over 50%.Downstream tasks are highly dependent on underlying LLM performance.However,changes to the same version of GPT models are not announced by OpenAI,despite them being continuously updated.The same LLM version has been reported by users to have drastically different performa
127、nce over time.Everyone had to continuously monitor performance as well as update carefully curated prompts.stateof.ai 2023 How is ChatGPTs Behaviour Changing over Time?report shows that March 2023 and June 2023 versions of GPT3.5 and GPT4 varied in performance on tasks like math questions(figure bel
128、ow),sensitive questions,opinion surveys,knowledge questions,generating code,US Medical License tests and visual reasoning.#stateofai|36Prompt engineering trial and error Introduction|Research|Industry|Politics|Safety|Predictions The most immediate way LLMs can have an impact on the economy today is
129、when they are enabled to execute calls to diverse external tools.The most obvious use tool is a web browser,allowing a model to stay up to date,but practitioners are fine-tuning language models on API calls to enable them to use virtually any possible tool.stateof.ai 2023 One example of tool-using L
130、LMs is Meta and Universitat Pompeu Fabras Toolformer,where researchers train a GPT-J-based model in a self-supervised manner“to decide which APIs to call,when to call them,what arguments to pass,and how to best incorporate the results into future token prediction.”Notably,during training,Toolformer
131、samples API calls and only retains the ones which result in reducing the training loss.#stateofai|37Welcome,Agent Smith:LLMs are learning to use software tools Introduction|Research|Industry|Politics|Safety|Predictions Some models are more narrowly focused,like Googles Minds eye,where models run a p
132、hysics simulation to answer physics reasoning questions,while others extended this approach to tens of thousands of possible external tools.LLMs which are able to use external tools are now commonly referred to as“agents”.Stepping out from academic research,we have seen multiple tools devised by ind
133、ustry and the open source community,most notably ChatGPT plugins,Auto-GPT and BabyAGI.Capable of code generation and execution,LLMs can be powerful planning agents in open-ended worlds.The best example of this is Voyager,a GPT-4 based agent capable of reasoning,exploration and skill acquisition in M
134、inecraft.stateof.ai 2023 By iteratively prompting GPT-4(LLMs still struggle at one-shot code generation),Voyager produces executable code to complete tasks.Note that most likely GPT-4 has seen a significant amount of Minecraft related data,so this approach might not generalise to other games.Open-en
135、ded learning with LLMs#stateofai|38 Introduction|Research|Industry|Politics|Safety|Predictions The agent interacts with the environment through explicit javascript code via the MineCraft API.If the generated code succeeds at the task,it is then stored as a new skill,otherwise GPT-4 gets prompted aga
136、in with the error.GPT-4 generates the tasks curriculum based on Voyagers state to encourage it to solve progressively harder tasks.Without any training,Voyager obtains 3.3x more unique items,travels 2.3x longer distances,and unlocks key tech tree milestones up to 15.3x faster than prior SOTA.stateof
137、.ai 2023#stateofai|39Reasoning with language model is planning with a world model Introduction|Research|Industry|Politics|Safety|Predictions The world model can generate an action as well as predict the next state reached by taking that action.This produces a reasoning trace which makes the LM more
138、coherent then Chain of Thought methods which predict next actions but not next world states.The rewards are also obtained from the LM and used to maintain a state-action value function for planning with MCTS.While being significantly more expensive,RAP outperforms Chain-of-Thought reasoning approach
139、es on plan generation,math reasoning and logical reasoning.RAP on LLaMA-33B even outperforms CoT on GPT-4 in a setting of Blocksworld.Reasoning has been traditionally thought of as searching a space of possible outcomes and picking the best one.By containing so much information about the world,LLMs
140、offer the opportunity of generating this space(often called a world model)in which planning algorithms can explore.Reasoning via Planning(RAP)uses Monte Carlo Tree Search to find a high-reward reasoning path efficiently.Another text-only agent based on GPT-4 is SPRING.It outperforms state-of-the-art
141、 RL baselines in open-world games with no training.It reads a games original academic paper and plays the game through an LLM.stateof.ai 2023 RL has been the go-to for game-based problems like Minecraft and Crafter,despite it being limited by the high sample complexity and difficulty in incorporatin
142、g prior knowledge.In contrast,the LLM can processes the latex source of the paper and reasons through a QA framework(directed acyclic graph with questions as nodes and dependencies as edges)to take an environment action.#stateofai|40 GPT-4 out-performs RL algorithms by studying papers and reasoning
143、Introduction|Research|Industry|Politics|Safety|Predictions In a new Visual Instruction Benchmark(VisIT-Bench)consisting of 592 queries with human-authored captions vision-language models are tested against human-verified GPT4 and most come short of expectations.stateof.ai 2023#stateofai|41Vision-lan
144、guage models:GPT-4 wins(but API access is still limited)Introduction|Research|Industry|Politics|Safety|PredictionsAccording to human evaluators the best model is LLaMa-Adapter-v2,despite it only winning against the GPT4 verified reference captions in 27.4%of the cases on VisIT-Bench.Earlier this yea
145、r a multimodal model that stood out was BLIP-2 from Salesforce.It was released early(before GPT4)and had better performance than closed-source Flamingo on VQAv2 while having 54x less trainable parameters.It uses an off-the-shelf frozen LLM,an off-the-shelf frozen pre-trained image encoder and only t
146、rains a small transformer.However its improved variant InstructBLIP has a win rate of only 12.3%against GPT4 reference captions on VisIT-Bench.+Two methods VisProg and ViperGPT show how given an input natural language query about an image,an LLM can decompose this into a sequence of interpretable st
147、eps that call predefined API functions for visual tasks.The visual programming approach aims to build general-purpose vision systems via compositional multi step reasoning instead of end-to-end multitask training.Both methods use entirely off-the-shelf components.stateof.ai 2023#stateofai|42Leveragi
148、ng LLMs and world knowledge for compositional visual reasoning Introduction|Research|Industry|Politics|Safety|Predictions An API for visual primitives calls into existing SOTA models(e.g.semantic segmentation,object detection,depth estimation).ViperGPT uses Codex to directly generate python programs
149、 based on the API which can be executed using a python interpreter.VisProg prompts GPT-3 with examples of pseudocode instructions and interprets them as a visual program,relying on LLM in-context learning from examples.World knowledge in LLMs from training on internet scale data is shown to aid in v
150、isual reasoning tasks(e.g.querying for non alcoholic drink in an image based on detected brand).Both methods show state-of-the-art results across various complex visual tasks.+LINGO-1 is Wayves vision-language-action model that provides driving commentary,such as information about the driving behavi
151、our or the driving scene.It can also answer questions in a conversational manner.LINGO-1 can be a game changer in terms of explainability of end-to-end driving models as well improve reasoning and planning.stateof.ai 2023#stateofai|43Leveraging LLMs for autonomous driving Introduction|Research|Indus
152、try|Politics|Safety|Predictions+PaLM-E is a 562-billion parameter,general-purpose,embodied generalist model trained on vision,language and robot data.It can control a manipulator in real time while also setting a new SOTA on a VQA benchmark.Given its embodied intelligence advantage,PaLM-E is better
153、at pure language tasks(particularly the ones involving geo-spatial reasoning)than text-only language models.stateof.ai 2023#stateofai|44PaLM-E:a foundation model for robotics Introduction|Research|Industry|Politics|Safety|PredictionsThe model combines PaLM-540B and ViT-22B and enables as input text,
154、images and robot states which are encoded into the same space as word token embeddings and then fed into a language model to perform next token prediction.+Vision-language models can be fine-tuned all the way to low-level policies showing impressive performance in manipulating objects.They also reta
155、in their ability to reason about web-scale data.stateof.ai 2023RT-2 represents actions as tokens and trains vision-language-action models.Rather than naive finetuning on robot data only,RT-2 co-finetunes PaLI-X and PaLM-E on robot actions(6-DoF positional and rotational displacement of the robot end
156、-effector).#stateofai|45From vision-language models to low-level robot control:RT-2 Introduction|Research|Industry|Politics|Safety|PredictionsInternet-scale training enables generalisation to novel objects,interpreting commands not present in the robot training data and semantic reasoning(figuring o
157、ut what object to pick as an improvised hammer).For efficient real-time inference,RT-2 models are deployed in a multi-TPU cloud service.The largest RT-2 model(55B parameters)can run at a frequency of 1-3Hz.+RoboCat is a foundation agent for robotic manipulation that can generalise to new tasks and n
158、ew robots in zero-shot or few-shot(100-1000 examples).Impressive real-time performance on a variety of platforms.stateof.ai 2023Its built on top of DeepMinds multi-modal,multi-task and multi-embodiment Gato.It uses a frozen VQ-GAN tokenizer trained on a variety of vision and control datasets.While G
159、ato only predicted actions,RoboCat additionally predicts future VQ-GAN tokens.In terms of policy learning,the paper only mentions behaviour cloning.RoboCat is fine-tuned with few demonstrations(via teleoperation)and re-deployed to generated new data for a given task,self-improving in subsequent trai
160、ning iterations.RobotCat can operate 36 real robots with different action specifications,in 253 tasks on 134 real objects at an impressive speed(20Hz).#stateofai|46From vision-language models to low-level robot control:RoboCat Introduction|Research|Industry|Politics|Safety|Predictions+This is a firs
161、t time win for a robot in a competitive sport(first-person view drone racing).Swift is an autonomous system that can race a quadrotor at the level of human world champions using only onboard sensors and computation.It won several races against 3 champions and had the fastest recorded time.stateof.ai
162、 2023Swift uses a combination of learning-based and more traditional techniques.It combines a VIO estimator with a gate detector that estimates global position and orientation of the drone through a Kalman filter to obtain an accurate estimation of the robots state.Swifts policy is trained using on-
163、policy model-free deep reinforcement learning in simulation with a reward that combines progress towards the next gate and keeping it in the field of view(this increases pose estimation accuracy).The racing policy transfers well from sim to real when accounting for uncertainty in perception.#stateof
164、ai|47An autonomous system that races drones faster than human world champions Introduction|Research|Industry|Politics|Safety|Predictions Map-building is an emergent phenomenon in the course of AI agents learning to navigate.It explains why we can feed neural networks images with no explicit maps and
165、 can predict navigation policies.stateof.ai 2023 The Emergence of Maps in the Memories of Blind Navigation Agents shows that giving an agent knowledge of only ego-motion(change in agents location and orientation as it moves)and goal location is sufficient to successfully navigate to the goal.Note th
166、at this agent does not have any visual information as input and yet its success rates compared to sighted agents are very similar,only efficiency differs.The model doesnt have any inductive bias towards mapping and is trained with on-policy reinforcement learning.The only mechanism that explains thi
167、s ability is the memory of the LSTM.It is possible to reconstruct metric maps and detect collisions solely from the hidden state of this agent.#stateofai|48The emergence of maps in the memories of blind navigation agents Introduction|Research|Industry|Politics|Safety|Predictions Meta trained an AI a
168、gent to play a popular multiplayer strategy game called Diplomacy,which involves planning and negotiating in natural language with other players over multiple rounds.CICERO achieved double the average score of human players online and ranked in the top 10%players who played more than one game.stateo
169、f.ai 2023 Fast parallel progress in strategic planning and language modeling allows for potentially great advancements at the intersection,with applications in human-AI cooperation.Meta tackles the game of Diplomacy as a benchmark for such progress.CICERO uses dialogue history between players as wel
170、l as the board state and its history to begin predicting what everyone will do.It then iteratively refines these predictions using planning,then decides according to a policy which action it intends to take.CICERO then generates and filters candidate messages to communicate with players.The controll
171、able dialogue model it uses is based on a 2.7B-params BART-like model fine-tuned on 40K online games of Diplomacy.CICERO uses a new iterative planning algorithm based on piKL which improves the predictions of other players moves after dialoguing with them.#stateofai|49CICERO masters natural language
172、 to beat humans at Diplomacy Introduction|Research|Industry|Politics|Safety|Predictions Similar to last year(Slide 33),the race is between video diffusion and masked transformer models(although algorithmically the two are very similar).Last years Make-a-video and Imagen were based on diffusion while
173、 Phenaki was based on a bidirectional masked transformer.stateof.ai 2023 VideoLDM is a latent diffusion model capable of high-resolution video generation(up to 1280 x 2048!).They build on pre-trained image diffusion models to turn them into video generators by temporally fine-tuning with temporal al
174、ignment layers.MAGVIT is a masked generative video transformer.Similarly to Phenaki,it uses a 3D tokeniser to extract spatio-temporal tokens.It introduces a novel masking approach.It currently has the best FVD on video generation benchmarks and its 250 x faster than video diffusion.#stateofai|50The
175、text-to-video generation race continues Introduction|Research|Industry|Politics|Safety|Predictions Last year saw the emergence of a host of text-image generation models:DALLE-2,Imagen,Parti,Midjourney,Stability and more.But controlling the generation requires experimenting extensively with prompts a
176、nd custom syntax.This year has seen new methods enabling co-pilot style capability for image generation and editing.stateof.ai 2023 InstructPix2Pix,leverages pre-trained GPT3 and StableDiffusion to generate a large dataset of input image,text instruction,generated image triplets to train a supervise
177、d conditional diffusion model.Editing then happens in a feed-forward way without any per image fine tuning/inversion,enabling modifications in seconds.Masked inpainting methods such as Imagen Editor require providing the model with an overlay or“mask”to indicate the region to modify,alongside text i
178、nstructions.Building on these approaches,startups such as Genmo AIs“Chat”provide a co-pilot style interface for image generation with text-guided semantic editing.#stateofai|51Instruction based editing assistants for text-image generation Introduction|Research|Industry|Politics|Safety|Predictions+A
179、new NeRF contender based on 3D Gaussians shows impressive quality while also enabling real-time rendering.stateof.ai 2023#stateofai|52Welcome 3D Gaussian Splatting Introduction|Research|Industry|Politics|Safety|PredictionsMipNeRF360 Barron 220.06 fpsTrain:48h,PSNR:27.693D Gaussian Splatting134 fpsTr
180、ain:41min,PSNR:27.21Instead of learning the parameters of a neural network,3D Gaussian Splatting learns millions of Gaussian distributions(one for each 3D point)and performs rasterisation by calculating the contribution each gaussian makes to each pixel in the final image.Areas that need more repres
181、entational power use more Gaussians,while avoiding unnecessary computation in empty space,which is why,similarly to NeRFs,scenes look so beautifully detailed.Its now possible to render high-quality real-time(100 fps)novel-views at 1080p resolution.*Note that Zip-NeRF has a training time of 53min and
182、 a PSNR of 28.54 on the same dataset(Multiscale 360).NeRF-based generative models are a promising direction for large scale creation of 3D assets.NeRFs not only have improved in speed and quality(see HyperDiffusion,MobileNeRF,Neurolangelo and DynIBAR)but also enabled GenAI to model 3D geometry.state
183、of.ai 2023#stateofai|53NeRFs meet GenAIDreamFusion and Score Jacobian Chaining were the first methods to use a pretrained 2D text-to-image diffusion model to perform text-to-3D synthesis.Early attempts showed cartoonish-looking 3D models of single objects.RealFusion finetunes the diffusion prior on
184、a specific image to increase that images likelihood.SKED only alters a selected region of a NeRF provided through a few guiding sketches.They preserve the quality of the base NeRF and ensure that the edited region respects the semantics of a text prompt.Instruct-Nerf2Nerf edits an entire NeRF scene
185、rather than a region or generating from scratch.They apply a latent diffusion model on each input image and iteratively update the NeRF scene ensuring it stays consistent.Introduction|Research|Industry|Politics|Safety|Predictions ,Zero-shot depth models have recently been used as conditioning for be
186、tter image generation.This only requires relative depth prediction,while other downstream applications such as robotics require metric depth which so far has not generalised well across datasets.stateof.ai 2023#stateofai|54Zero-shot metric depth is here Introduction|Research|Industry|Politics|Safety
187、|Predictions“ZeroDepth:Towards Zero-Shot Scale-Aware Monocular Depth Estimation”is able to predict metric depth for images from different domains and different camera parameters.They jointly encode image features and camera parameters which enables the network to reason over the size of objects and
188、train in a variational framework.The depth network ends up learning scale priors that can be transferred across datasets.“ZoeDepth:Zero-shot Transfer by Combining Relative and Metric Depth”is a relative depth model with an additional module fine-tuned on metric depth.This is the first model to train
189、 on multiple datasets without a significant drop in performance and able to generalise across both indoor and outdoor domains.Taking inspiration from large language models which are pre-trained on vast datasets and exhibit zero-shot capabilities via prompting,Meta researchers set out to build a mode
190、l that enables general promptable segmentation:given any prompt,the model should be able to identify and segment any object in any image.Meta introduced a large-scale project called“Segment Anything”which included the release of 1B segmentation masks on a 11M image dataset(SA-1B),and a segmentation
191、model(SAM)with an Apache 2.0 commercial use license.Meta tested SAM on 23 out of domain image datasets outperforming existing SoTA on 70%+of cases.stateof.ai 2023#stateofai|55 Introduction|Research|Industry|Politics|Safety|PredictionsSegment Anything:a promptable segmentation model with zero-shot ge
192、neralisation The model has two components:(i)An heavyweight encoder(ViT)to compute a one-time image embedding,(ii)a lightweight interactive module(that can run on CPU in a browser)consisting of a prompt encoder that embeds the user prompt,and mask decoder that predicts the segmentation masks.A model
193、-in-the-loop data-engine was used to generate the training data,with the final SA-1B generated entirely automatically by applying SAM.Through prompt engineering,SAM can be applied to other tasks including edge detection,object proposal generation,and instance segmentation and preliminary results wer
194、e shown combining SAM+CLIP for text prompts.It is the first work to close the gap between self-supervised and weakly supervised approaches.DINOv2 features are shown to contain information about object parts as well as semantic and low level understanding of images.DINOv2 is a self-supervised Vision
195、Transformer model from Meta,producing universal visual features that can be used across a variety of image level(e.g.classification)and pixel level(e.g.segmentation)tasks without fine-tuning and are competitive with SOTA open-source weakly supervised alternatives.stateof.ai 2023#stateofai|56 Introdu
196、ction|Research|Industry|Politics|Safety|PredictionsDINOv2:the new default computer vision backbone The authors made the training of self-supervised learning models more stable through additional regularisation methods and reduced the memory requirements,which enabled training larger models on more d
197、ata for longer.They also provide compressed versions of the models obtained through distillation.Although any image can be used for training,a key component was curating the dataset and automatically balancing it across concepts(keeping 142M out of 1.2B source images).Re-visit this slide.DINOv2 feat
198、ures can be used with linear classifiers to obtain strong results across many visual tasks.Pangu-Weather is a 3D deep learning model with Earth-specific priors trained on 39 years of global data that can generate medium-range global weather for.The system can be used for more accurate early-stage cy
199、clone tracking vs status quo.Skilful short term precipitation predictions(nowcasting)today are blurry,prone to dissipation and are slow.Medium-range global weather forecasts using the accurate numerical weather prediction method is computationally expensive.For both problems,learned methods and phys
200、ics-informed models that incorporate relevant priors are able to deliver performance improvements preferred by professional meteorologists.New benchmark datasets such as Googles WeatherBench 2 help data-driven weather model development.stateof.ai 2023#stateofai|57More accurate weather predictions,in
201、 the now(casts)and the longer ranges Introduction|Research|Industry|Politics|Safety|Predictions NowcastNet is a nonlinear model that uses physical first principles and statistical-learning methods,unified under a deep generative model framework.Evaluated by 62 professional meteorologists from across
202、 China,the model ranks 1st in 71%of cases against leading methods.New models from Google,Meta,and the open source community significantly advance the quality of controllable music generation.stateof.ai 2023 Though not the best in terms of generated music quality,Riffusion was probably the most innov
203、ative model.Researchers fine-tuned Stable Diffusion on images of spectrograms,which are then converted into audio clips.With MusicLM,Google researchers“cast conditional music generation as a hierarchical seq2seq modeling task”.They are able to generate consistent music(24kHz)over several minutes.Sam
204、ples are available at https:/google-research.github.io/seanet/musiclm/examples/To our ears,Metas MusicGen strikes a better balance between adhering to text descriptions and generating a pleasant melody.It uses a single transformer LM and careful codebook interleaving techniques.Samples:https:/ai.hon
205、u.io/papers/musicgen/#stateofai|58Another year of progress in music generation Introduction|Research|Industry|Politics|Safety|Predictions Designing novel proteins from scratch such that they have desired functions or structural properties,de novo design,is of interest in both research and industry.I
206、nspired by their success in generative modelling of images and language,diffusion models are now applied to de novo protein engineering.stateof.ai 2023#stateofai|59Diffusion models design diverse functional proteins from simple molecular specifications Introduction|Research|Industry|Politics|Safety|
207、Predictions A model called RFdiffusion takes advantage of the high precision,residue-level resolution protein structure prediction capabilities of RoseTTAFold to fine-tune it as the denoising network in a generative diffusion model using noisy structures from the Protein Data Bank.Similar to AlphaFo
208、ld 2,RFdiffusion is best trained when the model conditions denoising on previous predictions between timesteps.RFdiffusion can generate protein backbones with desired features and ProteinMPNN can then be used to design sequences that encode these generated structures.The model can produce backbone d
209、esigns for protein monomers,protein binders,symmetric oligomers,enzyme active site scaffolding and more.This model,Evolutionary Scale Modeling2(ESM-2),is used to characterize the structure of 617M metagenomic proteins(found in soil,bacteria,water,etc).ESM-2(schematic below)offers significant speedup
210、s compared to AlphaFold-2(AF2):these results were produced in 2 weeks using a cluster of 2,000 GPUs.ESMFold is a fully end-to-end single-sequence structure predictor that uses a folding head for ESM-2.ESMFold structures(right)are of AF2-grade quality as measured by TM-score,which is the accuracy of
211、the projection in comparison to the ground truth structure.Atomic-level protein structure can now be directly predicted from amino acid sequences without relying on costly and slow multiple sequence alignment(MSA).To do so,a masked language modeling objective is used over millions of evolutionarily
212、diverse protein sequences to cause biological structure to materialize in the language model because it is linked to the sequence patterns.stateof.ai 2023#stateofai|60Learning the rules of protein structure at evolutionary-scale with language models Introduction|Research|Industry|Politics|Safety|Pre
213、dictions Understanding how gene expression changes as a result of stimulating or repressing combinations of genes(i.e.perturbations)is important to unravel biological pathways relevant to health and disease.But combinatorial explosion precludes us from running these experiments in living cells in th
214、e lab.Integrating deep learning with a knowledge graph of gene-gene relationships offers a solution.stateof.ai 2023 Graph-enhanced gene activation and repression simulator(GEARS)combines prior experimental knowledge to predict the gene expression outcome given unperturbed gene expression and the app
215、lied perturbation.For example,GEARS can be trained on the gene expression profiles postperturbation for one-gene and two-gene experiments(b),and then be tasked with predicting the postperturbation gene expression for 5,460 pairwise combinations(c).#stateofai|61Predicting the outcome of perturbing mu
216、ltiple genes without a cell-based experiment Introduction|Research|Industry|Politics|Safety|Predictions The AlphaMissense system is built by:(i)training on weak labels from population frequency data,avoiding circularity by not using human annotations;(ii)incorporating an unsupervised protein languag
217、e modeling task to learn amino acid distributions conditioned on sequence context;and(iii)incorporating structural context by using an AlphaFold-derived system.AlphaMissense is then used to predict 71M missense variants,saturating the human proteome.Of these,32%are likely pathogenic and 57%are likel
218、y benign.Additional resources include all 216M possible single amino acid substitutions across the 19,233 canonical human proteins.Individual changes in amino acid sequences that result from genetic variation(“missense variants”)can either be benign or result in downstream problems for protein foldi
219、ng,activity or stability.Over 4M of these missense variants have been identified through human population-level genome sequencing experiments.However,98%of these variants lack any confirmed clinical classification(benign/pathogenic).A new system,AlphaMissense,makes use of AlphaFold predictions and u
220、nsupervised protein language modeling to close this gap.stateof.ai 2023#stateofai|62Pathogenic or not?Predicting the outcome of all single-amino acid changes Introduction|Research|Industry|Politics|Safety|Predictions#stateofai|63 Introduction|Research|Industry|Politics|Safety|PredictionsGoogles Med-
221、PaLM 2 language model is an expert according to the USMLE A year after releasing Med-PaLM,first model to exceed a“passing”score on the US Medical Licensing Examination(USMLE),Med-PaLM 2 set a new SOTA result across more datasets as a result of base LLM improvements,medical domain finetuning and prom
222、pting strategies.In a pairwise ranking study on 1,066 consumer medical questions,Med-PaLM 2 answers were preferred over physician answers by a panel of physicians across eight of nine axes in our evaluation framework.stateof.ai 2023#stateofai|64 Introduction|Research|Industry|Politics|Safety|Predict
223、ionsNext,Med-PaLM goes multimodal To bridge beyond text-based medical Q&A,Google first created MultiMedBench-a 14 task dataset that includes medical Q&A,mammography and dermatology image interpretation,radiology report generation and summarization,and genomic variant calling.This dataset is used to
224、train a large single multitask,multimodal version of MedPaLM with the same set of model weights.The system exhibits novel emergent capabilities such as generalisation to novel medical concepts and tasks.An alternative lighter-weight approach,ELIXR,was also proposed.ELIXR grafts language-aligned visi
225、on encoders onto a fixed LLM,which requires less compute to train and shows promise across tasks including visual QA,semantic search,and zero-shot classification.+stateof.ai 2023 Like CLIP,PLIP can perform zero-shot classification on unseen data,enabling it to distin several key tissue types.It can
226、also be used to improve text-to-image and image-to-image retrieval of pathology images.Unlike other machine learning approaches in digital pathology that are predicated on learning from a fixed set of labels,PLIP can be more generally applied and is flexible to the changing nature of diagnostic crit
227、eria in pathology.Compared to CLIP,PLIP has 2-6x better Precision10.Its no secret that(quality)data is king for building capable AI systems,and no more so than in domains such as clinical medicine where(quality)data is expensive to produce.This work mines text-image pairs on Twitter to create the Op
228、enPath dataset with 200+pathology images paired with natural language descriptors.Inspired by OpenAIs Contrastive Language-Image Pretraining(CLIP)model,the authors create P(athology)LIP.stateof.ai 2023#stateofai|65Tweet storm:a SOTA pathology language-image pretrained model from medical Twitter Intr
229、oduction|Research|Industry|Politics|Safety|Predictions+Computer vision has been shown to be useful for breast cancer screening on mammograms and tuberculosis triaging.However,to enable practical and reliable use in the clinic it is important to know when to rely on a predictive AI model or revert to
230、 a clinical workflow.stateof.ai 2023#stateofai|66Real world-inspired clinical system design for automated medical image analysis Introduction|Research|Industry|Politics|Safety|Predictions Complementarity-Driven Deferral to Clinical Workflow(CoDoC)learns to decide whether to rely on a predictive AI m
231、odels output or defer to a clinical workflow instead.For breast cancer screening,CoDoC reduces false positives by 25%at the same false-negative rate compared to double reading with arbitration in the UK.Importantly,clinical workload is reduced by 66%as a result.#stateofai|67 Introduction|Research|In
232、dustry|Politics|Safety|Predictionsstateof.ai 2023AI for science:medicine is growing fastest but mathematics captures the most attentionThe top 20 scientific fields applying AI to accelerate progress include physical,social,life and health sciences.Out of all the highest increase in the number of pub
233、lications is Medicine.We expect there to be significant research breakthroughs in the foreseeable future as a result of AIs use in the sciences.#stateofai|68 Introduction|Research|Industry|Politics|Safety|Predictionsstateof.ai 2023Most impactful research comes from very few places70%of the most cite
234、d AI papers in the last 3 years have authors from US-based institutions and organisations.Section 2:Industrystateof.ai 2023#stateofai|69 Introduction|Research|Industry|Politics|Safety|Predictions#stateofai|70 Introduction|Research|Industry|Politics|Safety|PredictionsGPU demand sees NVIDIA print blow
235、out earnings as it enters the$1T market cap clubQ2 23 data center revenue was a record$10.32B,up 141%from Q1 23 and up 171%from a year ago.The stock was bearish for 2022 even though annual revenue came in at$27B,a 61.4%increase from 2021.NVIDIA now commands a$1.1T market capitalisation,up from$8.5B(
236、130 x)10 years ago.stateof.ai 2023stateof.ai 2023Selling faster than Coachella:GPUs snapped up from upstart infra providers#stateofai|71CoreWeave,Lambda,and Crusoe Cloud,three selected NVIDIA partners that build and run GPU datacenters,together have tens of thousands of GPUs in their fleet.Lambda ma
237、de 9-figures$worth of H100s available in its on-demand cloud and sold out in just over an hour.CoreWeave is one of the largest GPU operators in the market with a scale similar to several hyperscalers.The company is fully booked through the end of the year with their build schedule and are signing co
238、ntracts in Q1 2024.Introduction|Research|Industry|Politics|Safety|Predictions#stateofai|72 Introduction|Research|Industry|Politics|Safety|PredictionsPrivate companies are shoring up NVIDIA GPUs and wielding them as a competitive edgestateof.ai 2023#stateofai|73 Introduction|Research|Industry|Politic
239、s|Safety|PredictionsFootballers Compute is the new oil in Gulf States?Saudi Arabias King Abdullah University of Science and Technology(Kaust)has allegedly purchased 3,000 H100s to build a supercomputer,Shaheen III,that should be operational by end of 2023.Its LLM-focused researchers are primarily Ch
240、inese nationals that cannot access the US because their universities are restricted.Meanwhile,the United Arab Emirates Technology Innovation Institute in Masdar City,which developed the Falcon LLM,is also said to be procuring compute resources from NVIDIA.Finally,Abu Dhabi-based G42 entered into a d
241、eal with US-based Cerebras to procure up to$900M worth of the companys Wafer-scale compute systems and build 9 interconnected AI supercomputers.There is likely much spend more to comestateof.ai 2023stateof.ai 2023Compute Index:NVIDIA A100 clusters#stateofai|74The number of large-scale NVIDIA A100 GP
242、U clusters has grown since last year,particularly at Tesla and Stability,as well as new clusters at Hugging Face.Introduction|Research|Industry|Politics|Safety|Predictionsstateof.ai 2023Compute Index:NVIDIA H100 clusters#stateofai|75Its early days,but private and public companies are announcing new
243、H100 infrastructure for large-scale model training.As of writing,Google and Inflection are not yet at full scale and we understand others including OpenAI,Anthropic,Meta,Character.ai,Adept,Imbue,and more have significant capacity.We expect more to come online soon.Introduction|Research|Industry|Poli
244、tics|Safety|Predictionsstateof.ai 2023NVIDIA chips are used 19x more in AI research papers than all alternative chips combined#stateofai|76In last years report,we began tracking the utilization of specific semiconductors in AI research papers.We found that NVIDIA chips were cited vastly more than al
245、ternatives.In 2023,NVIDIA is even more popular:31x more than FPGAs and 150 x more than TPUs.Introduction|Research|Industry|Politics|Safety|Predictions31x150 xstateof.ai 2023NVIDIA chips have remarkably long lifetime value:5 years from launch to peak popularity#stateofai|77In 2023,all eyes were on NV
246、IDIAs new H100 GPU,the more powerful successor to the A100.While H100 clusters are being built(not without hiccups),researchers are relying on the V100,A100 and RTX 3090.It is quite remarkably how much competitive longevity NVIDIA products have:the V100,released in 2017,is still the most commonly us
247、ed chip in AI research.This suggests A100s,released in 2020,could peak in 2026 when the V100 is likely to hit its trough.The new H100 could therefore be with us until well into the next decade!Introduction|Research|Industry|Politics|Safety|Predictionsstateof.ai 2023While NVIDIA is king,Cerebras ramp
248、s amongst the challenger crop#stateofai|78Cerebras,creators of the largest AI chip in the world,engaged in several open source model training and dataset creation projects,which helped it gain traction versus its competitors with researchers.Overall,theres still a long road to climb for NVIDIA conte
249、nders.Introduction|Research|Industry|Politics|Safety|Predictionsstateof.ai 2023Hyperscalers scale their spending on AI as a%of total capex#stateofai|79It is also rumored that NVIDIA is to ship 1.5M and 2M H100s in 2024,up from the 500,000 expected this year.Introduction|Research|Industry|Politics|Sa
250、fety|Predictionsstateof.ai 2023Tesla marches towards a Top-5 largest compute cluster for AI in the world#stateofai|80In our Compute Index from 2022,Tesla ranked 4th based on its A100 GPU count.As of summer 2023,the company brought online a new 10,000 H100 cluster,already making it one of the largest
251、 online to date.Introduction|Research|Industry|Politics|Safety|Predictionsstateof.ai 2023Meta announced MTIA,the companys first in-house accelerator based on open source RISC-V architecture that addresses the requirements of deep learning-based recommendation models.This is driven by growing size an
252、d complexity of models deployed in production and the slow inference speeds offered by GPUs.#stateofai|81More hyperscalers develop their own inference hardware for internal AI workfloads Introduction|Research|Industry|Politics|Safety|Predictions#stateofai|82 Introduction|Research|Industry|Politics|S
253、afety|PredictionsNVIDIA,Intel and AMD make Chinese-export controls proof chipsAccording to NVIDIAs CFO,China historically accounted for 20-25%of NVIDIAs revenue from data centre-related products(Financial Times).As a result,as the US commerce department became increasingly aggressive with export con
254、trols of AI chips,NVIDIA(and its competitors)developed chips which fly right below the export list thresholds.stateof.ai 2023 In late August 2022,NVIDIAs A100 and H100 their most powerful chips for AI applications were added to the US Commerce Departments export control list and became out of reach
255、for Chinese companies.By November,NVIDIA had already started advertising the A800 and H800 chips,which it designed to be below the performance threshold set by the US ban.Intel did the same with a new version of their Habana Gaudi 2 chip,and AMD expressed a similar intent.As a result,the likes of By
256、teDance and Baidu have ordered$1B worth of A800/H800 NVIDIA GPUs.There has also been reports of increasing A100/H100 GPU traffic in China,but on a much smaller scale.Arm,whose IP underpins the chips in 99%of the worlds smartphones,is working to reposition itself as a player in the AI market.It has p
257、artnered with self-driving car company Cruise and NVIDIA on its Grace Hopper chip(where its tech acts in a supporting role).#stateofai|83 Introduction|Research|Industry|Politics|Safety|PredictionsSoftbank re-lists Arm on the NASDAQ after its sale to NVIDIA was blockedBack in 2020,we predicted that N
258、VIDIA would fail to complete its acquisition of Arm.In September,Arm was relisted on the Nasdaq,achieving a valuation of$60 billion at open.stateof.ai 2023 However,it wont be plain-sailing.Revenue was flat over the last fiscal year and 25%comes from Arm China,an independent subsidiary required for s
259、ales into the Chinese market.Arm may have the potential to raise its low royalty rates per device,considering its huge market share,but will need to balance this with growing open source alternative architectures like RISC-V.As Arm does not sell physical chips,it has managed to swerve the impact of
260、sanctions so far,but as the US-China chip wars escalate,there is no guarantee this will last.ElevenLabs now had over 2M registered users and is growing fast.It took half the time to to get the second million of users than the first.Users cumulatively uploaded over 10 years of audio content.Initially
261、 geared towards creators and publishers,ElevenLabs is now adapting to a large range of use-cases from AI agents,companion,entertainment,and gaming.Uizard,a product design company powered by AI tools,said it recorded$3.2M ARR up to July 23,which is 13x YoY.The company had crossed$1M ARR in April,and
262、went from$1M to$3M in 3 months.In 2022,we predicted:“Generative audio tools emerge that attract over 100,000 developers by September 2023.”Both ElevenLabs(UK)and Resemble AI(US)exceeded that threshold.Another domain,product design,is seeing rapid integration of generative AI technology,to the benefi
263、t of fast-moving companies like Uizard.stateof.ai 20232022 Prediction:Generative AI applications grow in popularity#stateofai|84 Introduction|Research|Industry|Politics|Safety|Predictions Video too is a rapidly advancing frontier for GenAI.Founded in 2017,London-based Synthesia launched their AI-fir
264、st video creator in 2020.The system generates multi-lingual avatars that enact a script for use by consumers and enterprises alike.Once considered to be“fringe”,Synthesia is now used by 44%of the Fortune 100 for learning and development,marketing,sales enablement,information security and customer se
265、rvice.Over 9.6M videos have been generated with the service since launch in 2020.stateof.ai 20232022 Prediction:Generative AI applications grow in popularity#stateofai|85 Introduction|Research|Industry|Politics|Safety|Predictions2020 data starts on 1 May.2023 data stops on 1 Sept.#stateofai|86 Intro
266、duction|Research|Industry|Politics|Safety|PredictionsOpenAIs ChatGPT is one of the fastest growing internet productsstateof.ai 2023#stateofai|87 Introduction|Research|Industry|Politics|Safety|PredictionsOpenAI is now printing real money at scalebut at what cost?Only 12 months,the revenue projections
267、 made by OpenAI in the lead up to its$10B fundraise were met with much scepticism.Today,the company is ripping past its targets.How long will this last?And at what cost?stateof.ai 2023stateof.ai 2023Chegg,an NYSE-listed company focused on improving learning and learning outcomes for students,was hit
268、 hard by the launch of ChatGPT.In May 2023,the company said“In the first part of the year,we saw no noticeable impact from ChatGPT on our new account growth and we were meeting expectations on new sign-ups.”Students that paid Chegg to practice exams and get homework feedback turned to ChatGPT instea
269、d.As a result,their share price plummeted 40%.In Cheggs August 2023 earnings call,the company said“Weve pivoted the company to harness AI to better serve learners.”Theyre building internal LLMs in partnership with Scale AI.#stateofai|88 Introduction|Research|Industry|Politics|Safety|PredictionsFeeli
270、ng the ChatGPT heat:education gets hit first and Chegg is fighting backstateof.ai 2023Stack Overflow,a(pre-AI)de facto source for developers to find solutions to their coding problems,placed a ban on responses generated by ChatGPT and has suffered traffic losses as a result of ChatGPTs popularity.#s
271、tateofai|89 Introduction|Research|Industry|Politics|Safety|PredictionsFeeling the ChatGPT heat:coding is nextand developers are loving it!Left figure credit:Andre Retterathstateof.ai 2023If its meant to be,it will be(no matter how long it takes).GitHub has finally launched their coding assistant,CoP
272、ilot,to hugely positive reception.The system is trained on billions of lines of code.#stateofai|90 Introduction|Research|Industry|Politics|Safety|PredictionsResults are in:GitHub CoPilot drives significant productivity gains for developers In Sept 2022,GitHub ran an experiment with 95 professional d
273、evelopers,split them randomly into two groups,and timed how long it took them to write an HTTP server in JavaScript.This found significant productivity gains.In June 2023,GitHub reported data from 934,533 CoPilot users.Interestingly,productivity dips a little bit before significantly increasing as C
274、opilot users get acquainted with the tool,and the less experienced users are the ones who benefit the most(32%productivity gain).A new MIT study supports popular wisdom:ChatGPT helps with writing.Specifically,for“mid-level professional writing”the study showed that,compared to a control group,worker
275、s using ChatGPT took 40%less time to complete their task,and the output quality was measured to be 18%better.stateof.ai 2023#stateofai|91ChatGPT drives productivity in(repetitive,boring?)writing Introduction|Research|Industry|Politics|Safety|Predictionsstateof.ai 2023#stateofai|92Certain less obviou
276、s GenAI use cases have also gained significant traction Introduction|Research|Industry|Politics|Safety|PredictionsWeve seen huge consumer interest in users to interact with customised chatbots.A16z-backed Character.AI raised a$150M Series A and reported 200M monthly visits to its site ahead of the l
277、aunch of its app.Many of their uses are benign-for example,their use as grammar tools or in fanfiction communities,but weve seen commercial and ethical challenges.Weve seen reports of users developing emotional dependencies on their bots,companies struggle with the trade-off between the popularity o
278、f explicit content and its implication for their brand,as well as claims of extremist content.stateof.ai 2023After a breakout year in 2022 with the release of Stable Diffusion,Midjourney and Stability are still racing ahead with continuous improvements to their models.Though seemingly slower to reac
279、t on the text-to-image front,OpenAI has released its best text-to-image model yet,DALL-E 3.And there are still new entrants like Ideogram,whose founders are the creators of Googles Imagen their model notably can spell.Meanwhile weve seen countless integrations of text-to-image models in popular prod
280、ucts,most notably on Adobes Firefly,Photoroom,or even Discord.#stateofai|93Text-to-image models:Competition intensifies and integrations abound Introduction|Research|Industry|Politics|Safety|Predictions Midjourneys revenue,which had already reached$1M MRR in March 2022,is projected to reach$200M ARR
281、 in 2023.Its number of users grew from 2M to 14.8M YoY.Notably,Midjourney is integrated in Discord,where users can generate images on a Discord server.According to Discord,more than 30 million people use AI apps on its servers every month,creating more than 1 billion unique images.Photoroom,a French
282、 startup specializing in photo editing,said that with the introduction of generative AI in February,the company doubled its revenue and number of users over the last 6 months.Stabilitys SDXLOpenAIs DALL-E 3Midjourney v5.2Ideogram v0.1stateof.ai 2023#stateofai|94But GenAIs wow effect is(so far)insuff
283、icient for users to stick around Introduction|Research|Industry|Politics|Safety|PredictionsCompared to the most popular incumbent apps such as YouTube,Instagram,TikTok or WhatsApp,GenAI apps such as ChatGPT,Runway or Character.ai suffer from lower median retention and daily active users.Figure credi
284、t:Sequoia Capital In Oct 2022,Shutterstock-a leading stock multimedia provider-announced it will work with OpenAI to bring DALLE-powered content onto the platform.Then in July 2023,the two companies signed a 6-year content licensing agreement that would give OpenAI access to Shutterstocks image,vide
285、o and music libraries and associated metadata for model training.Furthermore,Shutterstock will offer its customers indemnification for AI image creation.The company also entered into a content license with Meta for GenAI.This pro-GenAI stance is in stark contrast to Shutterstocks competitor,Getty Im
286、ages,which is profoundly against GenAI as evidenced by its ongoing lawsuit against Stability AI for copyright infringement filed in Feb 2023.stateof.ai 2023#stateofai|95 Introduction|Research|Industry|Politics|Safety|Predictions2022 Prediction:A major user generated content site negotiates a commerc
287、ial settlement with a start-up producing AI models(e.g.OpenAI)for training on their corpusvs.In July 2023,OpenAI and the Associated Press(AP)entered into a licensing agreement for partial access to APs news stories dating back to 1985.Meanwhile,AP will gain access to OpenAI technology and product ex
288、pertise to explore generative applications.Although AP doesnt have LLM-based applications in production,it has made use of AI systems to create automated corporate earnings and sporting event recaps.stateof.ai 2023#stateofai|96 Introduction|Research|Industry|Politics|Safety|Predictions2022 Predictio
289、n:A major user generated content site negotiates a commercial settlement with a start-up producing AI models(e.g.OpenAI)for training on their corpusstateof.ai 2023A US District Court has reaffirmed the long-standing principle that human authorship is needed for copyright protection.While appeals are
290、 likely,important precedent may now have been set.#stateofai|97 Introduction|Research|Industry|Politics|Safety|PredictionsUS Courts set precedent for AI-generated content being unsuitable for copyright protection,but then another on fair use The US District Fort for the District of Columbia rejected
291、 a claim from Stephen Thaler that the 2012 image“A Recent Entrance to Paradise”(on the right)was worthy of copyright protection.The Copyright Office,however,has established an initiative to examine the impact of AI on copyright law and has released new copyright guidance,covering literary,visual,aud
292、iovisual,and sound.It stipulates that any artwork needed a human author and that applications needed to specify where AI was used.More challengingly for providers,in May 2023 ruling in a copyright case over a 1981 portrait of Prince,the US Supreme Court applied a new,stricter interpretation of what
293、constitutes as transformative under fair use.This could well make the scraping of books and artwork for models training data legally riskier.stateof.ai 2023#stateofai|98 Introduction|Research|Industry|Politics|Safety|Predictions In the UK and US,Getty Images is suing Stability,arguing that Stability
294、 had copied millions of photographs from its collection,altered or removed copyright information,and accused Stable Diffusion of generated images that bear a modified version of the Getty Images watermark.OpenAI and Meta are facing lawsuits claiming that ChatGPT and LLaMa on the grounds that they di
295、d not consent to their copyrighted books being used in training datasets.The New York Times is said to be mulling a similar suit against OpenAI.Three artists are suing Stability,DeviantArt and Midjourney for using their artwork to train an image generator that creates“infringing derivative works”.Th
296、e UK has a text and data mining exception to copyright law,but this only extends to non-commercial use;plans to widen this exemption have been shelved.The EU had a similar exemption,but the AI Act states that foundation model providers will have to provide summaries of copyrighted material used to t
297、rain their models(which could prove technically challenging)Microsoft has moved to reassure users of their Copilot tools that the corporation will assume any legal risks in the event of any copyright claims.Cases featuring the major text and image generation are being fought in the UK and US.While t
298、he companies contend that they are engaging in fair use or freedom of expression,there are signs that trouble may lie ahead.But cases continue to be fought in multiple jurisdictions about copyright infringement stateof.ai 2023#stateofai|99From labels to preferences Introduction|Research|Industry|Pol
299、itics|Safety|Predictions As instruction fine-tuning and RLHF became the default method to fine-tune and align language models,companies offering labeling services like Scale AI and Surge HQ stand to register exceptional growth from the exploding popularity of LLMs.Both companies bolster an impressiv
300、e list of customers,from AI startups to large corporate clients to leading labs in LLM research.Scale AI was last valued at$7.3B back in 2021,pre-Stable Diffusion and the ChatGPT frenzy.stateof.ai 2023#stateofai|100Open source AI is on a tear at a time when incumbents push for closed source AI Intro
301、duction|Research|Industry|Politics|Safety|Predictions Hugging Face,the now 7-year old company that has firmly become the town hall for open source AI,is seeing significant momentum as the community vies to keep AI models and datasets accessible to all.Over 1,300 models have been submitted to their O
302、pen LLM Leaderboard in a few months and 600 million model downloads in August 2023 alone.These models are exposed on Spaces as web applications built with tools such as Gradio or Streamlit,enabling broader accessibility and rapid prototyping.Monthly active Gradio users has grown 5x from 120k(Jan 23)
303、to 580k(Aug 23).Prior to the acquisition,Mosaic showed impressive engineering feats like training Stable Diffusion from scratch for 70%.Data as of 29 Sept 2023#stateofai|112 Introduction|Research|Industry|Politics|Safety|Predictionsstateof.ai 2023The US continues to lead by number of AI unicorns,fol
304、lowed by China and the UKThe from 2022 continues:the US grows its unicorn count to 315 from 292 and total enterprise value to$5.9T from$4.6T.The UK adds 3 more unicorns but sees cumulative enterprise value regress to$155B from$207B.Data as of 19 Sept 2023#stateofai|113 Introduction|Research|Industry
305、|Politics|Safety|Predictionsstateof.ai 2023Enterprise software,fintech and healthcare are the most invested AI categories globallyData as of 14 Sept 2023$invested in AI categories 2010-23Deal volume in AI categories2022-23%of deals for AI startups Deal volume in AI categories2022-23#stateofai|114 In
306、troduction|Research|Industry|Politics|Safety|Predictionsstateof.ai 2023Although IPOs dried up in 2023,the M&A market continues to stay strongNot much public market activity outside of a few SPACs(e.g.Arrival,Roadzen,Triller)vs.98 in 2022.However,there were several large acquisitions MosaicML+Databri
307、cks($1.3B),Casetext+Thomson Reuters($650M),and InstaDeep+BioNTech(500M).Data as of 19 Sept 2023#stateofai|115 Introduction|Research|Industry|Politics|Safety|Predictionsstateof.ai 202324%of all corporate VC investments went into AI companies in 2023In 2023,corporates refocused their investments towar
308、ds GenAI.They cut investments into non-AI companies by 50%YoY while keeping AI investments roughly steady($29B in 22 vs.$22B in 23).Data as of 19 Sept 2023#stateofai|116 Introduction|Research|Industry|Politics|Safety|Predictionsstateof.ai 20232023 sees a massive acceleration in GenAI fundingNamed af
309、ter a textbook genre of artificial intelligence,GenAI companies are attracting mountains of capital.Data as of 2 Oct 2023$950M(2010-18 combined)#stateofai|117 Introduction|Research|Industry|Politics|Safety|Predictionsstateof.ai 2023Check out those GenAI round(GPU bills)sizes:$18B invested in 2023 al
310、one!Mega rounds capture the headlines and are driven by“foundation”or“frontier”model companies selling equity dollars to purchase cloud computing capacity to train large-scale systems.This trend might finally see a break:CoreWeave raised a$2.3B debt facility(instead of equity)to buy its GPUs.Data as
311、 of 2 Oct 2023stateof.ai 2023Instead of one such relationship,NVIDIA pursues a multi-pronged land-grab on AI,which includes a)investments into private and public AI-first companies,b)arming specialized GPU cloud providers,and c)adding new industry verticals.#stateofai|118XXX Introduction|Research|In
312、dustry|Politics|Safety|Predictions2022 Prediction:NVIDIA forms a strategic relationship with an AGI organizationSelect investmentsGPU cloud providersIndustry verticalsRecursion(drug discovery)Synthesia(video generation)Cohere(LLMs)Adept(process automation)CoreWeaveLambdaBioNeMo:GenAI cloud service i
313、n drug discovery.Picasso:GenAI cloud service for visual design.Omniverse:digital twins of the world.#stateofai|119 Introduction|Research|Industry|Politics|Safety|PredictionsA handful of corporates were at the center of some of the highest profile AI fundraisesMonster roundup to$4BMega round$1.3BSeri
314、es C$141MSeries C$270MSeries D$235MBeast round$10Bstateof.ai 2023#stateofai|120 Introduction|Research|Industry|Politics|Safety|Predictionsstateof.ai 2023GenAI companies raised 33%larger Seeds and 130%larger As than all startups in 2023Compute and talent isnt coming cheap when the worlds attention is
315、 on you.Data as of 19 Sept 2023Section 3:Politicsstateof.ai 2023#stateofai|121 Introduction|Research|Industry|Politics|Safety|Predictionsstateof.ai 2023After years of speculation about mounting potential divergence in regulatory approaches,were starting to see regulatory approaches stabilise and set
316、tle into a handful of distinct approaches.Have we reached“peak”regulatory divergence?#stateofai|122 Introduction|Research|Industry|Politics|Safety|PredictionsRelying on existing laws and regulationsIntroducing AI-specific legislative frameworksBanning specific services(e.g.ChatGPT)stateof.ai 2023Rep
317、resented by the UK and India,this approach operates on the basis that AI does not currently require any additional legislation.“Light-touch”or“pro-innovation”:scepticism of large-scale regulation#stateofai|123 Introduction|Research|Industry|Politics|Safety|Predictions So far,both the UK and India ha
318、ve stressed the economic and social upside of AI,with the March 2023 white paper and a parliamentary response from Indias digital minister arguing that any current risks could be absorbed by current sectoral regulations and privacy legislation.The UK did,however,include some AI principles(grounded i
319、n similar work from the OECD)for regulators to follow and invested an initial 100M in a taskforce focused on frontier model safety,led by SOAI co-author Ian Hogarth.The team appears to be a world-first,in attempting to built a dedicated unit drawing on industry and academia to assess risk at the fro
320、ntier,The UK also secured a special agreement with Google DeepMind,Anthropic,and OpenAI to gain early access to their most advanced frontier models to improve their understanding of risk.While popular with industry,it is unclear if these approaches will survive.Recently the UK Government dropped“lig
321、ht-touch”from its vocabulary and has repositioned itself as the home of the AI safety debate.The Indian Ministry of Electronics and Information Technology has now said forthcoming legislation may indeed cover some forms of AI harms,alongside web3 and other technology.stateof.ai 2023The EU and China
322、are leading the pack in passing new,AI-specific legislation,with especially stringent measures around foundation modelsWide-ranging legislation#stateofai|124 Introduction|Research|Industry|Politics|Safety|Predictions The EUs AI Act is entering its closing legislative stages in the coming months.The
323、Parliaments current draft has added regulations around foundation models and general purpose AI systems(which are stipulated separately).While the rest of the AI Act tiers requirements based on how high risk a systems intended use is,in the Parliaments draft,all commercial foundation model providers
324、 are subject to special requirements.These include risk assessments,disclosing when content is generated AI,prevention of a model from generating illegal content,and publishing summaries of any copyrighted data used for training.Meanwhile,China brought in specific legislation on recommender systems,
325、alongside generative AI regulations.This updated previous deep synthesis regulation that required AI-generated content to be labelled,protections against misuse,barred anonymous accounts using services,and included censorship requirements.Developers will also have to register their algorithms with t
326、he government and there is a special“security assessment”for any deemed capable of influencing public opinion.China is expected to follow this up with a national AI law later this year-but details have not yet been released.stateof.ai 2023In other markets,were either seeing slimmed down national reg
327、ulation or a preponderance of local laws.While avoiding some of the challenges of major legislation,they also risk pleasing no one.Hybrid models:The best or worst of both worlds?#stateofai|125 Introduction|Research|Industry|Politics|Safety|Predictions The US is unlikely to pass a federal AI law anyt
328、ime soon and in some respects is pursuing a UK-style approach,with an emphasis on voluntary commitments(e.g.the July White House agreement)and research to establish what constitutes good practice(e.g.the National Institute of Standards and Technologys AI Risk Management Framework).Some of these comm
329、itments,for example,involve third party evaluations,but do not specify which third party this would be and the company could theoretically ignore their findings.However,individual US states have been moving to introduce AI laws that vary in strictness.We have mandatory transparency laws around“profi
330、ling”and automated decisions in California,Colorado,Texas,Virginia and others.Meanwhile,New York and Illinois have specific laws around the use of AI in hiring decisions.Canada is attempting a slimmed down version of the EU AI Act,banning certain and applications and regulating others.Instead of an
331、EU-style sliding scale of obligations,Canadas Artificial Intelligence and Data Act only regulates“high-risk”applications.Enforcement will fall to an existing department,rather than a new regulator.This approach,however,has been attacked from both sides-with critics accusing it of both going too far
332、or not far enough.stateof.ai 2023Various global regulators have been floated as models,including the International Atomic Energy Agency,the Intergovernmental Panel on Climate Change,and CERN.These proposals,however,remain confined to academic papers for the moment.State action on global governance i
333、s in its early stages#stateofai|126 Introduction|Research|Industry|Politics|Safety|Predictions The EU and US have announced that they are working on a joint AI code of conduct,which will include non-binding international standards on risk audits and transparency,among other requirements.The G7 will create the Hiroshima AI process,in collaboration with the OECD and Global Partnership on AI,which wi