Monday, July 5, 2021

GPT-J for text generation: hardware requirements

Since the release of GPT-J, I worked hard in order to add it to NLPCloud.io for text generation.

This is done now and the infrastructure is stabilized but that was tricky. So I thought I would share here my key takeaways, in case it can help some of you:

- On CPU, the model needs around 40GB of memory to load, and then around 20GB during runtime.

- On CPU, a standard text generation (around 50 words) takes approximately 12 CPUs for 11 seconds

- On a GPU, the model needs around 40GB of memory to load, and then around 3GB during runtime + 24GB of GPU memory. For a standard text generation (around 50 words), the latency is around 1.5 secs

The 2 main challenges are the high amount of RAM needed for startup, and then high amount of GPU memory needed during runtime which is quite impractical as most affordable NVIDIA GPUs dedicated to inference, like Tesla T4, only have 16GB of memory...

It's very interesting to note that, during my tests, the latency was pretty much the same as GPT-Neo 2.7B on the same hardware, but accuracy seems of course much better.

If some of you also ran these kinds of benchmarks on GPT-J I'd love to see if we're aligned or not!


Comments URL: https://news.ycombinator.com/item?id=27740921

Points: 2

# Comments: 0



from Hacker News: Newest https://ift.tt/2TDXzZJ
via IFTTT

No comments:

Post a Comment

Trump: One Year Later

Article URL: https://www.nytimes.com/2026/01/18/opinion/trump-one-year-later.html Comments URL: https://news.ycombinator.com/item?id=466712...