Weka platform

Unleash Your Data for HPC & AI with the WEKA Data Platform

Join us for the Big Geodata Talk on Weka Platform!

The WEKA Data Platform (WEKA) is a high performance, software-defined, parallel file system that delivers unsurpassed performance density (throughput, IOPS and metadata) to unleash data for your HPC & AI use-cases. WEKA delivers more performance per TB, scales to multiple exabytes in a single namespace, requires zero performance tuning and is deployable on-prem, public cloud or hybrid. This talk will provide an introduction to WEKA, how the WEKA Data Platform works and how it's deployed in a production environment.

Date

24 April 2024, 13:00-14:00 CEST

Venue

Online

Speaker

Derek Burke
Data Storage Specialist, WEKA

Derek Burke has over twenty years’ experience in scientific computing with the past sixteen years focused on high-performance data storage. Burke’s career began at General Electric and Silicon Graphics in the late 90s to mid-2000s. Since then, he has held senior commercial positions in data storage companies such as Panasas, Seagate, Pure Storage and, for the past 5 years, with WEKA. Burke is focused on bringing innovative data storage technologies to HPC & AI users. Whilst at Seagate, he briefly held the ‘VP for Industry’ position for the ETP4HPC, a public/private partnership advising the European Commission on its high performance computing strategy.

Video

Presentation

Questions and Answers

  • Are there any specific algorithms or machine learning models that are particularly well-suited for analyzing geospatial data within the WEKA Data Platform?

    Our customers are using a fairly broad gamut of machine learning processing frameworks. At its core, WEKA is a POSIX compliant storage system and we've got a POSIX compliant client interface. So, you do not need to change any of your machine learning processing frameworks to use WEKA. Your favorite and processing framework from PyTorch to Caffe, they work out-of-the-box with WEKA, and they work extremely well.  

    A lot of our customers are running very large models. They start off with models on a single GPU and then they go to multiple GPUs in a single client, and then they start running very large models across multiple GPUs across multiple clients. WEKA works very well regardless of the model size and how many GPUs that they're running across in parallel.  WEKA provides good performance with lots of concurrent processes reading and writing into the same file. In summary, don't worry as long as the Linux applications. They will be POSIX compliant, and they will be able to use WEKA.

  • It seems that WEKA encourage and support movement of execution closer to the data to reduce the data copy. However, some of on-premise infrastructure already divide the storage server with processing cluster. How WEKA handle this situation efficiently?

    Most of our work is done with applications that are running in parallel leveraging highly distributed data. I don't know what particular file formats you're looking at, or particular distributed I/O tools you're thinking about.  We've not come across any common tools in an HPC environment that's caused any issues. If you've got applications where you're particularly concerned, it would be interesting if you could share the file format and any details with me. We run it and test it for you, but I would be very surprised if it poses a problem for WEKA. 

    With regards to latency, I mentioned that we are reading and writing data over the network at sub millisecond. The time it takes to pass data over a high performance network is a tiny fraction of the time it takes to do a read and write operation on an SSD. Data transfer over the network is completely negligible. We don't really worry about data locality in the same way that we did when people were using 10 GB or 1 GB networks. Transferring data over 100 GB network or above is so quick that data locality is no longer an issue. If you're using something like WEKA, you know that's got a completely optimized network stack.

  • The examples you gave were quite high-end infrastructure and large volumes of storage. But it might be possible that parties who have small size infrastructure will be also interested to get the best performance out of the their existing infrastructure. Is there a kind of minimum data volume after which WEKA becomes a feasible solution?

    That's a really good question because we tend to show off a little bit with our largest customers, but most of our customers are significantly smaller than that. We've got customers that that are starting as small as 50 to 100 TB of capacity. I would say, if you're going to invest in a network attached storage system, then that's probably a good starting point for WEKA. Because you've got to invest in the networking, you may already have enough ports, but you've got to invest in NVME devices, servers and networking to build a cluster. So it only really makes sense once you've got 50-100 TB and above. Also suitable for small scale enterprises or like academic institutions. 

  • My understanding is that the technology is optimized for NVMe, but can it also work nicely if somebody would be interested to test it with spinning drives?

    Potentially it would work, but you're not really going to get the benefits of WEKA because we've built and designed our software stack for NVMe. If you wanted to use SAS flash drives for example, then it would probably work. But it's not something that's going to get the highest levels of performance, because our software stacks are designed for NVMe and similarly with spinning disc, you know you're not going to get the levels of performance. We just wouldn't recommend doing so. 

    There's certain workloads that HDDs is still good for, but what we see is that most modern workloads require either an all flash system or certainly a primary flash storage, where the hot data resides these days. Though a spinning disc is getting more and more capacity on a single drive, the performance isn't necessarily going up. So through a file system, you're only going to get, let's say, 150 MBPS from a spinning disc. And you could get up to 5 GBPS, or more with PCIe Gen 5 on a on an NVMe device. Per GB or per TB of data, spinning disc is getting slower and slower over time. So for that high performance layer to your applications, it's pretty hard to ignore having enough flash tier these days.

  • In your slides, you briefly mentioned the number of cores per disk requirements, but didn’t mention any memory requirements for the nodes. Do you have a kind of memory caching mechanism that requires a certain amount of memory?

    Our POSIX client running on your computer would require a few megabytes of memory. I can't remember the exact figure. On the storage server side, we're probably looking at anywhere from 256 – 512 GB of RAM. The different services we're using to run the storage cluster, so we look for a fairly high chunk. 

  • Is there a way to test WEKA hands on?

    We can do a live demo where you'll get a better flavor of how WEKA works, how it's administered and how it's used day-to-day. We can give people access to a demo lab where you can use some of our clients and WEKA storage cluster to do some testing. We can also provide demo licenses if you want to do some on premise testing. And finally we have the option to build a WEKA cluster in the public cloud or we can build it in your cloud environment and you can you can play around with it there. So there are a lot of different options on how to test working. 

  • Can you provide information about how your business model works?

    The license is subscription based and it's licensed per usable TB. So typically our on premise customers would let's say acquire a WEKA storage system with 500 TB of capacity. Then you know you would buy a WEKA license for 500 TB and you would buy it generally for the same length of time as your hardware support. So it's subscription based for the period of time that you want to use it based on usable terabytes.