360SOFTY

Insights

Engineering Insights

Practical writing on software architecture, SaaS products, AI automation, legacy modernisation, and the business of building reliable systems.

RSS

Curated links from external sources — not 360Softy original articles.

ExternalAI
NVIDIA Technical Blog

CUDA 13.2 Introduces Enhanced CUDA Tile Support and New Python Features

CUDA 13.2 arrives with a major update: NVIDIA CUDA Tile is now supported on devices of compute capability 8.X architectures (NVIDIA Ampere and NVIDIA Ada), as... CUDA 13.2 arrives with a major update: NVIDIA CUDA Tile is now supported on devices of compute capability 8.X architectures (NVIDIA Ampere and NVIDIA Ada), as well as 10.X, 11.X and 12.X architectures (NVIDIA Blackwell). In an upcoming release of the CUDA Toolkit, all GPU architectures starting with Ampere will be fully supported. If yo

NVIDIA Technical BlogRead original
ExternalAI
NVIDIA Technical Blog

Implementing Falcon-H1 Hybrid Architecture in NVIDIA Megatron Core

In the rapidly evolving landscape of large language model (LLM) development, NVIDIA Megatron Core has emerged as the foundational framework for training massive... In the rapidly evolving landscape of large language model (LLM) development, NVIDIA Megatron Core has emerged as the foundational framework for training massive transformer models at scale. The open source library offers industry-leading parallelism and GPU-optimized performance. Now developed GitHub-first in the NVIDIA/Megatron-LM re

NVIDIA Technical BlogRead original
ExternalDevOps
Kubernetes Blog

Announcing the AI Gateway Working Group

The community around Kubernetes includes a number of Special Interest Groups (SIGs) and Working Groups (WGs) facilitating discussions on important topics between interested contributors. Today, we're excited to announce the formation of the AI Gateway Working Group, a new initiative focused on developing standards and best practices for networking infrastructure that supports AI workloads in Kubernetes environments. What is an AI Gateway? In a Kubernetes context, an AI Gateway refers to network

Kubernetes BlogRead original
ExternalAI
NVIDIA Technical Blog

Enhancing Distributed Inference Performance with the NVIDIA Inference Transfer Library

Deploying large language models (LLMs) requires large-scale distributed inference, which spreads model computation and request handling across many GPUs and... Deploying large language models (LLMs) requires large-scale distributed inference, which spreads model computation and request handling across many GPUs and nodes to scale to more users while reducing latency. Distributed inference frameworks use techniques such as disaggregated serving, KV cache loading, and wide expert parallelism. In d

NVIDIA Technical BlogRead original
ExternalAI
NVIDIA Technical Blog

Removing the Guesswork from Disaggregated Serving

Deploying and optimizing large language models (LLMs) for high-performance, cost-effective serving can be an overwhelming engineering problem. The ideal... Deploying and optimizing large language models (LLMs) for high-performance, cost-effective serving can be an overwhelming engineering problem. The ideal configuration for any given workload (such as hardware, parallelism, and prefill/decode split) resides in a massive, multi-dimensional search space that is impossible to explore manually or t

NVIDIA Technical BlogRead original
External
Smashing Magazine

Persuasive Design: Ten Years Later

Many product teams still lean on usability improvements and isolated behavioral tweaks to address weak activation, drop-offs, and low retention – only to see results plateau or slip into shallow gamification. Anders Toxboe updates persuasive design for today’s reality, clarifying what has actually held up over the last decade.

Smashing MagazineRead original

Work with 360Softy

Building a SaaS product, AI system, or business platform?

Book a free consultation and we will tell you honestly whether we can help.