--- title: "Getting Started with Rmlx" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Getting Started with Rmlx} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} library(Rmlx) knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) # Detect available device device <- mlx_best_device() ``` ## Introduction Apple MLX (Machine Learning eXchange) is Apple’s high-performance array and machine-learning framework for Apple Silicon, built on top of Metal for GPU execution and optimized CPU kernels. It offers lazy evaluation, vectorized math, automatic differentiation, and neural network building blocks (see the [official MLX documentation](https://ml-explore.github.io/mlx/) for full details). `Rmlx` is a thin R layer over MLX that lets you: * Create MLX tensors from R data (`as_mlx()`). * Run GPU-accelerated math, linear algebra, FFTs, and reductions with familiar R syntax. * Use automatic differentiation (`mlx_grad()`, `mlx_value_grad()`) for optimization. * Build simple models with MLX modules and update them using SGD helpers. All heavy computation stays in MLX land; you only copy back to base R when you call functions like `as.matrix()`. ## System Requirements Before using Rmlx, ensure MLX is installed: ```bash # Using Homebrew (if available) brew install mlx # Or build from source git clone https://github.com/ml-explore/mlx.git cd mlx && mkdir build && cd build cmake .. && make && sudo make install ``` ## Creating MLX Arrays Convert R objects to MLX arrays using `as_mlx()`: ```{r} # From a vector v <- as_mlx(1:10) print(v) # From a matrix m <- matrix(1:12, nrow = 3, ncol = 4) x <- as_mlx(m) print(x) ``` Or use `mlx_vector()`, `mlx_matrix()` or `mlx_array()`: ```{r} # like as_mlx(1:10) but more explicit mlx_vector(1:10) # From a matrix mlx_matrix(1:12, 3, 4) ``` > **Precision note:** Numeric inputs use single precision (`float32`) by default. > `float64` arrays are supported on CPU only; request them with > `dtype = "float64", device = "cpu"` or explicitly finish GPU work with > `mlx_cast(x, dtype = "float64", device = "cpu")`. > Logical inputs are stored as MLX `bool` tensors (logical `NA` values are not > supported). Complex inputs are stored as `complex64` (single-precision real > and imaginary parts). ## Lazy Evaluation MLX arrays use lazy evaluation - operations are recorded but not computed until needed: ```{r} x <- mlx_matrix(1:100, 10, 10) y <- mlx_matrix(101:200, 10, 10) # These operations are not computed immediately z <- x + y * 2 # Force evaluation of a specific array mlx_eval(z) # Or convert to R (automatically evaluates) as.matrix(z) # Wait for all queued work on the available device mlx_synchronize(device) ``` ## Arithmetic Operations Rmlx supports standard arithmetic operators: ```{r} x <- as_mlx(matrix(1:12, 3, 4)) y <- as_mlx(matrix(13:24, 3, 4)) # Element-wise operations sum_xy <- x + y diff_xy <- x - y prod_xy <- x * y quot_xy <- x / y pow_xy <- x ^ 2 # Convert back to R to see results as.matrix(sum_xy) ``` ## Matrix Operations ### Matrix Multiplication ```{r} a <- as_mlx(matrix(1:6, 2, 3)) b <- as_mlx(matrix(1:6, 3, 2)) # Matrix multiplication c <- a %*% b as.matrix(c) ``` ### Transpose ```{r} x <- as_mlx(matrix(1:12, 3, 4)) x_t <- t(x) print(x_t) ``` ### Cross Products ```{r} x <- as_mlx(matrix(rnorm(20), 5, 4)) true_w <- as_mlx(matrix(c(2, -1, 0.5, 0.25), 4, 1)) y <- x %*% true_w w <- as_mlx(matrix(0, 4, 1)) # Loss must stay entirely in MLX-land: no conversions back to base R loss <- function(theta, data_x, data_y) { preds <- data_x %*% theta resids <- preds - data_y sum(resids * resids) / length(data_y) } grads <- mlx_grad(loss, w, x, y) # Wrong: converting to base R breaks the gradient bad_loss <- function(theta, data_x, data_y) { preds <- as.matrix(data_x %*% theta) # leaves MLX resids <- preds - as.matrix(data_y) sum(resids * resids) / nrow(resids) } try(mlx_grad(bad_loss, w, x, y)) # A small SGD loop using the module/optimizer helpers model <- mlx_linear(4, 1, bias = FALSE) # learns a single weight vector parameters <- mlx_parameters(model) opt <- mlx_optimizer_sgd(parameters, lr = 0.1) loss_fn <- function(mod, data_x, data_y) { theta <- mlx_param_values(parameters)[[1]] loss(theta, data_x, data_y) } loss_history <- numeric(50) for (step in seq_along(loss_history)) { step_res <- mlx_train_step(model, loss_fn, opt, x, y) loss_history[step] <- as.vector(step_res$loss) } # Check final loss and inspect learned parameters final_loss <- mlx_forward(model, x) residual_mse <- as.vector(mean((final_loss - y) * (final_loss - y))) residual_mse loss_history learned_w <- mlx_param_values(parameters)[[1]] as.matrix(learned_w) as.matrix(true_w) ``` ## Reductions Compute summaries across arrays: ```{r} x <- as_mlx(matrix(1:100, 10, 10)) # Overall reductions sum(x) mean(x) # Column and row means colMeans(x) rowMeans(x) # Convert to R to see values as.matrix(colMeans(x)) # Cumulative operations flatten the array in column-major order as.vector(cumsum(x)) ``` ## Indexing Subset MLX arrays similar to R: ```{r} x <- mlx_matrix(1:100, 10, 10) # Select rows and columns x_sub <- x[1:5, 1:5] # Select specific row row_1 <- x[1, ] # Select specific column col_1 <- x[, 1] ``` ## Device Management Control whether computations run on GPU or CPU: ```{r} # Check default device mlx_device() a <- b <- mlx_matrix(1:4, 2, 2) # Run an operation on the CPU mlx_device("cpu") a %*% b # Set back to best available device mlx_device(device) ``` Numeric computations default to `float32` for GPU compatibility. Use CPU `float64` when you need double precision, and cast back to `float32` before moving results to the GPU. ## Performance Comparison Here's a simple timing comparison for large matrix multiplication: ```{r} n <- 1000 # R base m1 <- matrix(rnorm(n * n), n, n) m2 <- matrix(rnorm(n * n), n, n) t1 <- system.time(r_result <- m1 %*% m2) # MLX x1 <- as_mlx(m1) x2 <- as_mlx(m2) mlx_eval(x1) mlx_eval(x2) t2 <- system.time( as.matrix(x1 %*% x2) ) cat("Base R:", t1["elapsed"], "seconds\n") cat("MLX:", t2["elapsed"], "seconds\n") ``` For more information see the Benchmarks vignette.