This content originally appeared on Level Up Coding - Medium and was authored by Kevin Berlemont, PhD
Data science is an ever-evolving field that requires professionals to stay up-to-date with the latest tools and techniques. Julia is an increasingly popular programming language that is well-suited for data science and numerical computing.
- Performance: Julia is designed to be fast and efficient, especially for numerical computing. It is generally faster than languages like R and Python, which makes it well-suited for data-intensive tasks.
- Syntax: Julia has a syntax that is similar to languages like Python and Matlab, which makes it easy for users familiar with these languages to learn Julia.
- Good for machine learning: Julia has a number of packages and libraries available for machine learning, including support for deep learning.
- Interoperability: Julia can easily call C and Fortran libraries, which makes it easy to use existing code and libraries in Julia.
In this blog post, I will provide a list of basic and useful functions focused on the three main parts of Data Science: data manipulation, data visualization and MLOps.
Data Manipulation Cheat Sheet
- Load data: Use the pipe operator to load data from a CSV file into a Julia dataframe.
# Install the CSV and DataFrames package
using Pkg
Pkg.add("CSV")
Pkg.add("DataFrames")
using CSV, DataFrames
df = CSV.File("file.csv") |> DataFrame
- Explore data: Use the size and names functions to get the dimensions and column names of a dataframe. Use the first and last functions to view the first and last few rows of a dataframe.
# Get dimensions
println("Number of rows: $(size(df)[1])")
println("Number of columns: $(size(df)[2])")
# Get column names
println("Column names: $(names(df))")
# View first five rows
first(df,5)
# View last five rows
last(df,5)
- Select columns: Use the select function to select a subset of columns from a dataframe.
# Select a single column
df_col1 = select(df, :col1)
# Or it's equivalent with []
df_col1 = df[!,:col1]
# Select multiple columns
df_col1_col2 = select(df, [:col1, :col2])
- Filter rows: Use the filter function to select rows from a dataframe based on a condition.
# Select rows where col1 is greater than 0
df_filtered = filter(row -> row.col1 > 0, df)
- Sort data: Use the sort function to sort a dataframe by one or more columns.
# Sort by col1 in ascending order
df_sorted = sort(df, :col1)
# Sort by col1 in descending order
df_sorted = sort(df, :col1, rev=true)
# Sort by col1 in ascending order, then by col2 in descending order
df_sorted = sort(df, [:col1, :col2], rev=[false, true])
- Group data: Use the groupby and combine functions to group a dataframe by one or more columns and apply a function to each group.
# Group by col1 and compute the mean of col2 for each group
df_grouped = combine(groupby(df, :col1), :col2 => mean)
Data Visualization Cheat Sheet
- Install the “Plots” package:
using Pkg
Pkg.add("Plots")
- Scatter plot: Use the scatter function to create a scatter plot.
using Plots
scatter(df.col1, df.col2)
- Line plot: Use the plot function to create a line plot.
plot(df.col1, df.col2)
- Bar plot: Use the bar function to create a bar plot.
bar(df.col1, df.col2)
- Histogram: Use the histogram function to create a histogram.
histogram(df.col1, nbins=20)
- Box plot: Use the boxplot function to create a box plot.
boxplot(df.col1, df.col2)
- Customize plots: Use the “title”, “xlabel”, “ylabel”, “xlim”, “ylim”, and other functions to customize the appearance of plots.
scatter(df.col1, df.col2, title="Scatter Plot", xlabel="X", ylabel="Y")
xlims!((0, 10))
ylims!((0, 10))
- Save plots: Use the savefig function to save plots to a file.
savefig("plot.png")
This is just a small sampling of what you can do with Julia for data visualization. There are many other functions and customization options available in the “Plots” package, and you can also use other packages such as “Gadfly” and “Plotly” for additional visualization capabilities.
MLOps Cheat Sheet
- Install the “MLJ” package (developed by the Alan Turing Institute):
using Pkg
Pkg.add("MLJ")
- Split data: Use the partition function from the "MLJ" package to split the data into a training set and a test set.
using MLJ
(Xtrain, Xtest), (ytrain, ytest) = partition((df[:, 1:end-1], df[:, end]), 0.8)
- Preprocess data: Use the “Standardizer" function from the "MLJ" package to standardize the data.
stand = Standardizer()
mach = machine(stand,y)
fit!(mach)
z = transform(mach,y)
- Train model: Use “DecisionTree” package to train a decision tree model.
using DecisionTree
model = DecisionTreeClassifier()
mach = machine(model, train_X, train_Y)
fit!(mach)
- Make predictions: Use the predict function to make predictions on the test set.
predictions = predict(mach, test_X)
- Evaluate model: Use the evaluate function.
# This line evaluates the model and the data directly
evaluate(tree, X, y,
resampling=Holdout(fraction_train=0.7, shuffle=true, rng=1234),
measure=[LogLoss(), Accuracy()])
# If a machine has been defined similarly to above
evaluate!(mach,
resampling=Holdout(fraction_train=0.7, shuffle=true, rng=1234),
measure=[LogLoss(), Accuracy()])
- Save model: Use the jldsave function from the "JLD2" package to save the trained model to a file.
using JLD2
jldsave("model.jld2", model)
- Load model: Use the load function from the "JLD2" package to load the saved model from a file.
model = load("model.jld2")
Conclusion
With its high performance, easy-to-learn syntax, and powerful libraries, Julia is a great language for data science. The cheat sheets above are a good summary of useful functions and packages to get you started.
References
- https://alan-turing-institute.github.io/MLJ.jl/dev/
- https://dataframes.juliadata.org/stable/
- https://docs.juliaplots.org/stable/
Level Up Coding
Thanks for being a part of our community! Before you go:
- 👏 Clap for the story and follow the author 👉
- 📰 View more content in the Level Up Coding publication
- 🔔 Follow us: Twitter | LinkedIn | Newsletter
🚀👉 Join the Level Up talent collective and find an amazing job
Julia for Data Science Cheat Sheet was originally published in Level Up Coding on Medium, where people are continuing the conversation by highlighting and responding to this story.
This content originally appeared on Level Up Coding - Medium and was authored by Kevin Berlemont, PhD

Kevin Berlemont, PhD | Sciencx (2023-01-17T12:19:58+00:00) Julia for Data Science Cheat Sheet. Retrieved from https://www.scien.cx/2023/01/17/julia-for-data-science-cheat-sheet/
Please log in to upload a file.
There are no updates yet.
Click the Upload button above to add an update.