The SteadBlog

A mixture of Computer Science and Software engineering related writings, study notes, book reviews and general ramblings. Opinions are my own - enjoy!

A solution and, importantly, a proof for LeetCode Problem 11 - Container with Most Water.

Read more…

PySpark seemingly allows Python code to run on Apache Spark - a JVM based computing framework. How is this possible? I recently needed to answer this question and although the PySpark API itself is well documented, there is little in-depth information on its implementation. This article contains my findings from diving into the Spark source code to find out what’s really going. Spark vs PySpark For the purposes of this article, Spark refers to the Spark JVM implementation as a whole.

Read more…

See the first post in The Pragmatic Programmer 20th Anniversary Edition series for an introduction. The first two challenges recommend some (excellent) books to the reader, however do not provide a specific challenge for me to write about here. I shall, therefore, begin with the third challenge. Challenge 3 In the first exercise that follows we look at sorting arrays of long integers. What is the impact if the keys are more complex, and the overhead of key comparison is high?

Read more…

See the first post in The Pragmatic Programmer 20th Anniversary Edition series for an introduction. Exercise 25 A data feed from a vendor gives you an array of tuples representing key-value pairs. The key of DepositAccount will hold a string of the account number in the corresponding value: [ ... {:DepositAccount, "564-904-143-00"} ... ] It worked perfectly in test on the 4-core developer laptops and on the 12-core build machine, but on the production servers running in containers, you keep getting the wrong account numbers.

Read more…

See the first post in The Pragmatic Programmer 20th Anniversary Edition series for an introduction. Exercise 24 Would a blackboard-style system be appropriate for the following applications? Why, or why not? Image processing. You’d like to have a number of parallel processes grab chunks of an image, process them, and put the completed chunk back. Group calendaring. You’ve got people scattered across the globe, in different time zones, and speaking different languages, trying to schedule a meeting.

Read more…