• 1 Post
  • 24 Comments
Joined 1 year ago
cake
Cake day: January 3rd, 2024

help-circle
rss
  • I’d actually argue Python stops people learning how to solve problems.

    I love teaching juniors and have done so for 10 years but I’ve noticed in the last 4-5 years since Python became the popular choice at universities Graduates aren’t learning anything about Static Types, Memory Management, Object Oriented Programming, Data Encapsulation, Composition, Service Oriented Architecture, etc…

    I used to expect most graduates to have a mixed grounding in those concepts and would find excuses for them to work on a small UI projects. I would do this as it gets them used to solving a small problem and UI’s give instant feedback. As Python became dominate university teaching language the graduates aren’t spending their time learning Typescript, Angular, HTML, etc… but instead getting overwhelmed by the concept of types.

    Those concepts I want them to learn were created to help make solving problems easier and each has their strengths and weaknesses but most graduates are coming through only knowing how to lay out a small amount of procedural logic using Python and really struggling to move beyond that.


  • QT is a cross platform UI development framework, its goal is to look native to the platform it operates on. This video by a linux maintainer from 2014 explains its benefits over GTK, its a fun video and I don’t think the issues have really changed.

    Most GTK advocates will argue QT is developed by Trolltech and isn’t GPL licensed so could go closed source! This argument seems to ignore open source projects use the Open Source releases of QT and if Trolltech did close source then the last open source would be maintained (much like GTK).

    Personally I would avoid Flutter on the grounds its a Google owned library and Google have the attention span of a toddler.

    Not helping that assessment is Google let go of the Fuschia team (which Flutter was being developed for) and seems to have let go a lot of Flutter developers.

    Personally I hate web frontends as local applications. They integrate poorly on the desktop and often the JS engine has weird memory leaks


  • It was a mixture of factors.

    Data was to be dumped into a S3 bucket (minio), this created an event and anouther team had built an orchestrator which would do a couple of things but eventually supply an endpoint a reference to a plain/txt file for analysis.

    For the Java devs they had to [modify the example camel docs.](https://camel.apache.org/manual/rest-dsl.html) and use the built in jackson library to convert the incoming object to a class. This used the default AWS S3 api to create a stream handle which fed into the OpenNLP docs. .

    The Python project first hit a wall in setting up Flask. They followed the instructions and it didn’t work from setup tools. The Java team had just created a new maven project from the Intelij but the same approach didn’t work for the Python team using pycharm. It lost them a couple days, I helped them overcome it.

    Then they hit a wall with Boto3, the team expected to stream data but Boto3 only supports downloading, there was also a complexity issue the AWS SDK in Java waa about 20 lines to setup and a single line to call, it was about 50 lines in Python. On the positive side I got to explain what all the config meant in S3.

    This caused the team anouther few days of delay because the team knew I used a 350MiB Samsung TV guide to test the robustness and had to go learn about Docker volume mounting and they thought they needed a stateful kubernetes service and I had to explain why that was wrong.

    Basically Python threw up a lot of additional complexity and the docs weren’t as helpful as they could have been.



  • You do, but considering the scales they process data I suspect Google would be better building Go tooling (or whatever the dominate internal language is).

    A few years back I was trying to teach some graduates the importance of looking at a programming language ecosystem and selecting it based on that.

    One of my comparison projects was Apache OpenNLP/Camel vs Flask/Spacy.

    Spacy is the go to for NLP, I expected it to be either quicker to develop, easier to use, better results or just less resources.

    I assigned Grads with Java experience Spacy and Python experience OpenNLP.

    The OpenNLP guys were done first, they raved about being able to stream data into the model and how much simplier it made life.

    When compared with the same corpus (Books, Team emails, corporate sharepoint, dev docs, etc…) OpenNLP would complete on 4GiB of RAM in less than a second on 0.5vCPU. Spacy needed 12GiB and was taking ~2 seconds with 2vCPU. They identified the same results…

    Me and a few others ended up spending a day reading the python and trying to optimise it, clearly the juniors had done something daft, they had not.

    It rather undermined my point.


  • My expectation is whatever the solution it needs to dockerise and be really easy to deploy via docker compose or Kubernetes so people can quickly and easily set up their own.

    The front end is effectively static files so I would probably choose Apache or Express (whichever gives me a smaller docker image)…

    For the backend I would choose Java for Spring Boot. An Alpine image with OpenJDK and the app is tiny. Spring has a library for every kind of interface making them trivial to implement but the main reason is hibernate.

    Hibernate (now Spring Data) was the first library for being able to switch out databases without having to change code (its all config). A lot of mastodon instances struggle with the resource requirements of elastic search so letting small instances use something like postgres would seem ideal.

    I have noticed Go/Rust still expect you to write or manage a lot of stuff Spring gives away for free. Python is ok if your backend is really tiny but there is a lot of boilerplate in how Python libraries work so complex projects get hard to manage and I assume interacting with the fediverse will add complexity.



  • Immutable distributions won’t solve the problem.

    You have 3 types of testing unit (descrete part of code), integration (how a software piece works with others) and system testing (e.g. the software running in its environment). Modern software development has build chains to simplify testing all 3 levels.

    Debian’s change freeze effectively puts a known state of software through system testing. The downside its effecitvely ‘free play’ testing of the software so it requires a big pool of users and a lot of time to be effective. This means software in debian can use releases up to 3 years old.

    Something like Fedora relies on the test packs built into the open source software, the issue here is testing in open source world is really variable in quality. So somethinng like Fedora can pull down broken code that passes its tests and compiles.

    The immutable concept is about testing a core set of utilities so you can run the containers of software on top. You haven’t stopped the code in the containers being released with bugs or breaking changes you’ve just given yourself a means to back out of it. It’s a band aid to the actual problem.

    The solution is to look at core parts of the software stack and look to improve the test infrastructure, phoronix manages to run the latest Kernel’s on various types of hardware for benchmarking, why hasn’t the Linux foundation set up a computing hall to compile and run system level testing for staged changes?

    Similarly website’s are largely developed with all 3 levels of testing, using things like Jest/Mocha/etc… for Unit/Integration testing and Robots/Cypress/Selenium/Storybook/etc… for system testing. While GTK and KDE apps all have unit/integration tests where are the system level test frameworks?

    All this is kinda boring while ‘containers!’ is exciting new technology




  • The developer behind KBin seems to have issues delegating/accepting contributors.

    If you look at the pull requests, most have been unreviewed for months and he tends to regularly push his branches once complete and just merge them in.

    That behaviour drove the MBin fork, where 4-5 people were really keen to contribute but were frustrated.

    To some extent that would be ok, its his project and if he doesn’t want to encourage contributions that is his decision but…

    KBin.social has gotten to the size where it really should have multiple admins (or a paid full time person). Which it doesn’t have.

    The developer has also told us he has gone through a divorce, moved into his own place, gotten a full time job and now had surgery.

    Thats a lot for any normal person and he is going through that while trying to wear 2 hats (dev & ops) each of which would consume most of your free time.

    Personally I moved to kbin.run which is run by one of the MBin devs


  • Firstly it was just a bit of fun but from memory…

    Twitter was listed as having 2 data centers and a couple dozen satellite offices.

    I forgot the data center estimate, but most of those satelites were tiny. Google gave me the floor area for a couple and they were for 20-60 people (assuming a desk consumes 6m2 and dividing the office area by that).

    Assuming an IT department of 20 for such an office is rediculous but I was trying to overestimate.


  • The Silicon Valley companies massively over hired.

    Using twitter as an example, they used to publicly disclose every site and their entire tech stack.

    I have to write proposals and estimates and when Elon decided to axe half the company of 8000 I was curious…

    I assigned the biggest functional team I could (e.g. just create units of 10 and plan for 2 teams to compete on everything). I assumed a full 20 person IT department at every site, etc… Then I added 20% to my total and then 20% again for management.

    I came up with an organisation of ~1200, Twitter was at 8000.

    I had excluded content moderators and ad sellers because I had no experience in estimating that but it gives a idea of the problem.

    I think the idea was to deny competition people but in reality that kind of staff bloat will hurt the big companies


  • Docker swarm was an idea worse than kubernetes, that came out after kubernetes, that isn’t really supported by anyone.

    Kubernetes has the concept of a storage layer, you create a volume and can then mount the volume into the docker image. The volume is then accessible to the docker image regardless of where it is running.

    There is also a difference between a volume for a deployment and a statefulset, since one is supposed to hold the application state and one is supposed to be transient.






  • I actually researched my list, most the technologies were used internally for years and either publically released after better public alternatives had been adopted or it seems buzz reached me years after Google’s first release. So I am wrong.

    Between 2012-2015 I used to consult on Apache Ivy projects (ideally moving them to Maven and purging the insanity people had written). As a result I would get called in when projects had dependency issues.

    The biggest culprits were Guava/GSon, projects would often choose to use them (because Google) and then would discover a bug that had been fixed in a later patch release (e.g. they used 2.2.1 and 2.2.2 had the fix). However the reason they used 2.2.1 was because a library they needed did. Bumping up the version usually caused things to break.

    The standard solution was to ask’why’ they needed Guava/GSon and everytime you would find they are usually some function found in one of the Apache Commons libraries. So I would pull down the commons library rewrite the bit (often they worked identically)

    Fun side note in 2016-2017 I got called to consult on a lot of Gradle projects to fix the same kind of convoluted bespoke things people did with Apache Ivy. Ivy knew the Gradle ‘feautres’ were a massive headache in 2012 and told you to use Maven for those reasons. Ce La vie.

    We tried using Protobuf in 2008 and it was worse than the Apache Axis for JSON conversion (which feels too harsh to say), similarly I had been using AMQP or Kafka for years and tried gRPC when it was released (google say 2016 but I am sure we tried in 2014) and it was worse in every metric I still don’t understand why it exists.

    I was using Vaadin in 2011 and honestly thought GWT was released in 2012. I had to use it in 2014 and the workflow, compile time and look of GWT is just worse than Vaadin.


  • The FAANG companies have an internal kind of elitisim that would make staff less effective.

    If you look at any Google Java library, GWT, GSon, Guava, Gradle, Protobuf, etc… there was a commonly used open source library that existed years before that covered 90% of the functionality.

    The Google staff just don’t think to look outside Google (after if Google hasn’t solved it no chance outsiders have) and so wrote something entirely from scratch.

    Then normally within 6 months the open source library has added the killer new feature. The Google library only persists because people hold FAANG as great “Its by Google so it must be good!” Yet it normally has serious issues/limitations.

    The Google libraries that actually suceeded weren’t owned by Google (E.g. Yahoo wrote Hadoop, Kubernetes got spun away from Google control, etc…).


  • I wouldn’t use “certified” in this context.

    Limiting support of software to specific software configurations makes sense.

    Its stuff like Debian might be using Python 3.8 Ubuntu Python 3.9, OpenSuse Python 3.9, etc… Your application might use a Python 3.9 requiring library and act odd on 3.8 but fine on 3.7, etc… so only supporting X distributions let you make the test/QA process sane.

    This is also why Docker/Flatpack exist since you can define all of this.

    However the normal mix is RHEL/Suse/Ubuntu because those target businesses and your target market will most likely be running one.