Our virtual user conference SaltConf21 will be November 3-4! Call for Speakers will be open June 14 - July 26.

Open Hour 2021-MAR-11

Agenda

  • General Updates
  • Release Updates
  • Package Updates
  • Salt Enhancement Proposals
  • Working Groups and Clinics
  • Retrospective
  • YouTube Video

General Updates

Salt Air relaunched!

Events

CANCELED Virtual Meetup on March 17, Wed. 5pm, MDT

  • Randy Thompson with GoDaddy will be presenting
  • The build pipeline for Tiamat build salt binaries, states, etc. as a walled garden deploy setup for teams

Misc. Salt Project Links

Release Updates

  • Aluminium Release: March 31st
  • Code freeze was: Feb. 25th
  • Move RC1 is available – See the Package Post
  • Next week review RCA or report from CVE release retrospective

Package Updates

Salt Enhancement Proposals

Open SEPs: https://github.com/saltstack/salt-enhancement-proposals/pulls

  • Many in the final stage
  • The Change Advisory Board SEP is still in-progress

We are working on a communications plan to be more transparent, and scheduled.

Working Groups

  • Test needs a Captain and meeting day/time – it has some objectives and goals; Project Board
  • Release meets the 2nd Wednesday each month Zoom Meeting Link; Captain: Pablo Suarez Hernandez; Project Board; Slack Channel: #release-process in Community Slack
  • Cloud – see the calendar links
  • Docs – see the calendar links
  • Security – New! needs a captain meeting day/tim
  • Formulas – see calendar links

Retrospective

CVE Release issue on GitHub: https://github.com/saltstack/salt/issues/59631

SECURITY.md file: https://github.com/saltstack/salt/blob/master/SECURITY.md

What VMware can assist us with:

  • Access to security resources in VMware, allowing us to implement better security auditing/feedback
  • VMware is also certified to assist with the CVSS scoring and use of CVE id’s

Start: Action to change and why

  • Update the CVE reporting process to make it more seamless
  • Introducing preventative measures with baseline bandit / static code analysis
  • Have more collaborative opportunities between working groups (security, testing, release, formulas)
  • Many overlaps could result in improvements to our CVE process, release process, and test suite
  • Static Code analysis – Bandit
  • Updates to Pull Request review process

Stop: What isn’t going well?

  • Patching for previous, unsupported versions: this took a lot of developer time, and is the last time we are releasing patches for unsupported versions of Salt
  • We should have cancelled the Open Hour upon discovering last minute problems with the CVE release and communicated the delay earlier

Continue: What is going well?

  • Holding Retrospectives
  • Having an asynchronous way of getting feedback via issues in the salt repo using label:retro
  • Current Open Issue https://github.com/saltstack/salt/issues/59631
  • Early communication around the upcoming release, and testing/early access to CVE updates
  • This allowed us to catch serious issues that delayed the initial release
  • Fixing the CVEs: we are fixing them when they are discovered!

General Questions and Discussion during the retrospective

Q: have you considered whether you should/could be looking at issues around those CVE, proactively?In-meeting response: We are doing this, in fact of the 10 CVEs recently released 4 were found by Salt engineers. One report was around not validating a cert and we went and looked for this throughout the code base and fixed it in more places than reported. We decided since this last September to put in place a static code analysis tool Bandit and will look to continual improve its use. It will not catch everything. We also will need to improve our Pull Request process to ask the security related questions when reviewing. We went several years not paying attention to failing tests. This stopped about 2 years ago and we are still looking to keep tests green at releases as well and improving the test suite. We did that with pytest and unfortunately that has set us back a little, but we are still looking to improve the test suite and fix flaky tests. We also broke up the Test and Release Working Group this week into 2 groups to help focus on testing overall and not just testing at releasing. This group needs a community captain and will have Core Team involvement. There will also be overlap and group of groups (like scrum of scrums) to fill gaps and ensure the groups have visibility across groups, know what they are working on, and how they can work together to achieve better results.

Expanded response: The Test and Release Working Group is split into two groups to focus on 2 very broad topics. There will be overlap and we will need to coordinate between them. The second week of March, officially the group was split into 2 and although goals and objectives remain from the combined group, the Testing Working Group needs some attention. The Core Team will be involved here and we plan to set up specific tasks. You can see the Project Board in the community repo. This group needs a community captain and a day/time will be established for meetings soon.
Q: what happened with a delay previously?Response: (This was more than a year ago and not this latest CVE release). The packages get built ahead of time, and go through testing. When we published those packages the known timeframe is only a few minutes, during publishing we did not know until we started the process that it was going to take longer than expected and the typical time to publish. The issue occurred during rsync instead of only uploading new packages, every package was syncing and thus was taking much longer than a typical timeframe to publish new packages. It wasn’t until the time was taking much longer than expected that we realized what was happening.
Q: What was the last minute delay this release?In-meeting response: In testing, the morning of the release, one merged wasn’t done correctly and we had to rebuild. Also, to note we packaged 3 versions and build patch files for about 12 different versions. (follow-up Q: will that continue) No, no, absolutely not, we do not have enough resources and mistakes often are made. We will follow our Platform Support Policy. Communication at the time of the release contained this information and we will continue to make that communication more widely known and circulated.

Expanded response: Late in the morning of the release, through testing, we found one of the fixes was merged improperly and needed to be fixed and added to one package. As the team discussed the timeframe that this would take to correct, originally the response was only a few hours, thus keeping the release time within reach. At a few minutes to 11 AM MST the team said testing was needed and not calculated in the original timeframe and the soonest time to release would be late in the evening. As soon as this was discussed, the Community Open Hour meeting had already begun, and the Release Manager started drafting the delay communication. Communications were not finished and ready to deliver until about 11:45 AM MST and the Release Manager decided to wait until confirmation of the close of the Open Hour meeting to publish the delay communication. Looking back, this is a mistake two-fold: once known there was a problem, the community meeting should have been cancelled and will be in the future with details of an imminent delay with or without timeframes. The Release team members have plans to work closely with the Release Working Group to improve this type of process in the event of delay and will improve the release process to ensure this type of delay is minimized. This delay is specific to the CVE release process; however, the release process for major, minor, and CVE releases will be reviewed and improved. Details about work being done will be published and widely circulated.
Q: there was a mention about coordination with OS vendors? If there isn’t already a workflow in place, I think we should have one in place to get those packages in place when the embargo is over.In-meeting response: that is on our radar, but we haven’t had the resources to tackle that, yet. We plan to coordinate with the Release Working Group to dive into the process. We were lucky in the last release in that one of the CVE reporters was an OpenBSD maintainer.

Expanded response: the Release Working Group members have already started to coordinate. Please join the Working Group next month Wednesday 2021-APR-14 9-10 AM MDT and 3-4 PM UTC Zoom Meeting Link
Comment: if you need additional input, I could give someIn-meeting response: Input is good, we always appreciate that, but action/cycles are even better. We do have a really great community with a lot of active members that we appreciate very much we’re trying to figure out how we can expand and let more of those people involved in things that are typically more behind the scenes (discussion about the overlap in testing/release/security groups. A group of groups? like a scrum of scrums). (Sage) We have discussed this within the Core Team and we hold a weekly Friday Planning meeting and currently, we are considering have the captains meet with us at that time and day, not as a required meeting, but open it up and as their schedules allow coordinate perhaps a monthly date to attend.

Expanded response: More details will be forth coming. There are at least 2 groups that need community captains, if you are interested please get with someone on the Core team in IRC, Slack, or email Sage Robins directly at sage@saltstack.com

Comment: CAB?In-meeting response: As we look at the process, I asked for people to send me details about their interest to be a member of the CAB and since then I have received zero people saying they would commit to being a member (Sage). we’re not really convinced that the CAB is the right way to go, we’re looking at alternative approaches that would help people contribute (Dwoz).
Q: you mentioned that this last CVE release you’ve stuck with the SaltStack (the company) process/policy, is there a VMware approved policy?In-meeting response: There is an approved process [sic within VMware] for non-open source products. We are one of few open source projects that have a CVE process in place. It is our desire to iterate and improve that process. We have the opportunity to lead the way, but yes there will be some changes in that we were a combined company with a process that included the use of internal business units that no longer exist. There is also the legacy process that will, over a long period of time, evolve. Referring to legacy SaltStack customers have priority to early-access of the packages and patches during a CVE release. We are looking at how we can do that with community members. How to vet community members that today we already work with and trust to change this to be not only customers. Of course, we are also looking to improve other key aspects of the process, as well. We now have the ability to collaborate with an entire VMware Security team and that is invaluable and will likely help us see where we have gaps in our process. This will likely be one objective between the Release and new Working Group Security, which needs a captain.

Pending Actions (updated 2021-APR-01)

  • Responses will be input into the Retro Issue and a link to these meeting notescomplete
  • Final report of all responses will be made available including response from the in-person meeting on MAR-11 and the responses to within the issue (feedback was taken previously from Slack) but it is known that not all feedback may be captured, these pieces of communication are an attempt to capture all feedback and give a general report of Actions to be defined, pursued, and documented – complete
  • Review of report, and feedback during the MAR-18 in the Open Hour weekly meeting – no longer applicable but will have this available during weekly meetings for discussion and review of outstanding Actions – pending
  • Releases will be planned for Tuesdays or Wednesdays an no longer on Thursdays – previously our policy was Tuesday/Wednesday/Thursday – this is the policy and the schedule will hold through the next year; however this is pending documentation
  • Update the CVE reporting process to make it more seamless – this will be done with the Working Groups and is still pending
  • Preventative measures such as static code analysis and its use (complete); updates to Pull Request review process to include questions specific to security (to be defined and documented) – started
  • Define more collaborative opportunities between Working Groups (security, testing, release, formulas): Many overlaps could result in improvements to our CVE process, release process, and test suite – started, but still pending, this report is a start and continuation will be within Working Groups and with the Core Team
  • Enhance Platform Support Page and OS Support document as well as create more visibility – started, but still pending and will reside with Working Groups over the next release cycle
  • More communication about what versions will be addressed in CVE releases – started, but still pending, will continue as we also look at the action to enhance the Platform Support Page
  • Enhance CVE release process to include more details in the documentation regarding what is affected and how – pending and will be addressed by the Core Team in the Working Groups and made official with process documentation